[HN Gopher] AMD 3D Stacks SRAM Bumplessly
       ___________________________________________________________________
        
       AMD 3D Stacks SRAM Bumplessly
        
       Author : rbanffy
       Score  : 143 points
       Date   : 2021-06-08 11:10 UTC (1 days ago)
        
 (HTM) web link (fuse.wikichip.org)
 (TXT) w3m dump (fuse.wikichip.org)
        
       | 55873445216111 wrote:
       | Really impressive and a plesently surprising feature from AMD. I
       | imagine the real target market for this is Eypc server CPUs.
       | Eventually they might be able to cut the L3 cache out of the CCD
       | die completely and rely only on external SRAM die as L3. This
       | would give AMD a very flexible portfolio where they can offer
       | different SKUs with different amounts of L3 cache at different
       | price points, all using the exact same CCD die. What will be
       | interesting is to see how much the die stacking impacts thermals.
        
         | techrat wrote:
         | Also consider that cache is typically a substantial portion of
         | the die, that's some impressive cost savings to be able to etch
         | the wafer without having to worry about the greater area with
         | cache resulting in a potentially defective chip. Smaller chips,
         | more chips on a wafer, less loss overall with a defect.
        
         | oscardssmith wrote:
         | It wouldn't surprise me if they have a really expensive epic
         | chip with 8 cores and 1gb of cache. A chip like that would be
         | incredible for applications with per core licensing
        
           | tromp wrote:
           | 1GB on-chip SRAM is quite feasible at 7nm according to
           | several companies designing a single chip ASIC for Grin's
           | Cuckatoo32 Proof-of-Work, that needs 512MB of sequential
           | access memory and 512MB of pure random access memory for a
           | maximally efficient solver [1].
           | 
           | [1]https://forum.grin.mw/t/cuckatoo32-feasibility
        
         | gameswithgo wrote:
         | the first thing they did in the presentation was show a 12%
         | gaming FPS uplift.
         | 
         | market is anyone who wants faster stuff.
         | 
         | But yes, databases will love it, so will compilers.
        
           | [deleted]
        
       | gbrown_ wrote:
       | Previous discussion of Anandtech article on this
       | https://news.ycombinator.com/item?id=27350632
        
       | oblak wrote:
       | Since I am not exactly in the industry, it's always funny looking
       | at these diagrams. Actual cores are so tiny compared to various
       | caches, SIMD blocks and what have you
        
         | cogman10 wrote:
         | Yup, been like that for a while. The vast majority of
         | transistors in a CPU are dedicated to memory. Only a real tiny
         | fraction are dedicated to logic.
        
           | rbanffy wrote:
           | That's so true. In college I designed a discrete stack-based
           | CPU and by far the biggest chip count was in the microcode
           | EPROMs and SRAM for the register file. In those days RAM was
           | fast, so it didn't have a cache (which would be even more
           | SRAM).
        
           | lacksconfidence wrote:
           | Perhaps particularly interesting about this approach by AMD
           | is that the typical memory on a CPU die isn't as dense as it
           | can be, because of the processes they have to apply to the
           | wafer to also build the CPU transistors. With AMD moving to
           | separate chips they can use a process that builds denser
           | memory than what is typically seen on a cpu.
        
             | hinkley wrote:
             | I would think this helps with yield issues on your new
             | manufacturing node as well.
             | 
             | If you have a 33% yield on one new chiplet that doesn't
             | triple the price per unit for the package.
        
               | nine_k wrote:
               | Yields are normally kept high by having extra device
               | blocks, and achieving a working config by cutting some
               | links on the die with a laser.
               | 
               | Some chips get downgraded in the process: if you can't
               | sell a 4-core CPU with 8MB of L1 cache, you can disable
               | the core that fails tests, and / or disable the parts of
               | the cache that fail tests, and sell a 3-core part with
               | 4MB cache; AMD did just that back in the day.
        
               | mschuster91 wrote:
               | > Some chips get downgraded in the process: if you can't
               | sell a 4-core CPU with 8MB of L1 cache, you can disable
               | the core that fails tests, and / or disable the parts of
               | the cache that fail tests, and sell a 3-core part with
               | 4MB cache; AMD did just that back in the day.
               | 
               | How does that work anyway? I mean, how is a processor
               | actually tested at the pre-packaging stage, given that
               | you'd need to provide it with power and cooling for a
               | test?
        
               | doikor wrote:
               | Camera takes photos and compares with how it should look
               | like.
        
               | cogman10 wrote:
               | Hard to really say if it will negatively or positively
               | impact yield.
               | 
               | You should be able to cram in more chips per wafer. But,
               | you might see more of those turn out to be duds due to
               | the more complex layers. This 3d stacking has an
               | amplifying effect to flaws in lower layers.
               | 
               | We'll see if Zen 4 or Zen 5 has 100Mb caches... that'll
               | be the true test.
        
               | derefr wrote:
               | The proper yield comparison for TSV wouldn't be against
               | the one-chip, less-stuff version, though. It'd be against
               | what you'd have to do to achieve the _same_ capacities
               | without TSV: a multiplication of the number of mask
               | layers per chip, to produce a single extremely  "tall"
               | chiplet. That'd be an _extremely_ low-yield process
               | (which is why nobody 's doing it.)
        
       | Sephr wrote:
       | You can also stack SRAM bumplessly using the wireless ThruChip
       | Interface: https://en.wikichip.org/wiki/thruchip_interface
        
         | jl2718 wrote:
         | I do not understand on-chip wireless. Why build an antenna for
         | each channel? You could just multiplex high-impedance RF into
         | any common wire with and it's the same thing.
        
           | monocasa wrote:
           | Because it works better. Two very close antennas and two
           | halves of a transformer are just a matter of tuning and
           | opinion.
        
             | kabdib wrote:
             | Is this affected by external magnetic fields, or other
             | interference that a hard-wired connection is immune to?
             | 
             | I'm not in the habit of waving magnets around near CPUs,
             | but I worry about susceptibility to EMI and transients.
        
               | lazide wrote:
               | The only difference between a handheld magnet, radio
               | waves, inductance, etc. is speed, precision, and
               | magnitude.
               | 
               | So in practice it is highly unlikely a handheld magnet is
               | going to do anything for the same reason a cloud passing
               | overhead is going to break a rock. The difference in air
               | pressure is too low, the energy gradient too gradual, and
               | it's too far away anyway.
               | 
               | Take the same aggregate amount of energy into a tank of
               | compressed air, and use a jackhammer and that rock is
               | dust in no time.
               | 
               | Same physics principles; radically different impact in
               | practice.
               | 
               | Assuming the magnet you have isn't the electromagnetic
               | version of a tornado and you aren't waving it around at
               | several thousand RPM anyway.
        
         | rbanffy wrote:
         | I remember Sun working on wireless interconnects, but I think
         | it was horizontal, inter-package
        
         | Escapado wrote:
         | Interesting technology. Is everyone using it? And if not (I
         | recall tsv being used a lot) then why? The wiki entry paints a
         | very positive picture here.
        
           | jleahy wrote:
           | Large size and poor thermals, if I had to guess.
        
       | twotwotwo wrote:
       | The BIOS on an AMD reference platform refers to there being 1, 2,
       | or 4 "X3D stacks", suggesting that eventually you might be
       | talking about more cache than this:
       | https://twitter.com/aschilling/status/1399701274489151489
       | 
       | (Who knows when/if they get 2/4-high working and at what price:
       | the I/O die needs to be able to track tags proportionate to the
       | total cache size, the compute chiplet needs to support it, and
       | packaging/power/everything else needs to work out. Switch in BIOS
       | doesn't mean the rest is there.)
       | 
       | Other clever thing about this for them is they can keep focusing
       | on one base compute chiplet but turn it into a wider range of
       | SKUs, on top of how they already do that with core count etc.
       | Same chiplet can end up in a product with 32 or 96MB L3/CCD as
       | soon as the first two-high stack comes out and, obviously, more
       | if/when they get 2/4-high going.
        
         | baybal2 wrote:
         | They already have 4 dies in 1 layer, just some of them being
         | dummy, or possibly dead dies.
        
           | touisteur wrote:
           | Or more thermally ideally placed. I had a 7371 with 16 cores
           | like the 7351 but higher frequency 3.1GHz base frequency
           | against 2.4GHz (more EUREUREUR too !). The thing was a
           | beauty.
        
       ___________________________________________________________________
       (page generated 2021-06-09 23:00 UTC)