[HN Gopher] AMD 3D Stacks SRAM Bumplessly
___________________________________________________________________
AMD 3D Stacks SRAM Bumplessly
Author : rbanffy
Score : 143 points
Date : 2021-06-08 11:10 UTC (1 days ago)
(HTM) web link (fuse.wikichip.org)
(TXT) w3m dump (fuse.wikichip.org)
| 55873445216111 wrote:
| Really impressive and a plesently surprising feature from AMD. I
| imagine the real target market for this is Eypc server CPUs.
| Eventually they might be able to cut the L3 cache out of the CCD
| die completely and rely only on external SRAM die as L3. This
| would give AMD a very flexible portfolio where they can offer
| different SKUs with different amounts of L3 cache at different
| price points, all using the exact same CCD die. What will be
| interesting is to see how much the die stacking impacts thermals.
| techrat wrote:
| Also consider that cache is typically a substantial portion of
| the die, that's some impressive cost savings to be able to etch
| the wafer without having to worry about the greater area with
| cache resulting in a potentially defective chip. Smaller chips,
| more chips on a wafer, less loss overall with a defect.
| oscardssmith wrote:
| It wouldn't surprise me if they have a really expensive epic
| chip with 8 cores and 1gb of cache. A chip like that would be
| incredible for applications with per core licensing
| tromp wrote:
| 1GB on-chip SRAM is quite feasible at 7nm according to
| several companies designing a single chip ASIC for Grin's
| Cuckatoo32 Proof-of-Work, that needs 512MB of sequential
| access memory and 512MB of pure random access memory for a
| maximally efficient solver [1].
|
| [1]https://forum.grin.mw/t/cuckatoo32-feasibility
| gameswithgo wrote:
| the first thing they did in the presentation was show a 12%
| gaming FPS uplift.
|
| market is anyone who wants faster stuff.
|
| But yes, databases will love it, so will compilers.
| [deleted]
| gbrown_ wrote:
| Previous discussion of Anandtech article on this
| https://news.ycombinator.com/item?id=27350632
| oblak wrote:
| Since I am not exactly in the industry, it's always funny looking
| at these diagrams. Actual cores are so tiny compared to various
| caches, SIMD blocks and what have you
| cogman10 wrote:
| Yup, been like that for a while. The vast majority of
| transistors in a CPU are dedicated to memory. Only a real tiny
| fraction are dedicated to logic.
| rbanffy wrote:
| That's so true. In college I designed a discrete stack-based
| CPU and by far the biggest chip count was in the microcode
| EPROMs and SRAM for the register file. In those days RAM was
| fast, so it didn't have a cache (which would be even more
| SRAM).
| lacksconfidence wrote:
| Perhaps particularly interesting about this approach by AMD
| is that the typical memory on a CPU die isn't as dense as it
| can be, because of the processes they have to apply to the
| wafer to also build the CPU transistors. With AMD moving to
| separate chips they can use a process that builds denser
| memory than what is typically seen on a cpu.
| hinkley wrote:
| I would think this helps with yield issues on your new
| manufacturing node as well.
|
| If you have a 33% yield on one new chiplet that doesn't
| triple the price per unit for the package.
| nine_k wrote:
| Yields are normally kept high by having extra device
| blocks, and achieving a working config by cutting some
| links on the die with a laser.
|
| Some chips get downgraded in the process: if you can't
| sell a 4-core CPU with 8MB of L1 cache, you can disable
| the core that fails tests, and / or disable the parts of
| the cache that fail tests, and sell a 3-core part with
| 4MB cache; AMD did just that back in the day.
| mschuster91 wrote:
| > Some chips get downgraded in the process: if you can't
| sell a 4-core CPU with 8MB of L1 cache, you can disable
| the core that fails tests, and / or disable the parts of
| the cache that fail tests, and sell a 3-core part with
| 4MB cache; AMD did just that back in the day.
|
| How does that work anyway? I mean, how is a processor
| actually tested at the pre-packaging stage, given that
| you'd need to provide it with power and cooling for a
| test?
| doikor wrote:
| Camera takes photos and compares with how it should look
| like.
| cogman10 wrote:
| Hard to really say if it will negatively or positively
| impact yield.
|
| You should be able to cram in more chips per wafer. But,
| you might see more of those turn out to be duds due to
| the more complex layers. This 3d stacking has an
| amplifying effect to flaws in lower layers.
|
| We'll see if Zen 4 or Zen 5 has 100Mb caches... that'll
| be the true test.
| derefr wrote:
| The proper yield comparison for TSV wouldn't be against
| the one-chip, less-stuff version, though. It'd be against
| what you'd have to do to achieve the _same_ capacities
| without TSV: a multiplication of the number of mask
| layers per chip, to produce a single extremely "tall"
| chiplet. That'd be an _extremely_ low-yield process
| (which is why nobody 's doing it.)
| Sephr wrote:
| You can also stack SRAM bumplessly using the wireless ThruChip
| Interface: https://en.wikichip.org/wiki/thruchip_interface
| jl2718 wrote:
| I do not understand on-chip wireless. Why build an antenna for
| each channel? You could just multiplex high-impedance RF into
| any common wire with and it's the same thing.
| monocasa wrote:
| Because it works better. Two very close antennas and two
| halves of a transformer are just a matter of tuning and
| opinion.
| kabdib wrote:
| Is this affected by external magnetic fields, or other
| interference that a hard-wired connection is immune to?
|
| I'm not in the habit of waving magnets around near CPUs,
| but I worry about susceptibility to EMI and transients.
| lazide wrote:
| The only difference between a handheld magnet, radio
| waves, inductance, etc. is speed, precision, and
| magnitude.
|
| So in practice it is highly unlikely a handheld magnet is
| going to do anything for the same reason a cloud passing
| overhead is going to break a rock. The difference in air
| pressure is too low, the energy gradient too gradual, and
| it's too far away anyway.
|
| Take the same aggregate amount of energy into a tank of
| compressed air, and use a jackhammer and that rock is
| dust in no time.
|
| Same physics principles; radically different impact in
| practice.
|
| Assuming the magnet you have isn't the electromagnetic
| version of a tornado and you aren't waving it around at
| several thousand RPM anyway.
| rbanffy wrote:
| I remember Sun working on wireless interconnects, but I think
| it was horizontal, inter-package
| Escapado wrote:
| Interesting technology. Is everyone using it? And if not (I
| recall tsv being used a lot) then why? The wiki entry paints a
| very positive picture here.
| jleahy wrote:
| Large size and poor thermals, if I had to guess.
| twotwotwo wrote:
| The BIOS on an AMD reference platform refers to there being 1, 2,
| or 4 "X3D stacks", suggesting that eventually you might be
| talking about more cache than this:
| https://twitter.com/aschilling/status/1399701274489151489
|
| (Who knows when/if they get 2/4-high working and at what price:
| the I/O die needs to be able to track tags proportionate to the
| total cache size, the compute chiplet needs to support it, and
| packaging/power/everything else needs to work out. Switch in BIOS
| doesn't mean the rest is there.)
|
| Other clever thing about this for them is they can keep focusing
| on one base compute chiplet but turn it into a wider range of
| SKUs, on top of how they already do that with core count etc.
| Same chiplet can end up in a product with 32 or 96MB L3/CCD as
| soon as the first two-high stack comes out and, obviously, more
| if/when they get 2/4-high going.
| baybal2 wrote:
| They already have 4 dies in 1 layer, just some of them being
| dummy, or possibly dead dies.
| touisteur wrote:
| Or more thermally ideally placed. I had a 7371 with 16 cores
| like the 7351 but higher frequency 3.1GHz base frequency
| against 2.4GHz (more EUREUREUR too !). The thing was a
| beauty.
___________________________________________________________________
(page generated 2021-06-09 23:00 UTC)