[HN Gopher] Cerebras' new monster AI chip adds 1.4T transistors
       ___________________________________________________________________
        
       Cerebras' new monster AI chip adds 1.4T transistors
        
       Author : Anon84
       Score  : 109 points
       Date   : 2021-04-22 19:37 UTC (3 hours ago)
        
 (HTM) web link (spectrum.ieee.org)
 (TXT) w3m dump (spectrum.ieee.org)
        
       | AlexCoventry wrote:
       | Imagine a beowulf cluster of these.
        
         | faichai wrote:
         | This made me chuckle!
        
       | xhrpost wrote:
       | How can the chip itself consume that kind of power? Or is the
       | 15kw value for the entire unit? That's like 10 residential space
       | heaters all turned to max. I'm surprised that much heat could be
       | dissipated over such a small surface area. Does it use
       | refrigerant for cooling? If my math is correct, if you had a
       | 6500BTU window air conditioner, you'd need 8 of them to move the
       | heat from this chip.
        
         | philjohn wrote:
         | Well, a single 7nm CPU (AMD Ryzen) can pull down 95 odd watts
         | at peak, granted, the IO die is a fair bit of that as it's on
         | an older process, but if you extrapolate that out across a
         | giant wafer it's "only" 157 Ryzen 7's.
        
         | YetAnotherNick wrote:
         | It would melt if you just do air cooling. I did some math for
         | liquid nitrogen which has heat of vaporization as 200 kJ / kg.
         | It would boil 3600*15/200 = 270 kg of nitrogen every hour. Just
         | insane.
        
       | terafo wrote:
       | Is there any information on how this thing performs on actual AI
       | workloads? It is priced similarly to a dozen of DGX A100s, but is
       | it faster on something like training big transformer models(such
       | as CLIP or GPT-3)?
        
       | 01100011 wrote:
       | This thing is awesome and, as someone working for a competitor,
       | kind of scary. I applaud their approach though. I think we're a
       | couple years off from it, but we'll probably see wider adoption
       | of larger silicon, with more specialized functional units, which
       | are used with a lower duty cycle to manage heat. If nothing else,
       | they're probably developing some good IP and techniques to handle
       | other sorts of ultra-mega-insane-scale-integration.
       | 
       | I wonder what their software stack looks like. Can they support
       | the sort of virtualization and sharing you'd want to keep this
       | expensive beast fully utilized 24/7?
        
         | ohazi wrote:
         | LSI, VLSI, UMISI
         | 
         | I like it!
         | 
         | In previous articles they've gone into some detail about how
         | they deal with reticle limits, jumping over the scribe line
         | area, and other n stuff. Between that, chiplets, HBM-style die
         | stacks, etc... the developments here have been more interesting
         | than I expected.
        
       | KETpXDDzR wrote:
       | I think I once saw one of the founders with a wafer in an In n'
       | out with a potential investor. Looking at what Apple achieved
       | with their M1A and the demand for "AI" - or training neutral
       | networks, what it really is - they have a lot of potential. At
       | least as long as the AI bubble doesn't burst.
        
         | arisAlexis wrote:
         | How can the future burst? It's like saying medicine will burst
         | or physics
        
           | foobiekr wrote:
           | Personalized medicine, as an example, absolutely burst.
        
           | mattkrause wrote:
           | Same way as the last time: rampant over-promising and under-
           | delivering, maybe catalyzed by some high profile mishaps.
           | 
           | Bubbles popping aren't always evaluations of the objective
           | quality of something; it could just be about its assumed
           | value _relative to other plausible options._ Homes mostly
           | don't become uninhabitable when a real estate bubble pops.
        
           | lainga wrote:
           | I don't know, expert systems haven't been so hot lately...
        
             | arisAlexis wrote:
             | It's very surprising what people with downvote power do but
             | it's *** to downvote anyone that disagrees. Anyway expert
             | systems were a precursor to the technology and they got
             | replaced much like new physics replace old physics.
        
               | ASalazarMX wrote:
               | Disagreeing doesn't mean you're right. AI is not
               | equivalent to the future, and it will burst if it stalls
               | and another AI winter cools the current hype cycle.
        
               | jefft255 wrote:
               | It's because you made a false analogy. AI isn't literally
               | "the future". Billions of $ are being invested in deep-
               | learning focused AI right now (which you call "the
               | future"), and yes it could be a bubble and it could
               | burst. You can disagree, but it's still a sensible thing
               | to predict.
        
               | arisAlexis wrote:
               | By bursting you mean humanity will never create
               | artificial intelligence? Or you mean that there will be a
               | cool off period as for example what happened with quantum
               | physics at some point? Because it sure looks to me that
               | there is no future without AI regardless of cool off
               | periods. That makes my statement true. If you think
               | humanity will never progress from where we are now then
               | we pretty much are on very opposite schools of thought.
        
       | etaioinshrdlu wrote:
       | How much memory is on the chip, and what kind is it?
       | 
       | Under what circumstances does the chip need to access external
       | memory?
       | 
       | What type of communication interfaces does this chip have?
       | 
       | Also, if the chip is the size of a wafer, is it appropriate to
       | call it a Chip?
        
         | verdverm wrote:
         | Tom's Hardware has some nice tables comparing the specs:
         | https://www.tomshardware.com/news/cerebras-wafer-scale-engin...
         | 
         | (more than the IEEE)
         | 
         | This thing pulls 15-20kW of juice!
        
           | meepmorp wrote:
           | > This thing pulls 15-20kW of juice!
           | 
           | If you look at the wafer she's holding at the top, it's
           | seemingly segmented into a 12x7 grid of roughly chip-sized
           | rectangles. That's 84 "CPUs" at 200-240 watts each, which is
           | pretty well in line with discrete server CPUs.
           | 
           | The amount of heat coming off this thing must be amazing,
           | though.
        
           | teruakohatu wrote:
           | I did a double that when I realised that was Kilowatts not
           | Watts. This chip uses more energy in an hour than the average
           | household (in my country at least) does in a day.
           | 
           | It may be a very large wafer but dissipating that heat is
           | still very impressive.
        
             | kllrnohj wrote:
             | It sounds like a lot but it almost isn't? Like this is ~50x
             | bigger than an Nvidia A100, and the A100 pulls up to 400w.
             | 50 * 400 ~= 20kW. So in terms of thermal density it's in-
             | line with existing GPUs.
             | 
             | That said, I'd be fascinated to see the cooling solution.
             | Is it just a _massive_ copper heatsink  & a boatload of
             | airflow? Typical approaches of using heatpipes to expand
             | the heatsink won't really work with something this big
             | after all. Or is it a massive waterblock with multiple
             | inlets/outlets so it can hit up a stack of radiators? How
             | do they get even mounting pressure across that large of an
             | area?
        
               | mvanaltvorst wrote:
               | There's an image on their website[1], pretty huge water
               | pumps.
               | 
               | "To solve the 70-year-old problem of wafer-scale, we
               | needed not only to yield a big chip, but to invent new
               | mechanisms for powering, packaging, and cooling it.
               | 
               | The traditional method of powering a chip from its edges
               | creates too much dissipation at a large chip's center. To
               | prevent this, CS-2's innovative design delivers power
               | perpendicularly to each core.
               | 
               | To uniformly cool the entire wafer, pumps inside CS-2
               | move water across the back of the WSE-2, then into a heat
               | exchanger where the internal water is cooled by either
               | cold datacenter water or air."
               | 
               | [1]: https://cerebras.net/product/
        
         | typon wrote:
         | 40GB of SRAM. Not quite big enough to fit the big models like
         | GPT3
        
           | etaioinshrdlu wrote:
           | It seems like you typically want a balance of memory
           | size/bandwidth to compute ratio for typical deep learning
           | applications.
           | 
           | The 40GB of SRAM probably has tremendous bandwidth (it could
           | all be updated every few cycles!), but the memory size is
           | very small compared to the amount of compute available.
           | 
           | However, maybe a different way of looking at it is that this
           | chip will allow the training steps on deep learning models to
           | take a fraction of the time as a GPU. Perhaps what takes 1s
           | on a GPU could take 10ms on this chip.
           | 
           | So, this product may be effective at making training happen
           | very fast, but without substantial model size or efficiency
           | gains.
           | 
           | That's still ground breaking -- you can't acheive this result
           | on GPUs. You can't achieve this result by any parallelization
           | or distributed training, either. The large batch sizes in
           | distributed training do not result in the same model or one
           | that generalizes as well.
        
           | claytonius wrote:
           | I don't think it's straightforward to do a head to head
           | comparison.
           | 
           | from: https://www.youtube.com/watch?v=yso2S2Svdlg
           | 
           | @ 25:14
           | 
           | James Wang: "If a model doesn't fit into a GPU's HBM, is it
           | smaller when it's laid out in the Cerebras way relative to
           | your 18 gigabytes?"
           | 
           | Andrew Feldman: "It is -- it's smaller in that we hold
           | different things in memory than they do. One can imagine a
           | model that has more parameters than we can hold -- one can
           | posit one, but remember our memory is doing different things.
           | Our memory is basically holding parameters. That's not what
           | their memory is doing. Their memory is holding the shape of
           | the model, their model is holding the results of the batches.
           | We use memory rather differently. We haven't found models
           | that we can't place and train on a chip. We expect them to
           | emerge, that's why we support clustering of chips and
           | systems, that's why we do that in whats called a "model
           | parallel" way, where If you put two chips together you get
           | twice the memory capacity. That's not what you get when you
           | put multiple GPUs together. When you put multiple GPUs
           | together you get two versions of the same amount of memory,
           | you actually don't get twice the memory. I see you smiling
           | here because you know that's a problem... ...With us if we
           | support 4 billion parameters and you add a second wafer scale
           | engine, now you support 8 billion parameters ,and if you add
           | a third you can support 12 billion. That's not the way it
           | works with GPUs. With GPUs you just support two chips, each
           | with a few million - tens of millions of parameters."
        
             | frogblast wrote:
             | Are there any good resources out there describing in
             | practice how existing training workloads are distributed
             | among GPUs? (using tensorflow, pytorch, or whatever else?).
             | 
             | I'm curious how the problem effectively gets sliced.
        
           | MuffinFlavored wrote:
           | SRAM (static RAM) vs DRAM (dynamic RAM) for anybody else
           | curious: https://computer.howstuffworks.com/question452.htm
        
         | IshKebab wrote:
         | It's 40 GB of SRAM. I doubt it supports external memory.
         | 
         | > Also, if the chip is the size of a wafer, is it appropriate
         | to call it a Chip?
         | 
         | Good question. I think it is. I mean the word "chip" isn't
         | really that well defined (is HBM one chip?), but given that
         | they sell it as a single unit and you can't really cut it in
         | half I think it's one chip.
        
           | NaturalPhallacy wrote:
           | Looking at the picture, the thing is a platter. Not a chip.
           | 
           | It's a really cool picture too.
        
       | fredfoobar wrote:
       | Time for an AI winter I guess.
        
         | Logon90 wrote:
         | Just from the headline you can see the irrelevance of this
         | chip. Who talks about transistor count as a proxy of
         | performance?
        
         | streetcat1 wrote:
         | You do realize that AI crossed human expert performance in NLP
         | / Vision tasks ?
        
           | semi-extrinsic wrote:
           | Exactly how does one outperform a human expert in natural
           | language processing?
        
             | zetazzed wrote:
             | See, if you were an AI, you would understand EXACTLY what
             | the poster means by this.
        
             | twic wrote:
             | Maybe the gibberish GPT-3 spits out is actually true, and
             | our puny monke brains are just too weak to understand it.
        
             | pulse7 wrote:
             | Parent probably meant that it outperformed a human expert
             | in some specific task in the area of natural language
             | processing - for example the task of converting a spoken
             | language into a written language...
        
             | PeterisP wrote:
             | They don't really outperform human experts on real tasks
             | yet (no matter what some superGLUE or other benchmark
             | shows); but in general, once a system can solve a
             | particualr task well, it would be plausible to outperform
             | human experts simply by not making random errors.
             | 
             | If we have multiple human experts annotate a NLP task and
             | measure inter-annotator agreement, it will be far from
             | 100%; part of that will be genuine disagreements or fuzzy
             | gray area, but part of the identified differences will be
             | simply obviously wrong answers given by the experts -
             | everyone makes mistakes. The same applies for many other
             | domains - business process automation, data entry, etc; no
             | employee will produce error-free output in a manual
             | process, no matter how simple and unambiguous the task is.
             | 
             | And for simpler tasks the computer can easily make less
             | mistakes than a human - especially if you measure the human
             | reliability not for a few minutes of focus, but for a whole
             | tedious working day.
        
           | firebaze wrote:
           | I'm not sure if you're assuming best intentions. If you do,
           | it'd be nice to provide sources - to my knowledge, vision
           | under non-optimal conditions (rain, snow, sunlight ahead) is
           | only partially solved by resorting to sensors resistant to
           | the disturbance.
           | 
           | I'd be glad to learn I'm wrong.
        
           | king_magic wrote:
           | No, it hasn't. Not even close.
           | 
           | AI has "crossed human expert performance" on _extremely
           | narrow_ NLP /CV tasks.
           | 
           | AI is still light years away from human-level performance.
        
           | anthk wrote:
           | GPT-3 is not even close to a minimal and coherent text
           | adventure made with Inform6 from a novice, even if it's
           | written by a non-native English speaker.
           | 
           | Those networks didn't match "Detective", a crappy story
           | written by a 12yo.
        
         | trhway wrote:
         | >Time for an AI winter
         | 
         | with AI chips burning 15KW? No chance for the winter in sight.
         | Some chances for AI hell though.
        
       | zitterbewegung wrote:
       | Much better article from anandtech at
       | https://www.anandtech.com/show/16626/cerebras-unveils-wafer-...
        
         | XCSme wrote:
         | I laughed at the "arm+leg" official cost of the CPU.
         | 
         | EDIT: Or GPU, or whatever it is.
        
         | wyxuan wrote:
         | Here's a video made by the author, Ian Cuttress, which goes
         | into more detail as well.
         | 
         | https://www.youtube.com/watch?v=FNd94_XaVlY
        
           | smithza wrote:
           | This is very cool. The explanation of how yield/defects was
           | interesting: they can bypass cores with defects due to
           | channeling and account for the statistical defects, allowing
           | them to have a 100% yield.
        
       | sillysaurusx wrote:
       | I'm bearish on new hardware for AI training. The most important
       | thing is the software stack, and thus far everyone has failed to
       | support pytorch in a drop-in way.
       | 
       | The philosophy here seems to be "if we build it, they'll buy it."
       | But suppose you wanted to train a gpt model with this specialized
       | hardware. That means you're looking at two months of R&D minimum
       | to get everything rewritten, running, tested, trained, and with
       | an inferencing pipeline to generate samples.
       | 
       | And that's _just_ for gpt -- you lose all the other libraries
       | people have written. This matters more in GAN training, since for
       | example you can find someone else's FID implementation and drop
       | it in without too much hassle. But with this specialized chip,
       | you'd have to write it from scratch.
       | 
       | We had a similar situation in gamedev circa 2003-2009.
       | Practically every year there was a new GPU, which boasted similar
       | architectural improvements. But, for all its flaws, GL made these
       | improvements "drop-in" --- just opt in to the new extension, and
       | keep writing your gl code as you have been.
       | 
       | Ditto for direct3d, except they took the attitude of "limit to a
       | specific API, not arbitrary extensions." (Pixel shader 2.0 was an
       | awesome upgrade from 1.1.)
       | 
       | AI has no such standards, and it hurts. The M1 GPU in my new Air
       | is supposedly ready to do AI training. Imagine my surprise when I
       | loaded up tensorflow and saw that it doesn't support any GPU
       | devices whatsoever. They seem to transparently rewrite the cpu
       | ops to run on the gpu automatically, which isn't the expected
       | behavior.
       | 
       | So I dig into Apple's actual api for doing training, and holy
       | cow, that looks miserable to write in swift. I like how much
       | control it gives you over allocation patterns, but I can't
       | imagine trying to do serious work in it on a daily basis.
       | 
       | What we need is a unified API that can easily support multiple
       | backends -- something like "pytorch, but just enough pytorch to
       | trick everybody" since supporting the full api seems to be beyond
       | hardware vendors' capabilities at the moment. (Lookin' at you,
       | google. Love ya though.)
        
         | whimsicalism wrote:
         | I'm on board with you that there should be a "drop-in" cross
         | support of these chips, but pytorch is at a way higher
         | abstraction level than what should be commonly supported.
        
         | socialdemocrat wrote:
         | Maybe people will get to their senses and switch to Julia
         | instead of having to waste all these time on Python bindings.
        
           | habibur wrote:
           | Python here basically works as a binder for C, which is what
           | everything is written in.
        
         | zucker42 wrote:
         | > The philosophy here seems to be "if we build it, they'll buy
         | it."
         | 
         | Supposedly Cerebras is already profitable, so it's hardly a
         | situation where they are building something and hoping people
         | buy it eventually.
         | 
         | > That means you're looking at two months of R&D minimum to get
         | everything rewritten, running, tested, trained, and with an
         | inferencing pipeline to generate samples.
         | 
         | Again, based on the companies representations, Cebebras
         | transparently supports Pytorch and Tensorflow, only requiring a
         | few lines of changed code.
         | 
         | Source: https://www.anandtech.com/show/16626/cerebras-unveils-
         | wafer-... (Dr. Cutress's video on TechTechPotato is also good).
        
       | michelpp wrote:
       | This thing needs a GraphBLAS[1] implementation yesterday. 100
       | billion edge graphs and up are the new norm. This monster could
       | smoke the competition if the implementation was tuned right!
       | 
       | [1] http://graphblas.org
        
         | blueyes wrote:
         | Creating the ecosystem of both software and adjacent hardware
         | for wafers this size is the real challenge for a company like
         | Cerebras (which is doing amazing work). At first, they thought
         | they just needed to make a chip 56x the size of its
         | predecessor, and somehow get around the issue of defects and
         | yield. After they solved those problems (which blocked Gene
         | Amdahl, among others), they found they needed to bring an
         | entire ecosystem into being to work with their hardware.
        
           | michelpp wrote:
           | Agreed, that's why I think the GraphBLAS would be such a
           | great fit for this hardware. The ecosystem is growing pretty
           | fast. There are, for example Python bindings, you can do
           | sparse 'A @ B' on millions of elements in parallel with this
           | wafer-chip. MATLAB 2021a now has GraphBLAS built in, you
           | could drive this thing directly from your notebooks.
           | 
           | I'm sure there's a compiler and low level primitives to
           | really get the maximum performance out of it, but the trade-
           | off maybe worth it in many cases to do it using an
           | abstraction like the linear algebra approach.
        
       | systemvoltage wrote:
       | Most interesting aspect of wafer-scale manufacturing is yield.
       | Even if we have 95% _chip_ yield, as the chip size approaches the
       | wafer-level dimensions, I don 't know off top of my head what the
       | math would be but it is going to plummet drastically. My guess is
       | that they're handling this in the chip logic. Building resiliency
       | by turning off cells in the wafer that didn't yield. That begs
       | the question, how are they probing them? A probe card of the size
       | of the wafer is unheard of. How are they running validation?
       | Pretty mindblowing to say the least!
        
         | Zardoz84 wrote:
         | It's the old Wafer Scale Integration again. But now shows a
         | successful product : https://en.wikipedia.org/wiki/Wafer-
         | scale_integration
        
         | pradn wrote:
         | The first Cerebras Wafer Scale Engine used "... breakthrough
         | techniques in cross-reticle patterning, but with the level of
         | redundancy built into the design, ensured a yield of 100%,
         | every time." I'm unsure what to think of this.
         | 
         | "When we spoke to Cerebras last year, the company stated that
         | they already had orders in the 'strong double digits'."
         | 
         | And they cost 2- 2.5 million each!
         | 
         | https://www.anandtech.com/show/15838/cerebras-wafer-scale-en...
        
           | robocat wrote:
           | "the price has risen from ~$2-3 million to 'several'
           | million".
           | 
           | "The CEO Andrew Feldman tells me that as a company they are
           | already profitable, with dozens of customers already with
           | CS-1 deployed and a number more already trialling CS-2
           | remotely as they bring up the commercial systems".
           | 
           | Quotes from https://www.anandtech.com/show/16626/cerebras-
           | unveils-wafer-...
        
         | belval wrote:
         | Not an expert in chip manufacturing but my guess is that they
         | just disable the parts that don't work and their big numbers
         | represent ~80% of the actual number of transistors in the wafer
         | because they account for that manufacturing loss.
        
           | KETpXDDzR wrote:
           | Correct. You already have the same with modern CPUs and GPUs.
           | You just disable the defect parts. Obviously, that sounds way
           | easier than it is.
        
         | vkazanov wrote:
         | Disabling parts of the chip? The secret sauce then is to make
         | it defect-proof
        
           | gsnedders wrote:
           | Yup. This. From
           | https://www.anandtech.com/show/16626/cerebras-unveils-
           | wafer-...:
           | 
           | > Cerebras achieves 100% yield by designing a system in which
           | any manufacturing defect can be bypassed - initially Cerebras
           | had 1.5% extra cores to allow for defects, but we've since
           | been told this was way too much as TSMC's process is so
           | mature.
        
           | baybal2 wrote:
           | Yep, not all semiconductor defects can be repaired, and at
           | such scale repair circuits themselves will need to be
           | redundant.
        
         | [deleted]
        
         | bob1029 wrote:
         | They probably aren't bothering to. The extreme economics for
         | producing this type of chip are likely acceptable to the
         | stakeholders.
         | 
         | Also, there is no reason they cant have some redundancy
         | throughout the design so you can fuse off bad parts. It all
         | really depends on the nature of the anticipated vs actual
         | defects, which is an extraordinarily deep rabbit hole to climb
         | into.
        
           | Pet_Ant wrote:
           | How does this fusing work? I assume there are a bunch of
           | wires than are either hot or ground and that determines if
           | part of a chip gets run?
        
             | mechagodzilla wrote:
             | It might actually use a traditional fuse block, where at
             | some point in the packaging/testing process, you literally
             | apply a sufficiently high voltage that you can permanently
             | 'set' some part of it (whether that's actually melting a
             | tiny wire, I'm not sure). But that's basically just
             | programming a ROM that gets read in at boot time, and sets
             | a bunch of logic on the chip to route around the bad parts.
             | You could just use an external EEPROM to track that info
             | too, and it would basically work the same.
        
             | PeterisP wrote:
             | I believe you design special fuses on the chip that can be
             | "blown" with a laser after testing and before putting the
             | silicon in the protective packaging.
        
               | p_j_w wrote:
               | Fuses aren't blown with a laser, it's purely electrical.
               | You apply a sufficiently high voltage to some port on the
               | part from a source that can drive a high enough current
               | and then tell the digital block of the chip which address
               | to fuse and if you want it high or low. Repeat for the
               | entire fuse bank and you're done.
               | 
               | https://en.wikipedia.org/wiki/Efuse
        
           | systemvoltage wrote:
           | I wonder how much this "chip" costs!
        
             | aryonoco wrote:
             | According to Anandtech, an arm + leg. Also known as several
             | million.
        
             | meepmorp wrote:
             | The previous generation was the $2-3 million range, and
             | these are now in the neighborhood of "several million". Or,
             | arm+leg.
        
             | LASR wrote:
             | Products like these are often sold as part of a contract to
             | deliver complete solutions + support + maintenance over
             | some number of years.
             | 
             | It's hard to estimate a per-unit cost. But suffice to say
             | it would cost similar to other datacenter compute solutions
             | on a performance/$ level.
        
         | baybal2 wrote:
         | I bet they just probe the chip:
         | 
         | 1. piece by piece
         | 
         | 2. on die test circuits
        
         | seniorivn wrote:
         | they exclude/disable cores with issues so an long as
         | infrastructure parts of the chip are not affected the chip is
         | functional
        
       | zetazzed wrote:
       | I wonder what the dev story looks like for these? I know they say
       | "just use TF/Pytorch" but surely developers need to actually test
       | stuff out on silicon and run CI on code... do they offer a baby
       | version for testing?
        
       | noobydoobydoo wrote:
       | What does the programming model look like for one of these (like
       | at the assembly level)? I'm not even sure what to google.
        
       | cpr wrote:
       | Interesting--the placement of code-to-processor so that things
       | will be done roughly at the same time sounds a lot like the VLIW
       | compiler problem of scheduling execution units so things are
       | available at exactly the right time, without hardware interlocks.
        
       | bhewes wrote:
       | This thing reminds me of Daniel Hillis's "The Connection
       | Machine". Just 35 years later.
        
       | syntaxing wrote:
       | Super curious, how much do one of these cost? Like 10K,100K, or
       | 1M range?
        
       | lastrajput wrote:
       | Who said Moore's law is dead?
        
         | miohtama wrote:
         | It's not dead, but melting.
        
       | LASR wrote:
       | This is an interesting approach. Are there any
       | benchmark/performance indicators for these wafer-scale chips?
        
       | tandr wrote:
       | How do they do heat management and dissipation on such a big
       | wafer? I can imagine some different parts heats up differently,
       | putting a mechanical strain on wafer, and leading to cracks.
        
         | punnerud wrote:
         | A large portion of the article is answering most of this
        
           | tandr wrote:
           | No, it does not. There is a paragraph _talking_ about it, but
           | not much info still.
        
         | NortySpock wrote:
         | "Carefully."
         | 
         | The article mentions a year of engineering went into dealing
         | with the entire wafer thermally expanding under load.
        
         | addaon wrote:
         | If you look at the die shots (wafer shots?), you'll notice
         | small holes a few millimeters across spaced roughly at reticle
         | spacing. Those are drilled holes to allow through-wafer liquid
         | cooling. With liquid cooling not just around but through the
         | wafer, the temperature differential is minimized.
        
       | durst wrote:
       | Does anyone here have firsthand experience using the compiler?
       | Can you give a rough approximation of performance tuning with the
       | compiler compared to performance tuning with compilers targeting
       | the tensor cores on an A100 or a TPU?
        
       ___________________________________________________________________
       (page generated 2021-04-22 23:00 UTC)