[HN Gopher] The World's Largest Computer Chip
       ___________________________________________________________________
        
       The World's Largest Computer Chip
        
       Author : doe88
       Score  : 97 points
       Date   : 2021-08-21 09:50 UTC (13 hours ago)
        
 (HTM) web link (www.newyorker.com)
 (TXT) w3m dump (www.newyorker.com)
        
       | thethethethe wrote:
       | > In a big cluster, as many as forty-eight pizza-box-size servers
       | slide into a rack as tall as a person; these racks stand in rows,
       | filling buildings the size of warehouses. The neural networks in
       | such systems can tackle daunting problems, but they also face
       | clear challenges. A network spread across a cluster is like a
       | brain that's been scattered around a room and wired together.
       | Electrons move fast, but, even so, cross-chip communication is
       | slow, and uses extravagant amounts of energy.
       | 
       | Why wouldn't these giant chips be wired together into a cluster
       | too?
        
         | Nevermark wrote:
         | I am sure it will happen, but the cost effectiveness and
         | efficiency will drop so dramatically going from 1 wafer to just
         | 2 wafers.
         | 
         | 1 wafer, doing X work, in Y time
         | 
         | = 1 wafer, doing 2X work, in 2Y time
         | 
         | = Two wafers, doing 2X work, in something still close to 2Y
         | time
         | 
         | I.e. the slowness of between-wafer communication, vs. in-wafer
         | communication, will dwarf the computing time. Obviously there
         | is some N, where N wafers would be worth clustering, but it
         | might be quite high.
         | 
         | Maybe the company is working on ways to cut down cross-wafer
         | communication too. Vertical optical connections for instance
         | would be awesome.
        
       | delaaxe wrote:
       | Looks a bit like Tesla's new Dojo AI chip
        
         | kken wrote:
         | I may be mistaken, but Teslas Dojo chip still seems to be
         | relatively small. They can connect many of them into a 2D
         | fabric, though.
         | 
         | Cerebras still seems to have an advantage here, because they
         | can use on-chip interconnects, which potentially allows higher
         | bandwidth between the tiles.
        
           | Robotbeat wrote:
           | From what I understand, both Tesla and Cerebras use TSMC's
           | on-wafer fan out technology.
        
           | handol wrote:
           | I think the Tesla "Tiles" of 25 D1 chips are on a single
           | wafer with integrated interconnects. But there is certainly a
           | huge difference in the memory bandwidth claims. Cerebras
           | claims 20PB/s, and Tesla claims 10Tbps.
        
         | Robotbeat wrote:
         | I don't think that's an accident. Tesla undoubtedly took
         | inspiration from Cerebras. Also, I think some TSMC processes
         | enable this kind of wafer-scale chip, and both companies use
         | this.
        
       | [deleted]
        
       | riazrizvi wrote:
       | So currently chips are printed into regions that are limited in
       | size by physics, to a 3cm square. And processing power is
       | traditionally increased by stacking them in upto a dozen layers
       | that are interconnected at the edges.
       | 
       | Here, instead of that, the circuits are overlap-printed so that a
       | single wafer can support a set of 80 connected circuits, which
       | are now physically cooleable because of the flat design? While
       | they must be sacrificing some interconnection richness, because
       | of geometrical placement, for AI applications this probably
       | doesn't matter so much. Very interesting.
        
       | actually_a_dog wrote:
       | I know this is way OT, but the title immediately made me think of
       | this: http://web.cecs.pdx.edu/~harry/Relay/index.html
        
       | willis936 wrote:
       | You can see a photo of one of the chips and get some more
       | technical details here:
       | 
       | https://youtu.be/FNd94_XaVlY
        
         | Frost1x wrote:
         | I find it odd that an article written about scale, and not just
         | scale but the biggest scale, didn't include a photo that
         | demonstrates... relative sizes.
         | 
         | The official page demonstrates relative size at quick glance (I
         | guess they do use fingernails and dinnerplates but eh):
         | https://cerebras.net/chip/
        
           | IAmEveryone wrote:
           | But their ,,product" link in the footer goes to
           | https://cerebras.net/?page_id=632, which is 404. So I'll wait
           | before judging the relative skills of each site's technical
           | abilities.
        
       | baybal2 wrote:
       | Inaccuracy there. Cerebas is by far not the first trillion
       | transistor chip.
       | 
       | The first 1 trillion transistor chip was Samsung's 3D NAND chip,
       | and it went with rather little fanfare.
       | 
       | P.S. 2 -- Google is by far not the first company to do "automatic
       | floorplanning." This is what literally every EDA does.
        
       | fortran77 wrote:
       | I didn't know Gene Amdahl killed someone. According to the NY
       | Times he actually was convicted of manslaughter.
       | 
       | Considering that he ruined someone's life, why is he revered in
       | computing circles?
        
       | high_derivative wrote:
       | I am not really buying the lack of participation in MLPerf. Just
       | give us the numbers, don't skirt around about 'not being made for
       | these benchmarks'.
        
         | unnouinceput wrote:
         | You missed the point. They won't do it because the second
         | nVidia looks how much better Cerebrus are, they are going to
         | enter the market and sweep the rug under them, just like 40
         | years ago IBM decided to enter PC market and a lot of PC makers
         | went belly up by 1983, unless they made IBM PC compatible
         | clones (Apple was the only one without a IBM clone and barely
         | survived).
        
           | high_derivative wrote:
           | NVIDIA is going to do that either way. Huang is not asleep at
           | the wheel.
        
       | bloopernova wrote:
       | When I saw how large that chip was, I immediately thought of
       | cooling such a beast.
       | 
       | Can any materials scientists or engineers comment on if other
       | elements will withstand higher heat better than silicon? Seems
       | like such a large chip would be somewhat better to run at higher
       | temperature rather than budget for huge and elaborate cooling.
       | (This is very much a layman's question. The people who designed
       | the chip and its cooling are far, far smarter than me!)
        
         | Robotbeat wrote:
         | Silicon Carbide wafers can withstand higher temperatures, which
         | makes dumping heat easier.
        
         | sandworm101 wrote:
         | Water cooling would handle this without issue. No need for
         | fancy tricks. A big heat spreader and some 2" piping would be
         | more than enough.
        
           | codeflo wrote:
           | The article gives a figure of 15 kW for the chip. That's the
           | kind of heat usually generated by a small room full of
           | servers. Radiating that away on the outside is not the main
           | issue, solutions exist for that. But getting that kind of
           | heat away from the chip and into the water in the first place
           | has to be a nontrivial challenge.
        
             | MauranKilom wrote:
             | The chip is approximately 21 cm x 21 cm (that's 8.5
             | inches).
             | 
             | My kettle has 2 kW and doesn't take long to boil water from
             | room temperature. I reckon you could fit four such kettles
             | on the chip area (roughly). That means the chip would
             | roughly boil water two times as fast as my kettle, were it
             | used for that purpose.
             | 
             | While that does pose reasonably interesting engineering
             | challenges regarding coolant throughput etc., I don't think
             | there's anything particularly difficult there. You probably
             | would want a better heat transfer medium between chip and
             | water than my kettle has (well, I have not disassembled
             | it), but I agree with GP that a bunch of water pipes will
             | work well enough as a cooling solution.
             | 
             | Edit: Actually, screw it, we can calculate how much water
             | we need to put through there. Warming water from 20 to 100
             | degC takes 334 kJ/kg. (That comes out to 167 seconds to
             | heat 1l of water in a 2 kW kettle, for reference.) To
             | remove 15 kW of heat with water cooling, assuming the water
             | goes in at 20 and comes out at 100 degC, we need a
             | throughput of 0.045 kg/s = 45 g/s = 45 ml/s.
             | 
             | Sure, the temperature range may be a bit optimistic, but 45
             | ml/s (one liter every 22 seconds) is literally "just hold
             | it under a running tap". The main engineering challenge
             | would be making sure that heat is removed evenly enough, I
             | guess.
        
               | Nevermark wrote:
               | As the previous comment pointed out, its not the net
               | energy that needs to be removed that is the problem.
               | 
               | I.e. its like a kettle that isn't just radiating the
               | stove coil's energy away, it is actually trying to keep
               | your stove coil cool while its turned up to 10!
               | 
               | The same amount of energy movement, but not the same
               | problem at all.
        
             | xmcqdpt2 wrote:
             | Yes the whitepaper talks a lot about this cooling.
             | 
             | https://f.hubspotusercontent30.net/hubfs/8968533/Cerebras-
             | CS...
             | 
             | The traditional computer included in the box is probably
             | quite high end and power hungry too so that it can provide
             | enough data to maintain those bandwidths. They don't appear
             | to sell the chips by themselves.
             | 
             | I think the comparison is with an equivalent gpu cluster
             | like the nvidia DGX systems or HPC CPU nodes. The DGX A100
             | is 6.5kW for example,
             | 
             | https://images.nvidia.com/aem-dam/Solutions/Data-
             | Center/nvid...
             | 
             | The Cerebras system fits 15 rack units which is more than
             | 2x larger than the DGX (6.5U). A similar 15 node HPC server
             | with CPU is probably not that far from 15kW either (2
             | socket per node, 250W per CPU is already 7.5kW, then add
             | RAM etc.) so by HPC standards it's less "full room of
             | servers" than "single cabinet".
        
             | willis936 wrote:
             | If the thermal interfaces are done correctly then you can
             | put an absurd flow rate of coolant across the fins and
             | bring it out to a huge heat exchange system.
             | 
             | The trick isn't total power, it's power density on the die.
             | In that regard, I don't think this is pushing the
             | boundaries. It just needs custom built interfaces.
        
           | fortran77 wrote:
           | The article talks about the need for special allows that have
           | minimal expansion. There is 15 kilowatts of energy to
           | dissipate in a very small space and the chip really cant
           | expand or contract differently from the cooling block This
           | seems like a hard problem to solve.
        
         | boshomi wrote:
         | This chip need about 21 KW energy, enough to heat a house in
         | Middle Europe.
        
           | lisper wrote:
           | Or enough to cool one in the U.S.
        
             | rootusrootus wrote:
             | That should cool a half dozen decently big homes.
        
         | namibj wrote:
         | GaN-on-SiC and native SiC both support far higher temperatures,
         | with native SiC lasting thousands of hours even at 500C Tj [0]
         | and commercial GaN-on-SiC being rated for e.g. 225C Tjmax [1].
         | 
         | [0]: https://de.wikipedia.org/wiki/Siliciumcarbid#cite_ref-22
         | [1]: CREE Wolfspeed's CGHV1J070D ; datasheet:
         | https://cms.wolfspeed.com/app/uploads/2020/12/CGHV1J070D.pdf
        
           | petermcneeley wrote:
           | At such large die scales and high temperatures heat engines
           | become practical as a means of both cooling and also
           | extracting work. I wonder if there is serious research in
           | this area?
        
         | prvc wrote:
         | Based on a high-school level understanding, the cooling
         | requirements would be just be proportional to the surface area,
         | nothing special. Maybe there's an added risk of physical
         | fissures developing, but that's hard to know a priori.
        
           | Robotbeat wrote:
           | Surface area and also heat rejection temperature.
        
       ___________________________________________________________________
       (page generated 2021-08-21 23:01 UTC)