[HN Gopher] The World's Largest Computer Chip
___________________________________________________________________
The World's Largest Computer Chip
Author : doe88
Score : 97 points
Date : 2021-08-21 09:50 UTC (13 hours ago)
(HTM) web link (www.newyorker.com)
(TXT) w3m dump (www.newyorker.com)
| thethethethe wrote:
| > In a big cluster, as many as forty-eight pizza-box-size servers
| slide into a rack as tall as a person; these racks stand in rows,
| filling buildings the size of warehouses. The neural networks in
| such systems can tackle daunting problems, but they also face
| clear challenges. A network spread across a cluster is like a
| brain that's been scattered around a room and wired together.
| Electrons move fast, but, even so, cross-chip communication is
| slow, and uses extravagant amounts of energy.
|
| Why wouldn't these giant chips be wired together into a cluster
| too?
| Nevermark wrote:
| I am sure it will happen, but the cost effectiveness and
| efficiency will drop so dramatically going from 1 wafer to just
| 2 wafers.
|
| 1 wafer, doing X work, in Y time
|
| = 1 wafer, doing 2X work, in 2Y time
|
| = Two wafers, doing 2X work, in something still close to 2Y
| time
|
| I.e. the slowness of between-wafer communication, vs. in-wafer
| communication, will dwarf the computing time. Obviously there
| is some N, where N wafers would be worth clustering, but it
| might be quite high.
|
| Maybe the company is working on ways to cut down cross-wafer
| communication too. Vertical optical connections for instance
| would be awesome.
| delaaxe wrote:
| Looks a bit like Tesla's new Dojo AI chip
| kken wrote:
| I may be mistaken, but Teslas Dojo chip still seems to be
| relatively small. They can connect many of them into a 2D
| fabric, though.
|
| Cerebras still seems to have an advantage here, because they
| can use on-chip interconnects, which potentially allows higher
| bandwidth between the tiles.
| Robotbeat wrote:
| From what I understand, both Tesla and Cerebras use TSMC's
| on-wafer fan out technology.
| handol wrote:
| I think the Tesla "Tiles" of 25 D1 chips are on a single
| wafer with integrated interconnects. But there is certainly a
| huge difference in the memory bandwidth claims. Cerebras
| claims 20PB/s, and Tesla claims 10Tbps.
| Robotbeat wrote:
| I don't think that's an accident. Tesla undoubtedly took
| inspiration from Cerebras. Also, I think some TSMC processes
| enable this kind of wafer-scale chip, and both companies use
| this.
| [deleted]
| riazrizvi wrote:
| So currently chips are printed into regions that are limited in
| size by physics, to a 3cm square. And processing power is
| traditionally increased by stacking them in upto a dozen layers
| that are interconnected at the edges.
|
| Here, instead of that, the circuits are overlap-printed so that a
| single wafer can support a set of 80 connected circuits, which
| are now physically cooleable because of the flat design? While
| they must be sacrificing some interconnection richness, because
| of geometrical placement, for AI applications this probably
| doesn't matter so much. Very interesting.
| actually_a_dog wrote:
| I know this is way OT, but the title immediately made me think of
| this: http://web.cecs.pdx.edu/~harry/Relay/index.html
| willis936 wrote:
| You can see a photo of one of the chips and get some more
| technical details here:
|
| https://youtu.be/FNd94_XaVlY
| Frost1x wrote:
| I find it odd that an article written about scale, and not just
| scale but the biggest scale, didn't include a photo that
| demonstrates... relative sizes.
|
| The official page demonstrates relative size at quick glance (I
| guess they do use fingernails and dinnerplates but eh):
| https://cerebras.net/chip/
| IAmEveryone wrote:
| But their ,,product" link in the footer goes to
| https://cerebras.net/?page_id=632, which is 404. So I'll wait
| before judging the relative skills of each site's technical
| abilities.
| baybal2 wrote:
| Inaccuracy there. Cerebas is by far not the first trillion
| transistor chip.
|
| The first 1 trillion transistor chip was Samsung's 3D NAND chip,
| and it went with rather little fanfare.
|
| P.S. 2 -- Google is by far not the first company to do "automatic
| floorplanning." This is what literally every EDA does.
| fortran77 wrote:
| I didn't know Gene Amdahl killed someone. According to the NY
| Times he actually was convicted of manslaughter.
|
| Considering that he ruined someone's life, why is he revered in
| computing circles?
| high_derivative wrote:
| I am not really buying the lack of participation in MLPerf. Just
| give us the numbers, don't skirt around about 'not being made for
| these benchmarks'.
| unnouinceput wrote:
| You missed the point. They won't do it because the second
| nVidia looks how much better Cerebrus are, they are going to
| enter the market and sweep the rug under them, just like 40
| years ago IBM decided to enter PC market and a lot of PC makers
| went belly up by 1983, unless they made IBM PC compatible
| clones (Apple was the only one without a IBM clone and barely
| survived).
| high_derivative wrote:
| NVIDIA is going to do that either way. Huang is not asleep at
| the wheel.
| bloopernova wrote:
| When I saw how large that chip was, I immediately thought of
| cooling such a beast.
|
| Can any materials scientists or engineers comment on if other
| elements will withstand higher heat better than silicon? Seems
| like such a large chip would be somewhat better to run at higher
| temperature rather than budget for huge and elaborate cooling.
| (This is very much a layman's question. The people who designed
| the chip and its cooling are far, far smarter than me!)
| Robotbeat wrote:
| Silicon Carbide wafers can withstand higher temperatures, which
| makes dumping heat easier.
| sandworm101 wrote:
| Water cooling would handle this without issue. No need for
| fancy tricks. A big heat spreader and some 2" piping would be
| more than enough.
| codeflo wrote:
| The article gives a figure of 15 kW for the chip. That's the
| kind of heat usually generated by a small room full of
| servers. Radiating that away on the outside is not the main
| issue, solutions exist for that. But getting that kind of
| heat away from the chip and into the water in the first place
| has to be a nontrivial challenge.
| MauranKilom wrote:
| The chip is approximately 21 cm x 21 cm (that's 8.5
| inches).
|
| My kettle has 2 kW and doesn't take long to boil water from
| room temperature. I reckon you could fit four such kettles
| on the chip area (roughly). That means the chip would
| roughly boil water two times as fast as my kettle, were it
| used for that purpose.
|
| While that does pose reasonably interesting engineering
| challenges regarding coolant throughput etc., I don't think
| there's anything particularly difficult there. You probably
| would want a better heat transfer medium between chip and
| water than my kettle has (well, I have not disassembled
| it), but I agree with GP that a bunch of water pipes will
| work well enough as a cooling solution.
|
| Edit: Actually, screw it, we can calculate how much water
| we need to put through there. Warming water from 20 to 100
| degC takes 334 kJ/kg. (That comes out to 167 seconds to
| heat 1l of water in a 2 kW kettle, for reference.) To
| remove 15 kW of heat with water cooling, assuming the water
| goes in at 20 and comes out at 100 degC, we need a
| throughput of 0.045 kg/s = 45 g/s = 45 ml/s.
|
| Sure, the temperature range may be a bit optimistic, but 45
| ml/s (one liter every 22 seconds) is literally "just hold
| it under a running tap". The main engineering challenge
| would be making sure that heat is removed evenly enough, I
| guess.
| Nevermark wrote:
| As the previous comment pointed out, its not the net
| energy that needs to be removed that is the problem.
|
| I.e. its like a kettle that isn't just radiating the
| stove coil's energy away, it is actually trying to keep
| your stove coil cool while its turned up to 10!
|
| The same amount of energy movement, but not the same
| problem at all.
| xmcqdpt2 wrote:
| Yes the whitepaper talks a lot about this cooling.
|
| https://f.hubspotusercontent30.net/hubfs/8968533/Cerebras-
| CS...
|
| The traditional computer included in the box is probably
| quite high end and power hungry too so that it can provide
| enough data to maintain those bandwidths. They don't appear
| to sell the chips by themselves.
|
| I think the comparison is with an equivalent gpu cluster
| like the nvidia DGX systems or HPC CPU nodes. The DGX A100
| is 6.5kW for example,
|
| https://images.nvidia.com/aem-dam/Solutions/Data-
| Center/nvid...
|
| The Cerebras system fits 15 rack units which is more than
| 2x larger than the DGX (6.5U). A similar 15 node HPC server
| with CPU is probably not that far from 15kW either (2
| socket per node, 250W per CPU is already 7.5kW, then add
| RAM etc.) so by HPC standards it's less "full room of
| servers" than "single cabinet".
| willis936 wrote:
| If the thermal interfaces are done correctly then you can
| put an absurd flow rate of coolant across the fins and
| bring it out to a huge heat exchange system.
|
| The trick isn't total power, it's power density on the die.
| In that regard, I don't think this is pushing the
| boundaries. It just needs custom built interfaces.
| fortran77 wrote:
| The article talks about the need for special allows that have
| minimal expansion. There is 15 kilowatts of energy to
| dissipate in a very small space and the chip really cant
| expand or contract differently from the cooling block This
| seems like a hard problem to solve.
| boshomi wrote:
| This chip need about 21 KW energy, enough to heat a house in
| Middle Europe.
| lisper wrote:
| Or enough to cool one in the U.S.
| rootusrootus wrote:
| That should cool a half dozen decently big homes.
| namibj wrote:
| GaN-on-SiC and native SiC both support far higher temperatures,
| with native SiC lasting thousands of hours even at 500C Tj [0]
| and commercial GaN-on-SiC being rated for e.g. 225C Tjmax [1].
|
| [0]: https://de.wikipedia.org/wiki/Siliciumcarbid#cite_ref-22
| [1]: CREE Wolfspeed's CGHV1J070D ; datasheet:
| https://cms.wolfspeed.com/app/uploads/2020/12/CGHV1J070D.pdf
| petermcneeley wrote:
| At such large die scales and high temperatures heat engines
| become practical as a means of both cooling and also
| extracting work. I wonder if there is serious research in
| this area?
| prvc wrote:
| Based on a high-school level understanding, the cooling
| requirements would be just be proportional to the surface area,
| nothing special. Maybe there's an added risk of physical
| fissures developing, but that's hard to know a priori.
| Robotbeat wrote:
| Surface area and also heat rejection temperature.
___________________________________________________________________
(page generated 2021-08-21 23:01 UTC)