[HN Gopher] How thermal management is changing in the age of the...
       ___________________________________________________________________
        
       How thermal management is changing in the age of the kilowatt chip
        
       Author : rntn
       Score  : 119 points
       Date   : 2023-12-26 16:10 UTC (6 hours ago)
        
 (HTM) web link (www.theregister.com)
 (TXT) w3m dump (www.theregister.com)
        
       | ksec wrote:
       | Assuming the upcoming Zen 5c was capped at 192 Core because of
       | Bandwidth and not Thermal. We could have had 256 Core + IOD ( 70W
       | ), if every core were to use 3.6W that is nearly 1000W for the
       | CPU Socket.
       | 
       | In a 2U 2Node system, this is a potential of 1024 vCPU in a
       | single server.
        
         | gfv wrote:
         | In datacenters, you're mostly limited by the power (and thus
         | cooling). Most commercial DCs only let you use up to about 10kW
         | per rack. For standard 40U racks it's just 250W/RU, give or
         | take.
         | 
         | There are niche expensive datacenters with higher power
         | density, but as it stands, exotic multi-kW hardware at scale
         | makes sense if you either save a ton on per-node licensing, or
         | you need extreme bandwidth and/or low latency.
        
           | hinkley wrote:
           | I have been told by tech writers that Google discovered that
           | at some point electricians will refuse to route more power
           | into a building. So even if you created a separate thermal
           | plant, you still have issues.
        
             | repiret wrote:
             | Some electricians just assume you're on a budget.
             | 
             | There's always some limiting factor, and there's always
             | some (possibly crazy expensive) way to resolve it and get a
             | bit more power until you run into the next limiting factor.
        
               | kuchenbecker wrote:
               | You build another datacenter rather than make the
               | existing bigger.
        
             | eternauta3k wrote:
             | Really? I imagine datacenters are not reaching powers
             | anywhere near those of e.g. arc furnaces.
        
               | zamfi wrote:
               | Arc furnace: peak of about 250 MW to melt the steel [1]
               | 
               | Datacenter: seems to cap out at around 850 MW [2]
               | 
               | Same ballpark I guess? Probably both are limited by
               | inexpensive power availability + other connectivity
               | factors (road/rail, fiber).
               | 
               | [1]: "Therefore, a 300-tonne, 300 MVA EAF will require
               | approximately 132 MWh of energy to melt the steel, and a
               | "power-on time" (the time that steel is being melted with
               | an arc) of approximately 37 minutes." via
               | https://en.m.wikipedia.org/wiki/Electric_arc_furnace
               | 
               | [2]: https://www.racksolutions.com/news/blog/how-many-
               | servers-doe...
        
           | ksec wrote:
           | >Most commercial DCs only let you use up to about 10kW per
           | rack.
           | 
           | I think that was the case in 2020;
           | 
           | >By 2020, that was up to 8-10 kW per rack. Note, though, that
           | two-thirds of U.S. data centers surveyed said that they were
           | already experiencing peak demands in the 16-20 kW per rack
           | range. The latest numbers from 2022 show 10% of data centers
           | reporting rack densities of 20-29 kW per rack, 7% at 30-39 kW
           | per rack, 3% at 40-49 kW per rack and 5% at 50 kW or greater.
           | 
           | We dont have 2023 numbers and we are coming to 2024. But it
           | is clear that demands for high power density is growing. (
           | And hopefully at a much faster pace )
        
         | wmf wrote:
         | You can already overclock a 96C Threadripper 7000 to 1000W.
        
       | layer8 wrote:
       | Maybe we'll be able to heat our homes with self-hosted AI in a
       | few years.
        
         | lisper wrote:
         | You can already heat your house with bitcoin mining rigs. (Hm,
         | that's actually not such a bad idea!)
        
           | aftbit wrote:
           | If you're going to run an electric space heater anyway, might
           | as well do some mining instead. :P
        
             | lisper wrote:
             | The more I think about this the more I think there could be
             | a real business model here. Make a device that looks and
             | acts like a space heater but is in fact a miner loaded with
             | your private keys. Market it as a "smart heater" that
             | requires a wifi connection to operate. Sell it as a loss
             | leader, or maybe even give it away for free. It doesn't
             | have to be state-of-the-art hardware, so it can be cheap to
             | make. Hmm...
        
               | idiotsecant wrote:
               | It's been done. Space heaters are like 200 bucks. Hard to
               | compete with that.
        
               | talldatethrow wrote:
               | Space heaters are $20* at Walmart for basically as much
               | energy draw as a typical household socket can handle.
        
           | newaccount74 wrote:
           | Your bitcoin mining rig will need three times as much power
           | as a heat pump to produce the same amount of heat. I doubt it
           | would be economical for most people.
        
             | winwang wrote:
             | That means the electric cost of the mining rig is
             | essentially 1/3 "subsidized".
        
           | mminer237 wrote:
           | Heating with electricity is a profound waste though when
           | natural gas is a fraction of the price. And if you do have to
           | heat with electricity, a heat pump is going to be far better
           | than mere resistance.
        
             | dist-epoch wrote:
             | Unless you were going to spend that electricity on compute
             | anyway. In that case might as well reuse the waste heat.
        
             | seper8 wrote:
             | However a heat pump doesnt yield you any money.
        
         | colechristensen wrote:
         | A spa in Brooklyn is opening which is going to heat its pools
         | with bitcoin miners.
         | 
         | https://fortune.com/crypto/2023/12/21/bathhouse-nyc-bitcoin-...
        
         | Snorap101 wrote:
         | This idea has been around - here is a Microsoft paper from
         | 2011, they dub it the "data furnace"
         | 
         | https://www.microsoft.com/en-us/research/publication/the-dat...
        
         | miksumiksu wrote:
         | With district heating technically you can heat your home with
         | AI quite easily.
         | 
         | Here is an example of large scale project from Finland.
         | https://www.fortum.com/media/2022/03/fortum-and-microsoft-an...
        
       | joe_the_user wrote:
       | OK,
       | 
       | So what's the limit of a system you could run from your own house
       | (or apartment)?
       | 
       | A single standard outlet yields 1500-1800 watts but there are
       | higher voltage/amperage outlets in many houses.
        
         | ygra wrote:
         | 3.5 kW here at least (16 A x 230 V). If you replace the stove
         | with a computer, you get three phases, which supplies a bit
         | more.
        
           | LeonM wrote:
           | I think you are confusing a breaker group with your total
           | residential connection.
           | 
           | Assuming you are in Europe, and you have 2.5mm2 cabling
           | (which is the standard for residential applications) then you
           | are indeed limited to 16A per group.
           | 
           | However, there is nothing preventing you from using multiple
           | groups for one appliance. This is actually typical for high-
           | power appliances, such as induction cooking.
           | 
           | Ultimately it is your main fuse that limits your total power
           | consumption, which for most European countries this is
           | typically rated at 25A (5750W), but on request you can
           | usually have this raised to 35A, 50A or even 80A, if supply
           | is sufficient.
        
         | CapitalistCartr wrote:
         | If you're in USA or Canada, your house is probably over-wired
         | and can handle some extra load. Plugging into the electric
         | dryer outlet is good for 5760 continuous watts. The stove
         | outlet is 9600 continuous watts. An electrician could easily
         | add an additional one.
        
           | xattt wrote:
           | Tankless electric hot water heaters can be rated for 24-36 kW
           | (max current draw 150 A)! These are wired with 3-4 8-gauge
           | wires in parallel and connected to 40A breakers.
           | 
           | The only limit to household wiring would be the capacity of
           | the distribution coming into your home.
        
             | repiret wrote:
             | I plumber I used to have do work for me told me a story
             | about the first time he installed one: when he first turned
             | it on, it blew a fuse at the substation a couple miles
             | away.
             | 
             | A few years later I got to know the person whose house it
             | was installed in. And when the homeowner was talking about
             | it he complained that the plumber didn't install a big
             | enough one and he had to have it redone.
        
               | acquacow wrote:
               | That certainly shouldn't be able to happen. The top of
               | panel breaker should pop with a large enough load, and
               | the mains on the street can handle quite a number of
               | homes with supply. There's no way a single in-home unit
               | should/could pop anything at a substation. The component
               | at the substation likely just failed at that time.
        
               | xattt wrote:
               | I am wondering if there were no household loads on that
               | circuit that had that level of in-rush current, the
               | component was on its way out, and the new load pushed it
               | over the edge.
        
             | CapitalistCartr wrote:
             | Yes, they are, but most single family homes here have
             | either a 150 or 200 amp main. Most families rarely exceed
             | 120 amps, so adding 20-35 amps of intermittent load isn't a
             | problem.
             | 
             | On the other hand, tankless electrics often cause a service
             | upgrade unless planned for new construction. Great for us
             | electricians, less great for the homeowner.
        
         | bragr wrote:
         | A typical US residential breaker panel is rated for between 100
         | and 200 amps so probably around there across all circuits
         | without major electrical work.
        
         | metafunctor wrote:
         | For a typical house in my parts (Finland), it would be 230V and
         | 3 x 25A fuses, so maximum power draw is 17kW.
         | 
         | Most houses would be able to upgrade to 35A without extensive
         | reworking of cables, getting to a 24kW maximum.
         | 
         | Also 50A and 63A are available for consumers in most locations,
         | but would require re-evaluating the cabling coming to the
         | house.
        
         | c2h5oh wrote:
         | The "largest" single outlet I have is 3F32A which caps out at
         | 22kW. Entire house is wired for 3 phase power capped at 125A
         | per phase - just over 86kW total
        
         | mtreis86 wrote:
         | We just upgraded our lines from the street from 100A to 200A
         | (at 240V) so that pair of wires can support around 48KW. I
         | think my stove is 20A at 240V so the fattest wire in the house
         | protected by a breaker can safely handle around 4.8KW. On the
         | 120 side the biggest wire I've got is 20A, so 2.4kw. Many house
         | receptacles are only 15A max, hence your 1.8kw.
        
           | jtriangle wrote:
           | Also understand that you can only use 80% of a circuit's
           | capacity in a continuous fashion, so the usable power for
           | compute is a fair bit lower than it seems.
        
         | quickthrowman wrote:
         | A typical (new) residential service in the US is 200A @ 240v
         | single-phase, or 48kW. Assuming the circuit is protected by a
         | breaker, you can use up to 38.4kW of that 48kW. If you used
         | fuses instead of breakers, you could use the full 48kW.
        
           | Kirby64 wrote:
           | The 80% derating only applies to 'continuous loads' which are
           | defined by NEC to be anything >3 hours continuously at
           | maximum current.
           | 
           | Any circuit breaker will not trip at the rated current,
           | though. They're designed not to. So, you can run all 48kW
           | indefinitely without tripping a circuit breaker, assuming
           | everything else is sized appropriately (i.e., wire,
           | interconnects, etc).
        
         | calamari4065 wrote:
         | The limit is space, really. Well, that and money. If you're
         | willing to pay, you can probably convince your utility to hook
         | you up to three phase power, which is typically reserved for
         | industrial use.
         | 
         | If you want to stay with your normal residential circuit, in
         | the US they're commonly 100 or 150A. 200A isn't uncommon, but
         | you might have to pay for an upgrade.
         | 
         | That leaves you with 200A*240V=480kW minus whatever you need
         | for normal house things.
         | 
         | So probably more compute than you have physical space for.
        
           | adastra22 wrote:
           | Also in the US at least residential power is way more than
           | commercial power, so at some point (much earlier than you'd
           | think) you stop saving money on rack space rent.
        
             | bdavbdav wrote:
             | Yep - when I used to rent rack space, the space itself
             | wasn't the expensive part, it was the transit and power.
        
           | quickthrowman wrote:
           | > That leaves you with 200A*240V=480kW
           | 
           | You're off by an order of magnitude: 200 * 240 = 48000
           | 
           | > If you're willing to pay, you can probably convince your
           | utility to hook you up to three phase power, which is
           | typically reserved for industrial use.
           | 
           | Three-phase power isn't only for industrial use, (in the
           | United States) small commercial buildings will have a 208v
           | three-phase service drop and larger commercial buildings will
           | have a 480v service drop, or 13.8kV medium voltage drop if
           | it's big enough. Large enough industrial customers will have
           | dedicated substations.
        
           | mciancia wrote:
           | 200A*240V is 48kW ;)
           | 
           | Also, in Europe 3 phase power afair is fairly common.
        
             | calamari4065 wrote:
             | Math is hard, that's why I became an engineer!
        
         | slavik81 wrote:
         | I'm in Canada and have a 30A 240V breaker for the Debian ROCm
         | Team's CI [1]. It's only rated for 80% continuously, so that's
         | roughly 5.7 kW. In theory, the four systems currently hooked up
         | to it could draw almost that much, but their workload only uses
         | ~1 kW in practice.
         | 
         | [1]: https://lists.debian.org/debian-ai/2023/12/msg00031.html
        
       | brucethemoose2 wrote:
       | If y'all missed it, the Cerebras CS-2 teardown is amazing:
       | 
       | https://www.youtube.com/watch?v=pzyZpauU3Ig
       | 
       | The engineering to handle that power density is insane.
       | _Technically_ its less power per mm^2, but the chip is the size
       | of a dinner plate.
       | 
       | EDIT: The video was taken down, but looks like the web archive
       | got it:
       | 
       | https://web.archive.org/web/20230812020202/https://www.youtu...
       | 
       | As well as Vimeo (thanks morcheeba): https://vimeo.com/853557623
        
         | webstrand wrote:
         | That video is no longer available?
        
           | brucethemoose2 wrote:
           | Huh. I copied the link from here:
           | https://news.ycombinator.com/item?id=37096214
           | 
           | There are reposts with that same link.
           | 
           | Maybe it revealed _too_ much and was taken down?
        
         | morcheeba wrote:
         | Is it the same as this video? https://vimeo.com/853557623
        
           | brucethemoose2 wrote:
           | Yeah, that's it!
        
         | rwmj wrote:
         | That video and the cooling system is insane. Insanely cool
         | even. Thanks for posting.
         | 
         | I wonder, how do all the contacts in the 20,000A power
         | distribution plate that is bolted on top of the wafer-scale die
         | line up? The engineering involved in just making that part work
         | must be crazy.
        
           | vpribish wrote:
           | 20,000 Amps has to be a mis-statement. or there's some
           | qualifier they are not mentioning. it doesn't look like an
           | industrial arc-furnace
        
             | rwmj wrote:
             | She definitely says "the way we bring 20,000 amps into the
             | front side of the wafer" (around 2'17'' into the video). It
             | does seem an awful lot.
        
               | FastFT wrote:
               | Core voltage is going to be in the ballpark of 1V, and
               | given the stated power consumption is 15kw that means a
               | minimum of 15kA. So, it is indeed a lot but the math
               | checks out.
        
               | vpribish wrote:
               | how much of that power actually reaches the chip though?
               | (it's hilarious that this is one chip)? this thing is
               | mostly a water pump - I just can't - everything about it
               | is just wild
        
               | Kirby64 wrote:
               | Most of it. The water pump and other stuff consumes
               | probably less than 500W total. There's some efficiency
               | loss in the actual power converters, but they're likely
               | designed to be >95% efficient (probably >98%), otherwise
               | cooling them would be a nightmare.
        
               | pwg wrote:
               | This pdf (https://f.hubspotusercontent30.net/hubfs/896853
               | 3/CS-2%20Data...) says the power supplies are 6+6
               | redundant 4kW supplies. Assuming 100% utilization (not
               | realistic, but it makes the "math" easy) that's 24kW of
               | "power supply".
               | 
               | If we presume the wafer consumes a large percentage of
               | that (say 20kW out of the 24kW max) and that they are
               | feeding the "wafer" with DC at 1v, then they /do/ need to
               | feed in 20,000 amps to deliver 20kW of power at 1v.
               | 
               | So yes, 20kamp is a lot of current, but it is within the
               | "power budget" the device seems to express in its
               | marketing material.
        
             | mikeInAlaska wrote:
             | The low voltage power rails on consumer Ryzen and Intel
             | chips exceed 100 amps.
        
             | Kirby64 wrote:
             | It's not. The CS-2 they claim can do 23kW peak. The voltage
             | is very low. If 23kW and 20kA is right, it's a 1.15V core
             | voltage, which is pretty normal these days.
             | 
             | For comparison, one of the workstation AMD EPYC processors
             | uses ~400W under peak load, and would use approximately
             | ~320A peak current. It's only ~30x more current... modern
             | CPUs use an enormous amount of current these days.
        
               | brucethemoose2 wrote:
               | They mention that the CS-2 runs at very low, efficient
               | clocks compared to a CPU or GPU. So yeah, the voltage
               | should be quite low.
        
             | LASR wrote:
             | 850k cores apparently.
             | 
             | We have EPYC chips with ~100 cores at ~200w TDP. So each
             | core is around 2W in the AMD chips. Core voltages are ~1v
             | with modern CPUs. So that's 2A per core.
             | 
             | 850k cores at 20kA is very much lower than the AMD chips.
             | Must be massively parallel, lower-performing cores. But
             | it's quite feasible that it needs 20kA.
             | 
             | The CS-2 system on their website specs out at 23kW peak. So
             | all this lines up with each other.
             | 
             | As far as benchmarks and utility of such systems, I am not
             | sure if they've proven it out.
        
             | dist-epoch wrote:
             | My desktop CPU uses 200 Amps. At 1v.
        
           | CamperBob2 wrote:
           | They don't actually have to line up precisely. It seems to be
           | an elastomeric or "Zebra strip" connector. Basically like
           | those little rubber strips that you see when you take the LCD
           | out of a calculator, used to bring power and data from the
           | PCB to the glass surface:
           | https://en.wikipedia.org/wiki/Elastomeric_connector
           | 
           | Cerebras has evidently scaled them up a bit (and repatented
           | them, according to the video).
           | 
           | Interesting early application (if not the first) from the
           | 1970s: http://www.hp9825.com/html/hybrid_microprocessor.html
        
         | murphyslaw wrote:
         | Christ, 850000 cores!
        
         | throwawaymaths wrote:
         | I hear even A100s have a not very good shelf life and need to
         | be replaced relatively frequently. Wonder how it is for
         | cerebras
        
         | samstave wrote:
         | Jiminy Crickets - that Cerebras is dope - dont miss the only
         | other tech vid from Rebbecca: https://vimeo.com/lewie221 <--
         | About CFD.
         | 
         | Weirdly - CFD has been zeitgeisting me here on HN the last
         | couple of days - I have been talking about FluidX3D - and have
         | been attempting to compile it this AM locally on windows with
         | failures (about to see if I can plop it into a docker)
         | 
         | Never thought youd be doing CFD calc to keep a stable temp flow
         | over 1.6 trillion transistors to keep them evenly cooled - did
         | ya?
         | 
         | --
         | 
         | In watching that above CFD vid - I was led to start thinking if
         | CFD could be applied to the ways AI models/GPTs communicate or
         | calculate.
         | 
         | I wonder if one could use an CFD anaolgies to the data flows
         | through AI models/systems such as OpenAI.
         | 
         | It would be interesting to look at the OpenAI GPT Store's
         | entanglements through the lens of CFD and determine where
         | relations might be made for how stacking GPTs might communicate
         | through their Tapestry.
         | 
         | An CFD-heads care to dive in?
         | 
         | I wonder if one were to treat 'token flow' in a CFD manner in
         | which one can visualize how tokens are assigned attention
         | scores.
         | 
         | GPT claims to not be able to visualize the attention score
         | matrix for the tokens - but assuming it could - it would seem
         | as though it should be easy to visualize attention matrices in
         | a CFD visual.
         | 
         | https://i.imgur.com/jzZ1wsP.png
         | 
         | --
         | 
         | Or I sound like an idiot. Lets ask GPT. haha
         | 
         | ---
         | 
         | (also - in super expensive machines , why aren't sheets of
         | AeroGels used as gaskets if you want thermal separation.
         | 
         | Imagine taking an AeroGel powder and mixing it with Silicone -
         | and having a super thin, flexible material, such as D3 - which
         | has a melting point of 134c/273f....
         | 
         | So - mixing aerogel with D3 as a gasket would be good, and with
         | D3 being a non-newtonian it works well for shocks. A space
         | gasket as it were.
        
         | robomartin wrote:
         | I can appreciate the thermal engineering that has gone into
         | this. I have executed some extremely challenging thermal
         | designs over time. They generally used heat transfer plates of
         | the type shown in this video.
         | 
         | My largest design as approximately 23 x 12 inches. Maintaining
         | thermal efficiency and uniformity across the entire surface is
         | where the real challenge lies. We had an extremely tight
         | uniformity specification (0.1 deg C). This could only be
         | achieved through a complex design process that entailed writing
         | genetic algorithms to evolve and test solutions using FEA. In
         | the end it was a combination of sophisticated impingement
         | cooling and other techniques that did the job. That project was
         | seriously challenging. I like projects that absolutely kick my
         | butt. This one definitely did.
        
           | allenrb wrote:
           | Can you share anything more about the project? This sounds
           | fascinating.
        
             | robomartin wrote:
             | Sadly, I can't share details or the application domain.
             | 
             | I can tell you that we worked very hard to try and see if
             | we could accomplish the objectives using forced air (fans).
             | That effort involved laser-welded fins with sophisticated
             | airflow management, techniques to break-up the boundary
             | layer (which impedes optimal heat transfer) and powerful
             | centrifugal fans. It worked well, yet it was large and
             | sounded like a jet engine.
             | 
             | Ultimately, while still complex, fluid-based thermal
             | management offered a far more compact solution that could
             | exist in a room with people not having to wear hearing
             | protection. In addition to that, with a fluid-based system
             | you can move the hot side to a different room.
        
               | KMag wrote:
               | Very interesting, though a bit odd to contrast air
               | cooling with fluid cooling, given it takes pretty extreme
               | conditions for air to not be a fluid.
        
         | tomcam wrote:
         | That's wild. Especially the rough similarity to an air-cooled 3
         | cylinder engine.
        
       | avereveard wrote:
       | this site has picture for everything but the current article.
        
       | jl6 wrote:
       | One hopes this kind of heat density in a single system is still
       | more efficient than operating multiple less-power-hungry systems.
        
       | bluenose69 wrote:
       | I had to smile at the sentence "It essentially boils down to how
       | much surface area you have to dissipate the heat" in the article
       | :-)
        
       | hinkley wrote:
       | I can't help but think that someday we will see chips designed
       | like Sierpinski gaskets, with "holes" used for thermal transport
       | and the solid parts used for computation. Such chips would behave
       | more or less like a toroidal structure.
       | 
       | Now that chiplets are maturing that is a little less far-fetched.
        
       | dist-epoch wrote:
       | Why don't they build data centers where it's really cold, for
       | example north of Canada, and just pipe outside air in?
       | 
       | If you train an AI, you don't need low latency/high bandwidth
       | Internet access.
        
         | skirmish wrote:
         | Google does in Finland, https://templ.io/blog/hamina-google-
         | data-center/
         | 
         | Sea water is corrosive, hard to use for cooling.
        
         | yetanotherloss wrote:
         | They do, Google had/has one in the Dalles in Oregon next to the
         | dam and there are several in line Canada and Finland.
         | 
         | Part of the problem is consistent humidity management, but the
         | other is that air, even really cold air, isn't dense enough to
         | move specific heat as effectively as the same volume of a
         | denser working fluid.
         | 
         | For a practical example in the other direction, look at
         | combined cycle gas turbines which use heated air for the
         | primary turbine and the recapture as much denser steam for the
         | secondary.
        
       | ChuckMcM wrote:
       | Getting closer to Jim Gray's "smoking hairy golf ball" (Jim also
       | forecasted that chips would eventually become spherical to limit
       | the lengths of the data paths.)
       | 
       | It will be interesting to see if commercial data centers re-fit
       | with cooling water transport under the floor or above the
       | machines (riskier). This will be a challenge for DCs without a
       | lot of space under there, presumably they could boost the floor
       | height after a door transition. Still how many of them have 20 -
       | 40kW of power allocated per rack.
       | 
       | Its one of the few times I miss being at Google because they
       | approached this sort of problem very creatively and with an
       | effectively unlimited budget to try different things. I'm sure
       | their data centers are very much different from my time there!
        
         | icefo wrote:
         | I had the chance to visit the new datacenter of my college and
         | on the server exhaust side there is a radiator as tall as the
         | rack with cold water in it. All the pipes are under the floor.
         | 
         | IIRC they mainly put power hungry compute nodes for the
         | clusters in this new datacenter and I remember that servers
         | full of GPUs had crazy power draw. The water then goes through
         | an heat exchanger to help generate hot water to heat the campus
         | and for the taps.
        
       ___________________________________________________________________
       (page generated 2023-12-26 23:00 UTC)