[HN Gopher] How thermal management is changing in the age of the...
___________________________________________________________________
How thermal management is changing in the age of the kilowatt chip
Author : rntn
Score : 119 points
Date : 2023-12-26 16:10 UTC (6 hours ago)
(HTM) web link (www.theregister.com)
(TXT) w3m dump (www.theregister.com)
| ksec wrote:
| Assuming the upcoming Zen 5c was capped at 192 Core because of
| Bandwidth and not Thermal. We could have had 256 Core + IOD ( 70W
| ), if every core were to use 3.6W that is nearly 1000W for the
| CPU Socket.
|
| In a 2U 2Node system, this is a potential of 1024 vCPU in a
| single server.
| gfv wrote:
| In datacenters, you're mostly limited by the power (and thus
| cooling). Most commercial DCs only let you use up to about 10kW
| per rack. For standard 40U racks it's just 250W/RU, give or
| take.
|
| There are niche expensive datacenters with higher power
| density, but as it stands, exotic multi-kW hardware at scale
| makes sense if you either save a ton on per-node licensing, or
| you need extreme bandwidth and/or low latency.
| hinkley wrote:
| I have been told by tech writers that Google discovered that
| at some point electricians will refuse to route more power
| into a building. So even if you created a separate thermal
| plant, you still have issues.
| repiret wrote:
| Some electricians just assume you're on a budget.
|
| There's always some limiting factor, and there's always
| some (possibly crazy expensive) way to resolve it and get a
| bit more power until you run into the next limiting factor.
| kuchenbecker wrote:
| You build another datacenter rather than make the
| existing bigger.
| eternauta3k wrote:
| Really? I imagine datacenters are not reaching powers
| anywhere near those of e.g. arc furnaces.
| zamfi wrote:
| Arc furnace: peak of about 250 MW to melt the steel [1]
|
| Datacenter: seems to cap out at around 850 MW [2]
|
| Same ballpark I guess? Probably both are limited by
| inexpensive power availability + other connectivity
| factors (road/rail, fiber).
|
| [1]: "Therefore, a 300-tonne, 300 MVA EAF will require
| approximately 132 MWh of energy to melt the steel, and a
| "power-on time" (the time that steel is being melted with
| an arc) of approximately 37 minutes." via
| https://en.m.wikipedia.org/wiki/Electric_arc_furnace
|
| [2]: https://www.racksolutions.com/news/blog/how-many-
| servers-doe...
| ksec wrote:
| >Most commercial DCs only let you use up to about 10kW per
| rack.
|
| I think that was the case in 2020;
|
| >By 2020, that was up to 8-10 kW per rack. Note, though, that
| two-thirds of U.S. data centers surveyed said that they were
| already experiencing peak demands in the 16-20 kW per rack
| range. The latest numbers from 2022 show 10% of data centers
| reporting rack densities of 20-29 kW per rack, 7% at 30-39 kW
| per rack, 3% at 40-49 kW per rack and 5% at 50 kW or greater.
|
| We dont have 2023 numbers and we are coming to 2024. But it
| is clear that demands for high power density is growing. (
| And hopefully at a much faster pace )
| wmf wrote:
| You can already overclock a 96C Threadripper 7000 to 1000W.
| layer8 wrote:
| Maybe we'll be able to heat our homes with self-hosted AI in a
| few years.
| lisper wrote:
| You can already heat your house with bitcoin mining rigs. (Hm,
| that's actually not such a bad idea!)
| aftbit wrote:
| If you're going to run an electric space heater anyway, might
| as well do some mining instead. :P
| lisper wrote:
| The more I think about this the more I think there could be
| a real business model here. Make a device that looks and
| acts like a space heater but is in fact a miner loaded with
| your private keys. Market it as a "smart heater" that
| requires a wifi connection to operate. Sell it as a loss
| leader, or maybe even give it away for free. It doesn't
| have to be state-of-the-art hardware, so it can be cheap to
| make. Hmm...
| idiotsecant wrote:
| It's been done. Space heaters are like 200 bucks. Hard to
| compete with that.
| talldatethrow wrote:
| Space heaters are $20* at Walmart for basically as much
| energy draw as a typical household socket can handle.
| newaccount74 wrote:
| Your bitcoin mining rig will need three times as much power
| as a heat pump to produce the same amount of heat. I doubt it
| would be economical for most people.
| winwang wrote:
| That means the electric cost of the mining rig is
| essentially 1/3 "subsidized".
| mminer237 wrote:
| Heating with electricity is a profound waste though when
| natural gas is a fraction of the price. And if you do have to
| heat with electricity, a heat pump is going to be far better
| than mere resistance.
| dist-epoch wrote:
| Unless you were going to spend that electricity on compute
| anyway. In that case might as well reuse the waste heat.
| seper8 wrote:
| However a heat pump doesnt yield you any money.
| colechristensen wrote:
| A spa in Brooklyn is opening which is going to heat its pools
| with bitcoin miners.
|
| https://fortune.com/crypto/2023/12/21/bathhouse-nyc-bitcoin-...
| Snorap101 wrote:
| This idea has been around - here is a Microsoft paper from
| 2011, they dub it the "data furnace"
|
| https://www.microsoft.com/en-us/research/publication/the-dat...
| miksumiksu wrote:
| With district heating technically you can heat your home with
| AI quite easily.
|
| Here is an example of large scale project from Finland.
| https://www.fortum.com/media/2022/03/fortum-and-microsoft-an...
| joe_the_user wrote:
| OK,
|
| So what's the limit of a system you could run from your own house
| (or apartment)?
|
| A single standard outlet yields 1500-1800 watts but there are
| higher voltage/amperage outlets in many houses.
| ygra wrote:
| 3.5 kW here at least (16 A x 230 V). If you replace the stove
| with a computer, you get three phases, which supplies a bit
| more.
| LeonM wrote:
| I think you are confusing a breaker group with your total
| residential connection.
|
| Assuming you are in Europe, and you have 2.5mm2 cabling
| (which is the standard for residential applications) then you
| are indeed limited to 16A per group.
|
| However, there is nothing preventing you from using multiple
| groups for one appliance. This is actually typical for high-
| power appliances, such as induction cooking.
|
| Ultimately it is your main fuse that limits your total power
| consumption, which for most European countries this is
| typically rated at 25A (5750W), but on request you can
| usually have this raised to 35A, 50A or even 80A, if supply
| is sufficient.
| CapitalistCartr wrote:
| If you're in USA or Canada, your house is probably over-wired
| and can handle some extra load. Plugging into the electric
| dryer outlet is good for 5760 continuous watts. The stove
| outlet is 9600 continuous watts. An electrician could easily
| add an additional one.
| xattt wrote:
| Tankless electric hot water heaters can be rated for 24-36 kW
| (max current draw 150 A)! These are wired with 3-4 8-gauge
| wires in parallel and connected to 40A breakers.
|
| The only limit to household wiring would be the capacity of
| the distribution coming into your home.
| repiret wrote:
| I plumber I used to have do work for me told me a story
| about the first time he installed one: when he first turned
| it on, it blew a fuse at the substation a couple miles
| away.
|
| A few years later I got to know the person whose house it
| was installed in. And when the homeowner was talking about
| it he complained that the plumber didn't install a big
| enough one and he had to have it redone.
| acquacow wrote:
| That certainly shouldn't be able to happen. The top of
| panel breaker should pop with a large enough load, and
| the mains on the street can handle quite a number of
| homes with supply. There's no way a single in-home unit
| should/could pop anything at a substation. The component
| at the substation likely just failed at that time.
| xattt wrote:
| I am wondering if there were no household loads on that
| circuit that had that level of in-rush current, the
| component was on its way out, and the new load pushed it
| over the edge.
| CapitalistCartr wrote:
| Yes, they are, but most single family homes here have
| either a 150 or 200 amp main. Most families rarely exceed
| 120 amps, so adding 20-35 amps of intermittent load isn't a
| problem.
|
| On the other hand, tankless electrics often cause a service
| upgrade unless planned for new construction. Great for us
| electricians, less great for the homeowner.
| bragr wrote:
| A typical US residential breaker panel is rated for between 100
| and 200 amps so probably around there across all circuits
| without major electrical work.
| metafunctor wrote:
| For a typical house in my parts (Finland), it would be 230V and
| 3 x 25A fuses, so maximum power draw is 17kW.
|
| Most houses would be able to upgrade to 35A without extensive
| reworking of cables, getting to a 24kW maximum.
|
| Also 50A and 63A are available for consumers in most locations,
| but would require re-evaluating the cabling coming to the
| house.
| c2h5oh wrote:
| The "largest" single outlet I have is 3F32A which caps out at
| 22kW. Entire house is wired for 3 phase power capped at 125A
| per phase - just over 86kW total
| mtreis86 wrote:
| We just upgraded our lines from the street from 100A to 200A
| (at 240V) so that pair of wires can support around 48KW. I
| think my stove is 20A at 240V so the fattest wire in the house
| protected by a breaker can safely handle around 4.8KW. On the
| 120 side the biggest wire I've got is 20A, so 2.4kw. Many house
| receptacles are only 15A max, hence your 1.8kw.
| jtriangle wrote:
| Also understand that you can only use 80% of a circuit's
| capacity in a continuous fashion, so the usable power for
| compute is a fair bit lower than it seems.
| quickthrowman wrote:
| A typical (new) residential service in the US is 200A @ 240v
| single-phase, or 48kW. Assuming the circuit is protected by a
| breaker, you can use up to 38.4kW of that 48kW. If you used
| fuses instead of breakers, you could use the full 48kW.
| Kirby64 wrote:
| The 80% derating only applies to 'continuous loads' which are
| defined by NEC to be anything >3 hours continuously at
| maximum current.
|
| Any circuit breaker will not trip at the rated current,
| though. They're designed not to. So, you can run all 48kW
| indefinitely without tripping a circuit breaker, assuming
| everything else is sized appropriately (i.e., wire,
| interconnects, etc).
| calamari4065 wrote:
| The limit is space, really. Well, that and money. If you're
| willing to pay, you can probably convince your utility to hook
| you up to three phase power, which is typically reserved for
| industrial use.
|
| If you want to stay with your normal residential circuit, in
| the US they're commonly 100 or 150A. 200A isn't uncommon, but
| you might have to pay for an upgrade.
|
| That leaves you with 200A*240V=480kW minus whatever you need
| for normal house things.
|
| So probably more compute than you have physical space for.
| adastra22 wrote:
| Also in the US at least residential power is way more than
| commercial power, so at some point (much earlier than you'd
| think) you stop saving money on rack space rent.
| bdavbdav wrote:
| Yep - when I used to rent rack space, the space itself
| wasn't the expensive part, it was the transit and power.
| quickthrowman wrote:
| > That leaves you with 200A*240V=480kW
|
| You're off by an order of magnitude: 200 * 240 = 48000
|
| > If you're willing to pay, you can probably convince your
| utility to hook you up to three phase power, which is
| typically reserved for industrial use.
|
| Three-phase power isn't only for industrial use, (in the
| United States) small commercial buildings will have a 208v
| three-phase service drop and larger commercial buildings will
| have a 480v service drop, or 13.8kV medium voltage drop if
| it's big enough. Large enough industrial customers will have
| dedicated substations.
| mciancia wrote:
| 200A*240V is 48kW ;)
|
| Also, in Europe 3 phase power afair is fairly common.
| calamari4065 wrote:
| Math is hard, that's why I became an engineer!
| slavik81 wrote:
| I'm in Canada and have a 30A 240V breaker for the Debian ROCm
| Team's CI [1]. It's only rated for 80% continuously, so that's
| roughly 5.7 kW. In theory, the four systems currently hooked up
| to it could draw almost that much, but their workload only uses
| ~1 kW in practice.
|
| [1]: https://lists.debian.org/debian-ai/2023/12/msg00031.html
| brucethemoose2 wrote:
| If y'all missed it, the Cerebras CS-2 teardown is amazing:
|
| https://www.youtube.com/watch?v=pzyZpauU3Ig
|
| The engineering to handle that power density is insane.
| _Technically_ its less power per mm^2, but the chip is the size
| of a dinner plate.
|
| EDIT: The video was taken down, but looks like the web archive
| got it:
|
| https://web.archive.org/web/20230812020202/https://www.youtu...
|
| As well as Vimeo (thanks morcheeba): https://vimeo.com/853557623
| webstrand wrote:
| That video is no longer available?
| brucethemoose2 wrote:
| Huh. I copied the link from here:
| https://news.ycombinator.com/item?id=37096214
|
| There are reposts with that same link.
|
| Maybe it revealed _too_ much and was taken down?
| morcheeba wrote:
| Is it the same as this video? https://vimeo.com/853557623
| brucethemoose2 wrote:
| Yeah, that's it!
| rwmj wrote:
| That video and the cooling system is insane. Insanely cool
| even. Thanks for posting.
|
| I wonder, how do all the contacts in the 20,000A power
| distribution plate that is bolted on top of the wafer-scale die
| line up? The engineering involved in just making that part work
| must be crazy.
| vpribish wrote:
| 20,000 Amps has to be a mis-statement. or there's some
| qualifier they are not mentioning. it doesn't look like an
| industrial arc-furnace
| rwmj wrote:
| She definitely says "the way we bring 20,000 amps into the
| front side of the wafer" (around 2'17'' into the video). It
| does seem an awful lot.
| FastFT wrote:
| Core voltage is going to be in the ballpark of 1V, and
| given the stated power consumption is 15kw that means a
| minimum of 15kA. So, it is indeed a lot but the math
| checks out.
| vpribish wrote:
| how much of that power actually reaches the chip though?
| (it's hilarious that this is one chip)? this thing is
| mostly a water pump - I just can't - everything about it
| is just wild
| Kirby64 wrote:
| Most of it. The water pump and other stuff consumes
| probably less than 500W total. There's some efficiency
| loss in the actual power converters, but they're likely
| designed to be >95% efficient (probably >98%), otherwise
| cooling them would be a nightmare.
| pwg wrote:
| This pdf (https://f.hubspotusercontent30.net/hubfs/896853
| 3/CS-2%20Data...) says the power supplies are 6+6
| redundant 4kW supplies. Assuming 100% utilization (not
| realistic, but it makes the "math" easy) that's 24kW of
| "power supply".
|
| If we presume the wafer consumes a large percentage of
| that (say 20kW out of the 24kW max) and that they are
| feeding the "wafer" with DC at 1v, then they /do/ need to
| feed in 20,000 amps to deliver 20kW of power at 1v.
|
| So yes, 20kamp is a lot of current, but it is within the
| "power budget" the device seems to express in its
| marketing material.
| mikeInAlaska wrote:
| The low voltage power rails on consumer Ryzen and Intel
| chips exceed 100 amps.
| Kirby64 wrote:
| It's not. The CS-2 they claim can do 23kW peak. The voltage
| is very low. If 23kW and 20kA is right, it's a 1.15V core
| voltage, which is pretty normal these days.
|
| For comparison, one of the workstation AMD EPYC processors
| uses ~400W under peak load, and would use approximately
| ~320A peak current. It's only ~30x more current... modern
| CPUs use an enormous amount of current these days.
| brucethemoose2 wrote:
| They mention that the CS-2 runs at very low, efficient
| clocks compared to a CPU or GPU. So yeah, the voltage
| should be quite low.
| LASR wrote:
| 850k cores apparently.
|
| We have EPYC chips with ~100 cores at ~200w TDP. So each
| core is around 2W in the AMD chips. Core voltages are ~1v
| with modern CPUs. So that's 2A per core.
|
| 850k cores at 20kA is very much lower than the AMD chips.
| Must be massively parallel, lower-performing cores. But
| it's quite feasible that it needs 20kA.
|
| The CS-2 system on their website specs out at 23kW peak. So
| all this lines up with each other.
|
| As far as benchmarks and utility of such systems, I am not
| sure if they've proven it out.
| dist-epoch wrote:
| My desktop CPU uses 200 Amps. At 1v.
| CamperBob2 wrote:
| They don't actually have to line up precisely. It seems to be
| an elastomeric or "Zebra strip" connector. Basically like
| those little rubber strips that you see when you take the LCD
| out of a calculator, used to bring power and data from the
| PCB to the glass surface:
| https://en.wikipedia.org/wiki/Elastomeric_connector
|
| Cerebras has evidently scaled them up a bit (and repatented
| them, according to the video).
|
| Interesting early application (if not the first) from the
| 1970s: http://www.hp9825.com/html/hybrid_microprocessor.html
| murphyslaw wrote:
| Christ, 850000 cores!
| throwawaymaths wrote:
| I hear even A100s have a not very good shelf life and need to
| be replaced relatively frequently. Wonder how it is for
| cerebras
| samstave wrote:
| Jiminy Crickets - that Cerebras is dope - dont miss the only
| other tech vid from Rebbecca: https://vimeo.com/lewie221 <--
| About CFD.
|
| Weirdly - CFD has been zeitgeisting me here on HN the last
| couple of days - I have been talking about FluidX3D - and have
| been attempting to compile it this AM locally on windows with
| failures (about to see if I can plop it into a docker)
|
| Never thought youd be doing CFD calc to keep a stable temp flow
| over 1.6 trillion transistors to keep them evenly cooled - did
| ya?
|
| --
|
| In watching that above CFD vid - I was led to start thinking if
| CFD could be applied to the ways AI models/GPTs communicate or
| calculate.
|
| I wonder if one could use an CFD anaolgies to the data flows
| through AI models/systems such as OpenAI.
|
| It would be interesting to look at the OpenAI GPT Store's
| entanglements through the lens of CFD and determine where
| relations might be made for how stacking GPTs might communicate
| through their Tapestry.
|
| An CFD-heads care to dive in?
|
| I wonder if one were to treat 'token flow' in a CFD manner in
| which one can visualize how tokens are assigned attention
| scores.
|
| GPT claims to not be able to visualize the attention score
| matrix for the tokens - but assuming it could - it would seem
| as though it should be easy to visualize attention matrices in
| a CFD visual.
|
| https://i.imgur.com/jzZ1wsP.png
|
| --
|
| Or I sound like an idiot. Lets ask GPT. haha
|
| ---
|
| (also - in super expensive machines , why aren't sheets of
| AeroGels used as gaskets if you want thermal separation.
|
| Imagine taking an AeroGel powder and mixing it with Silicone -
| and having a super thin, flexible material, such as D3 - which
| has a melting point of 134c/273f....
|
| So - mixing aerogel with D3 as a gasket would be good, and with
| D3 being a non-newtonian it works well for shocks. A space
| gasket as it were.
| robomartin wrote:
| I can appreciate the thermal engineering that has gone into
| this. I have executed some extremely challenging thermal
| designs over time. They generally used heat transfer plates of
| the type shown in this video.
|
| My largest design as approximately 23 x 12 inches. Maintaining
| thermal efficiency and uniformity across the entire surface is
| where the real challenge lies. We had an extremely tight
| uniformity specification (0.1 deg C). This could only be
| achieved through a complex design process that entailed writing
| genetic algorithms to evolve and test solutions using FEA. In
| the end it was a combination of sophisticated impingement
| cooling and other techniques that did the job. That project was
| seriously challenging. I like projects that absolutely kick my
| butt. This one definitely did.
| allenrb wrote:
| Can you share anything more about the project? This sounds
| fascinating.
| robomartin wrote:
| Sadly, I can't share details or the application domain.
|
| I can tell you that we worked very hard to try and see if
| we could accomplish the objectives using forced air (fans).
| That effort involved laser-welded fins with sophisticated
| airflow management, techniques to break-up the boundary
| layer (which impedes optimal heat transfer) and powerful
| centrifugal fans. It worked well, yet it was large and
| sounded like a jet engine.
|
| Ultimately, while still complex, fluid-based thermal
| management offered a far more compact solution that could
| exist in a room with people not having to wear hearing
| protection. In addition to that, with a fluid-based system
| you can move the hot side to a different room.
| KMag wrote:
| Very interesting, though a bit odd to contrast air
| cooling with fluid cooling, given it takes pretty extreme
| conditions for air to not be a fluid.
| tomcam wrote:
| That's wild. Especially the rough similarity to an air-cooled 3
| cylinder engine.
| avereveard wrote:
| this site has picture for everything but the current article.
| jl6 wrote:
| One hopes this kind of heat density in a single system is still
| more efficient than operating multiple less-power-hungry systems.
| bluenose69 wrote:
| I had to smile at the sentence "It essentially boils down to how
| much surface area you have to dissipate the heat" in the article
| :-)
| hinkley wrote:
| I can't help but think that someday we will see chips designed
| like Sierpinski gaskets, with "holes" used for thermal transport
| and the solid parts used for computation. Such chips would behave
| more or less like a toroidal structure.
|
| Now that chiplets are maturing that is a little less far-fetched.
| dist-epoch wrote:
| Why don't they build data centers where it's really cold, for
| example north of Canada, and just pipe outside air in?
|
| If you train an AI, you don't need low latency/high bandwidth
| Internet access.
| skirmish wrote:
| Google does in Finland, https://templ.io/blog/hamina-google-
| data-center/
|
| Sea water is corrosive, hard to use for cooling.
| yetanotherloss wrote:
| They do, Google had/has one in the Dalles in Oregon next to the
| dam and there are several in line Canada and Finland.
|
| Part of the problem is consistent humidity management, but the
| other is that air, even really cold air, isn't dense enough to
| move specific heat as effectively as the same volume of a
| denser working fluid.
|
| For a practical example in the other direction, look at
| combined cycle gas turbines which use heated air for the
| primary turbine and the recapture as much denser steam for the
| secondary.
| ChuckMcM wrote:
| Getting closer to Jim Gray's "smoking hairy golf ball" (Jim also
| forecasted that chips would eventually become spherical to limit
| the lengths of the data paths.)
|
| It will be interesting to see if commercial data centers re-fit
| with cooling water transport under the floor or above the
| machines (riskier). This will be a challenge for DCs without a
| lot of space under there, presumably they could boost the floor
| height after a door transition. Still how many of them have 20 -
| 40kW of power allocated per rack.
|
| Its one of the few times I miss being at Google because they
| approached this sort of problem very creatively and with an
| effectively unlimited budget to try different things. I'm sure
| their data centers are very much different from my time there!
| icefo wrote:
| I had the chance to visit the new datacenter of my college and
| on the server exhaust side there is a radiator as tall as the
| rack with cold water in it. All the pipes are under the floor.
|
| IIRC they mainly put power hungry compute nodes for the
| clusters in this new datacenter and I remember that servers
| full of GPUs had crazy power draw. The water then goes through
| an heat exchanger to help generate hot water to heat the campus
| and for the taps.
___________________________________________________________________
(page generated 2023-12-26 23:00 UTC)