[HN Gopher] DRAM thermal issues reach crisis point
___________________________________________________________________
DRAM thermal issues reach crisis point
Author : rbanffy
Score : 165 points
Date : 2022-07-18 13:37 UTC (9 hours ago)
(HTM) web link (semiengineering.com)
(TXT) w3m dump (semiengineering.com)
| anthony_r wrote:
| we're lucky that this happens at 360 Kelvin and not at 200 Kelvin
| or even lower.
| p1mrx wrote:
| Note that "kelvin" is lowercase:
| https://english.stackexchange.com/questions/329629/is-kelvin...
| 8jy89hui wrote:
| If our habitable temperature was cooler or hotter, we would use
| different materials to best reflect that environment. I'm not
| so sure it is luck
| tinus_hn wrote:
| It's lucky in the northern hemisphere there is an easily
| recognizable star pointing almost exactly at the North Pole,
| which makes navigation much easier.
|
| It's lucky some available material worked the right way to
| make a transistor.
|
| It's lucky some person smart enough to make that work got to
| work on that.
|
| History is full of lucky coincidences like that. How many
| Einsteins have died out in the jungle, without access to our
| scientific knowledge or a way to add to it? For most of
| history and partly still today, being a scientist wasn't
| possible for just about anyone, you had to be from the right
| family. It's _all_ about luck.
| H8crilA wrote:
| So let's just use those that work up to 400K :)
| somebodynew wrote:
| There is a bit of luck in even having any viable materials
| that work at the required temperature to choose from.
|
| For example, humanity hasn't been able to find a single
| appropriate material for a superconductor at room
| temperature/atmospheric pressure despite significant
| research, but a civilization living below 100 K has a myriad
| of options to choose from. Superconductors are high
| technology to us, but if your planet is cold enough then
| superconducting niobium wire would be a boring household item
| like copper wire is for us.
| dodobirdlord wrote:
| Niobium superconducts at 9.3K, so that would be a pretty
| cold household!
| marcosdumay wrote:
| Hum... We inhabit that temperature exactly because it
| allows for a wide range of chemical reactions in a
| controlled fashion.
|
| The Anthropic Principle is not luck.
|
| We are lucky that those interesting things are possible. We
| are also unlucky that many interesting things are not
| possible. But given that they are possible, it was almost
| inevitable that most of them would be possible around us.
| YakBizzarro wrote:
| well, depends how you define lucky. at cryogenic temperature,
| the leakage current of a transistor is so small that you
| virtually don't require DRAM refresh. I tested DRAM cells with
| discharge times of hours, and the transistor was not at all
| optimized. See https://www.rambus.com/blogs/part-1-dram-goes-
| cryogenic/ (not my work)
| klodolph wrote:
| It's a combination of chemistry and geometry (and other
| factors). Maybe there's some luck.
|
| There are ICs and components built for operating in extreme
| environments, like drilling. You can get SiC (silicon carbide)
| chips that operate above 200degC (473K), if that's important to
| you. There are also various semiconductors that are worse than
| silicon at handling high temperatures, like germanium. Old
| germanium circuits sometimes don't even work correctly on a hot
| day.
|
| If we lived at 200K, I'm sure that there's a host of
| semiconductor materials which would be available to us which
| don't work at 300K.
| dusted wrote:
| Sounds like nothing a little liquid nitrogen can't fix.
|
| > (as a standard metric, about once every 64 milliseconds)
|
| 64 milliseconds? wow.. I thought they'd need refreshing way more
| often
| brutusborn wrote:
| I loved this part at the end: "By contrast, allowing a
| temperature increase for chips in large data centers could have
| surprising environmental benefits. To this point, Keysight's
| White recalled that a company once requested JEDEC increase the
| spec for an operating temperature by five degrees. The estimate
| of the potential savings was stunning. Based on how much energy
| they consumed annually for cooling, they calculated a five degree
| change could translate to shutting down three coal power plants
| per year. JEDEC ultimately compromised on the suggestion."
| JJMcJ wrote:
| I've heard of some large companies that run their data centers
| hot.
|
| Cheaper to have a slightly higher failure rate, or have the
| computers throttle their clock speed, than to pay for extra air
| conditioning.
| woleium wrote:
| Could also be more likely that it's cheaper to extend the
| life of an older DC by accepting higher temperatures and
| failure rates than to upgrade the hvac to accommodate newer
| higher density designs
| klysm wrote:
| Hard to do the math here though because you don't know the
| failure statistics in advance. Kind of a multi-armed bandit
| problem of sorts.
| magicalhippo wrote:
| Wouldn't the Arrhenius equation[1] be a good approximation?
| It's used in the industry[2] from what I know.
|
| Of course you'll need some data for calibrating the model,
| but if you got that?
|
| [1]: https://en.wikipedia.org/wiki/Arrhenius_equation
|
| [2]: https://www.ti.com/lit/an/snva509a/snva509a.pdf
| buescher wrote:
| Well, sometimes, and more frequently now at really small
| process nodes, but mostly no. Most electronic failures
| are basically mechanical and thermal cycling will cause
| fatigue failures more than elevated temperatures will
| accelerate things like electromigration. Lots of people
| still use 1960s style handbook methods anyway because
| there's no plug-and-chug replacement.
|
| The groundbreaking work here was by Michael Pecht back in
| the early nineties:
| https://apps.dtic.mil/sti/pdfs/ADA275029.pdf
| gjsman-1000 wrote:
| _Only three?_ That 's not an immediate win. What if the
| temperature increase causes just so slightly more failures,
| causing so slightly more replacements, and each replacement
| requires energy to make, ship, install, replace, recycle, the
| effects of increased demand... What if it lasts slightly less
| long, causing more early failures and eWaste? After all that
| potential risk, is it a benefit still, and if so, how much?
|
| We don't know and it is hard to know - but I don't blame JEDEC
| and would not call it a "compromise" on their part like it was
| a superior option.
| Spooky23 wrote:
| For one company? That's pretty impressive.
|
| When I was on an architecture team that consolidated ~80
| datacenter to 3 circa 2010, this was a key dollar driver. We
| raised the temperature ~6 degrees from the average temp,
| which meant kicking out a few vendors initially. The cost
| savings for doing that was essentially the total operational
| costs of 5 datacenters.
|
| The annual failure rates for the hardware did not change at
| all by any metric. Number of service impacting hardware
| failures went to zero due to the consolidation.
|
| In general, if you operate within the operating ranges of
| your hardware, you won't have failure. You will have
| complaints from employees, because computers will operate at
| temperatures not comfortable for humans.
| benlivengood wrote:
| It's almost certainly Google, since they've historically
| ran their data centers hotter than most [0]. Cooling
| efficiency increases with a higher delta-T to the working
| fluid, and Google uses a gigawatt or two continuously [1].
| From PUE numbers that's hundreds of MW spent on cooling, so
| making it more efficient is quite worth it.
|
| [0] https://www.google.com/about/datacenters/efficiency/
| [1] https://www.cnbc.com/2022/04/13/google-data-center-
| goal-100p...
| sllabres wrote:
| We did the same several years ago. At the time I found http
| ://www.cs.toronto.edu/~bianca/papers/temperature_cam.pdf
| quite interesting. I didn't find many other precise papers
| about issues when running at a higher temperatures, but
| many "should" and "can".
|
| Based on ASHARE there are two guidelines from HPE and IBM:
| https://www.chiltrix.com/documents/HP-ASHRAE.pdf
| https://www.ibm.com/downloads/cas/1Q94RPGE
|
| We found (by measurement) that some places in the
| datacenter with suboptimal airflow are well over the
| average or simulated temperature so one can leave the safe
| temperature envelope if one isn't careful.
| Spooky23 wrote:
| The beauty of a big project like is you get the
| engineering resources to lake sure the datacenter is
| working right.
|
| The hyper scale people take this to the next degree.
| Freestyler_3 wrote:
| Don't you want to have AC running on cooling mode or
| dehydration mode to get water out of the air?
|
| edit: this makes me wonder what is the ideal humidity in a
| data centre? Is too dry a thing?
| Dylan16807 wrote:
| Dry air makes static electricity buildup more.
| picture wrote:
| Cmon.. logically it has to be more than just three right?
| "Cooling efficiency" sometimes come in unit of W/degC
| difference, so I'd imagine that a few more degrees would be a
| huge deal.
| marcosdumay wrote:
| You usually have to budget a few degC of difference just
| for pushing enough energy through heat exchangers. So the
| rate of chip temperature / external temperature is lower
| than the rate that effectively determines the cooling
| efficiency.
| gjsman-1000 wrote:
| I don't know how much power efficiency would be saved - my
| concern is more that it is completely logical that running
| any part at higher temperatures causes increased risk of
| failure, whether it be a computer part or a mechanical
| part. _How much?_ I don 't know - I just don't blame JEDEC
| for recognizing this is not a clear and obvious win.
|
| Imagine if the failure rate was raised by as little as 1%.
| RAM Failure is not uncommon compared to other parts - I've
| had it happen before and render a system unable to boot,
| that's why we have Memtest86 and not CPUtest86 or
| SSDtest86. A 1% increase in failure over 5 years would have
| effects just as unbelievable as the power saving that
| increasing the temperature would be. How many smartphones
| would be junked? How many PCs would be thrown out for not
| working by people who are average Joes who can't diagnose
| them, and the extra waste that generates from both
| disposing the old PC and purchasing a new one? Perhaps the
| new PC is more efficient, but which is better, greater
| emissions or more eWaste in the ground due to the new PC
| being likely more efficient than the old one?
|
| The point is that it is not a clear win. With further
| research it might be, and I might be all for it. I'm only
| nitpicking the description of it as being a "compromise" as
| though it were obvious.
|
| [@picture: I'm at my posting limit for the day because HN
| is, well... I'll leave their censorship policies for
| another day. I would agree with you if the RAM with the 90C
| limit were strictly ECC RAM because that is most often used
| in data centers and not consumer parts. Maybe we have non-
| ECC/85 RAM and ECC/90 RAM options...]
| dcow wrote:
| Well now you're just being hyperbolic. As you say, this
| is an engineering problem so solutions are far from their
| ideals states in either direction. However, 1% increase
| in ram failure rates ruining the world? That doesn't
| sound right. Errors are encountered in RAM _all the time_
| and guess what, they 're corrected often by the hardware
| before even bothering the system. I'm sure we could deal
| with a 1% increase...
| Spooky23 wrote:
| Most datacenter hardware is fine at 95 degrees F (inlet
| temp). Approved configurations are usually available to
| 105 degrees F or slightly higher. Some devices can run as
| high as 130F.
|
| In the operating range, you're not going to have any
| measurable change in operations or failure rate - if you
| do the parts are defective. All of the stories you hear
| about this and that are conjecture.
| smolder wrote:
| Interestingly, computer chips can often be run at lower
| voltage and wattage for a given frequency if they are
| kept at a colder temperature. As a home user I can
| significantly reduce power draw for a CPU/GPU by
| improving the cooling solution and lowering voltages.
|
| The reasons this doesn't work for datacenters is two-
| fold, I think: First, they won't see efficiency
| improvements just by keeping their CPUs and GPUs (or RAM)
| cooler because the power levels/tables for the chips are
| baked-in, and operators aren't going to the trouble of
| tweaking voltages themselves. Second, even if they did
| tweak voltages, the cost of sustaining lower temperatures
| with better cooling likely won't outweigh the savings
| resulting from lower power draw for the chips.
|
| Still, this raises the question of whether designing
| hardware for higher operating temperatures is always the
| right move. At some point there's going to be a cost in
| performance and/or efficiency that outweighs the savings
| from allowing higher temperatures. Ideally these
| tradeoffs should be balanced as a whole.
| kllrnohj wrote:
| I think you missed the biggest reason this doesn't do
| much for servers - they _already_ run at low frequencies
| & voltages.
|
| For example take the Epyc 7742, at 225W it sounds super
| power hungry. But the 64-core chip only boosts to 3.4ghz
| max (2.25ghz base). That's less than the base clock of
| almost any of the Ryzen consumer CPUs. And if you look at
| the lower frequency end of https://images.anandtech.com/d
| oci/16214/PerCore-1-5950X.png there's not a whole heck of
| a lot of efficiency gains likely to be had below that
| ~3-3.4ghz mark. They're already basically sipping power
| at something like 3w per CPU core or less. 225w / 64c =
| 3.5w/c, _but_ the IO uncore isn 't exactly cheap to run
| and iirc sits more like in the 50-70w range. So subtract
| that out and you're at more like 2.5-2.7w/c. I don't
| think throwing cooling at this is really going to get you
| much of a gain.
| bradstewart wrote:
| This usually isn't true _at scale_ though. The chip
| manufacturers do a ton of validation and qualification to
| set the operating parameters (voltage, etc).
|
| You can undervolt (or overclock) one specific chip,
| individuals have been doing this at home for basically
| ever, but there's (almost) always a system-specific
| validation process you then do to make sure the system is
| stable for a specific workload with the new parameters.
|
| And these parameters differ between batches of chips, or
| even between chips within a batch.
|
| It's also significantly harder to drastically reduce the
| temperature of the chips inside of a single server, given
| the machine density of a typical data center.
| rbanffy wrote:
| > It's also significantly harder to drastically reduce
| the temperature of the chips inside of a single server,
| given the machine density of a typical data center.
|
| It'd be fun, however, if we could dynamically adjust that
| according to workload. If workload is light, you could
| consolidate load into fewer sockets/memory sticks and
| power down everything in that socket.
| picture wrote:
| That's for sure. I 100% agree that increased temperature
| will statistically increase failure rate. I'm just
| thinking that, the most common mechanisms of thermal
| failure in electronics are caused by repeated thermal
| cycling which cause fatigue and stress failures at
| interconnects (solder bumps, silicon bonding, etc). Data
| centers are designed to be operated in a relatively very
| constant temperature environment, so I would suspect that
| the failure rate may not be raised significantly.
| uoaei wrote:
| One single company changing one single design parameter and
| enabling savings on the scale of _multiple power plants_?
| That is as immediate as wins get.
| __alexs wrote:
| Maybe RAM will finally get more than a 4mm thermal pad and a
| random bit of Alu for cooling. Seems like most cooling designs
| have treated RAM as even more of an after thought than VRMs up
| until recently.
|
| Even in most servers the accommodation for RAM cooling has
| basically just been orientating the DIMMs to line up with
| airflow. They are still packed together with minimal clearance.
| dcow wrote:
| > cooling has basically just been orientating the DIMMs to line
| up with airflow
|
| Isn't that server cooling in a nutshell? Ram high volumes of
| airflow through the chassis with stupidly loud fans and hope
| the parts stay cool?
| dodobirdlord wrote:
| You still need to conduct the heat away from the sources to a
| radiator of some sort, since cooling is proportional to
| surface area and it's much easier to increase surface area by
| adding fins than by increasing airflow. You can only speed up
| the air to a certain point, past which better cooling becomes
| a matter of shaping the components for more contact with the
| air.
| kllrnohj wrote:
| Sure but the airflow over DIMMs in a server chassis is
| already _vastly_ more cooling than RAM gets in any consumer
| application other than GDDR on GPUs.
| __alexs wrote:
| The density is also vastly higher.
| Ekaros wrote:
| Yeah, kinda weird that on ATX RAM is placed in way that
| is perpendicular to usual CPU cooling or even general air
| flow. The top mounted fans do change this, but I don't
| think those are very common.
| sbierwagen wrote:
| Makes it easier to keep all the traces the same length: h
| ttps://electronics.stackexchange.com/questions/74789/purp
| os...
| kllrnohj wrote:
| Although the entire socket can be rotated 90* for even
| better traces, which is what the EVGA Kingpin
| motherboards do (
| https://www.evga.com/articles/01543/EVGA-Z690-DARK-
| KINGPIN/ )
| mjevans wrote:
| This design would have made so much more sense before the
| top of the case closed loop watercooler radiator setups
| became popular.
|
| I still like this a lot, but now the top down fan and
| some kind of ducting to help direct the air out the top /
| side vent makes more sense. There's so much heat these
| days everyone needs the baffles inside of a case.
| AshamedCaptain wrote:
| I am not sure how much this is related to external cooling
| versus actually internal thermal dissipation. DDR JEDEC
| standards have actually decreased power consumption on every
| generation.
| __alexs wrote:
| They have reduced voltage but power consumption per sq-mm has
| gone up with increased densities. Many people run DRAM at
| above JEDEC speeds which usually requires higher voltages
| too.
|
| Peak power consumption of DDR4 is around 375mW/GB @ 1.2V,
| DDR5 drops this about 10% but also increases the maximum
| density of a DIMM by 8x to 512GB which is like, 150W for a
| single DIMM.
| formerly_proven wrote:
| There are only three (tiny) 12 V power pins on a DDR5
| module, neither that nor the form factor allows for
| dissipating anywhere close to 150 W. The teased 512 GB
| Samsung module doesn't even have a heatspreader.
| __alexs wrote:
| VIN_BULK is 5V with a max current of 2A but every data
| pin provides current that is used on the DIMM in some
| respect.
| deelowe wrote:
| There's been talk of eliminating sockets for years. Something
| has got to give.
| jnwatson wrote:
| You can still actively cool socketed RAM.
| deelowe wrote:
| Sort of. Trace length is already a nightmare.
| zeroth32 wrote:
| more compact chips will have a higher failure rate. Not a
| great idea for servers.
| to11mtm wrote:
| There's a fun curve on this to be sure.
|
| If I had to guess, Servers would not go any further than
| some sort of memory-backplane where the memory for multiple
| channels was integrated onto a single PCB.
|
| Even then, IIRC hot-swapping of memory modules is a thing
| for some servers, so that will have to be handled somehow.
| wallaBBB wrote:
| Question that comes to mind - Is M2 (with thermal issue on new
| Air) affected considering how RAM is packed there?
| Toutouxc wrote:
| What thermal issues? All I've seen so far are people who don't
| seem to understand how passive cooling works, despite the M1
| Air being out for two years and working the same way.
| nostrademons wrote:
| The M2 chip generates more heat than the M1, with 20% more
| transistors and about a 12% higher clock speed. M2 Mac Pro
| has thermal issues compared to M1 Mac Pro as well, even with
| the fan.
| buryat wrote:
| mac pro doesn't have m1/2
| ywain wrote:
| They were likely referring to the laptop Macbook Pro, not
| the desktop Mac Pro.
| webmobdev wrote:
| Perhaps OP came across this recent article - _Reviewers
| agree: The M2 MacBook Air has a heat problem_ -
| https://www.digitaltrends.com/computing/m2-macbook-air-
| revie... .
| GeekyBear wrote:
| Throttles under load isn't a heat problem.
|
| This review of Lenovo's Thinkpad Yoga is what a heat
| problem looks like:
|
| >Unfortunately, the laptop got uncomfortably hot in its
| Best performance mode during testing, even with light
| workloads.
|
| https://arstechnica.com/gadgets/2022/07/review-lenovos-
| think...
|
| Too hot to comfortably touch, even under light workloads,
| unless you set it to throttle all the time? That's a heat
| problem.
| tedunangst wrote:
| Is it really a problem if it's designed to thermally
| throttle?
| EricE wrote:
| It is if you are expecting maximum performance.
| tinus_hn wrote:
| Perhaps that's the problem. Their expectations are
| unrealistic. Did Apple promise no thermal throttle?
| Dylan16807 wrote:
| Marketing usually talks about unthrottled speed only,
| including Apple's here as far as I have seen.
| jhallenworld wrote:
| Maybe DRAM becomes non-viable, so switch to SRAM. Which is
| denser, 14 nm DRAM or 5 nm SRAM?
| 55873445216111 wrote:
| SRAM is ~10x higher cost per bit (due to memory cell size) than
| DRAM
| [deleted]
| to11mtm wrote:
| DRAM.
|
| IIRC TSMC's 135MBit 5nm example is 79.8mm^2, although that's
| got other logic.
|
| In the abstract, a 0.021 square-micrometer-per-bit size [1]
| says you'd need about 21mm^2 for a gigabit (base 10) of 5nm
| SRAM, without other logic.
|
| Micron claimed 0.315Gb/mm^2 on their 14nm process, [2] so
| somewhere between a factor of 6 and 7.
|
| That said, my understanding is that there is some sort of wall
| around 10nm, where we can't really make smaller capacitors and
| thus the limitation on things. (This may have changed since I
| last was aware however.)
|
| (There is also the way than 'nm' works these days... but I'm
| not qualified to speak on that)
|
| Also, AFAIK SRAM is still broadly speaking more power hungry
| than DRAM (I may be completely out of date on this though...)
|
| [1] - https://fuse.wikichip.org/news/3398/tsmc-details-5-nm/
|
| [2] - https://semiengineering.com/micron-d1%CE%B1-the-most-
| advance...
| Victerius wrote:
| > A few overheated transistors may not greatly affect
| reliability, but the heat generated from a few billion
| transistors does. This is particularly true for AI/ML/DL designs,
| where high utilization increases thermal dissipation, but thermal
| density affects every advanced node chip and package, which are
| used in smart phones, server chips, AR/VR, and a number of other
| high-performance devices. For all of them, DRAM placement and
| performance is now a top design consideration.
|
| I know this may not be a cheap solution, but why not start
| selling pre-built computers with active cooling systems?
| Refrigerant liquids like those used in refrigerators or water
| cooling could be an option. The article addresses this:
|
| > Although it sounds like a near-perfect solution in theory, and
| has been shown to work in labs, John Parry, industry lead,
| electronics and semiconductor at Siemens Digital Industries
| Software, noted that it's unlikely to work in commercial
| production. "You've got everything from erosion by the fluid to
| issues with, of course, leaks because you're dealing with
| extremely small, very fine physical geometry. And they are
| pumped. One of the features that we typically find has the lowest
| reliability associated with it are electromechanical devices like
| fans and pumps, so you end up with complexity in a number of
| different directions."
|
| So instead of integrating fluids within the computer, build
| powerful mini-freezers for computers and store the computer
| inside. Or split the warm transistors from the rest of the build
| and store only those inside the mini freezer, with cables to
| connect to the rest of the computer outside.
| CoolGuySteve wrote:
| I've always wondered why motherboards aren't placed at a slight
| angle like the wing of a car so that the air moving over it has
| a higher angle of incidence, higher pressure, and higher
| thermal capacity.
|
| With the angle, you can also place cable connectors and whatnot
| on the bottom of the board so they don't obstruct airflow as
| much.
|
| Basically, optimize PV = nRT inside the computer case at no
| extra cost other than a redesign.
| saltcured wrote:
| I'm struggling slightly to envision the effect you are
| seeking. My motherboards don't tend to be flying through the
| air and so lack a well-defined angle of attack... :-) There
| already exist horizontal and vertical motherboard mounts in
| different computer cases, including ones that could be stood
| either way to suit the desktop. In my experience, this
| doesn't affect cooling that much.
|
| I think the fan, internal baffle, and vent positions dominate
| the airflow conditions inside the case. So, rather than
| tilting a motherboard, wouldn't you get whatever you are
| after with just a slight change in these surrounding
| structures?
| CoolGuySteve wrote:
| You seem to be ignoring that all the punch through
| connectors on a board are currently on the side that air
| must pass over.
|
| Furthermore, I've never seen a case, either desktop or
| rackmount, that allows one to angle the fans at anything
| other than a 90 degree angle or parallel to the board.
|
| None of this makes sense in terms of fluid dynamics.
| saltcured wrote:
| Having a smooth board seems at odds with having a large
| surface area for heat transfer, doesn't it? And wouldn't
| laminar flow also have less movement near the surface?
| For optimal cooling, would you actually want turbulence
| to mix the layers? Instead of mounting fans at different
| angles, add some vanes or even duct work to aim and
| accelerate the flow where it needs to transfer heat.
|
| But, given that boards do not have completely
| standardized layouts, it seems like you eventually need
| to assume a forest of independent heat sinks sticking up
| in the air. You lose the commodity market if everything
| has to be tailor made, like the integrated heat sink and
| heat pipe systems in laptops.
| SketchySeaBeast wrote:
| I would assume because the things with the greatest heat have
| typically had such a requirement for active cooling that
| minor optimization wouldn't have helped much and for
| everything else you really didn't worry about (though my
| motherboard now has heat-pipes across the VRMs and my RAM and
| northbridge have got big old heat spreaders).
| CoolGuySteve wrote:
| Yeah, the way my case is laid out, airflow to the VRM is
| blocked by the heat spreaders on the RAM and the ATX power
| connector. AMD systems in particular seem to require better
| memory thermals.
|
| It seems like we're reaching a point where a new ATX
| standard is required to ensure the memory and GPU can make
| contact with a large heatsink similar to how the trashcan
| Mac Pro and XBox Series X are designed. Doing so would also
| cut down on the ridiculous number of fans an overclocked
| gaming PC needs these days, my GPU and CPU heatsinks have 5
| 80mm fans mounted to them.
|
| ATX is great but it seems like only minor improvements to
| power connectors and whatnot have been made since it was
| introduced in 1995.
| Macha wrote:
| Are the trashcan Mac Pro and Xbox Series X considered
| effecient cooling solutions? I thought the trashcan Pro
| had issues at higher temperatures which in turn limited
| their ability to use higher end parts and in turn forced
| the return of the cheese grater?
|
| The series X GPU then is considered equivalent to a
| desktop 3070, and laptop 3080s exist and are also
| considered equivalent to a desktop 3070, so don't require
| anything particularly novel in terms of cooling solutions
| (3080 laptops are loud under load, but so is the series
| X).
|
| Overclocked components are so heavy in cooling needs as
| they're being run so far outside their most efficient
| window to get the maximum performance - which is why
| datacenters which care more about energy usage than
| gamers tend to use lower clocked parts.
| CoolGuySteve wrote:
| Both systems are a fraction of the size of an ATX case
| and as efficient as they needed to be to meet their
| predetermined convective cooling needs. In both cases,
| profit margin is increased by reducing material and
| shipping volume requirements.
|
| A similar single heatsink design for high end PCs would
| need to be much larger than either of those designs but
| considering how much empty space is in an ATX case, I
| don't think it would be much larger than current PCs.
|
| Consider that the best PC cooling solutions all look like
| this: https://assets1.ignimgs.com/2018/01/18/cpucooler-12
| 80-149617...
|
| Or pass liquid through a radiator with comparable volume,
| standardizing the contact points for a single block
| heatsink with larger fans would make computers more
| efficient and quiet.
| picture wrote:
| It won't be a simple redesign to tilt boards "slightly"
| because manufacturing processes that are already honed in
| need to be completely retooled, with likely more complexity
| (man different length of standoffs per board?)
|
| And additionally, there are only a few key components of a
| motherboard that need cooling. Most of the passive components
| like the many many decoupling capacitors don't generate
| significant heat. The components that do require access to
| cool air are already fitted with finned heat sinks and even
| additional fans. They interact with air enough to where a
| slight tilt cannot make a meaningful difference.
|
| Basically just adding a small piece of aluminum to key areas
| will work better than angling the whole board
| kllrnohj wrote:
| You don't really need to pass any air over the PCB, though.
| Anything that needs cooling sticks up above it. Also the
| airflow through a case isn't perfectly parallel to the
| motherboard PCB anyway. GPU fans throw the air in all sorts
| of directions, including straight down into the motherboard.
| And so do CPU coolers.
|
| Cables also don't really obstruct the airflow like at all.
| dangrossman wrote:
| The article mentions that the automotive industry demands some
| of the largest temperature ranges for these parts. New cars are
| basically computers on wheels (especially something like a
| Tesla), and the cabin in a hot day under a glass roof can
| easily exceed 170F. Where will the freezer you build around all
| the computers go, and how will it be powered while the car is
| sitting parked in a lot?
| outworlder wrote:
| > I know this may not be a cheap solution, but why not start
| selling pre-built computers with active cooling systems?
| Refrigerant liquids like those used in refrigerators or water
| cooling could be an option.
|
| Before going into water cooling, a change in form factor to
| allow for better airflow (and mounting of larger heat sinks)
| would be in order.
|
| Water cooling would require a water cooling block, not sure how
| it would work with the current form factor.
|
| > So instead of integrating fluids within the computer, build
| powerful mini-freezers for computers and store the computer
| inside. Or split the warm transistors from the rest of the
| build and store only those inside the mini freezer, with cables
| to connect to the rest of the computer outside.
|
| That's impractical. You are heat exchanging with the air, then
| you are cooling down the air? Versus exhaust the hot air and
| bringing more from the outside. You just need to dissipate
| heat, active cooling is not needed.
| kube-system wrote:
| Heat pipes are the phase-change cooling solution that solves
| all of those issues. People don't really think of their cheap
| laptop as having a phase-change liquid cooling system, but it
| actually does.
| _jal wrote:
| For most commercial use, you're talking about refrigerated
| racks. They exist, but they're pretty niche.
|
| In a typical data center, all this does is decentralize your
| cooling. Now you have many smaller (typically less robust)
| motors to monitor and replace, and many drain lines much closer
| to customer equipment and power.
|
| Those units take up a lot more space, too, because of the
| insulation.
| tbihl wrote:
| The elevated temperatures of the overheating components are
| such that fluid flow, not temperature difference, is the thing
| to go after, and it also has the advantage of being much
| simpler than adding a whole refrigeration cycle.
|
| These problems start to read like problems from nuclear power,
| where sufficiently uniform flow is a huge deal so that various
| materials aren't compromised in the reactor.
| beckingz wrote:
| Condensation in most environments gets really rough on
| computers.
|
| In theory you can eliminate condensation.
|
| But in practice, there's a difference between theory and
| practice.
| tonetheman wrote:
| i would hang the memory upside down so that condensation goes
| away from the electronics then put in a catch tray at the
| bottom for evaporation.
|
| I am sure there is a lot more to it than that though... ha
| beckingz wrote:
| More of an issue on the motherboards where it will
| eventually get into something.
| dclowd9901 wrote:
| Heat also moves upward so that would probably cause the
| board and its components to get too hot.
| dtx1 wrote:
| why not integrate the ram into the package like apple does
| anyway and use a slightly larger SoC Cooling solution for the
| chips? Or just attach headspreaders to ram modules (like gaming
| modules) and add a fan for them like servers already do due to
| their general front to back airflow design. The only thing you
| can't do anymore is relying on the passive cooling of the chips
| own surface, something CPUs can't do anymore since the early
| 90s
| toast0 wrote:
| Apple does a ram on top system right?
|
| That's not going to be viable for servers ror two big
| reasons:
|
| a) it would big a major capacity limitation; you're not
| fitting 8-16 DIMMs worth of ram ontop of the CPU. Sure, not
| everyone fills up their servers, but many do.
|
| b) if you put the ram on top of the cpu, all of the cpu heat
| needs to transit the ram, which practically means you need a
| low heat cpu. This works for Apple, their laptop cooling
| design has never been appropriate for a high heat cpu, but
| servers manage to cool hundred watt chips in 1U through
| massive airflow, so high heat enables more computation.
|
| Heatspreaders may make their way into server ram though
| (although not so big, cause a lot of servers are 1U)
|
| Otoh, the article says
|
| > 'From zero to 85degC, it operates one way, and at 85deg to
| 90degC, it starts to change,'" noted Bill Gervasi, principal
| systems architect at Nantero and author of the JEDEC DDR5
| NVRAM spec. "From 90deg to 95degC, it starts to panic. Above
| 95degC, you're going to start losing data, so you'd better
| start shutting the system down."
|
| CPUs commonly operate in that temperature range, but RAM
| doesn't pull that much power, so it doesn't get too much
| above ambient as long as there's some airflow, and if ambient
| hits 50C, most people are going to shutdown their severs
| anyway.
| kube-system wrote:
| Maybe we could architect servers with more CPU packages and
| fewer cores per package?
|
| Maybe instead of 32 RAM packages and 4 CPU packages, we
| could have 16 CPU packages each with onboard RAM?
| nsteel wrote:
| Will these CPUs talk to each other with a similar latency
| hit as we get from talking to DRAM today?
| __alexs wrote:
| The M1/M2 has the RAM on the same package as the CPU but
| it's not actually on top of the die, it's adjacent to it.
| Here's a pic of one someone on reddit delided
| https://imgur.com/a/RhGk1xw
|
| Obviously this is still a lot of heat in a small space but
| it does mean the cooler gets to have good coupling with the
| die rather than going all the way through some DRAM first.
| toast0 wrote:
| That's more tractable. Gotta make sure everything hits
| the same z-height and the contact patches are right...
| But you still have a capacity issue.
| SketchySeaBeast wrote:
| Big old SoC really do seem like the future. CPU, GPU, RAM,
| motherboard controllers, throw all those different problem
| onto a big old die and optimize for cooling that guy.
| foobiekr wrote:
| SOCs are harder, not easier, to cool.
| AtlasBarfed wrote:
| Yeah I don't understand why a dedicated fan and other basic
| CPU cooling techniques don't apply here. It's probably
| because the DRAM industry doesn't want to change form factors
| and standards to a substantial degree...
|
| ... probably because they do the bare minimum to keep up with
| CPU design and routinely get busted for cartel price fixing
| and predatory pricing.
| dtx1 wrote:
| I mean literally this https://youtu.be/TFE9wfAfudE?t=611
| Problem solved
| jackmott42 wrote:
| Active cooling tends to have the challenge of controlling
| condensation, and then of course now you are drawing even MORE
| power from the wall.
| mrtranscendence wrote:
| I've seen YouTube videos of overclockers employing
| refrigeration techniques (or coolants like liquid nitrogen),
| and it does seem like condensation is a major issue. Maybe
| that's not as much of a problem at more reasonable
| temperatures?
|
| But yeah, I'd be just as or more concerned about the amount
| of power it would take to run a freezer like that ... I'm
| already drawing as much as 850 watts for my PC, with a max of
| a couple hundred watts for my OLED TV and speakers, and don't
| forget the modem and router, and a lamp to top it all off;
| would a powerful enough mini freezer to cool my PC even fit
| on the circuit?
|
| Actually, it's even worse because I've got an air purifier
| running there too ... but I could move that, I suppose.
| Ekaros wrote:
| Cascade cooling is a fun thing. The next step after water
| cooling before getting to liquid nitrogen...
|
| Still, I wouldn't really go for that. Knowing how noisy the
| average compressor and fan for that size is. I much prefer
| my nearly silent fan cooled machine...
| EricE wrote:
| If you are going to go extreme enough to have a
| compressor and fan, you can always put them in another
| room :p
| snarfy wrote:
| The biological solution to leaks is clotting. Do we have
| cooling liquids that clot like blood does, say when exposed to
| oxygen?
| kansface wrote:
| Great, not your computer can have a thrombosis or a stroke!
| xxpor wrote:
| the reliability there isn't particularly great ;)
| mgsouth wrote:
| 50-100 yrs between major overhaul? When's the last time you
| had to manually top-up or bleed air out of your circulatory
| system? I'd say that's impressively robust.
| chmod775 wrote:
| It will clot inside the cooling circuit because of the air
| within it. Or will get within it.
|
| However there are ways to prevent and detect leaks in current
| system with negative pressure:
| https://www.youtube.com/watch?v=UiPec2epHfc
| SketchySeaBeast wrote:
| So you're taking a 300W-1000W space heater and putting it into
| a freezer that needs to be able to bleed that much heat? Going
| to need another breaker.
| Victerius wrote:
| I'm just brainstorming. I can troubleshoot my computer and
| write basic code but I'm not a computer engineer.
| 7speter wrote:
| This has come up often in the comments section of articles
| I've seen about prospective 600-900w 40 series nvidia cards.
| SketchySeaBeast wrote:
| Honestly, the fact that my 3080 can draw 400W makes me kind
| of sick and I limit FPS specifically so it doesn't. I can't
| ever see myself buying a card that draws double that.
| max51 wrote:
| You can reduce the power limit a lot on a 3080 before it
| impacts performance. The last 3 - 5% of performance they
| are getting out of their chip is responsible for more
| than a third of the power draw on higher clocked cards.
| SketchySeaBeast wrote:
| Yeah, I've significantly undervolted both my GPU and CPU.
| I now never see 300W, really helped with thermals as
| well.
| baybal2 wrote:
| nonrandomstring wrote:
| Still waiting to see the first micro-engineered Stirling engine
| that can self-cool. Any physicists care to comment on why that
| won't work yet, or ever?
| Chabsff wrote:
| You can only cool something by making something else warmer by
| a larger amount. The heat has to go somewhere, and moving that
| heat in any non-passive way will invariably produce yet more
| heat in the process.
| nonrandomstring wrote:
| I think some people are interpreting that as a joke. I'm not
| talking about a _net gain_ of energy or any crazy perpetual
| motion machine. Think of something like a "heat brake".
| Differential heat energy can be converted to mechanical work.
| Some of that can be used to cool the system elsewhere,
| creating a negative feedback loop. Another way to think of
| such a system is like the "reluctance" of an inductor.
|
| With present thermoelectric effects, using a Seebeck junction
| to generate current for a fan is hopelessly ineffective. But
| is that necessarily the case for all designs which could help
| to hold a system under a critical temperature when heat
| spikes.
| acomjean wrote:
| do you mean something like a solar chimney, where heat is
| used to draw air through the rest of the building?
|
| https://en.wikipedia.org/wiki/Solar_chimney
| nonrandomstring wrote:
| That's an example of a similar system, but probably
| impractical for use in an electronics context. I have in
| my imagination a fantasy "smart" material that in the
| limit can transfer 0.5 * k^m joules of heat per square
| meter per second from one side to the other (where m is
| somewhere between 1 and 2). Such a material would always
| feel slightly warmer on one side and cooler on the other,
| and this effect would actually increase in the presence
| of ambient heat, hence it could act as a thermal "brake"
| or active heat pipe/diode. I beleieve such a device is
| "allowable" within the laws of physics.
| ta8645 wrote:
| > You can only cool something by making something else warmer
| by a larger amount.
|
| Why isn't it also true that you can only make something
| warmer, by cooling something else by a larger amount?
|
| The movement of electricity generates waste heat, why isn't
| that process reversible? Making the heat disappear into a
| cold wire, rather than just dissipating into the atmosphere?
| (not suggesting it's would be easy or even practical).
| nostrademons wrote:
| 2nd law of thermodynamics - entropy is always increasing.
| Heat transfer is never 100% efficient, you always lose
| something in transmission. This is also why it's not
| possible to create a perpetual-motion machine.
|
| https://en.wikipedia.org/wiki/Second_law_of_thermodynamics
| nonrandomstring wrote:
| Peltier coolers [1] do exist for specialist applications
| but they are not at all effective. You can even buy them on
| Amazon. If the goal is to iron out a spike to stop your
| semiconductor from going into thermal runaway (instead of
| generating net energy as is the knee-jerk of some
| unimaginative down-voters here) then it's a possible
| saviour.
|
| [1] https://www.britannica.com/science/Seebeck-effect
|
| [2] https://www.amazon.com/Peltier-
| Cooler/s?k=Peltier+Cooler
| dylan604 wrote:
| Next, we'll have a generation of mobile devices that will be
| liquid cooled. Of course because of the miniturization, there
| will be no way to refill the liquid coolant without getting a new
| device. This will naturally happen before the batteries die
| creating an even shorter life cycle in devices. Sounds like a
| perfect pitch for an upcoming WWDC type of event.
| superkuh wrote:
| RAM has been parallel for ages. IBM's new POWER10 achitecture
| switches to serial control of ram with firmware running on the
| ram sticks. As long as complex mitigations and monitoring are
| going to be required this might be the way to go.
| bilsbie wrote:
| It's at least partly caused by climate change too
___________________________________________________________________
(page generated 2022-07-18 23:00 UTC)