[HN Gopher] Since when did SSDs need water cooling?
       ___________________________________________________________________
        
       Since when did SSDs need water cooling?
        
       Author : LinuxBender
       Score  : 65 points
       Date   : 2023-05-27 12:41 UTC (10 hours ago)
        
 (HTM) web link (www.theregister.com)
 (TXT) w3m dump (www.theregister.com)
        
       | jeffbee wrote:
       | This author is confused, or the article is just badly written.
       | The thing that draws all the power in these newer SSDs is the
       | controller, not the memory. Shoveling 2 million IOPS to the host
       | CPU is a difficult task ... your high-power host CPU can barely
       | keep up. But the article goes on and on about the flash and
       | hardly mentions the controller.
        
         | justin66 wrote:
         | > This author is confused, or the article is just badly
         | written.
         | 
         | I see you have discovered _The Register._ Welcome!
        
         | zokier wrote:
         | I'd question could we not use more the powerful, well-cooled,
         | cpu that we already have in our computers instead of pushing
         | the ssd controller complexity and power ever further. What if
         | we used something like UBIFS or F2FS and removed/simplified
         | FTL?
        
           | phire wrote:
           | Once you get to this level of performance, the bulk of a SSD
           | controller ASIC is essentially just high-performance
           | switching fabric; directing the flow of data between the 8-12
           | channels of NAND flash, the PCIE bus, it's internal buffers
           | and the DRAM.
           | 
           | If you have any experience with high performance networking
           | equipment, you know that pure switching fabric ASICs
           | generates a lot of heat on its own. Hell, even a dumb 5-port
           | gigabit ethernet switch generates a surprising amount of
           | heat, they are always warm to the touch.
           | 
           | I really doubt that handling the FTL layer on the controller
           | adds that much extra power draw. A dumb PCIE <-> NAND
           | switching ASIC will also have cooling problems.
        
             | trogdor wrote:
             | I recently upgraded my home network to 10 gigabit and was
             | surprised by the amount of heat generated by 10 gig
             | switches. Why do pure switching fabric ASICs generate so
             | much heat?
        
               | nine_k wrote:
               | Gates consume electricity when they switch: some energy
               | is needed to flip a FET from "open" to "closed" or back.
               | Then the gate stays in a particular state for some time,
               | allowing the circuit to operate.
               | 
               | The faster you switch a gate, the more often you have to
               | pay the switching price, which cannot go too low, else
               | the thermal noise would overcome it. So you spend roughly
               | 10x the energy switching a 10 Gbps stream as you use for
               | 1 Gbps stream. Newer, smaller gates consume less energy
               | switching, but not 10x less.
        
           | Sakos wrote:
           | SSD controllers are just ASICs. I'm not sure why you want to
           | go from a chip tailored for a specific task to a general
           | purpose one that already has way too much on its plate. Then
           | there's things like latency and how the controller abstracts
           | away the details of how an SSD works internally. All that
           | complexity doesn't go away by putting it on your CPU. You're
           | just moving it from one place to another for no benefit, and
           | adding other complexities.
           | 
           | Could ask the same thing about all the extra silicon in GPUs
           | that adds hardware acceleration for video encoding/decoding.
        
             | jeffbee wrote:
             | The point of doing it in software on the host kernel would
             | be to allow the flash layer and the filesystem to co-
             | evolve, instead of being agnostic at best and antagonistic
             | at worst.
        
               | Sakos wrote:
               | Who's going to write the code for it? Microsoft? Does
               | every manufacturer write their own kernel-level driver?
               | What happens to Linux/Unix? I don't want any of these
               | manufacturers anywhere near the kernel, or even doing any
               | more in software than they already do. Samsung isn't
               | exactly known for code quality.
               | 
               | This is a fantasy with questionable benefits at best that
               | don't outweigh the downsides.
        
               | jeffbee wrote:
               | > Microsoft?
               | 
               | Yes. That seems ideal to me. Microsoft, Apple, open
               | source contributors. Today what you have is a closed-
               | source translation layer written by the kinds of people
               | who write PC BIOSes, i.e. the biggest idiots in the
               | software industry. I would be _much_ happier with an OS
               | vendor flash storage stack. For all I know, I am already
               | using something like that from Apple. And I assure you
               | that large-scale server builders like Amazon and Google
               | are already doing it this way.
        
               | mistrial9 wrote:
               | calling those layers of people idiots is doing them a
               | favor, excusing the de facto practices as being simply
               | dumb. The truth includes a different layer, the business
               | of business, who pays, who gets to do what. Booting your
               | own hardware is the subject, and the actors there are not
               | the ones that come to mind, thinking of consumer
               | advocacy.
               | 
               | The largest companies have other alignments that are not
               | often discussed openly.
        
               | [deleted]
        
               | [deleted]
        
               | zokier wrote:
               | > don't want any of these manufacturers anywhere near the
               | kernel, or even doing any more in software than they
               | already do. Samsung isn't exactly known for code quality.
               | 
               | That seems such a bizarre take. You think it's better
               | that the crappy code is given to you as blackbox firmware
               | with no oversight rather than in the open written to
               | kernels quality standard where it can at least
               | hypothetically be improved?
        
           | numpad0 wrote:
           | Speaking from memory and coarse grained understanding,
           | 
           | 1) MLC/TLC/QLC work more like 4/8/16-tone grayscale e-paper
           | than flash: e.g. 0x10 = (1,0,1,0), that's "4 level/bits per
           | cell". And it's not a single pulse of 0x10 voltage into a
           | memory cell, more like repetitive pulses from 0b1111 to
           | enough millivolts below 0b1011. Readout is probably more
           | complicated, let alone lifecycle management. Those businesses
           | might be more involved than it's worth filesystem researchers
           | time.
           | 
           | 2) It was often said, at least years ago, that the
           | considerable fraction of heat in NVMe SSDs comes from PCIe
           | serialization/deserialization(SerDes), rather than payload
           | data processing or NAND programming.
           | 
           | If both of above are true, maybe it's PCIe that should be
           | replaced, with something more like the original PCI?
        
             | nine_k wrote:
             | The original PCI was parallel. You can't have an
             | excessively fast parallel bus, because the tiniest
             | differences between lanes make different pins receive the
             | signal out of sync. This is why RAM interface is so hard to
             | get right, and the lanes there are as short as possible.
        
         | KennyBlanken wrote:
         | Because the controller and flash are on the same physical
         | device in a very small amount of space, at least somewhat
         | thermally coupled, not just by the PCB, but by "heat spreaders"
         | often sold with the drive or part of the motherboard. Google
         | around and you'll see lots of thermal camera images of M2
         | drives.
         | 
         | As the article points out, these drives consume up to ~10W
         | under load. That's actually a lot of power for something with
         | very, very little thermal mass - around 10 grams, and a heat
         | capacity of around 400J/KgC is common for PCBs and chips.
         | 0.4J/gC means that for just one second under full load, if the
         | heat is generated evenly across the entire device, it will heat
         | up 2.5 degrees C. Assuming no cooling, that's 24 seconds until
         | it hits is thermal throttling point.
         | 
         | From the article:
         | 
         | > The amount of activity taking place on the gumstick-sized M.2
         | form factor means higher temps not only for the storage
         | controller, but for the NAND flash itself.
         | 
         | > NAND, Tanguy explains, is happiest within a relatively narrow
         | temperature band. "NAND flash actually likes to be 'hot' in
         | that 60deg to 70deg [celcius] range in order to program a cell
         | because when it's that hot, those electrons can move a little
         | bit easier," he explained.
         | 
         | > Go a little too hot -- say 80degC -- and things become
         | problematic, however. At these temps, you risk the SSD's built-
         | in safety mechanisms forcibly powering down the system to
         | prevent damage. However, before this happens users are likely
         | to see the performance of their drives plummet, as the SSD's
         | controller throttles itself to prevent data loss.
         | 
         | FYI, Tanguy according to his linkedin is the principle product
         | engineer for Micron.
        
         | Out_of_Characte wrote:
         | Although, with flash memory cells nearing their physical limits
         | in lithography, pretty soon you'll need active cooling for
         | bigger stacks.
        
         | marcosdumay wrote:
         | I doubt one is far from the other.
         | 
         | Shoveling IOPS into a bus is an easily parallelizable problem,
         | while NAND-flash memory has a very high theoretical floor on
         | its capacitance. Any good engineer would optimize the CPU part
         | up to the point where it's only a bit worse than the flash, and
         | stop there because there isn't much gain on going further.
         | 
         | If that's the case, you will see the CPU being the bottleneck
         | on your device, but it's actually the memory that constrains
         | the design.
         | 
         | That is, unless the CPU comes from some off the shelve design
         | that can't be changed due to volume constraints. But I don't
         | think SSDs have that kind of low volume.
        
           | zinekeller wrote:
           | > That is, unless the CPU comes from some off the shelve
           | design that can't be changed due to volume constraints.
           | 
           | Most SSDs (with exceptions like Samsung's) simply use
           | SiliconMotion's IP
           | (https://www.siliconmotion.com/products/client/detail) for
           | their controllers.
           | 
           | > But I don't think SSDs have that kind of low volume.
           | 
           | If a custom design adds a cent or two to the BOM then it
           | doesn't matter, but when you need to verify that the changes
           | works as intented _and_ that the data isn 't corrupted
           | (beyond specifications) that's a lot of cents to be saved.
           | Plus, SiliconMotion can request to TSMC to fabricate it at a
           | lower cost per unit (because there is only one pattern to
           | manufacture) than to customise the controllers for each
           | drive.
        
             | wtallis wrote:
             | You're vastly overestimating Silicon Motion's market share.
             | Samsung, Micron, Western Digital, SK Hynix(+Intel), and
             | Kioxia all use in-house SSD controller designs for at least
             | some of their product line. Among second-tier SSD brands
             | that don't have in-house chip design or fabrication, Phison
             | is dominant for high-performance consumer SSDs.
             | 
             | Speaking about SSD controllers in general: they _do_ use
             | off-the-shelf ARM CPU core designs (eg. Cortex-R series),
             | but those are usually the least important IP blocks in the
             | chip. The ARM CPU cores are mostly handling the control
             | plane rather than the data plane, and the latter is what is
             | performance-critical and power-hungry when pushing many GB
             | /s.
        
       | frou_dh wrote:
       | Design priorities probably get warped by doing well at artificial
       | benchmarks/torture-tests in reviews coming to the fore.
        
         | wtallis wrote:
         | Absolutely. CrystalDiskMark is bad for the consumer SSD market.
        
           | tinus_hn wrote:
           | Unfortunately the alternative is trusting the manufacturers
           | data which leads to cheating.
        
             | wtallis wrote:
             | The choices aren't exactly between a bad benchmark and no
             | benchmark at all. And the widespread use of CrystalDiskMark
             | as a _de facto_ standard by both independent testers and
             | drive vendors has done nothing to slow the rise of behavior
             | that an informed consumer would consider to be cheating.
        
       | ksec wrote:
       | There were early PCI-E 5.0 SSD samples pushing closer to
       | theoretical max of 16GB/s but were consuming up to 25W. The
       | current PCI-E 5.0 are only about ~11GB/s. But stays within a 12W
       | power envelop.
       | 
       | I do wonder if we have hit law of diminishing return. With Games
       | optimised for System on PS5 and Xbox's DirectStorage, developer
       | are already showing 80-95% of load time are spent on CPU already.
        
         | swarnie wrote:
         | What's the issue with a jump from 12w to 25w?
        
           | ilyt wrote:
           | The M.2 sockets on motherboards don't exactly have great
           | cooling, and SSDs don't usually even come with radiator.
           | 
           | The positioning can also be pretty iffy, mine have one next
           | to CPU, another just under GPU (no chance getting fan there)
           | and those are the "fast" (directly connected to CPU) ones!
           | 
           | The another 2 slots are again under GPU (one filled with
           | wifi/bt card), and only last 2 are far away from other hot
           | components and get its own heatsink. but those are not
           | directly connected to CPU
        
             | kdmytro wrote:
             | I don't think this would be a factor. Some PCI-E 4.0 SSDs
             | already come with metal heatsinks out of the box. If future
             | SSDs needs addtitional cooling, this will be communicated
             | to the buyer.
             | 
             | I think that the bigger question is whether 25W can be
             | phisically supplied to the drives by contemporary
             | motherboards. What is the power limit for the m.2 ports?
        
               | ilyt wrote:
               | At least according to wikipedia each pin is rated up to
               | 0.5A with I think nine 3.3v pins so technically just
               | around ~15W peak
               | 
               | Technically that's what U.2 (2.5 inch form factor for
               | SSDs) would be for.
               | 
               | They get 5V/12V and thicker connector, I severely doubt
               | M.2 could swing 25W as it only has 3.3V on it
        
           | wtallis wrote:
           | We're talking about an SSD form factor that's 22x80mm and is
           | fed by a couple of card edge pins carrying 3.3V. 12W was
           | already pushing it.
        
             | formerly_proven wrote:
             | According to the one-page datasheet of a Foxconn M.2 M-key
             | socket, maximum current per pin is 0.5 A (they're tiny,
             | after all). Since M.2 M-key has a total of nine pins
             | carrying 3.3 V this would limit power to 15 W before any
             | heat dissipation considerations plus connector derating
             | because the toasty SSD is heating the connector up.
        
               | [deleted]
        
             | KennyBlanken wrote:
             | ...on a device that weighs about 10 grams with a heat
             | capacity likely around 0.4J/gram-degrees-C.
             | 
             | 10Ws for such a device if I did the math right is around a
             | 2.5C/sec rise in device temperature.
        
         | londons_explore wrote:
         | As soon as SSD's are faster, developers will find ways to waste
         | more space and do more IO operations...
        
       | j16sdiz wrote:
       | For those don't read the article:
       | 
       | > "NAND flash actually likes to be 'hot' in that 60deg to 70deg
       | [celcius] range in order to program a cell because when it's that
       | hot, those electrons can move a little bit easier," he explained.
       | ... Go a little too hot -- say 80degC -- and things become
       | problematic
        
         | londons_explore wrote:
         | Sure... erasure energy is lower when it's hot... But there are
         | lots of other downsides, like a much reduced endurance, and
         | more noise in sense amplifiers meaning there is a higher chance
         | of needing to repeat read operations.
        
       | rowanG077 wrote:
       | I don't understand why pushing 16gb/s requires so much power. A
       | fully custom ic where the data path is in silicon should be able
       | to handle that speed no sweat.
        
         | wtallis wrote:
         | SSD controllers aren't just moving a lot of data. Between the
         | PCIe PHY and the ONFI PHY there's a lot of other functionality.
         | In particular, doing LDPC decoding at 16GB/s (128Gb/s) is not
         | trivial.
        
         | andromeduck wrote:
         | ECC/Crypto is pretty energy intensive - other bookkeeping like
         | wear leveling and r/w disturb is also quite complicated.
        
       | formerly_proven wrote:
       | One of the issues with M.2 in desktop PCs is how buried from the
       | airflow they are, and often they're literally on the exhaust side
       | of the GPU (many GPUs exhaust on both long edges, and many
       | motherboards just so happen to have an M.2 slot under the PEG) or
       | in the air-flow dead-zone between PEG and CPU cooler.
       | 
       | Overall the AT(X) form factor, with extension cards slotting in
       | at a 90deg angle, just doesn't work all that well for efficient
       | heat removal. DHE takes away I/O slot space and requires high
       | static pressures (so high fan RPMs), it works for headless
       | servers, but that's about it. The old-fashioned way of a
       | backplane and orthogonal airflow does work much better for stuff
       | like this; but it also requires a card cage and is not very
       | flexible in terms of card dimensions. The one saving grace of ATX
       | is that cards and their cooling solutions can grow in length and
       | height, GPUs are much taller than a normal full-height card, and
       | many are much longer than a full-length card is supposed to be as
       | well.
        
         | jackmott42 wrote:
         | This isn't really an issue because everyone's M.2 is working
         | fine. You have to construct absurd scenarios to cause problems.
         | Use a case with bad airflow, a hot GPU, and a workload that is
         | pushing the gpu and m.2 and cpu to their limits indefinitely,
         | which isn't a real life thing.
         | 
         | and if it IS a real life thing because you have some special
         | use case, you use a case with good airflow.
        
           | zamadatix wrote:
           | I think this depends on what the definition of fine and
           | problems is. IMO most don't even notice when their drive
           | throttles due to thermals so it's probably fine that the
           | drives get hot. At the same time, as newer drives keep
           | drawing more and more power, this is going to start to push
           | the limits of "well why did I buy the fast drive in the first
           | place" if they didn't come with these ever increasing cooling
           | solutions as well.
        
         | numpad0 wrote:
         | People are buying gaming branded PEG propping sticks and
         | sustainer wires because high end PEGs are sagging, and neither
         | the case nor the card support that front slot for full length
         | cards. It's well past the time for a card cage spec as far as I
         | can see from the user perspective.
        
         | alberth wrote:
         | > _"Overall the AT(X) form factor, with extension cards
         | slotting in at a 90deg angle, just doesn 't work all that well
         | for efficient heat removal."_
         | 
         | To give an example of this, here's a server from a huge cloud
         | provider for a brand new AMD 7700 on an ATX board.
         | 
         | Those 90deg angles make for horrible airflow.
         | 
         | https://twitter.com/PetrCZE01/status/1637122488025923585
        
           | ilyt wrote:
           | That look more like "you put powersupply on wrong side".
           | 
           | But yeah, most servers have risers that flip the cards to be
           | parallel to the board.
        
           | mordae wrote:
           | Yeah, but X470D4U and similar boards are so overpriced one
           | can somewhat relate to people using gaming boards for
           | servers. Especially since a lot of them route ECC pins
           | nowadays.
           | 
           | I sure wasn't happy paying extra just to have a different
           | board layout with mostly the same components.
           | 
           | Well, there's IPMI at least. Still not worth the price tag.
        
           | dx034 wrote:
           | It should be clear that this was only shot for marketing
           | purposes. I don't think they actually run cables like that,
           | but it probably looked better to have cables visible in the
           | picture.
        
             | alberth wrote:
             | Do you know this for a fact?
             | 
             | Or are you speculating.
        
           | hgsgm wrote:
           | That looks like a ribbon cable blocking the fan?
        
         | Asooka wrote:
         | I'm water cooling both the CPU and GPU in my PC and have found
         | out that leads to virtually no airflow over the m.2 slot. For
         | now I've simply placed a fan aimed directly at the slot on top
         | of the GPU and that keeps the SSD at a 50 to 60C. I am
         | considering installing a water block on the SSD when I do
         | maintenance next.
        
         | CrimsonRain wrote:
         | All of it doesn't matter because SSDs like being hot.
        
           | ilyt wrote:
           | Sure if you dont like your data
        
           | coldtea wrote:
           | In the same that an ice cream is too cold and could use some
           | heat
        
       | jmclnx wrote:
       | >While NAND flash tends to prefer higher temperatures there is
       | nothing wrong with running it closer to ambient temperatures
       | 
       | New one on me :) I did not know NAND liked to be hot, if true
       | does not bode well for laptops for over-clockers.
       | 
       | To me, the end result seems to be, yes and no, up to you. But I
       | still prefer HDD anyway, I am very old school.
        
         | jackmott42 wrote:
         | You do not actually prefer HDD, and nothing bodes badly for
         | overclockers or laptops. You are just looking for ways to be
         | contrarian.
        
           | detrites wrote:
           | Preferring a storage medium for its reliability regardless of
           | the amount of writes it endures is utility - I could see that
           | might be preferable to SSD in some specific case. Maybe there
           | are other upsides, eg, it's often much cheaper.
           | 
           | Regardless, some people drive an old, dangerous, slow, gas-
           | guzzling car - and maintain it at great expense - just
           | because they prefer it. Aesthetic and sentimental appeal is
           | highly personal and knows no bounds.
        
             | seized wrote:
             | Except hard drives aren't celebrated for reliability. Or
             | speed. Or low latency. Or durability (try knocking one). Or
             | power and heat. Old cars you can at least make some
             | arguments for... Hard drives as primary storage/boot, no.
             | 
             | Really they're good for bulk storage. And that's it. For
             | use in primary compute they're really great if you want to
             | slow everything down.
        
               | hulitu wrote:
               | AFAIK compared with SSDs they are better (reliability).
               | 
               | And running any electronic component hot is just asking
               | for trouble.
        
               | fuzzfactor wrote:
               | >hard drives aren't celebrated for reliability.
               | 
               | With no revelry whatsoever my 2006 early SATA Maxtor
               | 100GB HDD is still going strong with Windows 11 on a Dell
               | Vista PC.
               | 
               | Boots no slower than our IT guys have 2-year-old SSD W10
               | PC's doing at the office.
        
               | justsomehnguy wrote:
               | > AFAIK compared with SSDs they are better (reliability).
               | 
               | Depends on the price point.
               | 
               | Just days ago PM1725 gave us trouble. Yet, five WD10JUCT
               | I bought recently (in R5) beat it on the price and
               | available capacity, even with abysmal performance.
               | 
               | >any electronic component hot is just asking for trouble
               | 
               | I'd say running _too hot_.
        
           | EscapeFromNY wrote:
           | What's wrong with HDD? It's actually quite convenient having
           | time for your morning jog and a shower while you wait for
           | your computer to boot up.
        
             | consp wrote:
             | As a bonus and it's really old it sounds like an old
             | gravity fed drip coffee machine.
             | 
             | on a heat note: a spinning rustdisk also uses quite a bit
             | of watt, every time, all the time. Powerwise the high W
             | ssd's are still less power hungry over time.
        
               | pmontra wrote:
               | A 1 TB 2" HDD I attached to an Odroid consumes little
               | more than 1 W. A 3.5" 2 TB one consumes 10 W. I turn them
               | off by software when I don't need it. They are a backup
               | storage.
        
             | pixl97 wrote:
             | Nothing at all, I like to be able to count my IOPS on my
             | fingers.
        
           | coldtea wrote:
           | Please tell us more about the psychology of the parent
           | commenter.
           | 
           | You seem to have studied it quite well, or perhaps find that
           | ad-hominems make for the best arguments!
        
         | formerly_proven wrote:
         | NAND is bad for cold storage, because writes cause more wear
         | when it's not warm. Meanwhile data retention benefits from
         | lower temperatures.
        
           | adgjlsfhk1 wrote:
           | cold storage by it's nature doesn't have a lot of writes...
        
             | flaminHotSpeedo wrote:
             | And if there was a hot (as in frequently used) drive it
             | would still heat up (as in temperature).
             | 
             | But I guess the other commenters point might be valid if
             | you run a datacenter in a blast chiller
        
       | theknocker wrote:
       | [dead]
        
       | valine wrote:
       | Does anyone know of a PCIE 5 SSD designed for sustained
       | read/write? Most of these new drives are meant for short bursts
       | of data transfer, I can't imagine water cooling would be
       | necessary for most drives.
        
         | matja wrote:
         | Kioxia CD8 : https://www.storagereview.com/news/kioxia-
         | cd8-series-pcie-5-...
        
       ___________________________________________________________________
       (page generated 2023-05-27 23:02 UTC)