[HN Gopher] I tested four NVMe SSDs from four vendors - half los...
       ___________________________________________________________________
        
       I tested four NVMe SSDs from four vendors - half lose FLUSH'd data
       on power loss
        
       Author : ahachete
       Score  : 263 points
       Date   : 2022-02-21 19:29 UTC (3 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | throwaway984393 wrote:
       | Aren't there filesystem options that affect this? Wasn't there a
       | whole controversy over ext4 and another filesystem not committing
       | changes even after flush (under specific options/scenarios)?
        
       | jrockway wrote:
       | I always liked the embedded system model where you get flash
       | hardware that has two operations -- erase block and write block.
       | GC, RAID, error correction, etc. are then handled at the
       | application level. It was never clear to me that the current
       | tradeoff with consumer-grade SSDs was right. On the one hand,
       | things like the error correction, redundancy, and garbage
       | collection don't require the attention from CPU (and more
       | importantly, doesn't tie up any bus). On the other hand, the user
       | has no control over what the software on the SSD's chip does.
       | Clearly vendors and users are at odds with each other here;
       | vendors want the best benchmarks (so you can sort by speed
       | descending and pick the first one), but users want their files to
       | exist after their power goes out.
       | 
       | It would be nice if we could just buy dumb flash and let the
       | application do whatever it wants (I guess that application would
       | be your filesystem; but it could also be direct access for
       | specialized use cases like databases). If you want maximum speed,
       | adjust your settings for that. If you want maximum write
       | durability, adjust your settings for that. People are always
       | looking for that one size fits all use case, but it's hard here.
       | Some people may be running cloud providers and already have
       | software to store that block on 3 different continents. Some
       | people may be an embedded system with a fixed disk image that
       | changes once a year, with some temporary storage for logs. There
       | probably isn't a single setting that gets optimal use out of the
       | flash memory for both use cases. The cloud provider doesn't care
       | if a block, flash chip, drive, server rack, availability zone, or
       | continent goes away. The embedded system may be happy to lose
       | logs in exchange for having enough writes left to install the
       | next security update.
       | 
       | It's all a mess, but the constraints have changed since we made
       | the mess. You used to be happy to get 1/6th of a PCI Express lane
       | for all your storage. Now processors directly expose 128 PCIe
       | lanes and have a multitude of underused efficiency cores waiting
       | to be used. Maybe we could do all the "smart" stuff in the OS and
       | application code, and just attach commodity dumb flash chips to
       | our computer.
        
         | the_duke wrote:
         | I can recommend the related talk "It's Time for Operating
         | Systems to Rediscover Hardware". [1]
         | 
         | It explores how modern systems are a set of cooperating devices
         | (each with their own OS) while our main operating systems still
         | pretend to be fully in charge.
         | 
         | [1] https://www.youtube.com/watch?v=36myc8wQhLo
        
         | ATsch wrote:
         | There's really two problems here:
         | 
         | 1. Contemporary mainstram OSes have not risen to the challenge
         | of dealing appropriately with the multi-CPU, multi-address
         | space nature of modern computers. The proportion of the
         | computer that the "OS" runs on has been shrinking for a long
         | time and there have only been a few efforts to try to fix that
         | (e.g. HarmonyOS, nrk, RTKit)
         | 
         | 2. Hardware vendors, faced with proprietary or non-malleable
         | OSes and incentives to keep as much magic in the firmware as
         | possible, have moved forward by essentially sandboxing the user
         | OS behind a compatibility shim. And because it works well
         | enough, OS developers do not feel the need to adjust to the
         | hardware, continuing the cycle.
         | 
         | There is one notable recent exception in adjusting filesystems
         | to SMR/Zoned devices. However this is only on Linux, so desktop
         | PC component vendors do not care. (Quite the opposite: they
         | disable the feature on desktop hardware for market
         | segmentation)
        
           | btown wrote:
           | Are there solutions to this in the high-performance computing
           | space, where random access to massive datasets is frequent
           | enough that the "sandboxing" overhead adds up?
        
         | bob1029 wrote:
         | I would absolutely love to have access to "dumb" flash from my
         | application logic. I've got append only systems where I could
         | be writing to disk many times faster if the controller weren't
         | trying to be clever in anticipation of block updates.
        
         | ghshephard wrote:
         | Nothing says that you can't both offload everything to
         | hardware, and have the application level configure it. Just
         | need to expose the API for things like FLUSH behavior and
         | such...
        
           | jrockway wrote:
           | Yeah, you're absolutely right. I'd prefer that the world
           | dramatically change overnight, but if that's not going to
           | happen, some little DIP switch on the drive that says "don't
           | acknowledge writes that aren't durable yet" would be fine ;)
        
         | wtallis wrote:
         | Consumer SSDs don't have much room to offer a different
         | abstraction from emulating the semantics of hard drives and
         | older technology. But in the enterprise SSD space, there's a
         | lot of experimentation with exactly this kind of thing. One of
         | the most popular right now is zoned namespaces, which separates
         | write and erase operations but otherwise still abstracts away
         | most of the esoteric details that will vary between products
         | and chip generations. That makes it a usable model for both
         | flash and SMR hard drives. It doesn't completely preclude
         | dishonest caching, but removes some of the incentive for it.
        
           | sitkack wrote:
           | Check out https://www.snia.org/ if you want to track
           | development in this area.
        
       | eptcyka wrote:
       | Hey, now I want/need to know if my SSD is flushing correctly or
       | not.
        
       | hardwaresofton wrote:
       | I've actually run into some data loss running simple stuff like
       | pgbench on Hetzner due to this -- I ended up just turning off
       | write-back caching at the device level for all the machines in my
       | cluster:
       | 
       | https://vadosware.io/post/everything-ive-seen-on-optimizing-...
       | 
       | Granted I was doing something highly questionable (running
       | postgres with fsync off on ZFS) It was _very_ painful to get to
       | the actual issue, but I 'm glad I found out.
       | 
       | I've always wondered if it was worth pursuing to start a simple
       | data product with tests like these on various cloud providers to
       | know where these corners are and what you're _really_ getting for
       | the money (or lack thereof).
       | 
       | [EDIT] To save people some time (that post is long), the command
       | to set the feature is the following:                   nvme set-
       | feature -f 6 -v 0 /dev/nvme0n1
       | 
       | The docs for `nvme` (nvme-cli package, if you're Ubuntu based)
       | can be pieced together across some man pages:
       | 
       | https://man.archlinux.org/man/nvme.1
       | 
       | https://man.archlinux.org/man/nvme-set-feature.1.en
       | 
       | It's a bit hard to find all the NVMe features but 6 is the one
       | for controlling write-back caching.
       | 
       | https://unix.stackexchange.com/questions/472211/list-feature...
        
       | colanderman wrote:
       | I'm curious whether the drives are at least maintaining write-
       | after-ack ordering of FLUSHed writes in spite of a power failure.
       | (I.e., whether the contents of the drives after power loss are
       | nonetheless crash consistent.) That still isn't great, as it
       | messes with consistency between systems, but at least a system
       | solely dependent on that drive would not suffer loss of
       | integrity.
        
       | nyc_pizzadev wrote:
       | I'm actually interested in testing this scenario, a drive getting
       | power loss. Is there a thing which will cut power to a server
       | device on command? Or do you just pull the drive out of its bay?
        
       | dboreham wrote:
       | Surprised this is being rediscovered. Years ago only "enterprise"
       | or "data center" SSDs had the supercap necessary to provide power
       | for the controller to finish pending writes. Did we ever expect
       | consumer SSDs to not lose writes on power fail?
        
         | evancox100 wrote:
         | It's not losing pending writes, it's the drive saying there are
         | no pending writes, but losing them anyways. ie the drive is
         | most likely lying
        
           | dboreham wrote:
           | Understood. Low-end drives have always done this because:
           | performance.
        
           | monocasa wrote:
           | As I said in the recent Apple discussion, pretty much all
           | drives are lying and have been for decades at this point. The
           | good brands just spec out enough capacitance that you don't
           | see the difference externally.
           | 
           | https://news.ycombinator.com/item?id=30370551#30374585
        
             | brigade wrote:
             | So if SSDs rely solely on capacitors for data integrity and
             | lie about flushes, what _do_ they do on a flush that takes
             | any amount of time? Are they just taking a speed hit for
             | funsies? Heck, from this test, the magnitude of the speed
             | hit isn 't even correlated with whether they lose writes...
        
               | dboreham wrote:
               | When you look at how long it takes to perform a block
               | write on a flash device, you'll see that no SSD is going
               | to honor flush semantics.
        
               | monocasa wrote:
               | At one point it was different barriers on the different
               | submission queues inside the drive. Not externally
               | visible queues, but between internal implementation
               | layers.
               | 
               | It's been a few years since I've checked up on this and
               | it was for the most part pre SSDs though.
        
               | supermatt wrote:
               | Probably implementing a barrier for ordering io...
        
             | WrtCdEvrydy wrote:
             | The more you look at speculative execution and these drive
             | issues, the more you see that we're giving up a lot of what
             | computing "safe" for just performance.
        
               | _greim_ wrote:
               | Brings to mind Goodhart's law:
               | 
               | > When a measure becomes a target, it ceases to be a good
               | measure.
               | 
               | In this case, performance as a measure of value.
        
               | blibble wrote:
               | it's mongodb all over again
               | 
               | gotta get dem sweet sweet benchmarks, to hell if we're
               | persisting garbage
        
             | consp wrote:
             | Considering the amount of 10uF Tantalum caps (30) on one of
             | the bricked enterprise SSDs I opened I'm not surprised at
             | all.
        
               | dboreham wrote:
               | 300uF is not a supercap.
        
         | phkahler wrote:
         | This is not about pulling power during writes. Flush is
         | supposed to force all non-committed (i.e. cached) writes to
         | complete. Once that has been acknowledged there is no need for
         | any further writes. So those drives are effectively lying about
         | having completed the flush. I also have to wonder when they
         | intended to write the data...
        
           | dboreham wrote:
           | Also well known for many years.
        
       | mjw1007 wrote:
       | << Data loss occured with a Korean and US brand, but it will turn
       | into a whole "thing" if I name them so please forgive me. >>
        
         | LeifCarrotson wrote:
         | > _The models that never lost data: Samsung 970 EVO Pro 2TB and
         | WD Red SN700 1TB._
         | 
         | The others would probably be SK Hynix and Micron/Crucial,
         | right? Curious why he's reluctant to name and shame. A drive
         | not conforming to requirements and losing data is a legitimate
         | problem that should be a "thing"!
        
           | 0xcde4c3db wrote:
           | Crucial seems plausible, but there's a surprising number of
           | US brands for NVMe SSDs. I was able to find: Crucial,
           | Corsair, Seagate, Plextor, Sandisk, Intel, Kingston, Mushkin,
           | PNY, Patriot Memory, and VisionTek.
        
             | LeifCarrotson wrote:
             | Micron/Crucial is the 3rd largest manufacturer of flash
             | memory, most of the other brands in your list just make the
             | PCB and whitelabel other flash chips and controllers
             | (perhaps with some firmware customization, but they're
             | usually not responsible for implementing features like
             | FLUSH).
             | 
             | Toshiba/Kioxia is another big one, but they're based in
             | Japan. The US brand could be Intel instead of Crucial, I
             | suppose.
        
           | 420official wrote:
           | > Curious why he's reluctant to name and shame
           | 
           | My sense is he wants to shame review sites for not paying
           | attention to this rather than shame manufacturers directly at
           | this point.
        
           | nijave wrote:
           | Looks like he works at Apple. Maybe what he's testing is work
           | related or covered by some sort of NDA (e.g. doesn't want to
           | risk harming supplier relations for the brands misbehaving)
        
           | alliao wrote:
           | I thought Crucial specifically designed some power loss
           | protection as a differentiating selling point? Well at least
           | that was the reason why I bought one back in M.2 days (gosh
           | my PC is ancient...)
        
             | wtallis wrote:
             | I think the most they ever promised for some of their
             | consumer drives was that a write interrupted by a power
             | failure would not cause corruption of data that was already
             | at rest. Such a failure can be possible when storing
             | multiple bits per memory cell but programming the cells in
             | multiple passes, especially if later passes are writing
             | data from an entirely different host write command.
        
             | iforgotpassword wrote:
             | _Back_ in M.2 days? I know I 'm getting old, but what did
             | miss?
        
               | cestith wrote:
               | M.2 SATA as opposed to NVMe perhaps?
        
               | alliao wrote:
               | it's the storage form factor of my current PC.. which is
               | probably close to 10yrs old :P
               | 
               | https://en.wikipedia.org/wiki/M.2
        
               | sascha_sl wrote:
               | It is still being used in very recent PCIe Gen 4 NVMe
               | drives.
               | 
               | Though you likely have a different key only taking SATA
               | drives.
        
       | tr33house wrote:
       | I'm a systems engineer but I've never done low level
       | optimizations on drives. How does one even go about even testing
       | something like this? It sounds like something cool that I'd like
       | to be able to do
        
         | klysm wrote:
         | Write and flush and unplug the cable!
        
         | xenadu02 wrote:
         | My script repeatedly writes a counter value "lines=$counter" to
         | a file, then calls fcntl() with F_FULLFSYNC against that file
         | descriptor which on macOS ends up doing an NVMe FLUSH to the
         | drive (after sending in-memory buffers and filesystem metadata
         | to the drive).
         | 
         | Once those calls succeed it increments the counter and tries
         | again.
         | 
         | As soon as the write() or fcntl() fail it prints the last
         | successfully written counter value which can be checked against
         | the contents of the file. Remember: the semantics of the API
         | and the NVMe spec require that a successful return from
         | fcntl(fd, F_FULLFSYNC) on macOS require that data is durable at
         | that point no matter what filesystem metadata OR drive internal
         | metadata is needed to make that happen.
         | 
         | In my test while the script is looping doing that as fast as
         | possible I yank the TB cable. The enclosure is bus powered so
         | it is an unceremonious disconnect and power off.
         | 
         | Two of the tested drives always matched up: whatever the
         | counter was when write()+fcntl() succeeded is what I read back
         | from the file.
         | 
         | Two of the drives sometimes failed by reporting counter values
         | < the most recent successful value, meaning the write()+
         | fcntl() reported success but upon remount the data was gone.
         | 
         | Anytime a drive reported a counter value +1 from what was
         | expected I still counted at that as success... after all
         | there's a race window where the fcntl() has succeeded but the
         | kernel hasn't gotten the ACK yet. If disconnect happens at that
         | moment fcntl() will report failure even though it succeeded. No
         | data is lost so that's not a "real" error.
        
       | monocasa wrote:
       | Relevant recent discussion about Apple's NVMe being very slow on
       | FLUSH.
       | 
       | Apple's custom NVMes are amazingly fast - if you don't care about
       | data integrity
       | 
       | https://news.ycombinator.com/item?id=30370551
        
         | hughrr wrote:
         | Not really a problem when your computer has a large UPS built
         | into it. Desktop macs, not so good.
         | 
         | But really isn't the point of a journaling file system to make
         | sure it is consistent at one guaranteed point in time, not
         | necessarily without incidental data loss.
        
           | withinboredom wrote:
           | > Not really a problem when your computer has a large UPS
           | built into it.
           | 
           | Except that _one time_ you need to work until the battery
           | fails to power the device, at 8%, because the battery's
           | capacity is only 80%. Granted, this is only after a few years
           | of regular use...
        
             | monocasa wrote:
             | In Apple's defense, they probably have enough power even in
             | the worst case to limp along enough to flush in laptop form
             | factor, even if the power management components refuse to
             | power the main CCXs. Speccing out enough caps in the
             | desktop case would be very Apple as well.
        
               | withinboredom wrote:
               | Once voltage from the battery gets too low (despite
               | reporting whatever % charge), you aren't getting anything
               | from the battery.
        
               | monocasa wrote:
               | It's the other way around. The PMIC cuts the main system
               | off at a certain voltage, and even in the worse case you
               | have the extra watt/sec to flush everything at that
               | point.
        
               | withinboredom wrote:
               | I'm really hoping the battery has a low-voltage cutoff...
               | I guess the question is: does the battery cut the power,
               | or does the laptop? In the latter case, this may be "ok"
               | for some definition of ok. The former, there's probably
               | not enough juice to do anything.
        
               | monocasa wrote:
               | > does the battery cut the power, or does the laptop?
               | 
               | Last time I checked (and I could very well be out of date
               | on this), there wasn't really a difference. It wasn't
               | like an 18650 were the cells themselves have protection,
               | but a cohesive power management subsystem that managed
               | the cells more or less directly. It had all the
               | information to correctly make such choices (but, you
               | know, could always have bugs, or it's a metric that was
               | never a priority, etc.).
        
               | mardifoufs wrote:
               | Batteries can technically be used until their voltage is
               | 0 ( it would be hard to get any current under >1 volt for
               | lithium cells but still). The cutoff is either due to the
               | BMS (battery management system) cutting off power to
               | protect the cells from permanent damage or because the
               | voltage is just too low to power the device (but in that
               | case there's still voltage).
               | 
               | Running lithium cells under 2.7v leads to permanent
               | damage. But, I'm sure laptops have a built-in safety
               | margin and can cut off power to the system selectively.
               | That's why you can still usually see some electronics
               | powered (red battery light, flashing low battery on a
               | screen, etc) even after you "run out" of battery.
               | 
               | I've never designed a laptop battery pack, but from my
               | experience in battery packs in general, you always try to
               | keep safety/sensing/logging electronics powered even
               | after a low voltage cutoff.
               | 
               | Even in very cheap commodity packs that are device
               | agnostic, the basic internal electronics of the battery
               | itself always keep themselves powered even after cutting
               | off any power output. Laptops have the advantage of a
               | purpose built battery and BMS so they can have a very
               | fine grained power management even at low voltages.
        
               | bbarnett wrote:
               | There is no defense for lying about sync like this. Ever.
        
           | joenathanone wrote:
           | UPS won't help if kernel panics.
        
             | colanderman wrote:
             | It doesn't need to, kernel panic alone does not cause
             | acknowledged data not to be written to the drive.
             | 
             | UPS is not perfect though, it's better if your data
             | integrity guarantees are valid independent of power supply.
             | All that requires is that the drive doesn't lie.
        
             | metalliqaz wrote:
             | kernel panic wouldn't take out the SSD firmware...
        
               | dathinab wrote:
               | To quote Apples man pages:
               | 
               | > Specifically, if the drive loses power or the OS
               | crashes, the application may find that only some or none
               | of their data was written.
        
               | colanderman wrote:
               | mac OS's fsync(2) doesn't wait for SCSI-level flush,
               | unless F_FULLFSYNC is provided.
        
           | colanderman wrote:
           | Hard drive write caches are supposed to be battery-backed
           | (i.e., internal to the drive) for exactly this reason.
           | (Apparently the drives tested are not.) Data integrity should
           | not be dependent on power supply (UPS or not) in any way;
           | it's unnecessary coupling of failure domains (two different
           | domains nonetheless -- availability vs. integrity).
        
             | oceanplexian wrote:
             | As a systems engineer, I think we should be careful
             | throwing words around like "should". Maybe the data
             | integrity isn't something that's guaranteed by a single
             | piece of hardware but instead a cluster or a larger
             | eventually consistent system?
             | 
             | There will always be trade-offs to any implementation. If
             | you're just using your M2 SSD to store games downloaded off
             | Steam I doubt it really matters how well they flush data.
             | However if your financial startup is using then without an
             | understanding of the risks and how to mitigate them, then
             | you may have a bad time.
        
             | nomel wrote:
             | > it's unnecessary coupling
             | 
             | I think much improved write performance is a good example
             | of how it can be beneficial, with minimal risk.
             | 
             | Everything can be nice ideals of abstraction, until you
             | want to push the envelope.
        
               | colanderman wrote:
               | Accidental drive pulls happen -- think JBODs and RAID.
               | Ideally, if an operator pulls the wrong drive, and then
               | shoves it back in in a short amount of time, you want to
               | be able to recover from that without a full RAID rebuild.
               | You can't do that correctly if the RAID's bookkeeping
               | structures (e.g. write-intent bitmap) are not consistent
               | with the rest of the data on the drive. (To be fair, in
               | practice, an error arising in this case would likely be
               | caught by RAID parity.)
               | 
               | Not saying UPS-based integrity solutions don't make
               | sense, you are right it's a tradeoff. The issue to me is
               | more device vendors misstating their devices'
               | capabilities.
        
           | dathinab wrote:
           | > Not really a problem when your computer has a large UPS
           | built into it.
           | 
           | Actually it is (through a small one) to name some examples
           | where it can still lose without full sync:
           | 
           | - OS crashes
           | 
           | - random hard reset, e.g. due to bit flips due to e.g. cosmic
           | radiation (happens). Or someone putting their magnetic
           | earphone cases or similar on your laptop or similar.
           | 
           | Also any application which care about data integrity will do
           | full syncs and in turn will get hit by a huge perf. penalty.
           | 
           | I have no idea why people are so adamant to defend Apple in
           | this case, it's pretty clear that they messed up as
           | performance with full flush is just WAY to low and this
           | affects anything which uses full flushes, which any
           | application should at least do on (auto-)safe.
           | 
           | The point of a journalism file system is about making it less
           | likely the file system _itself_ isn't corrupted. Not that the
           | files are not corrupted if they don't use full sync!
        
             | colanderman wrote:
             | OS crashes do not cause acknowledged writes to be lost.
             | They are already in the drive's queue.
        
               | dathinab wrote:
               | They do if you don't use F_FULLSYNC, even apple
               | acknowledges it (quote apple man pages):
               | 
               | > Specifically, if the drive loses power or the OS
               | crashes, the application may find that only some or none
               | of their data was written.
               | 
               | It's also worse then just write losses:
               | 
               | > The disk drive may also re-order the data so that later
               | writes may be present, while earlier writes are not.
        
               | colanderman wrote:
               | I'm using "acknowledged" in the sense of SCSI-level flush
               | (per the thread OP), not mac OS's peculiar implementation
               | of fsync.
        
               | dathinab wrote:
               | But the thread OS is about it not being a problem that
               | SCSI-level flushes are supper slow, which is only not a
               | problem if you don't do them (e.g. only use fsync on
               | Mac)?
               | 
               | But reading it again there might have been some confusion
               | about what was meant.
        
             | marginalia_nu wrote:
             | I had an NVMe controller randomly reset itself a few days
             | ago. I think it was a heat issue. Not really sure though,
             | may be that the motherboard is dodgy.
             | 
             | This shit does happen.
        
           | emodendroket wrote:
           | It seems pretty clear that desktop Macs are an afterthought
           | for Apple.
        
           | monocasa wrote:
           | Journaling file systems depend on commands like FLUSH to
           | create the appropriate barriers to build their journal's
           | semantics.
        
             | rowanG077 wrote:
             | I don't have a clue how a journaling FS works. But any
             | ordering should not be observable unless you have a power
             | outage. Can you give an example how a journaling FS could
             | observe something that should be observable?
        
               | nwallin wrote:
               | > But any ordering should not be observable unless you
               | have a power outage.
               | 
               | But what if the front _does_ fall off?
        
               | bbarnett wrote:
               | A crash, lockup, are the same as a power failure.
        
               | colanderman wrote:
               | No, during a crash or lockup, acknowledged writes are not
               | lost. (Because the drive has acknowledged them, they are
               | in the drive's internal queue and thus need no further
               | action from the OS to be committed to durable storage.)
               | Only power loss/power cycle causes this.
        
               | rowanG077 wrote:
               | Why? During a crash or lockup acked writes still reached
               | the drive. They will be flushed to the storage eventually
               | by the SSD controller. As long as you have power that is.
        
               | joosters wrote:
               | The key word is 'eventually'. How long? Seconds, or even
               | minutes? If your machine locks up, you turn it off and on
               | again. If the drive didn't bother to flush its internal
               | caches by then, that data is lost, just as in a power
               | failure.
        
               | monocasa wrote:
               | 100s of milliseconds is the right order of magnitude.
        
               | colanderman wrote:
               | > you turn it off and on again
               | 
               | That would be a power failure. Kernel crash is not
               | equivalent to that.
               | 
               | System reboot doesn't entail power failure either. The
               | disks may be powered by an independent external enclosure
               | (e.g. JBOD). All they see is that they stopped receiving
               | commands for a short while.
        
               | monocasa wrote:
               | > unless you have a power outage
               | 
               | Journaling FSes are all about safety in the face of such
               | things. That is, unless the drive lies.
        
       | formerly_proven wrote:
       | > I guess review sites don't test this stuff.
       | 
       | Most review sites don't really test much at all.
        
         | sokoloff wrote:
         | Except for their affiliate links; they do test those.
        
       | post_break wrote:
       | The problem is you can't trust a model number of SSD. They change
       | controllers, chips, etc after the reviews are out and they can
       | start skimping on components.
       | 
       | https://www.tomshardware.com/news/adata-and-other-ssd-makers...
        
         | TheJoYo wrote:
         | this is even worse in automotive ECUs. this shortage is only
         | going to make things more difficult to test and forget about
         | securing.
        
         | infogulch wrote:
         | This needs to be cracked down on from a consumer protection
         | lens. Like, any product revision that could potentially produce
         | a different behavior must have a discernable revision number
         | published as part of the model number.
        
           | sokoloff wrote:
           | While I agree with the sentiment, even a firmware revision
           | could cause a difference in behavior and it seems
           | unreasonable to change the model number on every firmware
           | release.
        
             | jay_kyburz wrote:
             | I don't think its unreasonable to just add a date next to
             | the serial number that has when the firmware was released.
             | It's 6 extra numbers
        
               | yjftsjthsd-h wrote:
               | But firmware can be updated after the unit was shipped; a
               | date stamped on the hardware can't promise accuracy.
        
               | carlhjerpe wrote:
               | Is this the definition of whataboutism?
        
               | mardifoufs wrote:
               | Whataboutism is not pointing out flaws in a proposal, no.
               | But I guess the word is so overly used these days that
               | the definition becomes blurry.
        
               | robertlagrant wrote:
               | No, it's not.
        
               | yjftsjthsd-h wrote:
               | Er, no? Whataboutism is an attempt to claim hypocrisy by
               | drawing in something else with the same flaw. This is
               | pointing out a way for _this_ exact proposal to fail.
        
               | pezezin wrote:
               | _It 's 8 extra numbers_
               | 
               | FTFY, unless you use the YYYY-WW date format. But please
               | don't use just two digits for year numbers, we should
               | have learn from Y2K years ago.
        
             | matheusmoreira wrote:
             | It seems unreasonable to me that there is unknown
             | proprietary software running on my storage devices to begin
             | with. This leads to insanity such as failure to report read
             | errors or falsely reporting unwritten data as committed.
             | This should be part of the operating system, not some
             | obscure firmware hidden away in some chip.
        
           | mjw1007 wrote:
           | Right.
           | 
           | And no switching the chipset to a different supplier
           | requiring entirely different drivers between the XYZ1001 and
           | the XYZ1001a, either.
           | 
           | If I ruled the world I'd do it via trademark law: if you
           | don't follow my set of sensible rules, you don't get your
           | trademarks enforced.
        
             | toss1 wrote:
             | Years ago, that kind of behavior got Dell crossed off my
             | list of suppliers I'd work with for clients. We had to
             | setup 30+ machines of the exact same model number, and same
             | order, and set of pallets -- yet there were at least 7
             | DIFFERENT random configurations of chips on the
             | motherboards & video cards -- all requiring different
             | versions of the drivers. This was before the days of auto-
             | setup drivers. Absolute flaming nightmare. It was basically
             | random - the different chipsets didn't even follow serial
             | number groupings, it was just hunt around versions for
             | every damn box on the network. Dell's driver resources &
             | tech support even for VARs was worthless.
             | 
             | This wasn't the first incident, but after such a blatant
             | set of quality control failures I'll never intentionally
             | select or work with a Dell product again.
        
           | metalliqaz wrote:
           | usually the manufacturers are careful not to list official
           | specs that these part swaps affect. all you get is a vague
           | "up to" some b/sec or iops.
        
           | alerighi wrote:
           | It's complicated. Nowadays we have shortage of electronic
           | components and it's difficult to know what will be not
           | available the next month. So it's obvious that manufacturers
           | have to make different variants of a product that can mount
           | different components.
        
           | closeparen wrote:
           | The PC laptop manufacturers have worked around this for
           | decades by selling so many different short-lived model
           | numbers that you can rarely find information about the
           | specific models for sale at a given moment.
        
             | gjs278 wrote:
        
             | renewiltord wrote:
             | True. It's the Gish Gallop of model numbering. Fortunately,
             | it is the preserve of the crap brands. It's sort of like
             | seeing "in exchange for a free product, I wrote this honest
             | and unbiased review". Bam! Shitty product marker! Asus
             | GL502V vs Asus GU762UV? Easy, neither. They're clearly both
             | shit or they wouldn't try to hide in the herd.
        
               | gruez wrote:
               | >Shitty product marker! Asus GL502V vs Asus GU762UV?
               | Easy, neither. They're clearly both shit or they wouldn't
               | try to hide in the herd.
               | 
               | Is this based on empirical evidence or something? My
               | sample size isn't very big, but I haven't really noticed
               | any correlation between this practice and whether the
               | product is crappy. I just chalked this up to
               | manufacturers colluding with retailers to stop price
               | matches, rather than because "clearly both shit or they
               | wouldn't try to hide in the herd".
        
               | closeparen wrote:
               | Which laptop makers (other than Apple) don't do this?
        
           | duped wrote:
           | I don't want to live in a world where electronic components
           | can't be commoditized because of fundamentally misinformed
           | regulation.
           | 
           | There are alternatives to interchangeable parts, and none of
           | them are good for consumers. And that is what you're talking
           | about - the only reason for any part to supplant another in
           | feature or performance or cost is if manufacturers can change
           | them !
        
           | gruez wrote:
           | >Like, any product revision that could potentially produce a
           | different behavior must have a discernable revision number
           | published as part of the model number.
           | 
           | AFAIK samsung does this, but it doesn't really help anyone
           | except enthusiasts because the packaging still says "980 PRO"
           | in big bold letters, and the actual model number is something
           | indecipherable like "MZ-V8P1T0B/AM". If this was a law they
           | might even change the model number randomly for CYA/malicious
           | compliance reasons. eg. firmware updated? new model number.
           | DRAM changed, but it's the same spec? new model number.
           | changed the supplier for the SMD capacitors? new model
           | number. PCB etchant changed? new model number.
        
         | n00bface wrote:
         | This practice is false advertising at a minimum, and possibly
         | fraud. I'm shocked there hasn't been State AG or CFPB
         | investigations and fines yet.
         | 
         | Edit: Being mad and making mistakes go hand in hand. FTC is the
         | appropriate organization to go after these guys.
        
           | oceanplexian wrote:
           | What do you expect? These companies are making toys for
           | retail consumers. If you want devices that guarantee data
           | integrity for life or death, or commercial applications,
           | those exist, come with lengthy contracts, and cost 100-1000x
           | more than the consumer grade stuff. Like I seriously have a
           | hard time empathizing with someone who thinks they are
           | entitled to anything other than a basic RMA if their $60 SSD
           | loses data
        
           | R0b0t1 wrote:
           | It's definitely fraud. The only reason to hide the things
           | they do is to mislead the customer as evidenced by previous
           | cases of this that caused serious harm to consumers.
        
           | gruez wrote:
           | >or CFPB investigations and fines yet
           | 
           | >CFPB
           | 
           | "The Consumer Financial Protection Bureau (CFPB) is an agency
           | of the United States government responsible for consumer
           | protection in the financial sector. CFPB's jurisdiction
           | includes banks, credit unions, securities firms, payday
           | lenders, mortgage-servicing operations, foreclosure relief
           | services, debt collectors, and other financial companies
           | operating in the United States. "
        
             | blacksmith_tb wrote:
             | Right, presumably they meant to say the FTC[1].
             | 
             | 1: https://www.usa.gov/stop-scams-frauds
        
               | n00bface wrote:
               | Thanks. You got it.
        
       | supermatt wrote:
       | So, seems those drives may have been ignoring the F_FULLFSYNC
       | after all...
       | 
       | https://news.ycombinator.com/item?id=30371857
       | 
       | The Samsung EVO drives are interesting because they have a few GB
       | of SLC that they use as a secondary buffer before they reflush to
       | the MLC.
        
       | hughrr wrote:
       | The important quote:
       | 
       |  _> The models that never lost data: Samsung 970 EVO Pro 2TB and
       | WD Red SN700 1TB._
       | 
       | I always buy the EVO Pro's for external drives and use TB to NVMe
       | bridges and they are pretty good.
        
         | Trellmor wrote:
         | There is a 970 Evo, a 970 Pro and a 970 Evo Plus, but no 970
         | Evo Pro as far as I am aware. Would be interesting what model
         | OP is actually talking about and if it is the same for other
         | Samsung NMVe SSDs. I also prefer Samsung SSDs because they are
         | reliably and they usually don't change parts to lower spec ones
         | while keeping the same model number like some other vendors.
        
           | qwertox wrote:
           | I mostly buy Samsung Pro. Today I put an Evo in a box which
           | I'm sending back for RMA because of damaged LBAs. I guess I'm
           | stopping my tests on getting anything else but the Pros.
           | 
           | But IIRC Samsung was also called out for switching
           | controllers last year.
           | 
           | "Yes, Samsung Is Swapping SSD Parts Too | Tom's Hardware"
        
           | hughrr wrote:
           | Sorry I should have said EVO plus there in my original post.
           | I'll leave the error in so your comment makes sense.
        
             | zargon wrote:
             | The "EVO Pro" error was made by the OP. So it would be nice
             | to know which drive OP actually tested.
        
               | hughrr wrote:
               | Indeed. I use the EVO Plus NVMe's though.
        
               | cgriswald wrote:
               | OP has since appended to his post:
               | 
               | > Correction: "Plus" not "Pro". Exact model and date
               | codes:
               | 
               | > Samsung 970 Evo Plus: MZ-V7S2T0, 2021.10 > WD Red:
               | WDS100T1R0C-68BDK0, 04Sept2021
        
             | fuzzybear3965 wrote:
             | How do you know the model? People are asking the same
             | question in Twitter and OP doesn't seem to have supplied an
             | answer, then.
        
         | xenadu02 wrote:
         | Sorry, it was the 970 Evo Plus. Here are the exact model and
         | date codes from the drives:
         | 
         | Samsung 970 Evo Plus: MZ-V7S2T0, 2021.10
         | 
         | WD Red: WDS100T1R0C-68BDK0, 04Sept2021
        
           | willis936 wrote:
           | Which drives were tested/confirmed to lose data? Did a
           | Samsung Pro drive have this behavior?
        
             | xenadu02 wrote:
             | These two drives never lost FLUSH'd writes in any test I
             | ran.
        
               | willis936 wrote:
               | What drives did you test?
        
           | ysleepy wrote:
           | Were samsung and WD consistent in this or did you have drives
           | from them that behaved differently?
        
           | hughrr wrote:
           | Thanks for confirming - appreciated!
        
       | shadowgovt wrote:
       | As a frame of reference, how much loss of FLUSH'd data should be
       | expected on power loss for a semi-permanent storage device
       | (including spinning-platter hard drives, if anyone still installs
       | them in machines these days)?
       | 
       | I'm far more used to the mainframe space where the rule is
       | "Expect no storage reliability; redundancy and checksums or you
       | didn't want that data anyway" and even long-term data is often
       | just stored in RAM (and then periodically cold-storage'd to
       | tape). I've lost sight of what expected practice is for desktop /
       | laptop stuff anymore.
        
         | xenadu02 wrote:
         | The semantics of a FLUSH command (per NVMe spec) is that all
         | previously sent write commands along with any internal metadata
         | must be written to durable storage before returning success.
         | 
         | Basically the drive is saying "yup, it's all on NAND - not in
         | some internal buffer. You can power off or whatever you want,
         | nothing will be lost".
         | 
         | Some drives are doing work in response to that FLUSH but still
         | lose data on power loss.
        
         | colanderman wrote:
         | None. If the drive responds that the data has been written, it
         | is expected to be there after a power failure.
        
         | dathinab wrote:
         | > how much loss of FLUSH'd data should be expected on power
         | loss for
         | 
         | 0%
         | 
         | In enterprise you are expected to expect lost data, but only if
         | your drive fails and needs to be replaced, or if it's not yet
         | flushed.
        
       | iforgotpassword wrote:
       | I think this is something LTT could handle with their new test
       | lab. They already said they want to set new standards when it
       | comes to hardware testing, so if they can hold up to what they
       | promised and hire enough experts this should be a trivial thing
       | to add to a test Parcours for disk drives.
        
         | goodpoint wrote:
         | LTT is more focused on entertaining the audience than providing
         | thorough, professional testing.
        
         | OJFord wrote:
         | That's like Ellen DeGeneres declaring a desire to set new
         | standards for film critique.
        
         | balls187 wrote:
         | LTT's commentary makes it difficult to trust they are objective
         | (even if they are).
         | 
         | I loved seeing how giddy Linus got while testing Valve's
         | Steamdeck, but when it comes to actual benchmarks and testing,
         | I would appreciate if they dropped the entertainment aspect.
        
       | timbit42 wrote:
       | What is an EVO Pro?
        
         | tromp wrote:
         | See comment above; it's an EVO Plus
        
           | timbit42 wrote:
           | I don't have twitter. Someone should tell the tweeter to fix
           | their tweet.
        
             | closewith wrote:
             | Tweets can't be edited, unfortunately.
        
             | [deleted]
        
       | gruez wrote:
       | Correct me if I'm wrong, but if these drives are used for
       | consumer applications, this behavior is probably not a big deal?
       | If you made changes to a document, pressed control-S, and then 1
       | second later the power went out, then you might lose that last
       | save. That'd suck, but you would have lost the data anyways if
       | the power loss occurred 2s before, so it's not that bad. As long
       | as other properties weren't violated (eg. ordering), your data
       | should mostly be okay, aside from that 1s of data. It's a much
       | bigger issue for enterprise applications, eg. a bank's mainframe
       | responsible for processing transactions told a client that the
       | transaction went through, but a power loss occurred and the
       | transaction was lost.
        
         | __david__ wrote:
         | > As long as other properties weren't violated (eg. ordering),
         | your data should mostly be okay, aside from that 1s of data.
         | 
         | That's the thing though--ordering isn't guaranteed as far as I
         | remember. If you want ordering you do syncs/flushes, and if the
         | drive isn't respecting those, then ordering is out of the
         | window. That means FS corruption and such. Not good.
        
           | gruez wrote:
           | The tweet only mentioned data loss when you yanked the power
           | cable. That doesn't say anything about whether the ordering
           | is preserved. It's possible to have a drive that lies about
           | data written to persistent storage, but still keeps the
           | writes in order.
        
         | colanderman wrote:
         | > As long as other properties weren't violated (eg. ordering)
         | 
         | That is primarily what fsync is used to ensure. (SCSI provides
         | other means of ensuring ordering, but AFAIK they're not widely
         | implemented.)
         | 
         | EDIT: per your other reply, yes, it's possible the drives
         | maintain ordering of FLUSHed writes, but not durability. I'm
         | curious to see that tested as well. (Still an integrity issue
         | for any system involving more than just one single drive
         | though.)
        
           | [deleted]
        
         | supermatt wrote:
         | It's a big deal because they are lying. That sets false
         | expectations for the system. There are different commands for
         | ensuring write ordering.
        
         | ak217 wrote:
         | Modern SSDs, and especially NVMe drives, have extensive logic
         | for reordering both reads and writes, which is part of why they
         | perform best at high queue depths. So it's not just possible
         | but expected that the drive will be reordering the queue. Also,
         | as batteries age, it becomes quite common to lose power without
         | warning while on a battery.
         | 
         | In general it's strange to hear excuses for this behavior since
         | it's obviously an attempt to pass off the drive's performance
         | as better than it really is by violating design constraints
         | that are basic building blocks of data integrity.
        
           | gruez wrote:
           | >Modern SSDs, and especially NVMe drives, have extensive
           | logic for reordering both reads and writes, which is part of
           | why they perform best at high queue depths. So it's not just
           | possible but expected that the drive will be reordering the
           | queue.
           | 
           | If we're already in speculation territory, I'll further
           | speculate that it's not hard to have some sort of WAL
           | mechanism to ensure the writes appear in order. That way you
           | can lie to the software that the writes made it to persistent
           | memory, but still have consistent ordering when there's a
           | crash.
           | 
           | >Also, as batteries age, it becomes quite common to lose
           | power without warning while on a battery.
           | 
           | That's... totally consistent with my comment? If you're going
           | for hours without saving and only saving when the OS tells
           | you there's only 3% battery left, then you're already playing
           | fast and loose with your data. Like you said yourself, it's
           | common for old laptops to lose power without warning, so
           | waiting until there's a warning to save is just asking for
           | trouble. Play stupid games, win stupid prizes. Of course, it
           | doesn't excuse their behavior, but I'm just pointing out to
           | the typical consumer, the actual impact isn't bad as people
           | think.
        
         | whartung wrote:
         | > That'd suck, but you would have lost the data anyways if the
         | power loss occurred 2s before,
         | 
         | But if you knew power was failing, which is why you did the ^S
         | in the first place, it would not just suck, it be worse than
         | that because your expectations were shattered.
         | 
         | It's all fine and good to have the computers lie to you about
         | what they're doing, especially if you're in on the gag.
         | 
         | But when you're not, it makes the already confounding and
         | exasperating computing experience just that much worse.
         | 
         | Go back to floppies, at least you know the data is saved with
         | the disk stops spinning.
        
           | gruez wrote:
           | >But if you knew power was failing, which is why you did the
           | ^S in the first place, it would not just suck, it be worse
           | than that because your expectations were shattered.
           | 
           | The only situation I can think of this being applicable is
           | for a laptop running low on battery. Even then, my guess is
           | that there is enough variance in terms of battery
           | chemistry/operating conditions that you're already playing
           | fast and loose with regards to your data if you're saving
           | data when there's only a few seconds of battery left. I agree
           | that that having it not lose data is objectively better than
           | having it lose data, but that's why I characterized it as
           | "not a _big_ deal ".
        
         | nwallin wrote:
         | > If you made changes to a document, pressed control-S, and
         | then 1 second later the power went out, then you might lose
         | that last save.
         | 
         | If you made changes to a document, pressed control-S, and then
         | 1 second later the power went out, then the entire filesystem
         | might become corrupted and you lose all data.
         | 
         | Keep in mind that small writes happen a lot -- a _lot_ a lot.
         | Every time you click a link in a web page it will hit cookies,
         | update your browser history, etc etc, all of which will trigger
         | writes to the filesystem. If one of these writes triggers a
         | modification to the superblock, and during the update a FLUSH
         | is ignored and the superblock is in a temporary invalid state,
         | and the power goes out, you may completely hose your OS.
        
         | kortilla wrote:
         | Nope, the problem here is that it violates a very basic
         | ordering guarantee that all kinds of applications build on top
         | of. Consider all of the cases of these hybrid drives or just
         | multiple hard drives where you fsync on one to journal that you
         | do something on the other (e.g. steam storing actual games on
         | another drive).
         | 
         | This behavior will cause all kinds of weird data
         | inconsistencies in super subtle ways.
        
       ___________________________________________________________________
       (page generated 2022-02-21 23:00 UTC)