[HN Gopher] Update on Samsung SSD Reliability
       ___________________________________________________________________
        
       Update on Samsung SSD Reliability
        
       Author : Akharin
       Score  : 95 points
       Date   : 2023-02-03 21:44 UTC (1 hours ago)
        
 (HTM) web link (www.pugetsystems.com)
 (TXT) w3m dump (www.pugetsystems.com)
        
       | jeffbee wrote:
       | Would like to know more. Were failures in the field from wear-out
       | or sudden death? Are the health indicators losing 1% per week
       | consistent with the datasheet TBW, or worse?
        
       | adrenvi wrote:
       | Samsung 870 EVO drives were also known to fail early including my
       | 2TB model.
       | 
       | https://www.techpowerup.com/forums/threads/samsung-870-evo-b...
        
         | acabal wrote:
         | Yes, this bit me just last month. Around October 2022 I
         | purchased 3 Samsung 870 EVO 2TBs for use in a RAID array. _By
         | January 2023, all three of them failed within a week of each
         | other!_
         | 
         | Fortunately they failed one by one, so I was barely was able to
         | recover my RAID array by pulling out one drive at a time,
         | powering the computer off, and waiting for the RMA replacement
         | to arrive.
         | 
         | But imagine my shock to see one drive fail... only to replace
         | it with an RMA... and then days later, seeing the next drive
         | fail... and the next!
        
           | bombcar wrote:
           | A perfect example why RAID ain't backup
        
       | ssdpain wrote:
       | I installed 2 x 980 Pro 2Tb in a laptop in Nov 2022. Running a
       | daily Robocopy bat script to backup a folder in C: to D: would
       | freeze a couple of times a week and lock the D: drive. After
       | reboot, a drive check would find no errors and everything would
       | work as normal. I've used the same script for years with no
       | issues.
       | 
       | Since the firmware update last week Robocopy has not frozen the
       | drive at all this week.
        
         | issafram wrote:
         | Could you provide that batch script please? Like in a GitHub
         | Gist or something similar.
        
           | ssdpain wrote:
           | Sure...
           | 
           | @echo off
           | 
           | pause
           | 
           | robocopy "C:\Users\o\Desktop\2023" "D:\2023" /e /mir /np /v
           | /tee /r:0 /w:0 /log+:"C:\Users\o\Desktop\log_robocopy.txt"
           | 
           | pause
           | 
           | @echo on
        
           | [deleted]
        
         | dmitrygr wrote:
         | > the firmware update last week
         | 
         | Link to specific firmware version please?
        
           | ssdpain wrote:
           | The new firmware is version 5B2QGXA7, updated via magician on
           | Windows. I didn't make a note of earlier firmware versions.
           | It's still too soon to know if the ssd freeze will reoccur.
        
       | smiley1437 wrote:
       | I've been trying to find a decent endurance NVME in the m.2 form
       | factor for write-heavy applications and it appears that true
       | 2-bit MLC has all but disappeared, replaced by 3-bit TLC and
       | higher (with commensurate loss of endurance)
       | 
       | The high endurance SSDs appear to be only available in
       | u.2\u.3\hhhl and god-help-me EDSFF form factors
       | 
       | Any suggestions? Micron's 7450 isn't readily available
        
         | mnadkvlb wrote:
         | i recommend samsung pm9a3 versions. not as popular, but are
         | enterprise products and also the endurance is like 3 times 980
         | pro i believe (please check it, not 100% sure).
         | 
         | Been using in my threadrupper workstation with a lot of vms
         | which are put to sleep every day with around .25tb written and
         | read each time the vms are started. keep in mind these are
         | 22110 form factor
        
       | flyinglizard wrote:
       | Had issues with a 2TB 980 Pro. Things have stabilized with recent
       | updates.
        
       | pifm_guy wrote:
       | I really want to see ssd manufacturers offer a decent warranty...
       | 
       | This drive costs $100, and will last 10 years or until 100TB has
       | been written to it, as long as you keep it within the specified
       | temperature/humidity/power conditions.
       | 
       | If it fails to do that, we will return $1000 to you.
        
         | [deleted]
        
         | jeffbee wrote:
         | I am not sure why you want a 10x refund, but it seems like your
         | request is _easily_ met by current warrantees. A 1TB WD SN850X
         | advertises 1200TBW endurance, rather more than you require.
        
         | joenathanone wrote:
         | Lifetime warranties used to be commonplace, I wish we could
         | return to those times, or at least to a time of repairability.
        
         | walterbell wrote:
         | In theory, a 3rd party insurance equivalent to AppleCare could
         | be constructed for some technology products, but this is
         | hampered by short product lifecycles, lack of BOM transparency
         | (e.g components changed within a single product generation) and
         | ability of firmware updates to change product behavior and
         | invalidate previously collected data on reliability.
         | 
         | Open-source SSD firmware would provide more transparency on
         | performance and reliability.
        
           | CharlesW wrote:
           | > _Open-source SSD firmware would provide more transparency
           | on performance and reliability._
           | 
           | This seems fantastic. Are you saying you could review the
           | firmware source and know that the 980 Pro would lose ~1% of
           | its endurance per week?
        
         | TacticalCoder wrote:
         | Back when HDD would fail really a lot warranty was working. I'd
         | happily fill an online form, Web 1.0 style, and then send my
         | Seagate (I'm in Europe, was sending them to the Netherlands
         | IIRC) disks and a few weeks later I'd receive a new drive.
         | 
         | I probably still have a few screenshots of these forms
         | somewhere.
        
         | mrtksn wrote:
         | This sounds like an SLA agreement, its very unlikely you'll get
         | that for 100 bucks. Even if this manufacturer somehow perfected
         | their process and have zero defects, they are still acquiring a
         | 10 years liability for 100 dollars of revenue.
        
       | Aardwolf wrote:
       | I have this exact model, 980 Pro 2TB.
       | 
       | It says to update firmware, but how can you do that from Linux?
       | The instructions are all about some Windows program. Thanks!
        
         | pentamassiv wrote:
         | A few months ago I already updated my Samsung SSD by following
         | this procedure: https://askubuntu.com/a/1386451. Theoretically
         | they provide an image to boot from to do the update, but the
         | image seems very outdated and did not recognize my keyboard so
         | it was unusable.
        
         | TacticalCoder wrote:
         | From TFA it's not just the 980 Pro 2 TB but also all the newer
         | 990, so it's problematic.
        
         | acidburnNSA wrote:
         | Following from my linux desktop. I lost a system SSD that was a
         | 980 2 TB and recently reinstalled everything, thinking it was a
         | fluke. Now worried it will happen again rapidly.
        
       | xoa wrote:
       | Ars had a piece covering this as well, and I do wonder if there
       | is something going on somewhere else in the Samsung stack, not
       | just the NVMe 900 series line. Pure anecdote, but two years ago I
       | did a NAS for a client using 24x 2TB Samsung 870 Evo drives
       | (they'd gotten some incentive deal for it). While it was all one
       | type vs mixed, there was the "luxury" of time because at that
       | point getting the system they wanted together had a significant
       | lead time. So I did ensure that the drives were purchased over
       | the course of around 7 months, from multiple different reputable
       | sellers (B&H, CDW, Provantage etc) in separate batches. System
       | was solid, an Epyc 2 based SuperMicro server, running TrueNAS.
       | 
       | And then last year with around 5500-7500 quite light hours of
       | runtime (primarily reads, ~0.08 DWPD, well under official rating
       | of 0.3 DWPD) drives started failing. These were definitely real
       | failures, first indication came from regular automated ZFS scrubs
       | and reporting increasing checksum errors and ATA errors. It was
       | for so many drives and I'd always considered Samsung SSDs
       | relatively reliable (even for consumer ones) that at first I
       | thought it was a SATA controller failure, and our rep agreed and
       | warranties back the server. They were great, gold plated support
       | contracts pay off once in a while, and motherboard replacement
       | and thorough testing later back in service. More drive problems.
       | SMART short tests said everything was healthy, first longs did
       | too. But then drives exceeded error limits and started getting
       | faulted, and at last SMART long tests started failing. Digging in
       | showed worrisome stats. So began swapping out and warrantying
       | drives (cheers to the stress test to TrueNAS, in the end zero
       | downtime or need to restore from backups). In the end, _THIRTEEN
       | (13) out of 24 failed_. Brutal  >50% dead drive rate. I talked to
       | some others around and they'd seen <1 year rates also at 30-60%.
       | Big :\\. Rep also indicated they were hearing more about Samsung
       | failures.
       | 
       | Anyway, gave me a talking point going forward to really, really
       | press management on "it's worth paying for drives from 3-4x
       | brands and maybe splurging for higher rated vs consumer too", but
       | also does made me wonder if there is something going on, or was
       | (pandemic related?), at Samsung's storage division. It's
       | definitely pure anecdote but still, I spread those drive
       | purchases out reasonably hard, and they had radically different
       | serial numbers. Same with other folks I know at other businesses
       | using various Samsung drives, everyone has been going to real
       | effort following decent practices to prevent buying drives all
       | from a single lot. Even 10% failure rate for consumer drives I
       | could have seen, but 54%? And not a bathtub curve all frontloaded
       | in the first month or two but after 7-11 months? That feels high?
       | Samsung did replace them all no questions asked, they paid for
       | shipping too. I don't have any global insight into how this all
       | looks and it could be just plain bad luck for all of us in the
       | region, but still.
        
       | Dalewyn wrote:
       | I wasn't aware the 980 Pro 2TB was also affected; I have four of
       | those in a new machine I put together last year.
       | 
       | Time to install some bloatware and see about updating their
       | firmwares, I guess...
        
       | gjsman-1000 wrote:
       | There is always the possibility that their S.M.A.R.T.
       | implementation is borked...
        
         | TillE wrote:
         | The article does say they've seen "abnormally high failure
         | rates in the field", so it's not just that.
        
         | jpk wrote:
         | If that's all it was, then it's likely a firmware update would
         | not only prevent the issue, but also reverse it if the storage
         | is actually healthy. That doesn't seem to be the case here,
         | though.
        
       ___________________________________________________________________
       (page generated 2023-02-03 23:00 UTC)