[HN Gopher] Update on Samsung SSD Reliability
___________________________________________________________________
Update on Samsung SSD Reliability
Author : Akharin
Score : 95 points
Date : 2023-02-03 21:44 UTC (1 hours ago)
(HTM) web link (www.pugetsystems.com)
(TXT) w3m dump (www.pugetsystems.com)
| jeffbee wrote:
| Would like to know more. Were failures in the field from wear-out
| or sudden death? Are the health indicators losing 1% per week
| consistent with the datasheet TBW, or worse?
| adrenvi wrote:
| Samsung 870 EVO drives were also known to fail early including my
| 2TB model.
|
| https://www.techpowerup.com/forums/threads/samsung-870-evo-b...
| acabal wrote:
| Yes, this bit me just last month. Around October 2022 I
| purchased 3 Samsung 870 EVO 2TBs for use in a RAID array. _By
| January 2023, all three of them failed within a week of each
| other!_
|
| Fortunately they failed one by one, so I was barely was able to
| recover my RAID array by pulling out one drive at a time,
| powering the computer off, and waiting for the RMA replacement
| to arrive.
|
| But imagine my shock to see one drive fail... only to replace
| it with an RMA... and then days later, seeing the next drive
| fail... and the next!
| bombcar wrote:
| A perfect example why RAID ain't backup
| ssdpain wrote:
| I installed 2 x 980 Pro 2Tb in a laptop in Nov 2022. Running a
| daily Robocopy bat script to backup a folder in C: to D: would
| freeze a couple of times a week and lock the D: drive. After
| reboot, a drive check would find no errors and everything would
| work as normal. I've used the same script for years with no
| issues.
|
| Since the firmware update last week Robocopy has not frozen the
| drive at all this week.
| issafram wrote:
| Could you provide that batch script please? Like in a GitHub
| Gist or something similar.
| ssdpain wrote:
| Sure...
|
| @echo off
|
| pause
|
| robocopy "C:\Users\o\Desktop\2023" "D:\2023" /e /mir /np /v
| /tee /r:0 /w:0 /log+:"C:\Users\o\Desktop\log_robocopy.txt"
|
| pause
|
| @echo on
| [deleted]
| dmitrygr wrote:
| > the firmware update last week
|
| Link to specific firmware version please?
| ssdpain wrote:
| The new firmware is version 5B2QGXA7, updated via magician on
| Windows. I didn't make a note of earlier firmware versions.
| It's still too soon to know if the ssd freeze will reoccur.
| smiley1437 wrote:
| I've been trying to find a decent endurance NVME in the m.2 form
| factor for write-heavy applications and it appears that true
| 2-bit MLC has all but disappeared, replaced by 3-bit TLC and
| higher (with commensurate loss of endurance)
|
| The high endurance SSDs appear to be only available in
| u.2\u.3\hhhl and god-help-me EDSFF form factors
|
| Any suggestions? Micron's 7450 isn't readily available
| mnadkvlb wrote:
| i recommend samsung pm9a3 versions. not as popular, but are
| enterprise products and also the endurance is like 3 times 980
| pro i believe (please check it, not 100% sure).
|
| Been using in my threadrupper workstation with a lot of vms
| which are put to sleep every day with around .25tb written and
| read each time the vms are started. keep in mind these are
| 22110 form factor
| flyinglizard wrote:
| Had issues with a 2TB 980 Pro. Things have stabilized with recent
| updates.
| pifm_guy wrote:
| I really want to see ssd manufacturers offer a decent warranty...
|
| This drive costs $100, and will last 10 years or until 100TB has
| been written to it, as long as you keep it within the specified
| temperature/humidity/power conditions.
|
| If it fails to do that, we will return $1000 to you.
| [deleted]
| jeffbee wrote:
| I am not sure why you want a 10x refund, but it seems like your
| request is _easily_ met by current warrantees. A 1TB WD SN850X
| advertises 1200TBW endurance, rather more than you require.
| joenathanone wrote:
| Lifetime warranties used to be commonplace, I wish we could
| return to those times, or at least to a time of repairability.
| walterbell wrote:
| In theory, a 3rd party insurance equivalent to AppleCare could
| be constructed for some technology products, but this is
| hampered by short product lifecycles, lack of BOM transparency
| (e.g components changed within a single product generation) and
| ability of firmware updates to change product behavior and
| invalidate previously collected data on reliability.
|
| Open-source SSD firmware would provide more transparency on
| performance and reliability.
| CharlesW wrote:
| > _Open-source SSD firmware would provide more transparency
| on performance and reliability._
|
| This seems fantastic. Are you saying you could review the
| firmware source and know that the 980 Pro would lose ~1% of
| its endurance per week?
| TacticalCoder wrote:
| Back when HDD would fail really a lot warranty was working. I'd
| happily fill an online form, Web 1.0 style, and then send my
| Seagate (I'm in Europe, was sending them to the Netherlands
| IIRC) disks and a few weeks later I'd receive a new drive.
|
| I probably still have a few screenshots of these forms
| somewhere.
| mrtksn wrote:
| This sounds like an SLA agreement, its very unlikely you'll get
| that for 100 bucks. Even if this manufacturer somehow perfected
| their process and have zero defects, they are still acquiring a
| 10 years liability for 100 dollars of revenue.
| Aardwolf wrote:
| I have this exact model, 980 Pro 2TB.
|
| It says to update firmware, but how can you do that from Linux?
| The instructions are all about some Windows program. Thanks!
| pentamassiv wrote:
| A few months ago I already updated my Samsung SSD by following
| this procedure: https://askubuntu.com/a/1386451. Theoretically
| they provide an image to boot from to do the update, but the
| image seems very outdated and did not recognize my keyboard so
| it was unusable.
| TacticalCoder wrote:
| From TFA it's not just the 980 Pro 2 TB but also all the newer
| 990, so it's problematic.
| acidburnNSA wrote:
| Following from my linux desktop. I lost a system SSD that was a
| 980 2 TB and recently reinstalled everything, thinking it was a
| fluke. Now worried it will happen again rapidly.
| xoa wrote:
| Ars had a piece covering this as well, and I do wonder if there
| is something going on somewhere else in the Samsung stack, not
| just the NVMe 900 series line. Pure anecdote, but two years ago I
| did a NAS for a client using 24x 2TB Samsung 870 Evo drives
| (they'd gotten some incentive deal for it). While it was all one
| type vs mixed, there was the "luxury" of time because at that
| point getting the system they wanted together had a significant
| lead time. So I did ensure that the drives were purchased over
| the course of around 7 months, from multiple different reputable
| sellers (B&H, CDW, Provantage etc) in separate batches. System
| was solid, an Epyc 2 based SuperMicro server, running TrueNAS.
|
| And then last year with around 5500-7500 quite light hours of
| runtime (primarily reads, ~0.08 DWPD, well under official rating
| of 0.3 DWPD) drives started failing. These were definitely real
| failures, first indication came from regular automated ZFS scrubs
| and reporting increasing checksum errors and ATA errors. It was
| for so many drives and I'd always considered Samsung SSDs
| relatively reliable (even for consumer ones) that at first I
| thought it was a SATA controller failure, and our rep agreed and
| warranties back the server. They were great, gold plated support
| contracts pay off once in a while, and motherboard replacement
| and thorough testing later back in service. More drive problems.
| SMART short tests said everything was healthy, first longs did
| too. But then drives exceeded error limits and started getting
| faulted, and at last SMART long tests started failing. Digging in
| showed worrisome stats. So began swapping out and warrantying
| drives (cheers to the stress test to TrueNAS, in the end zero
| downtime or need to restore from backups). In the end, _THIRTEEN
| (13) out of 24 failed_. Brutal >50% dead drive rate. I talked to
| some others around and they'd seen <1 year rates also at 30-60%.
| Big :\\. Rep also indicated they were hearing more about Samsung
| failures.
|
| Anyway, gave me a talking point going forward to really, really
| press management on "it's worth paying for drives from 3-4x
| brands and maybe splurging for higher rated vs consumer too", but
| also does made me wonder if there is something going on, or was
| (pandemic related?), at Samsung's storage division. It's
| definitely pure anecdote but still, I spread those drive
| purchases out reasonably hard, and they had radically different
| serial numbers. Same with other folks I know at other businesses
| using various Samsung drives, everyone has been going to real
| effort following decent practices to prevent buying drives all
| from a single lot. Even 10% failure rate for consumer drives I
| could have seen, but 54%? And not a bathtub curve all frontloaded
| in the first month or two but after 7-11 months? That feels high?
| Samsung did replace them all no questions asked, they paid for
| shipping too. I don't have any global insight into how this all
| looks and it could be just plain bad luck for all of us in the
| region, but still.
| Dalewyn wrote:
| I wasn't aware the 980 Pro 2TB was also affected; I have four of
| those in a new machine I put together last year.
|
| Time to install some bloatware and see about updating their
| firmwares, I guess...
| gjsman-1000 wrote:
| There is always the possibility that their S.M.A.R.T.
| implementation is borked...
| TillE wrote:
| The article does say they've seen "abnormally high failure
| rates in the field", so it's not just that.
| jpk wrote:
| If that's all it was, then it's likely a firmware update would
| not only prevent the issue, but also reverse it if the storage
| is actually healthy. That doesn't seem to be the case here,
| though.
___________________________________________________________________
(page generated 2023-02-03 23:00 UTC)