[HN Gopher] Backblaze Drive Stats for Q1 2024
       ___________________________________________________________________
        
       Backblaze Drive Stats for Q1 2024
        
       Author : TangerineDream
       Score  : 211 points
       Date   : 2024-05-02 13:25 UTC (9 hours ago)
        
 (HTM) web link (www.backblaze.com)
 (TXT) w3m dump (www.backblaze.com)
        
       | GGO wrote:
       | I buy hard drives based on these reports. Thank you Backblaze.
        
         | Scene_Cast2 wrote:
         | Where do you buy your drives? Last time I was in the market, I
         | couldn't find a reputable seller selling the exact models in
         | the report. I'm afraid that the less reputable sellers (random
         | 3rd party sellers on Amazon) are selling refurbished drives.
         | 
         | I ended up buying a similar sounding but not same model from
         | CDW.
        
           | cm2187 wrote:
           | In europe lambda tek is my goto for enterprise hardware as a
           | retail customer.
        
           | secabeen wrote:
           | These are useful data points, but I've found that at my risk
           | tolerance level, I get a lot more TB/$ buying refurbished
           | drives. Amazon has a couple of sellers that specialize in
           | server pulls from datacenters, even after 3 years of minimal
           | use, the vendors provide 5 years of additional warranty to
           | you.
        
             | pronoiac wrote:
             | Buying refurbished also makes it much easier to avoid
             | having the same brand/model/batch/uptime, for firmware and
             | hardware issues. I do carefully test for bad sectors and
             | verify capacity, just in case.
        
             | WarOnPrivacy wrote:
             | > even after 3 years of minimal use, the vendors provide 5
             | years of additional warranty to you.
             | 
             | The Amazon refurb drives (in this class) typically come
             | with 40k-43k hours of data center use. Generally they're
             | well used for 41/2-5yrs. Price is ~30% of new.
             | 
             | I think refurb DC drives have their place (replaceable
             | data). I've bought them - but I followed other buyers'
             | steps to maximize my odds.
             | 
             | I chose my model (of HGST) carefully, put it thru an
             | intensive 24h test and check smart stats afterward.
             | 
             | As far as the 5yr warranty goes, it's from the seller and
             | they don't all stick around for 5 years. But they are
             | around for a while -> heavy test that drive after purchase.
        
             | dehrmann wrote:
             | I think you're better off buying used and using the savings
             | for either mirroring or off-site backup. I'd take two
             | mirrored used drives from different vendors over one new
             | drive any day.
        
               | ethbr1 wrote:
               | There was a Backblaze report a while ago that said,
               | essentially, that most individual drives are either
               | immediate lemons or run to warranty.
               | 
               | If you buy used, you're avoiding the first form of
               | failure.
        
             | malfist wrote:
             | A lot of those resellers do not disclose that the drive
             | isn't new, even labeling the item as new.
             | 
             | GoHardDrive is notorious for selling "new" drives with
             | years of power on time. Neither Newegg nor Amazon seem to
             | do anything about those sellers
        
             | 2OEH8eoCRo0 wrote:
             | Indeed- RAID used to stand for Redundant Array of
             | _Inexpensive_ Disks. The point was to throw a bunch of
             | disks together and with redundancy it didn 't matter how
             | unreliable they were. Using blingy drives w/ RAID feels
             | counter-intuitive- at least as a hobbyist.
        
           | havaloc wrote:
           | B&H has quite a few
        
             | bee_rider wrote:
             | I guess it isn't that surprising given the path the
             | development took, but it is always funny to me that one of
             | the most reputable consumer tech companies is a photography
             | place.
        
               | fallingsquirrel wrote:
               | Similar to how the most popular online retailer is a
               | bookstore. Successful businesses are able to expand and I
               | wish B&H the best of luck on that path, we need more
               | companies like them.
        
               | squigz wrote:
               | I'd rather companies stick to one thing and do it well,
               | rather than expand into every industry out there and
               | slowly creep into every facet of society.
               | 
               | Like that bookstore that just happens to retail some
               | stuff too.
        
               | ssl-3 wrote:
               | B&H seems to be pretty focused on techy things (and
               | cameras of all sorts have always been techy things,
               | though that corner of the tech market that has been
               | declining for a long time now).
               | 
               | When they branch out to selling everything including
               | fresh vegetables, motor oil, and computing services, then
               | maybe they might be more comparable to the overgrown
               | bookstore.
        
               | Modified3019 wrote:
               | I definitely learn towards B&H for electronic things.
               | It's quite a bit less "internet flea market" that Amazon
               | often is.
        
               | philistine wrote:
               | _AWB &H alone is a Fortune 500 company_
        
               | ghaff wrote:
               | There used to be a much more distinct camera--and all rhe
               | ancillary gear and consumables than there used to be.
               | Though B&H still sells a ton of lighting and audio gear
               | as well as printers and consumables for same.
               | 
               | They sell other stuff too but they're still pretty photo
               | and video-centric, laptops notwithstanding.
        
             | Wistar wrote:
             | I buy most, but not all, of my tech at B&H and have now for
             | more than a decade. Especially peripherals.
        
           | SoftTalker wrote:
           | And for stuff like this, many companies will have an approved
           | vendor, and you have to buy what they offer or go through a
           | justification for an exception.
        
           | nikisweeting wrote:
           | Lots of good options here: https://diskprices.com/
        
             | dsr_ wrote:
             | Note that they list at least one vendor as selling "New"
             | drives when they are not even close to being new.
        
           | user_7832 wrote:
           | What's the risk of buying Amazon & running a
           | SMART/crystaldisk test?
        
         | speedgoose wrote:
         | I don't buy hard drives based on these reports. I buy SSDs and
         | let my cloud providers deal with hard drives.
        
       | bluedino wrote:
       | > The 4TB Toshiba (model: MD04ABA400V) are not in the Q1 2024
       | Drive Stats tables. This was not an oversight. The last of these
       | drives became a migration target early in Q1 and their data was
       | securely transferred to pristine 16TB Toshiba drives.
       | 
       | That's a milestone. Imagine the racks that were eliminated
        
         | bombcar wrote:
         | > That's a milestone. Imagine the racks that were eliminated
         | 
         | I'm imagining about 3/4ths ;)
        
           | djbusby wrote:
           | I'm imagining 4x capacity
        
           | seabrookmx wrote:
           | 3/4ths of the racks that had 4TB drives, assuming they didn't
           | also expand capacity as part of this.
           | 
           | But they run many drive types.
        
         | toomuchtodo wrote:
         | Perhaps not eliminated, but repurposed with fresh 16TB drives.
         | And the power savings per byte stored!
        
         | Dylan16807 wrote:
         | Yeah, but just thinking about it reminds me how annoyed I am
         | that they increased the B2 pricing by 20% last year.
         | 
         | Right after launching B2, in late 2015, they made their post
         | about storage pod 5.0, saying it "enabled" B2 at the $5/TB
         | price, at 44 cents per gigabyte and a raw 45TB per rack unit.
         | 
         | In late 2022 they posted about supermicro servers costing 20
         | cents per gigabyte and fitting a raw 240TB per rack unit.
         | 
         | So as they migrate or get new data, that's 1/5 as many servers
         | to manage, costing about half as much per TB.
         | 
         | It's hard to figure out how the profit margin wasn't _much_
         | better, despite the various prices increases they surely had to
         | deal with.
         | 
         | The free egress based on data stored was nice, but the change
         | still stings.
         | 
         | Maybe I'm overlooking something but I'm not sure what it would
         | be.
         | 
         | In contrast the price increases they've had for their unlimited
         | backup product have always felt fine to me. Personal data keeps
         | growing, and hard drive prices haven't been dropping fast. Easy
         | enough. But B2 has always been per byte.
         | 
         | And don't think I'm being unfair and only blaming them because
         | they release a lot of information. I saw hard drives go from
         | 4TB to 16TB myself, and I would have done a similar analysis
         | even if they were secretive.
        
           | philistine wrote:
           | Inflation. At the rate it went up the last couple of years, a
           | 20% price increase to put them back on the right side of
           | profits is more than probable.
        
             | Moru wrote:
             | Also a storage inflation on the users side. People have
             | more data on bigger drives that wants a backup.
        
               | Dylan16807 wrote:
               | This is B2, the service that charges per byte. More data
               | makes it _easier_ for them to profit.
        
             | Dylan16807 wrote:
             | Maybe I wasn't clear, but the hardware costs and the
             | operation costs should all have dropped between 2x and 5x
             | as a baseline before price increases.
             | 
             | Inflation is not even close to that level.
             | 
             | And those hardware costs already take into account
             | inflation up through the end of 2022.
        
               | justsomehnguy wrote:
               | > e hardware costs and the operation costs should all
               | have dropped between 2x and 5x
               | 
               | That would work if they fully recouped the costs of
               | obtaining _and running_ the drives, including racks,
               | PSUs, cases, _drive and PSU replacements_ , control
               | boards, datacenter/whatever costs, electricity, HVAC etc.
               | _and_ generated a solid profit not only to buy all the
               | new hardware but a new yacht for the owners too.
               | 
               | But usually that is not how it works, because the nobody
               | sane buys the hardware with the cash. And even if they
               | have a new fancy 240TB/rack units, that doesn't mean they
               | just migrated outright and threw the old ones ASAP.
               | 
               | So while there is a 5x lower costs per U for the new rack
               | unit, it doesn't translate to 5x lower cost of storage
               | _for the sell_.
        
       | rokkamokka wrote:
       | I always click these every time they come up. Can't tell you how
       | much I appreciate them releasing stats like this!
        
       | sdwr wrote:
       | Says the annual failure rate is 1.5%, but average time to failure
       | is 2.5 years? Those numbers don't line up.
       | 
       | Are most drives retired without failing?
        
         | bombcar wrote:
         | Obviously yes. At an AFR of 1.5% they'd have to have the drives
         | run for (about) 67 years to have them all retire from failure.
         | 
         | (in reality they'd probably have failure rates spike at some
         | point, but the idea stands. And they explicitly said they
         | retired a bunch of 4TBs)
        
         | scottlamb wrote:
         | > Are most drives retired without failing?
         | 
         | I'd expect so given that HDDs are still having significant
         | density advancements. After a while old drives aren't worth the
         | power and sled/rack space that could be used for a higher
         | capacity drive. And, yeah, it makes these statistics make more
         | sense together.
         | 
         | Edit: plus they are just increasing drive count so most drives
         | haven't hit the time when they would fail or be retired...
        
         | rovr138 wrote:
         | They just retired 4TB ones.
         | 
         | While they seem to get retired, it's not as quick as we'd
         | think.
        
         | codemac wrote:
         | Drives have warranties, after which the manufacturer doesn't
         | make any claims about it's durability. This could put your
         | fleet at wild and significant risk if things start hitting a
         | wall and failing en masse. You may not be able to repair away
         | if as you're repairing the data you're copying to yet another
         | dying drive.
         | 
         | So you have usually a lifetime of drive tput and start/stop
         | values you want to stay under, and depending on how accurate
         | your data is for each drive you may push beyond the drive
         | warranties. But you will generally stop before the drive
         | actually fails.
        
         | rsync wrote:
         | "Are most drives retired without failing?"
         | 
         | Yes, certainly.
         | 
         | One can watch both SMART indicators as well as certain ZFS
         | stats and catch a problem drive before it actually fails.
         | 
         | I like to remove drives from zpools early because there is a
         | common intermediate state they can fall into where they have
         | not failed out but dramatically impact ZFS performance as they
         | timeout/retry certain operations _thousands and thousands of
         | times_.
        
           | favorited wrote:
           | What's the best way to monitor those ZFS stats? I just rely
           | on scheduled ZFS scrubs, and the occasional `zpool status
           | -v`...
        
         | mnw21cam wrote:
         | Yes, and because of that the numbers on the average time to
         | failure are completely meaningless. The drives the don't ever
         | fail skew the numbers completely. If a fantastically reliable
         | drive were to have 5/5000 drives fail, but they all failed in
         | the first month and then the rest carried on forever, then that
         | would show here as having a lower "reliability" than a dire
         | drive where 4000/5000 drives fail after a year.
         | 
         | I'd like to see instead something like mean time until 2% of
         | the drives fail. That'd actually be comparable between drives.
         | And yes, it would also mean that some drive types haven't
         | reached 2% failure yet, so they'd be shown as ">X months".
         | 
         | This is what a Kaplan-Meier survival curve was meant for [0].
         | Please use it.
         | 
         | Also, it'd be great to see the confidence intervals on the
         | annualised failure rates.
         | 
         | [0]
         | https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator
        
       | kayson wrote:
       | Does Backblaze ever buy refurbs? I'm guessing not, but I'd be
       | curious to see any data on how failure rates compare after
       | manufacturers recertify.
        
         | jjeaff wrote:
         | I can't think of any reason why the lifetime would be any
         | different for a refurb. of course, you need to start from when
         | the drive was originally used. of course, there is probably
         | also some additional wear and tear just due to the removal,
         | handling, and additional shipping of the drives.
        
         | from-nibly wrote:
         | In some ways that would be incredibly noisy to test. However it
         | could be a good way to measure the practicality of S.M.A.R.T
         | metrics. Finding out how accurate they are at predicting hdd
         | lifespan would be a great finding.
        
           | SoftTalker wrote:
           | Does anyone find value in SMART metrics?
           | 
           | In my experience, the drives report "healthy" until they
           | fail, then they report "failed"
           | 
           | I've personally never tracked the detailed metrics to see if
           | anything is predictive of impending failure, but I've never
           | seen the overall status be anything but "healthy" unless the
           | drive had already failed.
        
             | Sohcahtoa82 wrote:
             | The SMART metrics aren't binary, and any application that
             | is presenting them as binary (Either HEALTHY or FAILED) is
             | doing you a disservice.
             | 
             | > I've personally never tracked the detailed metrics to see
             | if anything is predictive of impending failure
             | 
             | Backblaze has!
             | 
             | https://www.backblaze.com/blog/hard-drive-smart-stats/
        
               | SoftTalker wrote:
               | From that link:
               | 
               | From experience, we have found the following five SMART
               | metrics indicate impending disk drive failure:
               | SMART 5: Reallocated_Sector_Count.         SMART 187:
               | Reported_Uncorrectable_Errors.         SMART 188:
               | Command_Timeout.         SMART 197:
               | Current_Pending_Sector_Count.         SMART 198:
               | Offline_Uncorrectable.
               | 
               | That's good to know, I might start tracking that. I
               | manage several clusters of servers and hard drive
               | failures just seem pretty random.
        
               | vel0city wrote:
               | I've had several hard drives that started gradually
               | increasing a reallocated sector count, then start getting
               | reported uncorrectable errors, then eventually just give
               | up the ghost. Usually whenever reallocated sectors starts
               | climbing a drive is nearing death and should be replaced
               | as soon as possible. You might not have had corruption
               | _yet_ , but its coming. Once you get URE's you've lost
               | some data.
               | 
               | However, one time a drive got a burst of reallocated
               | sectors, it stabilized, then didn't have any problems for
               | a long time. Eventually it wouldn't power on years later.
        
             | favorited wrote:
             | I've had an M.2 NVMe drive start reporting bad blocks via
             | SMART. I kept using it for non-critical storage, but
             | replaced it as my boot drive. Obviously not the same
             | failure pattern as spinning rust, but I was glad for the
             | early warning anyway.
        
             | Sakos wrote:
             | Absolutely. I've looked at the SMART data of easily over
             | 1000 drives. Many of them ok, many of them with
             | questionable health, many failing and many failed. The
             | SMART data has always been a valuable indicator as to
             | what's going on. You need to look at the actual values
             | given by tools like smartctl or CrystalDiskInfo. Everything
             | you need to evaluate the state of your drives is there.
             | 
             | I've never seen an HDD fail overnight without any
             | indication at all.
        
       | jpgvm wrote:
       | Amazing these have continued. I base my NAS purchase decisions on
       | these and so far haven't led me astray.
        
         | objektif wrote:
         | Which specific ones do you like so far?
        
         | Marsymars wrote:
         | How _would_ they lead you astray? I wouldn 't consider a drive
         | failure in a home NAS to indicate that - even their most
         | statistically reliable drives still require redundancy/backup -
         | if you haven't experienced a drive failure yet, that's just
         | chance.
        
       | MarkG509 wrote:
       | I, too, love Backblaze's reports. But they provide no information
       | regarding drive endurance. While I became aware of this with
       | SSDs, HDD manufacturers are reporting this too, usually as a
       | warranty item, and with surprisingly lower numbers than I would
       | have expected.
       | 
       | For example, in the Pro-sumer space, both WD's Red Pro and Gold
       | HDDs report[1] their endurance limit as 550TB/year total bytes
       | "transferred* to or from the drive hard drive", regardless of
       | drive size.
       | 
       | [1] See Specifications, and especially their footnote 1 at the
       | bottom of the page:
       | https://www.westerndigital.com/products/internal-drives/wd-r...
        
         | wtallis wrote:
         | The endurance figures for hard drives are probably derived from
         | the rated number of seek operations for the heads, which is why
         | it doesn't matter whether the operations are for reading or
         | writing data. But that bakes in some assumptions about the mix
         | of random vs sequential IO. And of course the figures are
         | subject to de-rating when the company doesn't want the warranty
         | to cover anything close to the real expected lifespan,
         | especially for products further down the lineup.
        
       | WarOnPrivacy wrote:
       | > _A Few Good Zeroes: In Q1 2024, three drive models had zero
       | failures_
       | 
       | They go on to list 3 Seagate models that share one common factor:
       | Sharply lower drive counts. Backblaze had a lot fewer of these
       | drives.
       | 
       |  _All of their <5 failures_ are from low quantity drives.
       | 
       | I have confidence in the rest of their report - but not with the
       | inference that those 3 Seagate models are more reliable.
        
         | matmatmatmat wrote:
         | This uncertainty should be accounted for in the confidence
         | intervals of their stats.
        
       | dehrmann wrote:
       | I find the stats interesting, but it's hard to actually inform
       | any decisions because by the time the stats come out, who knows
       | what's actually shipping.
        
       | gangstead wrote:
       | I wonder how the pricing works out. I look at the failure rates
       | and my general take away is "buy Western Digital" for my qty 1
       | purchases. But if you look within a category, say 14TB drives,
       | they've purchased 4 times as many Toshiba drives as WD. Are the
       | vendors pricing these such that it's worth a slightly higher
       | failure rate to get the $/TB down?
        
         | mijamo wrote:
         | If you are a large company owning hundreds of thousands of them
         | and knowing you will have disk failures regardless, maybe. If
         | you own just a few hundreds and a failure costs you money the
         | logic may be completely different.
        
         | Marsymars wrote:
         | I'd assume so. Also consider that if a drive fails under
         | warranty, and you're already dealing with a bunch of failing
         | drives on a regular basis, the marginal cost to get a warranty
         | replacement is close to zero.
        
       | Whatarethese wrote:
       | And people will still say they dont trust Seagate because of the
       | 3TB drives that failed over a decade ago.
        
         | kstrauser wrote:
         | Anecdata is such a weird thing. In my own NAS, I've had 3 out
         | of 3 WD Red drives, each a different size, die in an identical
         | manner well before their warranty expired over the last several
         | years. SMART says everything is fine, but the drive's
         | utilization creeps up to a constant 100% and its write IOPS
         | decrease until the whole array is slow as frozen molasses.
         | That's in a constantly comfortable operating environment that's
         | never too hot, cold, or otherwise challenging. And yet it looks
         | like I'm the statistical outlier. Other people -- like
         | Backblaze here -- have decent luck with the same drives that
         | have a 100% failure rate here.
         | 
         | Probability is a strange thing, yo. The odds of a specific
         | person winning the lottery are effectively 0, but someone's
         | going to. Looks like I've won the "WD means Waiting Death"
         | sweepstakes.
        
           | freedomben wrote:
           | Indeed, and anecdata is weighted so heavily by our minds,
           | even when we are aware of it and consciously look at the
           | numbers. That's what evolution gives us though. The best
           | brains at survival are the ones that learned from their
           | observations, so we're battling our nature by trying to
           | disregard that. I'll never buy another Seagate because of
           | that one piece of shit I got :-D
        
           | vel0city wrote:
           | Sounds like you're a victim of WD selling Reds with Shingled
           | Magnetic Recording (SMR). Quite a scandal a few years ago.
           | 
           | SMR takes advantage of the fact read heads are often smaller
           | than write heads, so it "shingles" the tracks to get better
           | density. However, if you need to rewrite in between tracks
           | that are full, you need to shuffle the data around so it can
           | re-shingle the tracks. This means as your array gets full or
           | even just fragmented, your drives can start to need to
           | shuffle data all over the place to rewrite a random sector.
           | This does hell to drives in an array, which a lot of
           | controllers have no knowledge of this shingling behavior.
           | 
           | Shingled drives are OK when you're just constantly writing a
           | stream of data and not going to do a lot of rewriting of data
           | in betweeen. Think security cameras and database backups and
           | what not. They're complete hell if you're doing lots of
           | random files that get a lot of modifications.
           | 
           | https://www.servethehome.com/wd-red-smr-vs-cmr-tested-
           | avoid-...
        
             | kstrauser wrote:
             | No, these were 100% CMR drives. I checked them very closely
             | when the scandal broke and confirmed that mine were not
             | shingled.
        
               | vel0city wrote:
               | Huh, weird, because that's 100% the failure mode friends
               | of mine who _did_ have shingled drives experienced. Maybe
               | your drives were shingled despite labeling suggesting
               | otherwise, or maybe they had whatever potential different
               | error you got without it being the SMR that killed the
               | arrays in the end.
               | 
               | Either way it made me never want to use WD for drives in
               | arrays and not trust their labeling anymore. "WD Red"
               | drives lost all meaning to me; who knows what they're
               | doing inside.
        
               | kstrauser wrote:
               | > Maybe your drives were shingled despite labeling
               | suggesting otherwise
               | 
               | I'm not ruling that out. The whole debacle was so
               | amazingly tonedeaf that I wouldn't be surprised if they
               | did that behind the scenes. I wrote this at the time:
               | https://honeypot.net/2020/04/15/staying-away-from.html
        
         | redox99 wrote:
         | I've had so many Seagate drives fail that I won't buy Seagate
         | again.
         | 
         | If a brand sells bad drives, they should be aware of the
         | reputational damage it causes. Otherwise there is no downside
         | to selling bad drives.
        
       | pcurve wrote:
       | Looks like WDC reliability has improved a lot in the past decade.
       | 
       | Seagate continues to trail behind competitors.
       | 
       | I guess they're basically competing on price? Because with data
       | like this, I don't know why anyone running data center would buy
       | Seagate over WD?
        
         | formerly_proven wrote:
         | The WDC models which are only somewhat more expensive than
         | Toshiba or Seagate tend to perform quite a lot worse than
         | those. Models with the same performance are significantly more
         | expensive.
        
       | louwrentius wrote:
       | If you buy drives based on there reports, make sure your drives
       | are operating within the same environmental parameters or these
       | stats may not apply
        
       | fencepost wrote:
       | As with every time these come out, _Remember that Backblaze 's
       | usage pattern is different from yours!_
       | 
       | Well, unless you're putting large numbers of consumer SATA drives
       | into massive storage arrays with proper power and cooling in a
       | data center.
        
       ___________________________________________________________________
       (page generated 2024-05-02 23:01 UTC)