[HN Gopher] Backblaze Drive Stats for 2022
       ___________________________________________________________________
        
       Backblaze Drive Stats for 2022
        
       Author : TangerineDream
       Score  : 175 points
       Date   : 2023-01-31 14:13 UTC (8 hours ago)
        
 (HTM) web link (www.backblaze.com)
 (TXT) w3m dump (www.backblaze.com)
        
       | controversial97 wrote:
       | Backblaze has stated in a blog post that they purchase drives in
       | large quantities direct from manufacturers and negotiate for good
       | prices. Backblaze says that a model of drive which is
       | significantly cheaper and somewhat less reliable can be ok for
       | them.
       | 
       | This means that these stats may not be a very meaningful
       | indication when purchasing a few drives from a retailer.
       | 
       | Never the less, the world is full of situations where you have to
       | make choices with insufficient data. I'm still going to prefer to
       | avoid Seagate hard drives where practical.
        
         | itchyouch wrote:
         | Even when they were shucking retail drives in the 1-4TB era,
         | Seagate didn't have the greatest stats, with about a 5-30%
         | failure rate (IIRC, over ~3 years or so), especially their 3T
         | drives.
         | 
         | I ran an array with 1.5T and 3T Seagates and within a 1-3
         | years, I replaced at least 1/3 of my Seagates, but I made sure
         | to replace the Seagates with Hitachis and Western Digitals,
         | even though they were slightly more expensive.
        
         | metadat wrote:
         | 100% with you.
         | 
         | Seagate is the manufacturer with drives that fail at 10x the
         | rate of the competition. I'll avoid, because drive failures are
         | annoying.
        
         | neilv wrote:
         | > _they purchase drives in large quantities direct from
         | manufacturers_
         | 
         | Given that the manufacturer knows that Backblaze publishes its
         | influential drive stats... do we know that the drive units that
         | the manufacturer ships to Backblaze are representative of the
         | quality of the same models available at retail?
        
           | zh3 wrote:
           | If we assume that Backblaze gets special treatment from every
           | manufactured who supplies it, that would suggest we're seeing
           | stats that are the best the manufacturer can manage. Which
           | seems a reasonable proxy for what they sell (lots of
           | statistical quibbling about this view aside).
        
             | nekoashide wrote:
             | Any more special treatment than they give AWS, GCP or
             | Azure? I feel they just manufacture them all the same
             | because there's no point in doing anyone any favors.
        
               | didgetmaster wrote:
               | I don't think anyone imagines that the hard drive
               | manufacturers are making special drives for Backblaze.
               | The question is 'Are they getting the cream of the crop?'
               | out of all the drives that are manufactured.
               | 
               | Every drive must go through a series of tests before it
               | is judged suitable for sale to the consumer. Drives that
               | fall below a certain threshold are judged 'failed' but
               | there can be a variety of drives that fall within the
               | 'passed' range. Some are just barely above the threshold,
               | others are way above.
               | 
               | It is kind of like binning for CPUs. The very best ones
               | can have their base clocks increased and sold at a
               | premium. The same kind of differentiation could be done
               | for hard drives.
        
           | brianwski wrote:
           | Disclaimer: I work at Backblaze.
           | 
           | > Do we know if these drives are the same as you would
           | purchase in a retail outlet
           | 
           | Seagate (for instance) won't sell ANYBODY drives directly, so
           | they force us to get bids from various resellers and
           | distributors. So when we pick the lowest price, I don't think
           | Seagate knows who the drives are going to, but there might be
           | a trick in there somewhere I am not aware of.
        
             | toomuchtodo wrote:
             | Have you considered becoming a distributor yourself? I
             | would assume at a certain purchasing scale, the cost
             | benefit is apparent, but perhaps not.
        
               | Kye wrote:
               | This is why every major tech company eventually becomes a
               | domain registrar.
        
               | jjeaff wrote:
               | Do all major tech companies really register that many
               | domains? I assumed that becoming a registrar was more
               | about pushing your control up the pipeline to reduce risk
               | if things like losing your domain name was the main
               | reason. Rather than saving money on registrations.
        
         | tjoff wrote:
         | The implication being that they might knowingly buy a subpar
         | batch or that a manufacturer would offer rebates for a subpar
         | batch, or what?
        
           | Macha wrote:
           | If drive A has a 3% failure/year rate and drive B a 1%
           | failure/year rate, but you can get 12 of drive A for the
           | price of 10 drive B and you're going to have stock on hand to
           | replace drives anyway and you're confidant in your backup
           | strategy then for a business it may still make sense to buy
           | drive A.
           | 
           | As a consumer, you're unlikely to keep an in-box spare, so
           | even if you have a good backup system, a drive failure is
           | still going to be an inconvenience, and leave you without
           | redundancy until your replacement is shipped to you, so you
           | might pay 20% more for drive A.
        
             | tjoff wrote:
             | I don't follow. The statistics backblaze post doesn't
             | implicate that you should buy the same drives they do.
             | 
             | It is because of theses statistics that you have (at least
             | an inkling of) a fighting chance to pick the drive that
             | more suits your usecase. Or perhaps a notion of brand
             | performance.
             | 
             | Of course, you won't get statistics about drives that they
             | don't buy. And if you then assume that more expensive
             | drives have better failure rates then that might skew the
             | data. But in there lies a lot of assumptions you really
             | can't make. For one, that an expensive drive for a consumer
             | is also an expensive for backblaze. And also, that there is
             | correlation between end user price and failure rates.
             | 
             | Which generally I don't really think there is, within the
             | alternatives that the consumer have access to (and
             | reasonable price-ranges) at least. And it probably will
             | vary between different markets as well.
        
               | Macha wrote:
               | Backblaze's article points out that sometimes they will
               | buy drives that are less reliable but cheaper, as that
               | makes sense for their business model.
               | 
               | The top level commenter points out that this strategy may
               | not make as much sense for the individual consumer where
               | you don't have backblaze levels of drives to amortise the
               | failure probability and deal with it.
               | 
               | That's it, that's the point. It's not a rebuttal of the
               | backblaze post at all, nor is it intended to be. The top
               | level commenter was simply pointing out "don't buy mostly
               | seagate drives because backblaze does" as backblaze
               | explained their strategy in drive selection and some
               | people may not consider the differences between their use
               | case and backblaze's.
        
               | tjoff wrote:
               | The whole point of the statistics is that you can pick
               | what you want. Why would you dismiss the statistics and
               | buy what backblaze does? Defeats the entire point of the
               | statistics and article in the first palce.
               | 
               | Top level comment: > _This means that these stats may not
               | be a very meaningful indication when purchasing a few
               | drives from a retailer._
               | 
               | That doesn't follow.
        
               | Dalewyn wrote:
               | >Why would you dismiss the statistics and buy what
               | backblaze does?
               | 
               | You assume people read the fine print and consider fine
               | details.
               | 
               | The vast majority of Joes are going to ask who/what
               | Backblaze is and what they're buying most and buy that,
               | screw the details.
        
           | mardifoufs wrote:
           | Not sure if that's what OP meant, but it could be that they
           | get better batches from manufacturers, while retail gets the
           | rest?
           | 
           | The other way you could read the comment would be that the
           | stats don't mean much since Backblaze can tolerate higher
           | defect rates if they negotiate a low enough price. But to me,
           | that doesn't make sense since it wouldn't affect the failure
           | rate statistics at all anyway.
        
         | iot_devs wrote:
         | I really struggle to follow this logic.
         | 
         | You could argue to remove the driver that are represented less
         | than 100 times (and it will be a stretch).
         | 
         | But once you have so many samples, there is no reason to not
         | believe that your driver will not follow the same distribution.
        
         | cm2187 wrote:
         | You converge to the mean fairly quickly. Build a NAS of 12
         | drives and those stats become meaningful.
         | 
         | That being said with 50TB drives just around the corner [1] and
         | SSD caching becoming the norm, 12 bays consumer NAS will become
         | rare.
         | 
         | [1] https://www.anandtech.com/show/18733/seagate-
         | confirms-30tb-h...
        
           | aidenn0 wrote:
           | At what point to increased rebuild times mean that the
           | increased redundancy requirements overwhelm the increased
           | capacity? If you're running large parity groups, the
           | redundancy required is rather small, but when you move to
           | smaller raid groups allowing for a second (or third) drive
           | failure significantly impacts total storage.
        
             | cm2187 wrote:
             | Agree, but that's what backups are for. RAID5 isn't a
             | backup, it's a way to stay online during the rebuild. If
             | you are using RAID5 but don't backup your files, you
             | probably don't care about those files that much.
        
               | tux3 wrote:
               | RAID or not, if you lose an active drive you will need to
               | 1) restore from backup, 2) fill a new backup drive
               | 
               | If it takes days to fill your new 50 TB backup drive, you
               | have the same problem RAID5 has with rebuild times. The
               | drive might fail in the middle.
               | 
               | RAID isn't a backup mostly because if you overwrite RAID
               | data, you wrote over every copy at once. Non RAID offsite
               | backups don't solve any problem related to drive size,
               | they just make it a lot less likely for a single event to
               | take everything out in the same minute.
        
               | cm2187 wrote:
               | Absolutely, you want RAID + backup, not one or the other.
               | 
               | 50TB at 200MB/s is about 72h. Doesn't seem to be a
               | particularly problematic rebuild time (and that's
               | assuming 100% filled). Of course you need to do regular
               | data scrubbing. If your rebuild is the first time the
               | data is being read in 5y, that might not go so well.
        
               | sn0wf1re wrote:
               | > 50TB at 200MB/s is about 72h.
               | 
               | Assuming the device is offline for users during that
               | time, otherwise there may be reduced throughput.
        
               | sliken wrote:
               | > RAID or not, if you lose an active drive you will need
               | to 1) restore from backup
               | 
               | Er what? I've had production machines lose a drive, have
               | it replaced, and never leave production. Why would you
               | need to restore from backups?
        
               | ThePowerOfFuet wrote:
               | > RAID or not, if you lose an active drive you will need
               | to 1) restore from backup
               | 
               |  _RAID 1 joins the chat_
        
               | aidenn0 wrote:
               | If you want <X% chance of having to restore from backups,
               | and rebuild times go up, you'll need to switch from RAID5
               | to RAID6 in order to maintain that.
        
           | Teever wrote:
           | Why do you think that 50TB drives will make 12 bay NAS become
           | rare?
        
             | Hamuko wrote:
             | We'll just hoard bigger files, like thus far.
             | 
             | My iPhone shoots 90 megabyte photos.
        
             | cm2187 wrote:
             | I don't know, my own experience is that drive capacity
             | increase exponentially but my storage need are fairly
             | linear, bigger files but not that many more bigger files.
        
               | sn0wf1re wrote:
               | I've noticed that media files seem to look "good enough"
               | at 720p for cartoons and 1080p for live action. I know
               | some people obsess over the highest quality of everything
               | (4k 10bit HDR) but even those came out in 2016[1] and
               | aren't that popular. Higher definition like 8k is
               | basically non existent, it can't run on old HDMI cables
               | and needs both a beefy decoder (Blasting heat) and a
               | powerful display driver as well.
               | 
               | [1] https://en.wikipedia.org/wiki/Ultra_HD_Blu-ray
        
               | crazygringo wrote:
               | > _Higher definition like 8k is basically non existent_
               | 
               | You're forgetting about VR. 8K in VR is very popular now.
               | If mainstream 2D HD is 720p and 1080p, the approximate
               | mainstream quality equivalents in 3D VR are 5K and 8K.
               | And you can watch it on a cheap Quest 2 headset.
        
       | tke248 wrote:
       | I have a theory that solar flares are the cause of some hard
       | drive failures would be interesting to see if a few lead shielded
       | cases would reduce the number of failures. I used to manage a
       | large fleet of computers and anytime we got radio interference
       | from solar flares we would have 3-5 hard drive failures that day.
        
         | rom-antics wrote:
         | What were the failure modes? Was it corrupted data or were the
         | drives permanently fried?
        
           | tke248 wrote:
           | Corrupt data that would compromise the Operating systems,
           | these were Dell computers with multiple different branded
           | hard drives they would have us run their hardware diagnostic
           | tool that would put them in the range to receive a free
           | replacements. We didn't have to send back the old ones under
           | the contract we had with them they would still work when
           | reformatted but were less reliable after that.
        
             | maccam94 wrote:
             | If you didn't disable the write cache on those drives,
             | flares could have caused bit flips in the cache memory
             | before it was flushed to disk.
        
             | GTP wrote:
             | So far I always assumed that, when talking about HDDs
             | failing rates, they where considering the typical
             | mechanical failure. I never considered that they could
             | declare a failure due to some corrupted data, although it
             | would be reasonable for a datacenter to do so.
        
               | tke248 wrote:
               | Mechanical failures were more prevalent in my experience
               | in 90s, most of the recent stuff is usually see are
               | controller failures. I rarely hear any head crash
               | clicking like the old days
        
         | dekhn wrote:
         | Any sufficiently large cluster is effectively a cosmic ray
         | detector with terrible sensitivity.
        
         | lazide wrote:
         | Out of what fleet size?
        
           | tke248 wrote:
           | Around a 1000
        
       | ck2 wrote:
       | I don't trust any of these helium filled drives to last more than
       | a decade.
       | 
       | Helium atom is too small and leaks through everything,
       | eventually.
       | 
       | Really hope I am proven wrong.
       | 
       | Unfortunately what backblaze is excellently documenting is not
       | archival use.
       | 
       | I've got half TB and 1TB WDC drives that are over a decade old
       | and still spin up fine, single platter and run cool and quiet
       | even air filled.
       | 
       | I think 4TB is the cutoff for air-filled but not sure anymore.
        
         | tzs wrote:
         | > Helium atom is too small and leaks through everything,
         | eventually.
         | 
         | Suppose the drive were encased in solid hydrogen? Hydrogen
         | freezes at 14 K and helium boils at 4 K so there's a 10 K range
         | where you could have both solid hydrogen and gaseous helium.
         | 
         | Hydrogen atoms are bigger than helium atoms, but what matters
         | is the gaps between the hydrogen in the solid. I was unable
         | with a bit of Googling to find out anything about the physical
         | structure of solid hydrogen.
        
         | geerlingguy wrote:
         | Many 8 TB and even up some 12 TB I believe don't have helium.
         | It depends on the model and manufacturer, so you have to dig
         | into the spec sheets to figure it out.
        
         | aequitas wrote:
         | But is the helium really required for the drive to function or
         | is it just an efficiency thing? So that if all the helium leaks
         | out the drive would just be slower or consume more power?
        
         | dannyw wrote:
         | There are helium sensors in the drives. It's reported via
         | SMART. It's monitorable.
        
         | dehrmann wrote:
         | I don't trust any spinning rust for a decade, but I suppose a
         | pre-helium model stored in good, stable conditions should still
         | spin up in a decade.
        
       | RDaneel0livaw wrote:
       | Looks like HGST continues to rule the pack, similar to prior
       | years. Though, I thought the entire brand of HGST was set to be
       | discontinued and Western Digital just taking over all naming /
       | marketing / etc... I even see when I search for "ultrastar" or
       | similar, it's all currently branded Western Digital. Does
       | backblaze get some special enterprise gear?
        
         | deeesstoronto wrote:
         | Western Digital got most of HGST - however HGST's 3.5in drive
         | business was sold to Toshiba. The HGST branding went to WD.
         | 
         | I believe hese new HGST drives are WDs.
         | 
         | The HGST drives in the older backblaze stats (which showed good
         | reliability) continue to be manufactured by/evolved into
         | Toshibas. There are several HGST model numbers that remained in
         | Toshiba's lineup.
         | 
         | https://www.anandtech.com/show/5635/western-digital-to-sell-...
        
         | segfaultbuserr wrote:
         | Some (all?) rebranded "WD Ultrastar" drives still report
         | themselves as "HGST" as their SMART device models. I'm not sure
         | how does Blackblaze count them. Perhaps they're (correctly)
         | counted via the SMART model regardless of branding?
         | 
         | Update: This issue has been covered in the Blackblaze 2020
         | report. They apparently existed in parallel. The original HGST
         | drives, while still existed as of 2020, were being gradually
         | phased out...
         | 
         | > _These drives obviously share their lineage with the HGST
         | drives, but they report their manufacturer as WDC versus HGST.
         | The model numbers are similar with the first three characters
         | changing from HUH to WUH and the last three characters changing
         | from 604, for example, to 6L4. We don't know the significance
         | of that change, perhaps it is the factory location, a firmware
         | version, or some other designation. If you know, let everyone
         | know in the comments. As with all of the major drive
         | manufacturers, the model number carries patterned information
         | relating to each drive model and is not randomly generated, so
         | the 6L4 string would appear to mean something useful._
        
         | andrewmunsell wrote:
         | I tend to buy my HDDs from Provantage now-a-days. I've never
         | bought an HGST drive (I have a combo of WD and Toshiba drives),
         | but the HGST branding seems more prominent at Provantage
         | (https://www.provantage.com/hgst-0f30146~7HGST0H9.htm) versus
         | elsewhere (ex. Amazon: https://www.amazon.com/HGST-Ultrastar-
         | HUH721212ALE600-3-5-In..., though I'd never buy an HDD from
         | Amazon today due to their shipping issues)
        
           | dspillett wrote:
           | _> though I 'd never buy an HDD from Amazon today due to
           | their shipping issues_
           | 
           | I'm wary of Amazon due to co-mingling issues. I can't be sure
           | I'm not getting something incorrectly packaged/labelled by
           | another seller so I get a reconditioned (or simply
           | counterfeit) unit instead of a new one.
        
         | numpad0 wrote:
         | I'm a total consumer in this topic, but I believe WD tried to
         | and externally communicated that to be the plan, and drives are
         | marked as such, but due to antitrust rulings(?) HGST remains
         | somewhat independent and the heritages remain somewhat separate
         | to each others. Also, there seems to be just handful of
         | "actual" drive models in HDD market - they seem to be swapped
         | Ladas and VW Beetles. Perhaps that makes calling them in old
         | names make sense.
        
       | bluedino wrote:
       | I wonder what the numbers need to look like to replace the
       | remaining 4TB pods with 16TB pods.
        
         | atYevP wrote:
         | Yev from Backblaze here -> That's a factor of time and power!
         | The 16TB drives consume more power so that's part of the
         | calculation when we swap from the 4TBs, but we are monitoring
         | them b/c we certainly want to avoid cluster failures and have
         | what we call "rolling migration" projects going on all the time
         | where we perpetually migration hard drives and hardware based
         | on durability projections, power balancing, physical capacity,
         | etc...
        
       | 2c2c2c wrote:
       | Anyone have experience using their s3 drop in replacement at
       | scale?
        
         | hedora wrote:
         | I have single digit TB in it, and my synology HyperBackups fail
         | extremely infrequently. (I'm not willing to attribute the
         | failures to their S3 implementation, since our ISP kind of
         | sucks.)
         | 
         | Anyway, I haven't seen any apparent data loss over the last few
         | years. I'm considering doing a full restore, just for the heck
         | of it. (Especially since this article convinced me I should
         | start replacing the aging drives in my NAS!)
        
           | a10c wrote:
           | HyperBackup is extremely disappointing.
           | 
           | At no point has it risen above 1MB/s when backing up to S3
           | (Backblaze) when other methods routinely saturate my upload
           | (40Mb/s)
        
         | dylan604 wrote:
         | My only experience was moving data from a b2 to an s3. While
         | there were some inefficiencies with the workflow, it went as
         | well as could be hoped for when cross chatting between cloud
         | vendors.
         | 
         | Installing their linux app was out of bounds of normal
         | procedures, but it be what it b2 <hangsHeadInShame>
        
         | james_in_the_uk wrote:
         | b2 or not b2, is that the question?
        
       | jms703 wrote:
       | If you're struggling to pick a drive, maybe the best approach is
       | to find a drive that meets your needs from a maker that has a
       | solid warranty program. With a good backup strategy, the maker of
       | the drive may not matter as much.
        
       | Hamuko wrote:
       | I'll never be able to understand how Seagate manages to make so
       | many failing hard drives. A 5.7% failure rate with an average age
       | of 2 years? The only one that comes close is the 5.3% HGST with
       | over twice the average age and overall low sample size.
       | 
       | Out of the many hard drives that I've owned in my life, the only
       | ones to die on me have both been Seagates, with the first one
       | being the infamous ST3000DM001, a hard drive so shit it has its
       | own Wikipedia article
       | (https://en.wikipedia.org/wiki/ST3000DM001).
        
         | the_third_wave wrote:
         | Either you're lucky or you just have not had that many drives
         | around...
         | 
         | The first to fail was a Miniscribe 51/4" 20MB, it died after a
         | week. I replaced it with a 31/2" Seagate connected to an
         | Adaptec RLL controller - this being before IDE was a thing the
         | controller decided the encoding scheme and RLL gave you 50%
         | extra storage space. This Seagate never died on me, I probably
         | have it around somewhere still. Then came the Western Digital
         | IDE drives, two of those died and took ~250MB of data with them
         | to data heaven. They were followed by another WD, this one
         | 1.2GB - it died. When I built a system around the famous Abit
         | BP-6 motherboard I put in two Maxtor drives - which I should
         | not have done, one of them died within 2 years. Meanwhile I'd
         | helped my father get a new drive, one of those fancy IBM
         | Deskstars. Within a year it had turned itself into a Deathstar
         | like most of its brethren seem to have done. Then there was the
         | 10 GB Travelstar which died, then the 20GB Travelstar which
         | also died - but survived the canoe expedition over the Yukon
         | safely tucked away into the water- and probably bulletproof
         | solar-powered Virgin Webplayer I had made to document the 31/2
         | month trip - and the 20GB Toshiba. The 2TB WD Green, dead. One
         | of the 1TB WD Greens, dead - but its mate still running strong
         | at a 120.000+ running hours with its 2TB brand mates coming in
         | second at 100.000+ hours. In the DS4243 array I have replaced 5
         | 15K 600GB drives, good that these were cheap as dirt (but take
         | quite a bit of power to run, hence the low price).
         | 
         | Notice that I have not had a Seagate drive die on me yet. There
         | are a few in an array somewhere here but those still do their
         | job even though they're quite old by now. Maybe I'm just lucky
         | in that I never bought any of the failing types since these
         | problems seem to be related to specific types.
        
         | pooloo wrote:
         | They also have a drive on that list that has an Avg Age of 92
         | months, also they have significantly more Seagate drives
         | compared to the rest. Additionally, most of these drives are
         | consumer drives which are not intended to be ran at 100% for
         | months on end.
        
           | paulmd wrote:
           | having high failure rates _despite_ having lots of drives is
           | actually worse - there is the chance the HGST failures are
           | just random chance due to small numbers of drives. But this
           | possibility does not exist for Seagate with much larger
           | statistical samples.
           | 
           | AFAIK there hasn't been much of a difference shown between
           | consumer workloads and enterprise workloads for HDD lifespan
           | either, it's just cope and theorycrafting from people who are
           | emotionally attached to the idea of Seagate not being shit
           | for some reason.
           | 
           | No other drives seem to have such problems with being used in
           | this fashion: what is your theory for why Seagate drives are
           | _uniquely_ affected by being in the pods in some fashion that
           | would not also affect WD drives or HGST drives or whoever
           | else? Are WD drives not affected by vibration for some
           | reason? For a while Seagate Deniers latched onto the first-
           | gen pods as maybe being the answer but they 're all long gone
           | at this point, this failure-rate anomaly is continuing even
           | in the newer pods.
           | 
           | The only reasonable possibility would be that Seagate drives
           | are simply constructed in an entirely different, less
           | resilient fashion, which (a) is not factually supported in
           | any way afaik, and (b) would still be very relevant for
           | consumers to know! It's not like a home PC is vibration-free
           | either after all.
           | 
           | There comes a point when it's not "steelmanning" it's just
           | denial of reality in the face of consistent evidence. Like
           | you're not "steelmanning" climate change you're just a
           | denier.
           | 
           | The data has pretty consistently showed the same thing for
           | 10+ years. It's not a "random sampling bias" that uniformly
           | affects everyone except Seagate in the exact same way almost
           | every single survey, it's not some magical factor that makes
           | Seagate drives uniquely unsuited to storage-array usage but
           | magically resilient when used in a home PC, it's not first-
           | gen backblaze pods being bad, it's just Seagate putting out
           | shitty drives, period the end. All these extremely complex
           | theories to get around the very simple conclusion that
           | Seagate has shitty parts or shitty QC and the failure rates
           | are slightly higher as a result.
           | 
           | A lot of the Seagate models are relatively OK, but almost all
           | of the "outlier" drives with really high failure rates are
           | Seagate. It is the old bayesian probability thing: get rid of
           | Seagate and you've gotten rid of almost all of the models
           | with high failure rates.
           | 
           | edit: sorry Samsung on the brain since they have another wave
           | of SSD failures too /laugh
        
       ___________________________________________________________________
       (page generated 2023-01-31 23:02 UTC)