[HN Gopher] Backblaze Hard Drive Stats for 2020
       ___________________________________________________________________
        
       Backblaze Hard Drive Stats for 2020
        
       Author : TangerineDream
       Score  : 279 points
       Date   : 2021-01-26 16:12 UTC (6 hours ago)
        
 (HTM) web link (www.backblaze.com)
 (TXT) w3m dump (www.backblaze.com)
        
       | peter_d_sherman wrote:
       | This brings up a good related question, which is:
       | 
       |  _What is the most reliable hard drive of all time, and why?_
       | 
       | In other words, let's say I didn't care about capacity, and I
       | didn't care about access time or data transfer rate, and I'd like
       | a drive to store data 100, 200+ years into the future and have it
       | have a reasonable chance of getting that data there -- then what
       | kind of hard drive should I choose, and why?
       | 
       | It's a purely philosophical question...
       | 
       | Perhaps I, or someone else, should ask this question on Ask HN...
       | I think the responses would be interesting...
        
       | lousken wrote:
       | With such a high capacity drive I wonder if they changed anything
       | in terms of redundancy - something like zfs resilvering prob
       | takes days on these
        
         | andy4blaze wrote:
         | Andy from Backblaze here. Larger drives do take longer to
         | rebuild, but to date we haven't changed the encoding algorithms
         | we built. There are other strategies like cloning which can
         | reduce rebuild time. We can also prioritize rebuilds or drop a
         | drive into read-only mode as needed. The system was built
         | expecting drive failures.
        
       | _nickwhite wrote:
       | If a Backblaze engineer (or Andy) is reading this, could you
       | comment on environmental temps and vibration you guys keep the
       | disks at? Thanks.
        
         | andy4blaze wrote:
         | Andy at Backblaze here. All the drives are in data centers with
         | temps around the 75-78 degree mark. Vibrations are kept to a
         | minimum via the chassis design. We publish the data, including
         | the SMART stats for all of the drives and there are attributes
         | for temperature (SMART 194) and vibration (multiple) see
         | https://en.wikipedia.org/wiki/S.M.A.R.T. for more info in SMART
         | attributes.
        
           | _nickwhite wrote:
           | Thanks Andy! Without me digging through the SMART data, has
           | there been any difference in data center temps over the years
           | for Backblaze? I ask because I've personally seen disk life
           | vary greatly between "warm" datacenters (closer to 79F) and
           | "cool" datacenters (closer to 72F). I don't have a huge
           | dataset, only anecdotal evidence, but it seems to me
           | temperature plays a pretty big role in drive longevity. Have
           | you guys found the same, or is this a variable not
           | controllable by Backblaze?
        
       | busterarm wrote:
       | I know that they're not always the most available, but I find it
       | interesting that they don't purchase _any_ Fujitsu drives.
       | 
       | After an experiment with their helium-filled drives, I've gone
       | all-in to great success.
        
         | dsr_ wrote:
         | Fujitsu sold their disk drive business to Toshiba in 2009,
         | didn't they?
        
         | numpad0 wrote:
         | They still make drives!? I've seen relics like 147GB 15K SAS
         | marked "FUJITSU LIMITED" but don't think they make 14TB 7K2
         | SATA that Backblaze uses.
        
           | busterarm wrote:
           | derp, see other reply.
        
         | busterarm wrote:
         | Stupid me, wrong company. I was thinking Toshiba.
         | 
         | I bought Toshiba 16TB drives -- MG08ACA16TE
         | 
         | And hey! 0.0% failure rate!
        
           | zepearl wrote:
           | Bought 8 Toshiba N300 (HDWN180, 8TB) consumer HDDs, maybe
           | about one year ago, and they're all still working.
        
       | numpad0 wrote:
       | The stats looks stable and consistent with common knowledge --
       | HGST > WD >> Seagate, Toshiba inconclusive.
       | 
       | Do anyone have anecdata on Toshiba MD/MG/MN 7K2 drives(excluding
       | DT because those are HGST OEMs)? They are always less price
       | competitive and thus always light on real-world stories though
       | they seem comparably reliable as HGST.
        
         | dbalan wrote:
         | HGST is sadly just WD now.
        
           | hinkley wrote:
           | And Audi is just Volkswagen. Except it isn't (Audi tech shows
           | up on VW when it reaches economies of scale, eg DSG gearboxes
           | ~12 years ago)
           | 
           | Is HGST still a separate department or "just" their luxury
           | brand now?
        
             | freeone3000 wrote:
             | It's still a separate factory making separate drives. This
             | line even uses a different storage controller. But this is
             | also true for luxury ranges, in general, so you may be
             | asking for too fine of a distinction. (Their usual luxury
             | range is the WD Red, however.)
        
               | robhu wrote:
               | A luxury range where they occasionally sneak shingled
               | drives in without telling you
               | https://blocksandfiles.com/2020/04/14/wd-red-nas-drives-
               | shin...
        
               | hinkley wrote:
               | Which is why I asked. That sounds like the sort of thing
               | that happens when it's just a label instead of a
               | division.
        
           | ksec wrote:
           | I dont think that is the case, as in they are not mixed up in
           | production and sold with different Brand.
           | 
           | They try to get rid of the HGST brand but failed. And had to
           | go back to HGST branding specifically for HDD coming from the
           | acquired HGST Factory.
           | 
           | i.e AFAIK HGST is still HGST.
        
         | nolok wrote:
         | Following Blackblaze report on them a few years ago "they
         | appear to be great, but we buy in bulk and there is not enough
         | volume of them there for us", I decided to use Toshiba drives
         | almost exclusively, as a sort of fun experiment.
         | 
         | Drives deployed below 400, usage exclusively NAS (raid 1,
         | raid10 and raid 6) inside below 50 employees companies. They
         | appear to be insanely reliable and high performers, to the
         | point the +15% price tax for them in french store seems highly
         | justified for me.
         | 
         | Purely anecdotal results, of course.
        
         | R0b0t1 wrote:
         | Unsure about the >> Seagate in there, that info is over a a
         | decade old now(?). It's worth pointing out they have a 12%
         | failure rate on their highest density units but the other ones
         | seem to do well outside of a DC environment.
        
       | neogodless wrote:
       | > Over the last year or so, we moved from using hard drives to
       | SSDs as boot drives. We have a little over 1,200 SSDs acting as
       | boot drives today. We are validating the SMART and failure data
       | we are collecting on these SSD boot drives. We'll keep you posted
       | if we have anything worth publishing.
       | 
       | Would love to see SSD stats like this in the future. Recently was
       | talking to some friends about what SSD to buy. I personally
       | really like my HP EX950 - one friend said he'd never buy HP
       | hardware. He said he was getting an Intel - I said I had an early
       | Intel SSD fail on me, and I don't think QLC is the best option,
       | but it is a nice value play. For performance, I do like Samsung,
       | though they are expensive. Another friend said he'd never buy a
       | Samsung SSD, as he had a reliability issue, and found lots of
       | similar stories when he was researching it.
       | 
       | Of course these are all anecdotes and they aren't useful in
       | making an informed choice. I suspect most SSDs are reliable
       | "enough" for most consumer use, and not nearly reliable enough
       | for certain kinds of critical storage needs. But it would still
       | be nice to see the big picture, and be able to factor that into
       | your SSD purchase decisions.
        
       | Merman_Mike wrote:
       | I'm planning a few encrypted long-term backups (ie. stick it in a
       | temperature controlled safe for a few years).
       | 
       | What's the best medium for this? SSD or HDD?
        
         | louwrentius wrote:
         | Tape. But that's probably unrealistic.
         | 
         | Otherwise HDD.
         | 
         | Archiving is not a one-time thing, but a process.
         | 
         | If you care about the data, you should periodically check the
         | media and at some point replace the media as they age.
        
         | stevezsa8 wrote:
         | This is what I'd recommend.
         | 
         | 1. Backup to external SSD or NAS. This is the backup you will
         | rely on if your PC loses all data. It will be fast to replicate
         | to.
         | 
         | 2. Mirror the external backup to a second external SSD. And
         | sync it every week or month. Sync more often if your data is
         | changing a lot.
         | 
         | 3. The third layer is an external HDD mirror for the long term
         | off-site backups. HDD are cheaper and more suited for being
         | switched off long term.
         | 
         | 4. If you can afford the expense of a forth step, every year
         | buy another external HDD and put the previous one aside as an
         | archive to be brought into service if the current one fails to
         | boot.
         | 
         | I recommend separating your data into some short of hierarchy
         | and choose what needs to be backed up to what level. So if you
         | have some software ISOs that you could repurchase/redownload,
         | then have a separate drive for junk like that and don't have it
         | go all the way through the backup steps listed above.
        
         | Spooky23 wrote:
         | Figure out what you really need and print it on good paper. Put
         | that in a safe place, away from direct light and dampness.
         | 
         | Save the rest on two of Google Drive, OneDrive, iCloud, some
         | other cloud storage, a backup service or copy to a computer in
         | your home. Make your selection based on things that you will
         | "touch" in some way at least every 12-24 months. Everything
         | else will fail in a few years.
         | 
         | Don't save crap you don't need. Don't futz around with optical
         | media, tape or other nonsense. Don't buy safes or safe deposit
         | boxes unless that's going to be part of your routine in some
         | way.
        
           | ghaff wrote:
           | >Don't save crap you don't need.
           | 
           | I tend to agree with this although it can be hard to
           | determine what you won't want/need in advance and it probably
           | takes at least some effort to winnow things down.
           | 
           | That said, I'm in the middle of going through my photos right
           | now and deleting a bunch of stuff. (Which is a big job.) It's
           | not so much for the storage space as I'll "only" be deleting
           | a few hundred GB. But it's a lot easier to look for stuff and
           | manage it when you don't have reams of near-identical or just
           | lousy pics. One of my takeaways from this exercise is that I
           | should really be better at pruning when I ingest a new batch.
        
         | WrtCdEvrydy wrote:
         | I'd argue SSD here... those memory chips should be good for a
         | few years.
        
           | kiririn wrote:
           | More like 1 year at best
           | 
           | Modern SSDs not only sacrifice endurance and sustained
           | performance, they also sacrifice power off data retention
        
           | Aardwolf wrote:
           | How long is a few years? What would be a good recommendation
           | for decades? Time goes fast!
        
           | ineedasername wrote:
           | A few years isn't archival quality. An HDD will last longer
           | and is cheaper, and speed is much less of an issue for a
           | drive that will be written to and then chucked in a safe.
        
         | cm2187 wrote:
         | I'd suggest to save the encryption software along the drive
         | (unencrypted)!
         | 
         | Sounds like a good fit for SMR archive HDD.
        
           | theandrewbailey wrote:
           | Separately storing the encryption software isn't needed if
           | you use LUKS.
        
         | Proven wrote:
         | Amazon Glacier
        
         | ineedasername wrote:
         | SSD's are not as bad as they used to be, but still not rated
         | for long term unpowered storage. HDD would be better for that.
         | 
         | But HDD isn't your only other option. How important is the
         | data, How often will you need to access it, and will you need
         | to rewrite to the storage medium? You might want to consider
         | Blu Ray. Or both, stored in different locations. Also look into
         | LTO tape drives. LTO 6 drives should be cheaper than 7/8
         | (though still not cheap) and have a capacity around 6TB.
        
           | gruez wrote:
           | >Also look into LTO tape drives. LTO 6 drives should be
           | cheaper than 7/8 (though still not cheap) and have a capacity
           | around 6TB.
           | 
           | AFAIK a post on /r/datahoarders says that the breakeven point
           | for tapes vs shucked hard drives from a pure storage
           | perspective is around 50TB. Given the hassle associated with
           | dealing with tapes, it's probably only really worth it if you
           | have 100+TB of data to store.
        
             | klodolph wrote:
             | I can vouch for the 50TB figure, it's around there.
             | 
             | The amount of hassle depends on your workflow. If you
             | create a backup every day and then bring the media off-
             | site, tape is easier. Easy enough to put a tape in your
             | drive, make the backup, and eject. Tape is not sensitive to
             | shock and you can just chuck the tapes in your care or
             | shove them in your backpack.
        
               | bch wrote:
               | > Tape is not sensitive to shock and you can just chuck
               | the tapes in your car
               | 
               | Apocryphal story from university - somebody did this and
               | reckons electro-magnetic leakage from their heated seats
               | wrecked their info
        
               | klodolph wrote:
               | Modern media is much more resistant to this kind of
               | stuff.
        
             | dehrmann wrote:
             | What do you think the availability of LTO 6 drives will be
             | in 10 years? The major benefit of SATA, and even Bluray, is
             | the interface and drive will likely still exist in 10
             | years.
        
               | _jal wrote:
               | Given that you can buy LTO-1 (commercialized in 2000)
               | drives and tapes today, and given the size of the market,
               | I suspect they'll be around.
        
               | fl0wenol wrote:
               | I'm still able to interface with an LTO 1 tape drive.
               | It's all SCSI or SAS. Secondary markets like Ebay have
               | made this surprisingly affordable (used drive, unopened
               | older media).
               | 
               | LTO is nice in that they mandate backwards compatibility
               | by two revisions, which come out once every 3 years or
               | so. So that gives you time to roll forward to new media
               | onto a new drive without breaking the bank, and giving
               | time for the secondary market to settle.
               | 
               | Adding: This was a deliberate decision by the LTO
               | Consortium; they wanted users to perceive LTO as the
               | safest option for data retention standards.
        
               | cptskippy wrote:
               | LTO 6 is like 10 years old, so the availability in 10
               | years will probably be limited. That being said, LTO 7
               | drives are able to read LTO 6 so that might increase your
               | chances.
        
           | kiririn wrote:
           | > SSD's are not as bad as they used to be
           | 
           | Those extra bits they squeeze into QLC etc literally do make
           | SSDs worse at power off retention
        
         | comboy wrote:
         | Why not b2 or glacier since you're encrypting anyway? If you
         | don't have that much data then maybe M-DISC?
         | 
         | Personally I think safe is.. unnecessary. What is it protecting
         | you from when your data is encrypted? If you put it in a safe
         | then you probably care enough about the data not to have it in
         | a single location no matter how secure it seemingly is.
        
           | [deleted]
        
           | sigstoat wrote:
           | > What is it protecting you from when your data is encrypted?
           | 
           | various forms of physical damage, including fire and
           | accidental crushing
           | 
           | where do you think they ought to store their drives?
           | 
           | a little safe that will hold easily 100TB costs $50 and can
           | hold your passport and such too.
        
             | emidln wrote:
             | Ignoring for a moment how insecure most cheap locks are
             | (including locks on safes), little safes are rarely
             | effective vs a prybar + carrying them away to be cut into
             | at the attacker's leisure. Larger safes have some of the
             | same issues w.r.t. cutting, but you can make it less
             | convenient for an adversary to do it (and make them spend
             | more time where they might be caught).
        
               | rmorey wrote:
               | All true, but I think the threat model here really is
               | fire, flooding, etc
        
               | sigio wrote:
               | The $50 safes are not fire-rated... and hardly break-in
               | rated. For fire-safety you need something big, and mostly
               | heavy, which will be costly (shipping/moving it alone)
        
               | dharmab wrote:
               | Honeywell and First Alert sell small fire safes for
               | around $100 that actually hold up to fire and water
               | damage.
               | 
               | https://www.nytimes.com/wirecutter/reviews/best-
               | fireproof-do...
               | 
               | Break-ins are not in my threat model for a document safe.
               | If they were, I'd get a deposit box at a bank. I just
               | want some of my personal mementos and documents to
               | survive a fire.
        
         | parliament32 wrote:
         | It'd probably be cheaper to stick it in Glacier or GC Archive
         | ($0.0012/GB/month).
        
         | robotmay wrote:
         | What about tape? I suppose the cost of the drive is prohibitive
         | but I was under the assumption that was used for a lot of long-
         | term storage.
        
           | derekp7 wrote:
           | I would imagine previous generation tape drives (used) can be
           | economical. Just need to find a reliable place that handles
           | testing / refurbishing (cleaning, alignment, belts, etc) used
           | drives. Also the other bit item is needing the appropriate
           | controller and cabling.
        
             | kiririn wrote:
             | Tape drives are open about both their condition and the
             | condition of tapes. It's all there in the scsi log pages,
             | more detailed than SMART on hard drives.
             | 
             | Mechanically and electrically, everything is rated to last
             | several times longer than the head
             | 
             | In other words, you just need to buy two used drives (one
             | as spare) and verify they can write a full tape and their
             | head hours and other error counters are sane. There is no
             | reasonable need to refurbish a tape drive other than a head
             | replacement, which is easy to do at home but so expensive
             | (for older generations) that you might as well buy a new
             | drive. All the testing you could hope for is done in POST
             | and by LTT/equivalent (writing a tape and reading logs is
             | good enough)
        
             | klodolph wrote:
             | You (more or less) just need a fiber channel card, they're
             | pretty mundane otherwise.
        
         | DanBC wrote:
         | Good quality DVD, in tyvek sleeves, with copious amounts of
         | PAR2 data, in multiple places.
        
           | Hamuko wrote:
           | Why tyvek sleeves in particular?
        
             | DanBC wrote:
             | It's easier to find tyvek sleeves that are sold as being
             | suitable for archive purposes.
        
         | theandrewbailey wrote:
         | How long is 'a few years'? Controlled environments shouldn't be
         | necessary for unplugged drives, just keep them at or slightly
         | below room temperature.
         | 
         | I've had three external hard drives for 7 years, and none have
         | stopped working. I have one, and keep two somewhere else
         | (office, family). I connect one for a few hours every
         | week/month to update, then leave alone until needed, or rotated
         | with one elsewhere.
        
           | Merman_Mike wrote:
           | I'd want to verify the existing data and maybe add some data
           | once a year or less.
        
         | mrkurt wrote:
         | Probably writable blu ray.
        
           | xellisx wrote:
           | Check out M Disc: https://mdisc.com/
        
             | Hamuko wrote:
             | AFAIK, M Disc really only matters for DVDs due to their
             | organic materials. (Non-LTH) BDs on the other hand have
             | inorganic materials and last pretty well.
             | 
             | I think there was a French study that compared DVDs, M
             | Discs and BDs and the HTL BDs fared very well. Can't find
             | the document though.
        
               | [deleted]
        
               | smarx007 wrote:
               | I don't think MDisks were compared if that's the study.
               | 
               | https://club.myce.com/t/french-study-on-bd-r-for-
               | archival/30...
               | 
               | https://francearchives.fr/file/5f281a39048987dcef88202816
               | a5c...
        
               | Hamuko wrote:
               | I think it was a separate one.
        
             | kamranjon wrote:
             | this is the weirdest website
        
         | toomuchtodo wrote:
         | LTO tape (specifically that which is rated for 15-30 years of
         | archival storage) with the drive. The tape is usually rated for
         | a couple hundred full passes, which should more than meet your
         | needs if you're writing once and sticking them somewhere safe.
         | 
         | SSDs don't have this archival longevity yet, and hard drives
         | are better when powered up and the data is always hot for
         | scrubbing and migrating when indicators of drive failure
         | present.
        
           | einpoklum wrote:
           | Don't LTO tape drives cost about a zillion dollars each?
        
             | toomuchtodo wrote:
             | I recommend acquiring them second hand (but validated) for
             | personal use.
        
         | Hamuko wrote:
         | I've had the impression that just having a HDD sit around
         | doesn't do it good and it might just fail when you replug it
         | the next time.
        
       | tyingq wrote:
       | I'm always excited for this yearly post. Are there other vendors
       | that provide this kind of insightful info for other types of
       | infrastructure?
       | 
       | Also, kudos to Backblaze. I'm sure there's some side brand
       | benefit to all the work that goes into making this public, but
       | it's clear it's mostly just altruism.
        
       | louwrentius wrote:
       | I wonder why Western Digital is almost absent, does anyone know
       | why?
        
         | atYevP wrote:
         | Yev from Backblaze here -> We just started deploying more of
         | them! We added almost 6k WDC drives in Q4 - so we're getting
         | more of them in the fleet! They have a pretty low AFR - but
         | haven't been deployed for too long, so they'll be interesting
         | to follow!
        
           | hinkley wrote:
           | Is your drive usage really homogenous or do you have
           | situations where slightly more reliable drives are
           | prioritized? Like say for servers or logging.
        
             | atYevP wrote:
             | We do have some drives that we use as boot drives and for
             | logging. We write about them a bit in the post - they're
             | primarily SSDs, so not included in the overall mix of data
             | drives!
        
           | louwrentius wrote:
           | Thanks for sharing!
        
         | brianwski wrote:
         | Disclaimer: I work at Backblaze.
         | 
         | > I wonder why Western Digital is almost absent, does anyone
         | know why?
         | 
         | Most of the time the answer comes down to price/GByte. But it
         | isn't QUITE as simple as that.
         | 
         | Backblaze tries to optimize for total cost most of the time.
         | That isn't just the cost of the drive, a drive that is twice as
         | large in storage still takes the same identical amount of rack
         | space and often the same electricity as the drive that is half
         | the storage. This means that we have a spreadsheet and
         | calculate what the total cost over a 5 year expected lifespan
         | will turn out to be. So for example, even if the drive that is
         | twice as large costs MORE than twice as much it can still make
         | sense to purchase it.
         | 
         | As to failure rates, Backblaze essentially doesn't care what
         | the failure rate of a drive is, other than to factor that into
         | the spreadsheet. If we think one particular drive fails 2% more
         | of the time, we still buy it if it is 2% cheaper, make sense?
         | 
         | So that's the answer most of the time, although Backblaze is
         | always making sure we have alternatives, so we're willing to
         | purchase a small number of pretty much anybody's drives of
         | pretty much any size in order to "qualify" them. It means we
         | run one pod of 60 of them for a month or two, then we run a
         | full vault of 1,200 of that drive type for a month or two, just
         | in case a good deal floats by where we can buy a few thousand
         | of that type of drive. We have some confidence they will work.
        
           | louwrentius wrote:
           | Thank you for this very elaborate and detailed answer!
           | 
           | Disclaimer: (I'm a customer)
        
           | Hamuko wrote:
           | > _As to failure rates, Backblaze essentially doesn 't care
           | what the failure rate of a drive is, other than to factor
           | that into the spreadsheet._
           | 
           | Guessing shit like the ST3000DM001 is a whole different thing
           | entirely.
        
             | brianwski wrote:
             | > Guessing shit like the ST3000DM001 is a whole different
             | thing entirely.
             | 
             | :-) Yeah, there are times where the failure rate can rise
             | so high it threatens the data durability. The WORST is when
             | failures are time correlated. Let's say the same one
             | capacitor dies on a particular model of drive after
             | precisely 6 months of being powered up. So everything is
             | all calm and happy and smooth in operations, and then our
             | world starts going sideways 1,200 drives at a time (one
             | "vault" - our minimum unit of deployment).
             | 
             | Internally we've talked some about staggering drive models
             | and drive ages to make these moments less impactful. But at
             | any one moment one drive model usually stands out at a good
             | price point, and buying in bulk we get a little discount,
             | so this hasn't come to be.
        
               | benlivengood wrote:
               | > Internally we've talked some about staggering drive
               | models and drive ages to make these moments less
               | impactful. But at any one moment one drive model usually
               | stands out at a good price point, and buying in bulk we
               | get a little discount, so this hasn't come to be.
               | 
               | I don't know what your software architecture looks like
               | right now (after reading the 2019 Vault post) but at some
               | point it probably makes sense to move file shard location
               | to a metadata layer to support more flexible layouts to
               | work around failure domains (age, manufacturer, network
               | switch, rack, power bus, physical location, etc.), reduce
               | hotspot disks, and allow flexible hardware maintenance.
               | Durability and reliability can be improved with two
               | levels of RS codes as well; low level (M of N) codes for
               | bit rot and failed drives and a higher level of (M2 of
               | N2) codes across failure domains. It costs the same
               | (N/M)*(N2/M2) storage as a larger (M*M2 of N*N2) code but
               | you can use faster codes and larger N on the (N,M) layer
               | (e.g. sse-accelerated RAID6) and slower, larger codes
               | across transient failure domains under the assumption
               | that you'll rarely need to reconstruct from the top-level
               | parity, and any 2nd-level shards that do need to be
               | reconstructed will be using data from a much larger
               | number of drives than N2 to reduce hotspots. This also
               | lets you rewrite lost shards immediately without physical
               | drive replacement which reduces the number of parities
               | required for a given durability level.
               | 
               | This paper does something similar with product codes:
               | http://pages.cs.wisc.edu/~msaxena/new/papers/hacfs-
               | fast15.pd...
        
           | hinkley wrote:
           | Is it safe to say that Backblaze essentially has an O(logn)
           | algorithm for labor due to drive installation and
           | maintenance, so up front costs and opportunity costs due to
           | capacity weigh heavier in the equation?
           | 
           | The rest of us don't have that, so a single disk loss can
           | ruin a whole Saturday. Which is why we appreciate that you
           | guys post the numbers as a public service/goodwill generator.
        
             | brianwski wrote:
             | > algorithm for labor due to drive installation and
             | maintenance ... the rest of us don't have that so a single
             | disk loss can ruin Saturday
             | 
             | TOTALLY true. We staff our datacenters with our own
             | datacenter technicians (Backblaze employees) 7 days a week.
             | When they arrive in the morning the first thing they do is
             | replace any drives that failed during the night. The last
             | thing they do before going home is replacing the drives
             | that failed during the day so the fleet is "whole".
             | 
             | Backblaze currently runs at 17 + 3. 17 data drives with 3
             | calculated parity drives, so we can lose ANY THREE drives
             | out of a "tome" of 20 drives. Each of the 20 drives in one
             | tome is in a different rack in the datacenter. You can read
             | a little more about that in this blog post:
             | https://www.backblaze.com/blog/vault-cloud-storage-
             | architect...
             | 
             | So if 1 drive fails at night in one 20 drive tome we don't
             | wake anybody up, and it's business as usual. That's totally
             | normal, and the drive is replaced at around 8am. However,
             | if 2 drives fail in one tome pagers start going off and
             | employees wake up and start driving towards the datacenter
             | to replace the drives. With 2 drives down we ALSO
             | automatically stop writing new data to that particular tome
             | (but customers can still read files from that tome),
             | because we have notice less drive activity can lighten
             | failure rates. In the VERY unusual situation that 3 drives
             | are down in one tome every single tech ops and datacenter
             | tech and engineer at Backblaze is awake and working on THAT
             | problem until the tome comes back from the brink. We do NOT
             | like being in that position. In that situation we turn off
             | all "cleanup jobs" on that vault to lighten load. The
             | cleanup jobs are the things that are running around
             | deleting files that customers no longer need, like if they
             | age out due to lifecycle rules, etc.
             | 
             | The only exceptions to our datacenters having dedicated
             | staff working 7 days a week are if a particular datacenter
             | is small or just coming online. In that case we lean on
             | "remote hands" to replace drives on weekends. That's more
             | expensive per drive, but it isn't worth employing
             | datacenter technicians that are just hanging out all day
             | Saturday and Sunday bored out of their minds - instead we
             | just pay the bill for remote hands.
        
               | quyse wrote:
               | Is it actually required to have employees wake up and
               | replace that specific failed drive to restore full
               | capacity of a tome? I would expect an automatic process -
               | disable and remove failed drive completely from a tome,
               | add SOME free reserve drive at SOME rack in the
               | datacenter to the tome, and start populating it
               | immediately. And originally failed drive can be replaced
               | afterwards without hurry.
        
               | hinkley wrote:
               | > because we have notice less drive activity can lighten
               | failure rates
               | 
               | It's a rite of passage to experience a second drive
               | failure during RAID rebuild/ZFS resilvering.
               | 
               | I got to experience this when I built a Synology box
               | using drives I had around and ordering new ones.
               | 
               | One of the old drives ate itself and I had to start over.
               | Then I did the math on how long the last drive was going
               | to take, realized that since it was only 5% full it was
               | going to be faster to kill the array and start over a 3rd
               | time. Plus less wear and tear on the drives.
        
               | [deleted]
        
         | Hamuko wrote:
         | Aren't all of the HGSTs essentially WDs?
        
           | louwrentius wrote:
           | Yes, but does that explain why there are almost no WD branded
           | drives?
        
           | numpad0 wrote:
           | IIRC, WD > 6TB are all 7K2 and HGST. HGST itself is "a WD
           | company" but for antitrust reasons the corporates are
           | separate.
        
             | syshum wrote:
             | HGST has a brand was phased out years go, and the company
             | was not separate for Anti-Trust reason
             | 
             | WD had to sell and license some HGST assets for Anti-Trust
             | reasons, but there was never a requirement for them to be a
             | seperate company
             | 
             | HGST Websites and other things today just redirect to WD,
             | there is no new HGST branded things being made as far as I
             | am aware
        
       | wellthisisgreat wrote:
       | Since this is a thread about HDDs, can someone recommend a quiet
       | HDD? Or is it a pipe dream and one should stick to SSDs?
        
         | glenneroo wrote:
         | Since moving to Fractal Design Define (i.e. soundproofed)
         | version 4/5/6/7 cases over the past ~10 years, the only noise I
         | ever hear anymore is from the fans, primarily GPU and case fans
         | when running heavier jobs. In my main system I have 6 spinning
         | rust drives (8-16TB from various manufacturers) running 24/7
         | and I never hear them, even during heavy read/writes... and I
         | often sleep on the couch nearby ;)
        
         | lostlogin wrote:
         | I can do the opposite - 16tb Seagate exos drives are very loud.
         | Great drives, but horrible noise.
         | 
         | If I were you I'd be looking at slow ones, 5400 WD drives.
        
       | jedberg wrote:
       | > The AFR for 2020 dropped below 1% down to 0.93%. In 2019, it
       | stood at 1.89%. That's over a 50% drop year over year... In other
       | words, whether a drive was old or new, or big or small, they
       | performed well in our environment in 2020.
       | 
       | If every drive type, new and old, big and small, did better this
       | year, maybe they changed something in their environment this
       | year? Better cooling, different access patterns, etc.
       | 
       | If this change doesn't have an obvious root cause, I'd be
       | interested in finding out what it is if I were Backblaze. It
       | could be something they could optimize around even more.
        
         | cm2187 wrote:
         | Or perhaps as time passes, a greater portion of their storage
         | is rarely accessed archives so more disks in % are sitting
         | doing nothing.
        
           | codezero wrote:
           | I'm just assuming that folks doing archival storage aren't
           | using these kinds of spiny disks as it would be super
           | expensive compared to other mediums, right?
           | 
           | I do think access patterns in general should contribute to
           | the numbers so that kind of thing can be determined.
        
             | freeone3000 wrote:
             | Compared to what, exactly? Tape is cheaper per GB, but the
             | drives and libraries tip that over the other way. Blu-Ray
             | discs are now more expensive per GB than hard drives,
             | thanks to SMR and He offerings.
             | 
             | Also note that Backblaze does _backups_ -- by definition,
             | these are infrequently accessed, usually write-once-read-
             | never. I 've personally been a customer for three years and
             | performed a restore exactly once.
        
               | guenthert wrote:
               | Despite claims to the contrary, tape isn't dead just yet.
               | They are still _considerably_ cheaper than drives. An
               | LTO-8 tape (12TB uncompressed capacity) can be had for
               | about $100, while a 12TB HDD goes for some $300. Tape
               | drives /libraries are quite expensive though, but that
               | just shifts the break-even point out. For the largest
               | sites, its still economical. Not sure, if backblaze is
               | big enough (I'm sure they did their numbers). backglacier
               | anyone?
        
               | birdman3131 wrote:
               | I bought a pair of 12tb drives for $199 the other day and
               | they often go cheaper. Now admittedly if you shuck
               | externals you loose warranty but we are keeping them in
               | the enclosures as these are for backups and thus the ease
               | of taking them off site is great for us.
        
               | R0b0t1 wrote:
               | They claim you lose the warranty but are wrong, they
               | still have to prove you damaged it. Federal law: https://
               | en.wikipedia.org/wiki/Magnuson%E2%80%93Moss_Warranty...
        
               | StillBored wrote:
               | And a number of the library vendors' libraries last for
               | decades with only drive/tape swaps along the way. The
               | SL8500 is on its second decade of sales for example.
               | Usually what kills them is the vendor deciding not to
               | release firmware updates to support the newer drives. The
               | stock half inch cartridge form factor dates from 1984
               | with DLT & 3480. Given there have been libraries with
               | grippers capable of moving a wide assortment of
               | DLT/LTO/TXX/etc cartridges at the same time. Its doubtful
               | if that will change anytime in the future. So if you buy
               | one of the big libraries today it will likely last
               | another decade or two, maybe three. There aren't many
               | pieces of IT technology you can utilize that long.
        
               | codezero wrote:
               | I was specifically thinking of the SKUs - I assumed they
               | were using faster disks rather than high volume disks
               | that make trade-offs for costs. Just assumptions on my
               | part - and I am mostly curious for more data, but given
               | the historical trends, I'm not terribly suspicious of the
               | actual results here.
        
               | thebean11 wrote:
               | Backblaze has backup and raw storage S3 type services.
               | I'm not sure what uses the majority of their disk space.
        
               | StillBored wrote:
               | Drive enclosures, raid/etc interfaces, and motherboards
               | burning electricity make it a lot more complex than raw
               | HD's vs raw tape. Tape libraries cost a fortune, but so
               | do 10+ racks of cases+power supplies+servers needed to
               | maintain the disks of equal capacity.
               | 
               | Tape suffers from "enterprise" which means the major
               | vendors price it so that its just a bit cheaper than
               | disk, and they lower their prices to keep that equation
               | balanced because fundamentally coated mylar/etc wrapped
               | around a spindle in an injection molded case is super
               | cheap.
        
             | cm2187 wrote:
             | But even if it is hot storage, do you touch all your files
             | every day? It is bound that you accumulate over time more
             | files that you never access.
        
               | codezero wrote:
               | Yeah hard to say, I assume the diversity of their
               | customers and normal distribution should statistic those
               | patterns out but I have no clue :)
               | 
               | I also wonder if failures are relating to physical
               | location on disk vs other things like controller or
               | mechanical failures.
               | 
               | You may not be reading old files but you're reading
               | files, or not, so still depends also on over all
               | utilization.
        
           | hinkley wrote:
           | If they've hit on a different access pattern that is more
           | gentle, that might be something useful for posterity and I
           | hope they dig into that possibility.
           | 
           | There's also just the possibility that failure rates are
           | bimodal and so they've hit the valley of stability.
           | 
           | Are they tracking wall clock time or activity for their
           | failure data?
        
             | brianwski wrote:
             | Disclaimer: I work at Backblaze.
             | 
             | > If they've hit on a different access pattern that is more
             | gentle, that might be something useful for posterity and I
             | hope they dig into that possibility.
             | 
             | Internally at Backblaze, we're WAY more likely to be
             | spending time trying to figure out why drives (or something
             | else like power supplies or power strips or any other pain
             | point) is failing at a higher rate, than looking into why
             | something is going well. I'm totally serious, if something
             | is going "the same as always or getting better" it just
             | isn't going to get much of any attention.
             | 
             | You have to understand that with these stats, we're just
             | reporting on what happened in our datacenter - the outcome
             | of our operations. We don't really have much time to do
             | more research and there isn't much more info than what you
             | have. And if we stumbled upon something useful we would
             | most likely blog about it. :-)
             | 
             | So we read all of YOUR comments looking for the insightful
             | gems. We're all in this together desperate for the same
             | information.
        
               | hinkley wrote:
               | Seems to me that every drive failure causes read/write
               | amplification, so a small decrease in failure rates would
               | compound. Have you folks done any other work to reduce
               | write amplification this year?
        
           | ddorian43 wrote:
           | The bottleneck in HDD in this scenario is bandwidth. What you
           | do is split & spread files as much as possible, so your HDD
           | are all serving the same amount of bandwidth. A disk doing
           | nothing is wasted potential bandwidth (unless it's turned
           | off).
        
             | cm2187 wrote:
             | But do they actively move around files to spread bandwidth
             | after the initial write? If they don't, and if I am right
             | that older files tend to be rarely accessed, I would expect
             | entire disks to become unaccessed over time.
        
               | jeffbee wrote:
               | If they allow that to happen, they are leaving a ton of
               | money on the table. It's typical in the industry to move
               | hot and cold files around to take advantage of the IOPS
               | you already paid for. See, for example, pages 22-23 of
               | http://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-
               | Google-Ke...
        
         | einpoklum wrote:
         | > If every drive type, new and old, big and small, did better
         | this year, maybe they changed something in their environment
         | this year?
         | 
         | It can also be the case that newer drives this year are better
         | than newer drives last year, while older drives are over a
         | "hill" in the failure statistics, e.g. it could be the case
         | that there are more 1st-year failures than 2nd-year failures
         | (for a fixed number of drives starting the year).
        
         | numpad0 wrote:
         | What about air quality? There are actually air filters in a
         | form of a patch or a package similar to miniature mustard
         | packets through which drives breathe. Supposedly those are
         | super fine filters but toxic gas molecules might still pass
         | through them.
        
           | brianwski wrote:
           | > There are actually air filters ... through which drives
           | breathe
           | 
           | Although the helium drives are more sealed up, which also
           | might be a factor?
        
         | codezero wrote:
         | I was wondering if the state of the world in 2020 might have
         | dramatically changed their business / throughput / access
         | patterns in a meaningful enough way to cause this dip.
         | 
         | I'm not sure if they have a measure of the disk utilization or
         | read/write load along with the failure rate.
        
           | brianwski wrote:
           | Disclaimer: I work at Backblaze, but mostly on the client
           | that runs on desktops and laptops.
           | 
           | > I'm not sure if Backblaze has a measure of the disk
           | utilization or read/write load along with the failure rate.
           | 
           | We publish the complete hard drive SMART stats for anybody to
           | attempt these analysis. Most of us in Backblaze engineering
           | get giddy with excitement when a new article comes out that
           | looks at correlating SMART stats and failures. :-) For
           | example, this article circulated widely at Backblaze a few
           | days ago: https://datto.engineering/post/predicting-hard-
           | drive-failure...
        
             | codezero wrote:
             | I anticipate these reports every year and have strong trust
             | in the data - I want to make that clear - Backblaze has
             | done a massive service to the entire industry by collecting
             | and aggregating this kind of data.
             | 
             | I'm really super curious about the dip in errors over the
             | past year :)
        
               | rootusrootus wrote:
               | Whether intentional or not, it's also great word-of-mouth
               | advertising. My preexisting experience with Backblaze's
               | hard drive stats reporting definitely worked positively
               | in their favor when I was looking for a new backup
               | service.
        
             | willis936 wrote:
             | There are other interesting factors to look for as well.
             | Temperature, moisture, electrical noise on the power rails,
             | infrasound, etc.
        
             | matmatmatmat wrote:
             | Hi Brian, just a note of thanks to you and Backblaze for
             | publishing these data. I always refer to them before a
             | purchase and they're really helpful.
        
         | jeffbee wrote:
         | That seems like a misleading aggregation. Their total AFR can
         | have been affected just by mix shift from early death to mid-
         | life. It looks that way to me from their tables.
        
         | alinspired wrote:
         | perhaps margin of error should be raised to accommodate this
         | change of about 1%, although the set of drives under test is
         | likely not the same between years
        
         | andruby wrote:
         | I guess these Hard Drive Stats post cover disks used for their
         | B2 service as well? Maybe the service mix is changing (a larger
         | percentage being used for B2 versus their traditional backup
         | service).
         | 
         | I'm not sure how more B2 access pattern would improve the stat
         | though.
        
           | brianwski wrote:
           | Disclaimer: I work at Backblaze.
           | 
           | > I guess these Hard Drive Stats post cover disks used for
           | their B2 service as well?
           | 
           | Yes. The storage layer is storing both Backblaze Personal
           | Backup files and B2 files. It's COMPLETELY interleaved, every
           | other file might be one or the other. Same storage. And we
           | are reporting the failure rates of drives in that storage
           | layer.
           | 
           | We THINK (but don't know for certain) that the access
           | patterns are relatively similar. For example, many of the 3rd
           | party integrations that store files in B2 are backup programs
           | and those will definitely have similar access patterns.
           | However, B2 is used in some profoundly different
           | applications, like the origin store for a Cloudflare fronted
           | website. So that implies more "reads" than the average
           | backup, and that could be changing the profile over time as
           | that part of our business grows.
        
       | thom wrote:
       | I have always loved these posts, they paint a picture of smart
       | people with good processes in place. It's confidence-building.
       | Unfortunately we were evaluating Backblaze around the time they
       | went down for a weekend and didn't even update their status page,
       | which was a bit of a blow to that confidence.
        
         | atYevP wrote:
         | Yev from Backblaze here -> sorry about that Thom - it's one of
         | the things we're definitely working on hammering down. Right
         | now we're growing quite a bit on our engineer and
         | infrastructure teams and one of the projects we'd like to see
         | is more automated status updates. We typically will throw
         | updates onto Twitter or our blog if it's a large outage - or
         | affecting many different people, but totally recognize that
         | process can use some sprucing.
        
         | csnover wrote:
         | Their marketing blog does a great job of painting them as smart
         | people with good processes. Sadly, I learned the hard way (by
         | trialling their software and immediately discovering a handful
         | of dumb bugs that should've been caught by QA, plus serious
         | security problems[0], and some OSS licence violations[1]) that
         | it seems to not actually be the case. This situation where they
         | continue to pump out blog posts about hard drive stats, yet
         | don't even _have_ a status page for reporting outages, is
         | another example of their marketing-driven approach to
         | development.
         | 
         | I have mentioned this on HN a couple times now[2][3], including
         | again just yesterday. I really dislike doing this because I
         | feel like I am piling on--but as much as I hate it, I feel even
         | more strongly that people deserve to be informed about these
         | serious ongoing failures at Backblaze so that they can make
         | more informed choices about what storage/backup provider to
         | use. I also genuinely hope that I can incentivise them to
         | actually start following software development best practices,
         | since they do provide a valuable service and I'd like them to
         | succeed. If they keep doing what they're doing now, I
         | absolutely expect to see a massive data breach and/or data loss
         | event at some point, since they are clearly unwilling or unable
         | to properly handle security or user privacy today--and I've
         | gotten the impression over time that some prominent people at
         | the company think these criticisms are invalid and they need
         | not make any substantive changes to their processes.
         | 
         | [0] https://twitter.com/zetafleet/status/1304664097989054464
         | 
         | [1] https://twitter.com/bagder/status/1215311286814281728
         | 
         | [2] https://news.ycombinator.com/item?id=25899802
         | 
         | [3] https://news.ycombinator.com/item?id=24839757
        
           | brianwski wrote:
           | Disclaimer: I work at Backblaze.
           | 
           | > as much as I hate it I'm following Backblaze around and
           | posting incorrect information about them
           | 
           | I get the impression Backblaze did something to upset you.
           | Can you let me know what it is so I can try to fix it?
           | 
           | If there wasn't a pandemic on I would invite you to come to
           | our office and I could buy you lunch and I could try to make
           | up for whatever we did to upset you.
        
           | atYevP wrote:
           | Yev from Backblaze here -> rest assured that we do read what
           | you're writing on these posts and they've spurred some
           | internal process discussions. I believe the bugs you
           | mentioned were cleared/fixed with version 7.0.0.439 which was
           | released in Q1 of 2020. We did leave HackerOne and switched
           | over to BugCrowd to handle our bug program. It's private at
           | the moment, but easy enough to get invited (by emailing
           | bounty@backblaze.com). While we spin that program up (it's a
           | new vendor for us) we may stay private, but hopefully that's
           | not a permanent state.
           | 
           | Edit -> I just noticed the Daniel Stenberg libcurl citation.
           | Oof, yea that certainly a whiff on our end. Luckily though we
           | were able to make up for it (he has a write-up here:
           | https://daniel.haxx.se/blog/2020/01/14/backblazed/).
        
         | andruby wrote:
         | I hadn't heard about this downtime. When did that happen?
        
           | thom wrote:
           | This was the HN thread:
           | 
           | https://news.ycombinator.com/item?id=25147951
           | 
           | And I must apologise, I was wrong when I said they didn't
           | update their status page during the outage. They don't have a
           | status page.
        
       | i5heu wrote:
       | It is always amazing to me how cheap, small and reliable storage
       | has become.
        
         | ramraj07 wrote:
         | Au contraire I feel like drive reliability has gone _down_...
         | Especially for consumers - the big difference between
         | Blackblaze and regular users is that they have their disks
         | spinning continuously, and the reliability numbers seem to only
         | apply in that scenario. If you switch off and store a drive my
         | experience is that after a year it's very high probability it
         | won't switch on again. This is a big problem in academic labs
         | where grad students generate terabytes of data and professors
         | and departments are too stingy to provide managed storage
         | services in that scale so it all sits in degrading drives in
         | some drawer in the lab.
        
           | mmsimanga wrote:
           | This. I bought a hard drive docking station and the idea was
           | to go through all my 6 hard drives from the past 10 years
           | which I haven't used. Only the laptop drives worked.
        
             | guenthert wrote:
             | Is the docking station supplied with enough current? 3.5"
             | drives tend to take considerably more power (and use 12V
             | for their motor) particularly when spinning up. I'd give
             | those drives another try in a different docking station or
             | connect them directly to a PC, RAID or NAS device.
        
           | lostlogin wrote:
           | My experience of data storage among academics has been
           | disturbing. Masses and masses of work is stored on USB sticks
           | and laptops. Hundreds of hours of work, maybe even thousands
           | of hours and no backups. I've hit it multiple times and it
           | blows my mind each time.
           | 
           | Yes, buying a basic backup solution is going to set you back
           | a few hundred dollars minimum (or not, if you go for BB or
           | similar) but it seems like a basic minimum.
           | 
           | I don't know how you change the culture but it's bad among
           | those I have worked along side.
           | 
           | I haven't bought large drives in years and recently started
           | doing so. I have been really impressed with how good they are
           | and how well they perform in an always-on NAS. I'm so
           | impressed with the Synology I got and can't speak highly
           | enough of it. I just wish I'd bought one with more bays.
        
           | omgwtfbyobbq wrote:
           | My experience has been the opposite. Every HDD drive I have
           | works. The new ones are fine, even if I let them sit for
           | months, as are the old ones after years.
        
       | patentatt wrote:
       | Apologies for being slightly off-topic, but presenting a table of
       | text as a image is annoying to me. A table of text ought to be
       | rendered in just plain old html in my old-school opinion.
        
         | qwertox wrote:
         | I agree.
         | 
         | I was reading the Q3 2020 stats yesterday because I'm looking
         | for a new drive.
         | 
         | It was somewhat annoying to have to type the HDD-model into the
         | Google search bar instead of just double-clicking and selecting
         | search from the context menu. It irritated me that it was an
         | image.
        
         | andy4blaze wrote:
         | Andy from Backblaze here. Actually you can download a
         | spreadsheet with all the data from the tables. There's a link
         | at the end of the post. Better than parsing HTML for the data.
        
           | JxLS-cpgbe0 wrote:
           | The spreadsheets are identical to the screenshots you took of
           | them. The images aren't responsive, aren't available to
           | assistive technologies, provide no text alternative, do not
           | respect a user's text settings, cannot be translated etc. Why
           | is that better than HTML?
        
       ___________________________________________________________________
       (page generated 2021-01-26 23:01 UTC)