[HN Gopher] Backblaze Hard Drive Stats for 2020
___________________________________________________________________
Backblaze Hard Drive Stats for 2020
Author : TangerineDream
Score : 279 points
Date : 2021-01-26 16:12 UTC (6 hours ago)
(HTM) web link (www.backblaze.com)
(TXT) w3m dump (www.backblaze.com)
| peter_d_sherman wrote:
| This brings up a good related question, which is:
|
| _What is the most reliable hard drive of all time, and why?_
|
| In other words, let's say I didn't care about capacity, and I
| didn't care about access time or data transfer rate, and I'd like
| a drive to store data 100, 200+ years into the future and have it
| have a reasonable chance of getting that data there -- then what
| kind of hard drive should I choose, and why?
|
| It's a purely philosophical question...
|
| Perhaps I, or someone else, should ask this question on Ask HN...
| I think the responses would be interesting...
| lousken wrote:
| With such a high capacity drive I wonder if they changed anything
| in terms of redundancy - something like zfs resilvering prob
| takes days on these
| andy4blaze wrote:
| Andy from Backblaze here. Larger drives do take longer to
| rebuild, but to date we haven't changed the encoding algorithms
| we built. There are other strategies like cloning which can
| reduce rebuild time. We can also prioritize rebuilds or drop a
| drive into read-only mode as needed. The system was built
| expecting drive failures.
| _nickwhite wrote:
| If a Backblaze engineer (or Andy) is reading this, could you
| comment on environmental temps and vibration you guys keep the
| disks at? Thanks.
| andy4blaze wrote:
| Andy at Backblaze here. All the drives are in data centers with
| temps around the 75-78 degree mark. Vibrations are kept to a
| minimum via the chassis design. We publish the data, including
| the SMART stats for all of the drives and there are attributes
| for temperature (SMART 194) and vibration (multiple) see
| https://en.wikipedia.org/wiki/S.M.A.R.T. for more info in SMART
| attributes.
| _nickwhite wrote:
| Thanks Andy! Without me digging through the SMART data, has
| there been any difference in data center temps over the years
| for Backblaze? I ask because I've personally seen disk life
| vary greatly between "warm" datacenters (closer to 79F) and
| "cool" datacenters (closer to 72F). I don't have a huge
| dataset, only anecdotal evidence, but it seems to me
| temperature plays a pretty big role in drive longevity. Have
| you guys found the same, or is this a variable not
| controllable by Backblaze?
| busterarm wrote:
| I know that they're not always the most available, but I find it
| interesting that they don't purchase _any_ Fujitsu drives.
|
| After an experiment with their helium-filled drives, I've gone
| all-in to great success.
| dsr_ wrote:
| Fujitsu sold their disk drive business to Toshiba in 2009,
| didn't they?
| numpad0 wrote:
| They still make drives!? I've seen relics like 147GB 15K SAS
| marked "FUJITSU LIMITED" but don't think they make 14TB 7K2
| SATA that Backblaze uses.
| busterarm wrote:
| derp, see other reply.
| busterarm wrote:
| Stupid me, wrong company. I was thinking Toshiba.
|
| I bought Toshiba 16TB drives -- MG08ACA16TE
|
| And hey! 0.0% failure rate!
| zepearl wrote:
| Bought 8 Toshiba N300 (HDWN180, 8TB) consumer HDDs, maybe
| about one year ago, and they're all still working.
| numpad0 wrote:
| The stats looks stable and consistent with common knowledge --
| HGST > WD >> Seagate, Toshiba inconclusive.
|
| Do anyone have anecdata on Toshiba MD/MG/MN 7K2 drives(excluding
| DT because those are HGST OEMs)? They are always less price
| competitive and thus always light on real-world stories though
| they seem comparably reliable as HGST.
| dbalan wrote:
| HGST is sadly just WD now.
| hinkley wrote:
| And Audi is just Volkswagen. Except it isn't (Audi tech shows
| up on VW when it reaches economies of scale, eg DSG gearboxes
| ~12 years ago)
|
| Is HGST still a separate department or "just" their luxury
| brand now?
| freeone3000 wrote:
| It's still a separate factory making separate drives. This
| line even uses a different storage controller. But this is
| also true for luxury ranges, in general, so you may be
| asking for too fine of a distinction. (Their usual luxury
| range is the WD Red, however.)
| robhu wrote:
| A luxury range where they occasionally sneak shingled
| drives in without telling you
| https://blocksandfiles.com/2020/04/14/wd-red-nas-drives-
| shin...
| hinkley wrote:
| Which is why I asked. That sounds like the sort of thing
| that happens when it's just a label instead of a
| division.
| ksec wrote:
| I dont think that is the case, as in they are not mixed up in
| production and sold with different Brand.
|
| They try to get rid of the HGST brand but failed. And had to
| go back to HGST branding specifically for HDD coming from the
| acquired HGST Factory.
|
| i.e AFAIK HGST is still HGST.
| nolok wrote:
| Following Blackblaze report on them a few years ago "they
| appear to be great, but we buy in bulk and there is not enough
| volume of them there for us", I decided to use Toshiba drives
| almost exclusively, as a sort of fun experiment.
|
| Drives deployed below 400, usage exclusively NAS (raid 1,
| raid10 and raid 6) inside below 50 employees companies. They
| appear to be insanely reliable and high performers, to the
| point the +15% price tax for them in french store seems highly
| justified for me.
|
| Purely anecdotal results, of course.
| R0b0t1 wrote:
| Unsure about the >> Seagate in there, that info is over a a
| decade old now(?). It's worth pointing out they have a 12%
| failure rate on their highest density units but the other ones
| seem to do well outside of a DC environment.
| neogodless wrote:
| > Over the last year or so, we moved from using hard drives to
| SSDs as boot drives. We have a little over 1,200 SSDs acting as
| boot drives today. We are validating the SMART and failure data
| we are collecting on these SSD boot drives. We'll keep you posted
| if we have anything worth publishing.
|
| Would love to see SSD stats like this in the future. Recently was
| talking to some friends about what SSD to buy. I personally
| really like my HP EX950 - one friend said he'd never buy HP
| hardware. He said he was getting an Intel - I said I had an early
| Intel SSD fail on me, and I don't think QLC is the best option,
| but it is a nice value play. For performance, I do like Samsung,
| though they are expensive. Another friend said he'd never buy a
| Samsung SSD, as he had a reliability issue, and found lots of
| similar stories when he was researching it.
|
| Of course these are all anecdotes and they aren't useful in
| making an informed choice. I suspect most SSDs are reliable
| "enough" for most consumer use, and not nearly reliable enough
| for certain kinds of critical storage needs. But it would still
| be nice to see the big picture, and be able to factor that into
| your SSD purchase decisions.
| Merman_Mike wrote:
| I'm planning a few encrypted long-term backups (ie. stick it in a
| temperature controlled safe for a few years).
|
| What's the best medium for this? SSD or HDD?
| louwrentius wrote:
| Tape. But that's probably unrealistic.
|
| Otherwise HDD.
|
| Archiving is not a one-time thing, but a process.
|
| If you care about the data, you should periodically check the
| media and at some point replace the media as they age.
| stevezsa8 wrote:
| This is what I'd recommend.
|
| 1. Backup to external SSD or NAS. This is the backup you will
| rely on if your PC loses all data. It will be fast to replicate
| to.
|
| 2. Mirror the external backup to a second external SSD. And
| sync it every week or month. Sync more often if your data is
| changing a lot.
|
| 3. The third layer is an external HDD mirror for the long term
| off-site backups. HDD are cheaper and more suited for being
| switched off long term.
|
| 4. If you can afford the expense of a forth step, every year
| buy another external HDD and put the previous one aside as an
| archive to be brought into service if the current one fails to
| boot.
|
| I recommend separating your data into some short of hierarchy
| and choose what needs to be backed up to what level. So if you
| have some software ISOs that you could repurchase/redownload,
| then have a separate drive for junk like that and don't have it
| go all the way through the backup steps listed above.
| Spooky23 wrote:
| Figure out what you really need and print it on good paper. Put
| that in a safe place, away from direct light and dampness.
|
| Save the rest on two of Google Drive, OneDrive, iCloud, some
| other cloud storage, a backup service or copy to a computer in
| your home. Make your selection based on things that you will
| "touch" in some way at least every 12-24 months. Everything
| else will fail in a few years.
|
| Don't save crap you don't need. Don't futz around with optical
| media, tape or other nonsense. Don't buy safes or safe deposit
| boxes unless that's going to be part of your routine in some
| way.
| ghaff wrote:
| >Don't save crap you don't need.
|
| I tend to agree with this although it can be hard to
| determine what you won't want/need in advance and it probably
| takes at least some effort to winnow things down.
|
| That said, I'm in the middle of going through my photos right
| now and deleting a bunch of stuff. (Which is a big job.) It's
| not so much for the storage space as I'll "only" be deleting
| a few hundred GB. But it's a lot easier to look for stuff and
| manage it when you don't have reams of near-identical or just
| lousy pics. One of my takeaways from this exercise is that I
| should really be better at pruning when I ingest a new batch.
| WrtCdEvrydy wrote:
| I'd argue SSD here... those memory chips should be good for a
| few years.
| kiririn wrote:
| More like 1 year at best
|
| Modern SSDs not only sacrifice endurance and sustained
| performance, they also sacrifice power off data retention
| Aardwolf wrote:
| How long is a few years? What would be a good recommendation
| for decades? Time goes fast!
| ineedasername wrote:
| A few years isn't archival quality. An HDD will last longer
| and is cheaper, and speed is much less of an issue for a
| drive that will be written to and then chucked in a safe.
| cm2187 wrote:
| I'd suggest to save the encryption software along the drive
| (unencrypted)!
|
| Sounds like a good fit for SMR archive HDD.
| theandrewbailey wrote:
| Separately storing the encryption software isn't needed if
| you use LUKS.
| Proven wrote:
| Amazon Glacier
| ineedasername wrote:
| SSD's are not as bad as they used to be, but still not rated
| for long term unpowered storage. HDD would be better for that.
|
| But HDD isn't your only other option. How important is the
| data, How often will you need to access it, and will you need
| to rewrite to the storage medium? You might want to consider
| Blu Ray. Or both, stored in different locations. Also look into
| LTO tape drives. LTO 6 drives should be cheaper than 7/8
| (though still not cheap) and have a capacity around 6TB.
| gruez wrote:
| >Also look into LTO tape drives. LTO 6 drives should be
| cheaper than 7/8 (though still not cheap) and have a capacity
| around 6TB.
|
| AFAIK a post on /r/datahoarders says that the breakeven point
| for tapes vs shucked hard drives from a pure storage
| perspective is around 50TB. Given the hassle associated with
| dealing with tapes, it's probably only really worth it if you
| have 100+TB of data to store.
| klodolph wrote:
| I can vouch for the 50TB figure, it's around there.
|
| The amount of hassle depends on your workflow. If you
| create a backup every day and then bring the media off-
| site, tape is easier. Easy enough to put a tape in your
| drive, make the backup, and eject. Tape is not sensitive to
| shock and you can just chuck the tapes in your care or
| shove them in your backpack.
| bch wrote:
| > Tape is not sensitive to shock and you can just chuck
| the tapes in your car
|
| Apocryphal story from university - somebody did this and
| reckons electro-magnetic leakage from their heated seats
| wrecked their info
| klodolph wrote:
| Modern media is much more resistant to this kind of
| stuff.
| dehrmann wrote:
| What do you think the availability of LTO 6 drives will be
| in 10 years? The major benefit of SATA, and even Bluray, is
| the interface and drive will likely still exist in 10
| years.
| _jal wrote:
| Given that you can buy LTO-1 (commercialized in 2000)
| drives and tapes today, and given the size of the market,
| I suspect they'll be around.
| fl0wenol wrote:
| I'm still able to interface with an LTO 1 tape drive.
| It's all SCSI or SAS. Secondary markets like Ebay have
| made this surprisingly affordable (used drive, unopened
| older media).
|
| LTO is nice in that they mandate backwards compatibility
| by two revisions, which come out once every 3 years or
| so. So that gives you time to roll forward to new media
| onto a new drive without breaking the bank, and giving
| time for the secondary market to settle.
|
| Adding: This was a deliberate decision by the LTO
| Consortium; they wanted users to perceive LTO as the
| safest option for data retention standards.
| cptskippy wrote:
| LTO 6 is like 10 years old, so the availability in 10
| years will probably be limited. That being said, LTO 7
| drives are able to read LTO 6 so that might increase your
| chances.
| kiririn wrote:
| > SSD's are not as bad as they used to be
|
| Those extra bits they squeeze into QLC etc literally do make
| SSDs worse at power off retention
| comboy wrote:
| Why not b2 or glacier since you're encrypting anyway? If you
| don't have that much data then maybe M-DISC?
|
| Personally I think safe is.. unnecessary. What is it protecting
| you from when your data is encrypted? If you put it in a safe
| then you probably care enough about the data not to have it in
| a single location no matter how secure it seemingly is.
| [deleted]
| sigstoat wrote:
| > What is it protecting you from when your data is encrypted?
|
| various forms of physical damage, including fire and
| accidental crushing
|
| where do you think they ought to store their drives?
|
| a little safe that will hold easily 100TB costs $50 and can
| hold your passport and such too.
| emidln wrote:
| Ignoring for a moment how insecure most cheap locks are
| (including locks on safes), little safes are rarely
| effective vs a prybar + carrying them away to be cut into
| at the attacker's leisure. Larger safes have some of the
| same issues w.r.t. cutting, but you can make it less
| convenient for an adversary to do it (and make them spend
| more time where they might be caught).
| rmorey wrote:
| All true, but I think the threat model here really is
| fire, flooding, etc
| sigio wrote:
| The $50 safes are not fire-rated... and hardly break-in
| rated. For fire-safety you need something big, and mostly
| heavy, which will be costly (shipping/moving it alone)
| dharmab wrote:
| Honeywell and First Alert sell small fire safes for
| around $100 that actually hold up to fire and water
| damage.
|
| https://www.nytimes.com/wirecutter/reviews/best-
| fireproof-do...
|
| Break-ins are not in my threat model for a document safe.
| If they were, I'd get a deposit box at a bank. I just
| want some of my personal mementos and documents to
| survive a fire.
| parliament32 wrote:
| It'd probably be cheaper to stick it in Glacier or GC Archive
| ($0.0012/GB/month).
| robotmay wrote:
| What about tape? I suppose the cost of the drive is prohibitive
| but I was under the assumption that was used for a lot of long-
| term storage.
| derekp7 wrote:
| I would imagine previous generation tape drives (used) can be
| economical. Just need to find a reliable place that handles
| testing / refurbishing (cleaning, alignment, belts, etc) used
| drives. Also the other bit item is needing the appropriate
| controller and cabling.
| kiririn wrote:
| Tape drives are open about both their condition and the
| condition of tapes. It's all there in the scsi log pages,
| more detailed than SMART on hard drives.
|
| Mechanically and electrically, everything is rated to last
| several times longer than the head
|
| In other words, you just need to buy two used drives (one
| as spare) and verify they can write a full tape and their
| head hours and other error counters are sane. There is no
| reasonable need to refurbish a tape drive other than a head
| replacement, which is easy to do at home but so expensive
| (for older generations) that you might as well buy a new
| drive. All the testing you could hope for is done in POST
| and by LTT/equivalent (writing a tape and reading logs is
| good enough)
| klodolph wrote:
| You (more or less) just need a fiber channel card, they're
| pretty mundane otherwise.
| DanBC wrote:
| Good quality DVD, in tyvek sleeves, with copious amounts of
| PAR2 data, in multiple places.
| Hamuko wrote:
| Why tyvek sleeves in particular?
| DanBC wrote:
| It's easier to find tyvek sleeves that are sold as being
| suitable for archive purposes.
| theandrewbailey wrote:
| How long is 'a few years'? Controlled environments shouldn't be
| necessary for unplugged drives, just keep them at or slightly
| below room temperature.
|
| I've had three external hard drives for 7 years, and none have
| stopped working. I have one, and keep two somewhere else
| (office, family). I connect one for a few hours every
| week/month to update, then leave alone until needed, or rotated
| with one elsewhere.
| Merman_Mike wrote:
| I'd want to verify the existing data and maybe add some data
| once a year or less.
| mrkurt wrote:
| Probably writable blu ray.
| xellisx wrote:
| Check out M Disc: https://mdisc.com/
| Hamuko wrote:
| AFAIK, M Disc really only matters for DVDs due to their
| organic materials. (Non-LTH) BDs on the other hand have
| inorganic materials and last pretty well.
|
| I think there was a French study that compared DVDs, M
| Discs and BDs and the HTL BDs fared very well. Can't find
| the document though.
| [deleted]
| smarx007 wrote:
| I don't think MDisks were compared if that's the study.
|
| https://club.myce.com/t/french-study-on-bd-r-for-
| archival/30...
|
| https://francearchives.fr/file/5f281a39048987dcef88202816
| a5c...
| Hamuko wrote:
| I think it was a separate one.
| kamranjon wrote:
| this is the weirdest website
| toomuchtodo wrote:
| LTO tape (specifically that which is rated for 15-30 years of
| archival storage) with the drive. The tape is usually rated for
| a couple hundred full passes, which should more than meet your
| needs if you're writing once and sticking them somewhere safe.
|
| SSDs don't have this archival longevity yet, and hard drives
| are better when powered up and the data is always hot for
| scrubbing and migrating when indicators of drive failure
| present.
| einpoklum wrote:
| Don't LTO tape drives cost about a zillion dollars each?
| toomuchtodo wrote:
| I recommend acquiring them second hand (but validated) for
| personal use.
| Hamuko wrote:
| I've had the impression that just having a HDD sit around
| doesn't do it good and it might just fail when you replug it
| the next time.
| tyingq wrote:
| I'm always excited for this yearly post. Are there other vendors
| that provide this kind of insightful info for other types of
| infrastructure?
|
| Also, kudos to Backblaze. I'm sure there's some side brand
| benefit to all the work that goes into making this public, but
| it's clear it's mostly just altruism.
| louwrentius wrote:
| I wonder why Western Digital is almost absent, does anyone know
| why?
| atYevP wrote:
| Yev from Backblaze here -> We just started deploying more of
| them! We added almost 6k WDC drives in Q4 - so we're getting
| more of them in the fleet! They have a pretty low AFR - but
| haven't been deployed for too long, so they'll be interesting
| to follow!
| hinkley wrote:
| Is your drive usage really homogenous or do you have
| situations where slightly more reliable drives are
| prioritized? Like say for servers or logging.
| atYevP wrote:
| We do have some drives that we use as boot drives and for
| logging. We write about them a bit in the post - they're
| primarily SSDs, so not included in the overall mix of data
| drives!
| louwrentius wrote:
| Thanks for sharing!
| brianwski wrote:
| Disclaimer: I work at Backblaze.
|
| > I wonder why Western Digital is almost absent, does anyone
| know why?
|
| Most of the time the answer comes down to price/GByte. But it
| isn't QUITE as simple as that.
|
| Backblaze tries to optimize for total cost most of the time.
| That isn't just the cost of the drive, a drive that is twice as
| large in storage still takes the same identical amount of rack
| space and often the same electricity as the drive that is half
| the storage. This means that we have a spreadsheet and
| calculate what the total cost over a 5 year expected lifespan
| will turn out to be. So for example, even if the drive that is
| twice as large costs MORE than twice as much it can still make
| sense to purchase it.
|
| As to failure rates, Backblaze essentially doesn't care what
| the failure rate of a drive is, other than to factor that into
| the spreadsheet. If we think one particular drive fails 2% more
| of the time, we still buy it if it is 2% cheaper, make sense?
|
| So that's the answer most of the time, although Backblaze is
| always making sure we have alternatives, so we're willing to
| purchase a small number of pretty much anybody's drives of
| pretty much any size in order to "qualify" them. It means we
| run one pod of 60 of them for a month or two, then we run a
| full vault of 1,200 of that drive type for a month or two, just
| in case a good deal floats by where we can buy a few thousand
| of that type of drive. We have some confidence they will work.
| louwrentius wrote:
| Thank you for this very elaborate and detailed answer!
|
| Disclaimer: (I'm a customer)
| Hamuko wrote:
| > _As to failure rates, Backblaze essentially doesn 't care
| what the failure rate of a drive is, other than to factor
| that into the spreadsheet._
|
| Guessing shit like the ST3000DM001 is a whole different thing
| entirely.
| brianwski wrote:
| > Guessing shit like the ST3000DM001 is a whole different
| thing entirely.
|
| :-) Yeah, there are times where the failure rate can rise
| so high it threatens the data durability. The WORST is when
| failures are time correlated. Let's say the same one
| capacitor dies on a particular model of drive after
| precisely 6 months of being powered up. So everything is
| all calm and happy and smooth in operations, and then our
| world starts going sideways 1,200 drives at a time (one
| "vault" - our minimum unit of deployment).
|
| Internally we've talked some about staggering drive models
| and drive ages to make these moments less impactful. But at
| any one moment one drive model usually stands out at a good
| price point, and buying in bulk we get a little discount,
| so this hasn't come to be.
| benlivengood wrote:
| > Internally we've talked some about staggering drive
| models and drive ages to make these moments less
| impactful. But at any one moment one drive model usually
| stands out at a good price point, and buying in bulk we
| get a little discount, so this hasn't come to be.
|
| I don't know what your software architecture looks like
| right now (after reading the 2019 Vault post) but at some
| point it probably makes sense to move file shard location
| to a metadata layer to support more flexible layouts to
| work around failure domains (age, manufacturer, network
| switch, rack, power bus, physical location, etc.), reduce
| hotspot disks, and allow flexible hardware maintenance.
| Durability and reliability can be improved with two
| levels of RS codes as well; low level (M of N) codes for
| bit rot and failed drives and a higher level of (M2 of
| N2) codes across failure domains. It costs the same
| (N/M)*(N2/M2) storage as a larger (M*M2 of N*N2) code but
| you can use faster codes and larger N on the (N,M) layer
| (e.g. sse-accelerated RAID6) and slower, larger codes
| across transient failure domains under the assumption
| that you'll rarely need to reconstruct from the top-level
| parity, and any 2nd-level shards that do need to be
| reconstructed will be using data from a much larger
| number of drives than N2 to reduce hotspots. This also
| lets you rewrite lost shards immediately without physical
| drive replacement which reduces the number of parities
| required for a given durability level.
|
| This paper does something similar with product codes:
| http://pages.cs.wisc.edu/~msaxena/new/papers/hacfs-
| fast15.pd...
| hinkley wrote:
| Is it safe to say that Backblaze essentially has an O(logn)
| algorithm for labor due to drive installation and
| maintenance, so up front costs and opportunity costs due to
| capacity weigh heavier in the equation?
|
| The rest of us don't have that, so a single disk loss can
| ruin a whole Saturday. Which is why we appreciate that you
| guys post the numbers as a public service/goodwill generator.
| brianwski wrote:
| > algorithm for labor due to drive installation and
| maintenance ... the rest of us don't have that so a single
| disk loss can ruin Saturday
|
| TOTALLY true. We staff our datacenters with our own
| datacenter technicians (Backblaze employees) 7 days a week.
| When they arrive in the morning the first thing they do is
| replace any drives that failed during the night. The last
| thing they do before going home is replacing the drives
| that failed during the day so the fleet is "whole".
|
| Backblaze currently runs at 17 + 3. 17 data drives with 3
| calculated parity drives, so we can lose ANY THREE drives
| out of a "tome" of 20 drives. Each of the 20 drives in one
| tome is in a different rack in the datacenter. You can read
| a little more about that in this blog post:
| https://www.backblaze.com/blog/vault-cloud-storage-
| architect...
|
| So if 1 drive fails at night in one 20 drive tome we don't
| wake anybody up, and it's business as usual. That's totally
| normal, and the drive is replaced at around 8am. However,
| if 2 drives fail in one tome pagers start going off and
| employees wake up and start driving towards the datacenter
| to replace the drives. With 2 drives down we ALSO
| automatically stop writing new data to that particular tome
| (but customers can still read files from that tome),
| because we have notice less drive activity can lighten
| failure rates. In the VERY unusual situation that 3 drives
| are down in one tome every single tech ops and datacenter
| tech and engineer at Backblaze is awake and working on THAT
| problem until the tome comes back from the brink. We do NOT
| like being in that position. In that situation we turn off
| all "cleanup jobs" on that vault to lighten load. The
| cleanup jobs are the things that are running around
| deleting files that customers no longer need, like if they
| age out due to lifecycle rules, etc.
|
| The only exceptions to our datacenters having dedicated
| staff working 7 days a week are if a particular datacenter
| is small or just coming online. In that case we lean on
| "remote hands" to replace drives on weekends. That's more
| expensive per drive, but it isn't worth employing
| datacenter technicians that are just hanging out all day
| Saturday and Sunday bored out of their minds - instead we
| just pay the bill for remote hands.
| quyse wrote:
| Is it actually required to have employees wake up and
| replace that specific failed drive to restore full
| capacity of a tome? I would expect an automatic process -
| disable and remove failed drive completely from a tome,
| add SOME free reserve drive at SOME rack in the
| datacenter to the tome, and start populating it
| immediately. And originally failed drive can be replaced
| afterwards without hurry.
| hinkley wrote:
| > because we have notice less drive activity can lighten
| failure rates
|
| It's a rite of passage to experience a second drive
| failure during RAID rebuild/ZFS resilvering.
|
| I got to experience this when I built a Synology box
| using drives I had around and ordering new ones.
|
| One of the old drives ate itself and I had to start over.
| Then I did the math on how long the last drive was going
| to take, realized that since it was only 5% full it was
| going to be faster to kill the array and start over a 3rd
| time. Plus less wear and tear on the drives.
| [deleted]
| Hamuko wrote:
| Aren't all of the HGSTs essentially WDs?
| louwrentius wrote:
| Yes, but does that explain why there are almost no WD branded
| drives?
| numpad0 wrote:
| IIRC, WD > 6TB are all 7K2 and HGST. HGST itself is "a WD
| company" but for antitrust reasons the corporates are
| separate.
| syshum wrote:
| HGST has a brand was phased out years go, and the company
| was not separate for Anti-Trust reason
|
| WD had to sell and license some HGST assets for Anti-Trust
| reasons, but there was never a requirement for them to be a
| seperate company
|
| HGST Websites and other things today just redirect to WD,
| there is no new HGST branded things being made as far as I
| am aware
| wellthisisgreat wrote:
| Since this is a thread about HDDs, can someone recommend a quiet
| HDD? Or is it a pipe dream and one should stick to SSDs?
| glenneroo wrote:
| Since moving to Fractal Design Define (i.e. soundproofed)
| version 4/5/6/7 cases over the past ~10 years, the only noise I
| ever hear anymore is from the fans, primarily GPU and case fans
| when running heavier jobs. In my main system I have 6 spinning
| rust drives (8-16TB from various manufacturers) running 24/7
| and I never hear them, even during heavy read/writes... and I
| often sleep on the couch nearby ;)
| lostlogin wrote:
| I can do the opposite - 16tb Seagate exos drives are very loud.
| Great drives, but horrible noise.
|
| If I were you I'd be looking at slow ones, 5400 WD drives.
| jedberg wrote:
| > The AFR for 2020 dropped below 1% down to 0.93%. In 2019, it
| stood at 1.89%. That's over a 50% drop year over year... In other
| words, whether a drive was old or new, or big or small, they
| performed well in our environment in 2020.
|
| If every drive type, new and old, big and small, did better this
| year, maybe they changed something in their environment this
| year? Better cooling, different access patterns, etc.
|
| If this change doesn't have an obvious root cause, I'd be
| interested in finding out what it is if I were Backblaze. It
| could be something they could optimize around even more.
| cm2187 wrote:
| Or perhaps as time passes, a greater portion of their storage
| is rarely accessed archives so more disks in % are sitting
| doing nothing.
| codezero wrote:
| I'm just assuming that folks doing archival storage aren't
| using these kinds of spiny disks as it would be super
| expensive compared to other mediums, right?
|
| I do think access patterns in general should contribute to
| the numbers so that kind of thing can be determined.
| freeone3000 wrote:
| Compared to what, exactly? Tape is cheaper per GB, but the
| drives and libraries tip that over the other way. Blu-Ray
| discs are now more expensive per GB than hard drives,
| thanks to SMR and He offerings.
|
| Also note that Backblaze does _backups_ -- by definition,
| these are infrequently accessed, usually write-once-read-
| never. I 've personally been a customer for three years and
| performed a restore exactly once.
| guenthert wrote:
| Despite claims to the contrary, tape isn't dead just yet.
| They are still _considerably_ cheaper than drives. An
| LTO-8 tape (12TB uncompressed capacity) can be had for
| about $100, while a 12TB HDD goes for some $300. Tape
| drives /libraries are quite expensive though, but that
| just shifts the break-even point out. For the largest
| sites, its still economical. Not sure, if backblaze is
| big enough (I'm sure they did their numbers). backglacier
| anyone?
| birdman3131 wrote:
| I bought a pair of 12tb drives for $199 the other day and
| they often go cheaper. Now admittedly if you shuck
| externals you loose warranty but we are keeping them in
| the enclosures as these are for backups and thus the ease
| of taking them off site is great for us.
| R0b0t1 wrote:
| They claim you lose the warranty but are wrong, they
| still have to prove you damaged it. Federal law: https://
| en.wikipedia.org/wiki/Magnuson%E2%80%93Moss_Warranty...
| StillBored wrote:
| And a number of the library vendors' libraries last for
| decades with only drive/tape swaps along the way. The
| SL8500 is on its second decade of sales for example.
| Usually what kills them is the vendor deciding not to
| release firmware updates to support the newer drives. The
| stock half inch cartridge form factor dates from 1984
| with DLT & 3480. Given there have been libraries with
| grippers capable of moving a wide assortment of
| DLT/LTO/TXX/etc cartridges at the same time. Its doubtful
| if that will change anytime in the future. So if you buy
| one of the big libraries today it will likely last
| another decade or two, maybe three. There aren't many
| pieces of IT technology you can utilize that long.
| codezero wrote:
| I was specifically thinking of the SKUs - I assumed they
| were using faster disks rather than high volume disks
| that make trade-offs for costs. Just assumptions on my
| part - and I am mostly curious for more data, but given
| the historical trends, I'm not terribly suspicious of the
| actual results here.
| thebean11 wrote:
| Backblaze has backup and raw storage S3 type services.
| I'm not sure what uses the majority of their disk space.
| StillBored wrote:
| Drive enclosures, raid/etc interfaces, and motherboards
| burning electricity make it a lot more complex than raw
| HD's vs raw tape. Tape libraries cost a fortune, but so
| do 10+ racks of cases+power supplies+servers needed to
| maintain the disks of equal capacity.
|
| Tape suffers from "enterprise" which means the major
| vendors price it so that its just a bit cheaper than
| disk, and they lower their prices to keep that equation
| balanced because fundamentally coated mylar/etc wrapped
| around a spindle in an injection molded case is super
| cheap.
| cm2187 wrote:
| But even if it is hot storage, do you touch all your files
| every day? It is bound that you accumulate over time more
| files that you never access.
| codezero wrote:
| Yeah hard to say, I assume the diversity of their
| customers and normal distribution should statistic those
| patterns out but I have no clue :)
|
| I also wonder if failures are relating to physical
| location on disk vs other things like controller or
| mechanical failures.
|
| You may not be reading old files but you're reading
| files, or not, so still depends also on over all
| utilization.
| hinkley wrote:
| If they've hit on a different access pattern that is more
| gentle, that might be something useful for posterity and I
| hope they dig into that possibility.
|
| There's also just the possibility that failure rates are
| bimodal and so they've hit the valley of stability.
|
| Are they tracking wall clock time or activity for their
| failure data?
| brianwski wrote:
| Disclaimer: I work at Backblaze.
|
| > If they've hit on a different access pattern that is more
| gentle, that might be something useful for posterity and I
| hope they dig into that possibility.
|
| Internally at Backblaze, we're WAY more likely to be
| spending time trying to figure out why drives (or something
| else like power supplies or power strips or any other pain
| point) is failing at a higher rate, than looking into why
| something is going well. I'm totally serious, if something
| is going "the same as always or getting better" it just
| isn't going to get much of any attention.
|
| You have to understand that with these stats, we're just
| reporting on what happened in our datacenter - the outcome
| of our operations. We don't really have much time to do
| more research and there isn't much more info than what you
| have. And if we stumbled upon something useful we would
| most likely blog about it. :-)
|
| So we read all of YOUR comments looking for the insightful
| gems. We're all in this together desperate for the same
| information.
| hinkley wrote:
| Seems to me that every drive failure causes read/write
| amplification, so a small decrease in failure rates would
| compound. Have you folks done any other work to reduce
| write amplification this year?
| ddorian43 wrote:
| The bottleneck in HDD in this scenario is bandwidth. What you
| do is split & spread files as much as possible, so your HDD
| are all serving the same amount of bandwidth. A disk doing
| nothing is wasted potential bandwidth (unless it's turned
| off).
| cm2187 wrote:
| But do they actively move around files to spread bandwidth
| after the initial write? If they don't, and if I am right
| that older files tend to be rarely accessed, I would expect
| entire disks to become unaccessed over time.
| jeffbee wrote:
| If they allow that to happen, they are leaving a ton of
| money on the table. It's typical in the industry to move
| hot and cold files around to take advantage of the IOPS
| you already paid for. See, for example, pages 22-23 of
| http://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-
| Google-Ke...
| einpoklum wrote:
| > If every drive type, new and old, big and small, did better
| this year, maybe they changed something in their environment
| this year?
|
| It can also be the case that newer drives this year are better
| than newer drives last year, while older drives are over a
| "hill" in the failure statistics, e.g. it could be the case
| that there are more 1st-year failures than 2nd-year failures
| (for a fixed number of drives starting the year).
| numpad0 wrote:
| What about air quality? There are actually air filters in a
| form of a patch or a package similar to miniature mustard
| packets through which drives breathe. Supposedly those are
| super fine filters but toxic gas molecules might still pass
| through them.
| brianwski wrote:
| > There are actually air filters ... through which drives
| breathe
|
| Although the helium drives are more sealed up, which also
| might be a factor?
| codezero wrote:
| I was wondering if the state of the world in 2020 might have
| dramatically changed their business / throughput / access
| patterns in a meaningful enough way to cause this dip.
|
| I'm not sure if they have a measure of the disk utilization or
| read/write load along with the failure rate.
| brianwski wrote:
| Disclaimer: I work at Backblaze, but mostly on the client
| that runs on desktops and laptops.
|
| > I'm not sure if Backblaze has a measure of the disk
| utilization or read/write load along with the failure rate.
|
| We publish the complete hard drive SMART stats for anybody to
| attempt these analysis. Most of us in Backblaze engineering
| get giddy with excitement when a new article comes out that
| looks at correlating SMART stats and failures. :-) For
| example, this article circulated widely at Backblaze a few
| days ago: https://datto.engineering/post/predicting-hard-
| drive-failure...
| codezero wrote:
| I anticipate these reports every year and have strong trust
| in the data - I want to make that clear - Backblaze has
| done a massive service to the entire industry by collecting
| and aggregating this kind of data.
|
| I'm really super curious about the dip in errors over the
| past year :)
| rootusrootus wrote:
| Whether intentional or not, it's also great word-of-mouth
| advertising. My preexisting experience with Backblaze's
| hard drive stats reporting definitely worked positively
| in their favor when I was looking for a new backup
| service.
| willis936 wrote:
| There are other interesting factors to look for as well.
| Temperature, moisture, electrical noise on the power rails,
| infrasound, etc.
| matmatmatmat wrote:
| Hi Brian, just a note of thanks to you and Backblaze for
| publishing these data. I always refer to them before a
| purchase and they're really helpful.
| jeffbee wrote:
| That seems like a misleading aggregation. Their total AFR can
| have been affected just by mix shift from early death to mid-
| life. It looks that way to me from their tables.
| alinspired wrote:
| perhaps margin of error should be raised to accommodate this
| change of about 1%, although the set of drives under test is
| likely not the same between years
| andruby wrote:
| I guess these Hard Drive Stats post cover disks used for their
| B2 service as well? Maybe the service mix is changing (a larger
| percentage being used for B2 versus their traditional backup
| service).
|
| I'm not sure how more B2 access pattern would improve the stat
| though.
| brianwski wrote:
| Disclaimer: I work at Backblaze.
|
| > I guess these Hard Drive Stats post cover disks used for
| their B2 service as well?
|
| Yes. The storage layer is storing both Backblaze Personal
| Backup files and B2 files. It's COMPLETELY interleaved, every
| other file might be one or the other. Same storage. And we
| are reporting the failure rates of drives in that storage
| layer.
|
| We THINK (but don't know for certain) that the access
| patterns are relatively similar. For example, many of the 3rd
| party integrations that store files in B2 are backup programs
| and those will definitely have similar access patterns.
| However, B2 is used in some profoundly different
| applications, like the origin store for a Cloudflare fronted
| website. So that implies more "reads" than the average
| backup, and that could be changing the profile over time as
| that part of our business grows.
| thom wrote:
| I have always loved these posts, they paint a picture of smart
| people with good processes in place. It's confidence-building.
| Unfortunately we were evaluating Backblaze around the time they
| went down for a weekend and didn't even update their status page,
| which was a bit of a blow to that confidence.
| atYevP wrote:
| Yev from Backblaze here -> sorry about that Thom - it's one of
| the things we're definitely working on hammering down. Right
| now we're growing quite a bit on our engineer and
| infrastructure teams and one of the projects we'd like to see
| is more automated status updates. We typically will throw
| updates onto Twitter or our blog if it's a large outage - or
| affecting many different people, but totally recognize that
| process can use some sprucing.
| csnover wrote:
| Their marketing blog does a great job of painting them as smart
| people with good processes. Sadly, I learned the hard way (by
| trialling their software and immediately discovering a handful
| of dumb bugs that should've been caught by QA, plus serious
| security problems[0], and some OSS licence violations[1]) that
| it seems to not actually be the case. This situation where they
| continue to pump out blog posts about hard drive stats, yet
| don't even _have_ a status page for reporting outages, is
| another example of their marketing-driven approach to
| development.
|
| I have mentioned this on HN a couple times now[2][3], including
| again just yesterday. I really dislike doing this because I
| feel like I am piling on--but as much as I hate it, I feel even
| more strongly that people deserve to be informed about these
| serious ongoing failures at Backblaze so that they can make
| more informed choices about what storage/backup provider to
| use. I also genuinely hope that I can incentivise them to
| actually start following software development best practices,
| since they do provide a valuable service and I'd like them to
| succeed. If they keep doing what they're doing now, I
| absolutely expect to see a massive data breach and/or data loss
| event at some point, since they are clearly unwilling or unable
| to properly handle security or user privacy today--and I've
| gotten the impression over time that some prominent people at
| the company think these criticisms are invalid and they need
| not make any substantive changes to their processes.
|
| [0] https://twitter.com/zetafleet/status/1304664097989054464
|
| [1] https://twitter.com/bagder/status/1215311286814281728
|
| [2] https://news.ycombinator.com/item?id=25899802
|
| [3] https://news.ycombinator.com/item?id=24839757
| brianwski wrote:
| Disclaimer: I work at Backblaze.
|
| > as much as I hate it I'm following Backblaze around and
| posting incorrect information about them
|
| I get the impression Backblaze did something to upset you.
| Can you let me know what it is so I can try to fix it?
|
| If there wasn't a pandemic on I would invite you to come to
| our office and I could buy you lunch and I could try to make
| up for whatever we did to upset you.
| atYevP wrote:
| Yev from Backblaze here -> rest assured that we do read what
| you're writing on these posts and they've spurred some
| internal process discussions. I believe the bugs you
| mentioned were cleared/fixed with version 7.0.0.439 which was
| released in Q1 of 2020. We did leave HackerOne and switched
| over to BugCrowd to handle our bug program. It's private at
| the moment, but easy enough to get invited (by emailing
| bounty@backblaze.com). While we spin that program up (it's a
| new vendor for us) we may stay private, but hopefully that's
| not a permanent state.
|
| Edit -> I just noticed the Daniel Stenberg libcurl citation.
| Oof, yea that certainly a whiff on our end. Luckily though we
| were able to make up for it (he has a write-up here:
| https://daniel.haxx.se/blog/2020/01/14/backblazed/).
| andruby wrote:
| I hadn't heard about this downtime. When did that happen?
| thom wrote:
| This was the HN thread:
|
| https://news.ycombinator.com/item?id=25147951
|
| And I must apologise, I was wrong when I said they didn't
| update their status page during the outage. They don't have a
| status page.
| i5heu wrote:
| It is always amazing to me how cheap, small and reliable storage
| has become.
| ramraj07 wrote:
| Au contraire I feel like drive reliability has gone _down_...
| Especially for consumers - the big difference between
| Blackblaze and regular users is that they have their disks
| spinning continuously, and the reliability numbers seem to only
| apply in that scenario. If you switch off and store a drive my
| experience is that after a year it's very high probability it
| won't switch on again. This is a big problem in academic labs
| where grad students generate terabytes of data and professors
| and departments are too stingy to provide managed storage
| services in that scale so it all sits in degrading drives in
| some drawer in the lab.
| mmsimanga wrote:
| This. I bought a hard drive docking station and the idea was
| to go through all my 6 hard drives from the past 10 years
| which I haven't used. Only the laptop drives worked.
| guenthert wrote:
| Is the docking station supplied with enough current? 3.5"
| drives tend to take considerably more power (and use 12V
| for their motor) particularly when spinning up. I'd give
| those drives another try in a different docking station or
| connect them directly to a PC, RAID or NAS device.
| lostlogin wrote:
| My experience of data storage among academics has been
| disturbing. Masses and masses of work is stored on USB sticks
| and laptops. Hundreds of hours of work, maybe even thousands
| of hours and no backups. I've hit it multiple times and it
| blows my mind each time.
|
| Yes, buying a basic backup solution is going to set you back
| a few hundred dollars minimum (or not, if you go for BB or
| similar) but it seems like a basic minimum.
|
| I don't know how you change the culture but it's bad among
| those I have worked along side.
|
| I haven't bought large drives in years and recently started
| doing so. I have been really impressed with how good they are
| and how well they perform in an always-on NAS. I'm so
| impressed with the Synology I got and can't speak highly
| enough of it. I just wish I'd bought one with more bays.
| omgwtfbyobbq wrote:
| My experience has been the opposite. Every HDD drive I have
| works. The new ones are fine, even if I let them sit for
| months, as are the old ones after years.
| patentatt wrote:
| Apologies for being slightly off-topic, but presenting a table of
| text as a image is annoying to me. A table of text ought to be
| rendered in just plain old html in my old-school opinion.
| qwertox wrote:
| I agree.
|
| I was reading the Q3 2020 stats yesterday because I'm looking
| for a new drive.
|
| It was somewhat annoying to have to type the HDD-model into the
| Google search bar instead of just double-clicking and selecting
| search from the context menu. It irritated me that it was an
| image.
| andy4blaze wrote:
| Andy from Backblaze here. Actually you can download a
| spreadsheet with all the data from the tables. There's a link
| at the end of the post. Better than parsing HTML for the data.
| JxLS-cpgbe0 wrote:
| The spreadsheets are identical to the screenshots you took of
| them. The images aren't responsive, aren't available to
| assistive technologies, provide no text alternative, do not
| respect a user's text settings, cannot be translated etc. Why
| is that better than HTML?
___________________________________________________________________
(page generated 2021-01-26 23:01 UTC)