[HN Gopher] My 71 TiB ZFS NAS After 10 Years and Zero Drive Fail...
       ___________________________________________________________________
        
       My 71 TiB ZFS NAS After 10 Years and Zero Drive Failures
        
       Author : louwrentius
       Score  : 400 points
       Date   : 2024-09-13 23:21 UTC (23 hours ago)
        
 (HTM) web link (louwrentius.com)
 (TXT) w3m dump (louwrentius.com)
        
       | ggm wrote:
       | There have been drives where power cycling was hazardous. So,
       | whilst I agree to the model, it shouldn't be assumed this is
       | always good, all the time, for all people. Some SSD need to be
       | powered periodically. The duty cycle for a NAS probably meets
       | that burden.
       | 
       | Probably good, definitely cheaper power costs. Those extra grease
       | on the axle drives were a blip in time.
       | 
       | I wonder if backblaze do a drive on-off lifetime stats model? I
       | think they are in the always on problem space.
        
         | louwrentius wrote:
         | > There have been drives where power cycling was hazardous.
         | 
         | I know about this story from 30+ years ago. It may have been
         | true then. It may be even true now.
         | 
         | Yet, in my case, I don't power cycle these drives often. At
         | most a few times a month. I can't say or prove it's a huge
         | risk. I only believe it's not. I have accepted this risk for
         | over 15+ years.
         | 
         | Update: remember that hard drives have an option to spin down
         | when idle. So hard drives can handle many spinups a day.
        
           | neilv wrote:
           | In the early '90s, some Quantum 105S hard drives had a
           | "stiction" problem, and were shipped with Sun SPARCstations.
           | 
           | IME at the time, power off a bunch of workstations, such as
           | for building electrical work, and probably at least one of
           | them wouldn't spin back up the next business day.
           | 
           | Pulling the drive sled, and administering percussive
           | maintenance against the desktop, could work.
           | 
           | https://sunmanagers.org/1992/0383.html
        
             | ghaff wrote:
             | Stiction was definitely a thing back in that general period
             | when you'd sometimes knock a drive to get it back to
             | working again.
        
           | ggm wrote:
           | I debated posting because it felt like shitstirring. I think
           | overwhelmingly what you're doing is right. And if a remote
           | power on eg WOL works on the device, so much the better. If I
           | could wish for one thing, it's mods to code or documentation
           | of how to handle drive power down on zfs. The rumour mill is
           | zfs doesn't like spindown.
        
             | louwrentius wrote:
             | I did try using HDD spindown on ZFS but I remember (It's a
             | long time ago) that I encountered too many vague errors
             | that scared me and I just disabled spindown all together.
        
             | Kirby64 wrote:
             | What is there to handle? I have a ZFS array that works just
             | fine with hard drives that automatically spin down. ZFS
             | handles this without an issue.
             | 
             | The main gotchas tend to be: if you use the array for many
             | things, especially stuff that throws off log files, you
             | will constantly be accessing that array and resetting the
             | spin down timers. Or you might be just at the threshold for
             | spindown and you'll put a ton of cycles on it as it bounces
             | from spindown to access to spin up.
             | 
             | For a static file server (rarely accessed backups or
             | media), partitioned correctly, it works great.
        
         | bluedino wrote:
         | Long ago I had a client who could have been an episode of "IT
         | Nightmares".
         | 
         | They used internal 3.5" hard drives along with USB docks to
         | backup a couple Synology devices...It seemed like 1/10 times
         | when you put a drive back in the dock to restore a file or make
         | another backup, the drive wouldn't power back up.
        
       | 2OEH8eoCRo0 wrote:
       | > The 4 TB HGST drives have roughly 6000 hours on them after ten
       | years.
       | 
       | So they mostly sit idle? Mine are about ~4 years old with ~35,000
       | hours.
        
         | louwrentius wrote:
         | To quote myself from a few lines below that fragment:
         | 
         | > My NAS is turned off by default. I only turn it on (remotely)
         | when I need to use it.
        
           | markoman wrote:
           | I loved the idea around the 10Gb NICs but wondered what model
           | switch are you connecting to? And what throughput is this
           | topology delivering?
        
             | louwrentius wrote:
             | I don't have a 10Gb switch. I connect this server directly
             | to two other machines as they all have 2 x 10Gbit. This NAS
             | can saturate 10Gbit but the other side can't, so I'm stuck
             | at 500-700 MB/s, I haven't measured it in a while.
        
       | russfink wrote:
       | What does one do with all this storage?
        
         | flounder3 wrote:
         | This is a drop in the bucket for photographers, videographers,
         | and general backups of RAW / high resolution videos from mobile
         | devices. 80TB [usable] was "just enough" for my household in
         | 2016.
        
           | patchymcnoodles wrote:
           | Exactly that. I'm not even shooting in ProRes or similar
           | "raw" video. But one video project easily takes 3TB. And I'm
           | not even a professional.
        
             | jiggawatts wrote:
             | Holy cow, what are you shooting with!?
             | 
             | I have a Nikon Z8 that can output up to 8.3K @ 60 fps raw
             | video, and my biggest project is just 1 TB! Most are on the
             | order of 20 GB, if that.
        
               | patchymcnoodles wrote:
               | I use a Sony a1, my videos are 4k 100fps. But on the last
               | project I also had an Insta 360 x4 shooting B-Roll. So on
               | some days that adds up a lot.
        
         | andrelaszlo wrote:
         | Especially since it's mostly turned off and it seems like the
         | author is the only user
        
         | rhcom2 wrote:
         | For me I have about 35TB and growing in Unraid for
         | Plex/Torrents/Backups/Docker
        
         | tbrownaw wrote:
         | It's still not enough to hold a local copy of sci-hub, but
         | could probably hold quite a few recorded conference talks (or
         | similar multimedia files) or a good selection of huggingface
         | models.
        
         | Nadya wrote:
         | If you prefer to own media instead of streaming it, are into
         | photography, video editing, 3D modelling, any AI-related stuff
         | (models add up) or are a digital hoarder/archivist you blow
         | through storage rather quickly. I'm sure there are some other
         | hobbies that routinely work with large file sizes.
         | 
         | Storage is cheap enough that rather than deleting 1000's of
         | photos and never be able to reclaim or look at them again I'd
         | rather buy another drive. I'd rather have a RAW of an 8 year
         | old photo that I overlooked and decide I really like and want
         | to edit/work with than a 87kb resized and compressed JPG of the
         | same file. Same for a mostly-edited 240GB video file. What if I
         | want or need to make some changes to it in the future? May as
         | well hold onto it than have to re-edit the video or re-shoot
         | the video if the original footage was also deleted.
         | 
         | Content creators have deleted their content often enough that
         | if I enjoyed a video and think future me might enjoy rewatching
         | the video - I download it rather than trust that I can still
         | watch it in the future. Sites have been taken offline
         | frequently enough that I download things. News sites keep
         | restructuring and breaking all their old article links so I
         | download the articles locally. JP artists are notorious for
         | deleting their entire accounts and restarting under a new alias
         | that I routinely archive entire Pixiv/Twitter accounts if I
         | like their art as there is no guarantee it will still be there
         | to enjoy the next day.
         | 
         | It all adds up and I'm approaching 2 million well-organized and
         | (mostly) tagged media files in my Hydrus client [0]. I have
         | many scripts to automate downloading and tagging content for
         | these purposes. I very, very rarely delete things. My most
         | frequent reason for deleting anything is "found in higher
         | quality" which conceptually isn't _really_ deleting.
         | 
         | Until storage costs become unreasonable I don't see my habits
         | changing anytime soon. On the contrary - storage keeps getting
         | cheaper and cheaper and new formats keep getting created to
         | encode data more and more efficiently.
         | 
         | [0] https://hydrusnetwork.github.io/hydrus/index.html
        
         | denkmoon wrote:
         | avoid giving disney money
        
         | complex1314 wrote:
         | Versioned datasets for machine learning.
        
       | fiddlerwoaroof wrote:
       | This sounds to me like it's just a matter of luck and not really
       | a model to be imitated.
        
         | louwrentius wrote:
         | It's likely that you are right and that I misjudged the
         | likelihood of this result being special.
         | 
         | You can still imitate the model to save money on power, but
         | your drives may not last longer, no evidence for that indeed.
        
       | turnsout wrote:
       | I've heard the exact opposite advice (keep the drives running to
       | reduce wear from power cycling).
       | 
       | Not sure what to believe, but I like having my ZFS NAS running so
       | it can regularly run scrubs and check the data. FWIW, I've run my
       | 4 drive system for 10 years with 2 drive failures in that time,
       | but they were not enterprise grade drives (WD Green).
        
         | louwrentius wrote:
         | Hard drives are often configured to spin down when idle for a
         | certain time. This can cause many spinups and spindowns per
         | day. So I don't buy this at all. But I don't have supporting
         | evidence that back up this notion.
        
           | turnsout wrote:
           | I think NAS systems in particular are often configured to
           | prevent drives from spinning down
        
           | max-ibel wrote:
           | There seems to be a huge difference between spin-down while
           | NAS is up vs shutting the whole NAS down and restart. When I
           | start my NAS, it takes a bunch of time to be back up: it
           | seems to do a lot of checking on/ syncing off the drives and
           | puts a fair amount of load on them (same is true for CPU as
           | well, just look at your CPU load right after startup).
           | 
           | OTOH, when the NAS spins up a single disk again, I haven't
           | notice any additional load. Presumably, the read operation
           | just waits until the disk is ready.
        
           | Wowfunhappy wrote:
           | > Hard drives are often configured to spin down when idle for
           | a certain time. This can cause many spinups and spindowns per
           | day.
           | 
           | I was under the impression that this _was_ , in fact, known
           | to reduce drive longevity! It is done anyway in order to save
           | power, but the reliability tradeoff is known.
           | 
           | No idea where I read that though, I thought it was "common
           | knowledge" so maybe I'm wrong.
        
         | Dalewyn wrote:
         | >Not sure what to believe
         | 
         | Keep them running.
         | 
         | Why?:
         | 
         | * The read/write heads experience literally next to no wear
         | while they are floating above the platters. They physically
         | land onto shelves or onto landing zones on the platters
         | themselves when turned off; landing and takeoff are by far the
         | most wear the heads will suffer.
         | 
         | * Following on the above, in the worst case the read/write
         | heads might be torn off during takeoff due to stiction.
         | 
         | * Bearings will last longer; they might also seize up if left
         | stationary for too long. Likewise the drive motor.
         | 
         | * The rush of current when turning on is an electrical
         | stressor, no matter how minimal.
         | 
         | The only reasons to turn your hard drives off are to save
         | power, reduce noise, or transport them.
        
           | tomxor wrote:
           | > Keep them running [...] Bearings will last longer; they
           | might also seize up if left stationary for too long. Likewise
           | the drive motor.
           | 
           | All HDD failures I've ever seen in person (5 across 3
           | decades), were bearing failures, in machine that were almost
           | always on with drives spun up. It's difficult to know for
           | sure without proper A-B comparisons, but I've never seen a
           | bearing failure in a machine where drives were spun down
           | automatically.
           | 
           | It also seems intuitive that for mechanical bearings the
           | longer they are spun up the greater the wear and the greater
           | the chance of failure.
        
             | manwe150 wrote:
             | I think I have lost half a dozen hard drives (and a couple
             | DVD-RW drives) over the decades because they sat in a box
             | for a couple years on a shelf (I recall that one recovered
             | working with a higher amperage 12V supply, but only long
             | enough to copy off most of the data)
        
               | mulmen wrote:
               | My experience with optical drives is similar to ink jet
               | printers. They work once when new then never again.
        
           | telgareith wrote:
           | Citations needed.
           | 
           | Counterpoints for each: Heads don't suffer wear when parking.
           | The armature does.
           | 
           | If the platters are not spinning fast enough, or the air
           | density is too low: the heads will crash into the sides of
           | the platters.
           | 
           | The main wear on platter bearings is vibration, it takes an
           | extremely long time for the lube to "gum up." If its still a
           | thing at all. I suspect it used to happen because they were
           | petroleum distilate lubes. So, shorter chains would
           | evaporate/sublimate leaving longer more viscous chains. Or
           | straight polymerize.
           | 
           | With fully synthetic PAO oils, and other options they won't
           | do that anymore.
           | 
           | What inrush? They're polyphase steppers. The only reason for
           | inrush is that the engineers didn't think it'd affect
           | lifetime.
           | 
           | Counter: turn your drives off, thebsaved power of 8 drives
           | being off half the day easily totals $80 a year- enough to
           | replace all but the highest capacities.
        
           | akira2501 wrote:
           | Using power creates heat. Thermal cycles are never good.
           | Heating parts up and cooling them down often reduces their
           | life.
        
           | bigiain wrote:
           | > The only reasons to turn your hard drives off are to save
           | power, reduce noise, or transport them.
           | 
           | One reason some of my drives get powered down 99+% of the
           | time is that its a way to guard against the whole network
           | getting cryptolockered. i have a weekly backup run by a
           | script that powers up a pair of raid1 usb drives, does and
           | incremental no-delete backup, then unmounts and powers them
           | back down again. Even in a busy week theyre rarely running
           | for more than an hour or two. I'd have to get unlucky enough
           | to not have the powerup script detect being cryptolockered
           | (it checks md5 hashes of a few "canary files") and powering
           | up the bav=ckup drives anyway. I figure that's a worthwhile
           | reason to spin them down weekly...
        
           | philjohn wrote:
           | Yes - although it's worth bearing in mind the number of
           | load/unload cycles a drive is rated for over its lifetime.
           | 
           | In the case of the IronWolf NAS drives in my home server,
           | that's 600,000.
           | 
           | I spin the drives down after 20 minutes of no activity, which
           | I feel is a good balance between having them be too thrashy
           | and saving energy. After 3 years I'm at about 60,000 load
           | unload cycles.
        
         | CTDOCodebases wrote:
         | I think a lot of the advice around keeping the drives running
         | is about avoiding wear caused by spin downs and startups i.e.
         | keeping the "Start Stop Cycles" low.
         | 
         | Theres a difference between spinning a drive up/down once or
         | twice a day and spinning it down every 15 minutes or less.
         | 
         | Also WD Green drives are not recommended for NAS usage. I know
         | in the past they used to park the read/write head every few
         | seconds or so which is fine if data is being accessed
         | infrequently but continuously however a server this can result
         | in continuous wear which leads to premature failure.
        
           | larusso wrote:
           | There used to be some tutorials going around to flash the
           | firmware to turn greens into reds I believe. Which simply
           | disables the head parking.
        
           | cm2187 wrote:
           | Agree. I do weekly backups, the backup NAS is only switched
           | on and off 52 times a year. After 5-6 years the disks are
           | probably close to new in term of usage vs disks that have
           | been running continuously over that same period.
           | 
           | Which leads to another strategy which is to swap the primary
           | and the backup after 5 years to get a good 10y out of the two
           | NAS.
        
         | foobarian wrote:
         | > regularly run scrubs and check the data
         | 
         | Does this include some kind of built-in hash/checksum system to
         | record e.g. md5 sums of each file and periodically test them? I
         | have a couple of big drives for family media I'd love to
         | protect with a bit more assurance than "the drive did not
         | fail".
        
           | Filligree wrote:
           | It's ZFS, so that's built-in. A scrub does precisely that.
        
           | adastra22 wrote:
           | Yes, zfs includes file-level checksums.
        
             | giantrobot wrote:
             | Block level checksums.
        
             | jclulow wrote:
             | This is not strictly accurate. ZFS records checksums of the
             | records of data that make up the file storage. If you want
             | an end to end file-level checksum (like a SHA-256 digest of
             | the contents of the file) you still need to layer that on
             | top. Which is not to say it's bad, and it's certainly
             | something I rely on a lot, but it's not quite the same!
        
           | ghostly_s wrote:
           | https://en.wikipedia.org/wiki/ZFS?wprov=sfti1#Resilvering_an.
           | ..
        
           | Cyph0n wrote:
           | Yep, ZFS reads everything in the array and validates
           | checksums. ZFS (at least on Linux) ships with scrub systemd
           | timers: https://openzfs.github.io/openzfs-
           | docs/man/master/8/zpool-sc...
        
         | bongodongobob wrote:
         | This is completely dependant on access frequency. Do you have a
         | bunch of different people accessing many files frequently? Are
         | you doing frequent backups?
         | 
         | If so then yes, keeping them spinning may help improve lifespan
         | by reducing frequent disk jerk. This is really only applicable
         | when you're at a pretty consistent high load and you're trying
         | to prevent your disks from spinning up and down every few
         | minutes or something.
         | 
         | For a homelab, you're probably wasting way more money in
         | electricity than you are saving in disk maintenance by leaving
         | your disks spin.
        
       | mvanbaak wrote:
       | the 'secret' is not that you turn them off. it's simply luck.
       | 
       | I have 4TB HGST drives running 24/7 for over a decade. ok, not 24
       | but 8, and also 0 failures. But I'm also lucky, like you. Some of
       | the people I know have several RMAs with the same drives so
       | there's that.
       | 
       | My main question is: What is it that takes 71TB but can be turned
       | off most of the time? Is this the server you store backups?
        
         | louwrentius wrote:
         | It can be luck, but with 24 drives, it feels very lucky.
         | Somebody with proper statistics knowledge can probably
         | calculate the risk with a guestimated 1% yearly failure rate
         | how likely it would be to have all 24 drives remaining.
         | 
         | And remember, my previous NAS with 20 drives also didn't have
         | any failures. So N=44, how lucky must I be?
         | 
         | It's for residential usage, and if I need some data, I often
         | just copy it over 10Gbit to a system that uses much less power
         | and this NAS is then turned off again.
        
           | the_gorilla wrote:
           | We don't really have to guess. Backblaze posted their stats
           | for 4 TB HGST drives for 2024, and of their 10,000 drives, 5
           | failed. If OP's 2014 4 TB HGST drives are anything like this,
           | then this is just snake oil and magic rituals and it doesn't
           | really matter what you do.
        
             | louwrentius wrote:
             | 5 drives failed in Q1 2024. 8 died in Q2. That is still a
             | very low failure rate.
        
               | renewiltord wrote:
               | Drives have a bathtub curve, but if you want you can be
               | conservative and estimate first year failure rates
               | throughout. So that's p=5/10000 for drive failure. So
               | chance of no-failure per year (because of our assumption)
               | is 1-p. So, chance of no-failure per ten year is (1-p)^10
               | or about 99.5%
        
               | louwrentius wrote:
               | Is that for one drive, or for all 24 drives to survive 10
               | years?
        
               | dn3500 wrote:
               | That's for one drive. For all 24 it's about 88.7%.
        
             | toast0 wrote:
             | > If OP's 2014 4 TB HGST drives are anything like this,
             | then this is just snake oil and magic rituals and it
             | doesn't really matter what you do.
             | 
             | It might matter what you do, but we only have public data
             | for people in datacenters. Not a whole lot of people with
             | 10,000 drives are going to have them mostly turned off, and
             | none of them shared their data.
        
             | formerly_proven wrote:
             | Those are different drives though, they're MegaScale DC
             | 4000 while OP is using 4 TB Deskstars. Not sure if they're
             | basically the same (probably). I've also had a bunch of
             | these 4TB Megascale drives and absolutely no problems
             | whatsoever in about 10 years as well. Run very cool as well
             | (I think they're 5400 rpm not 7200 rpm).
             | 
             | The main issue with drives like these is that 4 TB is just
             | so little storage compared to 16-20 TB class drives, it
             | kinda gets hard to justify the backplane slot.
        
           | manquer wrote:
           | The failure rate is not truly random with a nice normal
           | distribution of failures over time. There are sometimes
           | higher rates in specific batches or they can start failing
           | altogether etc.
           | 
           | Backblaze reports always are interesting insights into how
           | consumers drives behave under constant load.
        
           | dastbe wrote:
           | it's (1-p)^24^10, where p is drive failure rate per year
           | (assuming it doesn't go up over time). so at 1% that's about
           | 9% or a 1/10 chance of this result. Not exactly great, but
           | not impossible.
           | 
           | the backblaze rates are all over the place, but it does
           | appear they have drives that this rate or lower:
           | https://www.backblaze.com/blog/backblaze-drive-stats-
           | for-q2-...
        
             | louwrentius wrote:
             | I can see that, my assumption that my result (no drive
             | failure over 10 year) was rare is wrong. So I've updated my
             | blog about that.
        
           | CTDOCodebases wrote:
           | I'm curious what the "Stop/Stop cycle count" is on these
           | drives and roughly how many times per week/day you are
           | accessing the server.
        
         | monocasa wrote:
         | In fact, the conventional wisdom for a long time was to not
         | turn them off if you want longevity. Bearings seize when cold
         | for instance.
        
         | ryanjshaw wrote:
         | > What is it that takes 71TB but can be turned off most of the
         | time?
         | 
         | Still waiting for somebody to explain this to me as well.
        
           | leptons wrote:
           | I have a 22TB RAID10 system out in my detached garage that
           | works as an "off-site" backup server for all my other
           | systems. It stays off most of the time. It's on when I'm
           | backing up data to it, or if it's running backups to LTO
           | tape. Or it's on when I'm out in the garage doing whatever
           | project, I use it to play music and look up stuff on the web.
           | Otherwise it's off, most of the time.
        
       | naming_the_user wrote:
       | For what it's worth this isn't that uncommon. Most drives fail in
       | the first few years, if you get through that then annualized
       | failure rates are about 1-2%.
       | 
       | I've had the (small) SSD in a NAS fail before any of the drives
       | due to TBW.
        
       | rkagerer wrote:
       | I have a similar-sized array which I also only power on nightly
       | to receive backups, or occasionally when I need access to it for
       | a week or two at a time.
       | 
       | It's a whitebox RAID6 running NTFS (tried ReFS, didn't like it),
       | and has been around for 12+ years, although I've upgraded the
       | drives a couple times (2TB --> 4TB --> 16TB) - the older Areca
       | RAID controllers make it super simple to do this. Tools like Hard
       | Disk Sentinel are awesome as well, to help catch drives before
       | they fail.
       | 
       | I have an additional, smaller array that runs 24x7, which has
       | been through similar upgrade cycles, plus a handful of clients
       | with whitebox storage arrays that have lasted over a decade.
       | Usually the client ones are more abused (poor temperature control
       | when they delay fixing their serveroom A/C for months but keep
       | cramming in new heat-generating equipment, UPS batteries not
       | replaced diligently after staff turnover, etc...).
       | 
       | Do I notice a difference in drive lifespan between the ones that
       | are mostly-off vs. the ones that are always-on? Hard to say. It's
       | too small a sample size and possibly too much variance in 'abuse'
       | between them. But definitely seen a failure rate differential
       | between the ones that have been maintained and kept cool, vs.
       | allowed to get hotter than is healthy.
       | 
       | I _can_ attest those 4TB HGST drives mentioned in the article
       | were tanks. Anecdotally, they 're the most reliable ones I've
       | ever owned. And I have a more reasonable sample size there as I
       | was buying dozens at a time for various clients back in the day.
        
         | louwrentius wrote:
         | I bought the HGSTs specifically because they showed good stats
         | in those Backblaze drive stats that they just started to
         | publish back then.
        
       | lostmsu wrote:
       | I have a mini PC + 4x external HDDs (I always bought used) on
       | Windows 10 with ReFS since probably 2016 (recently upgraded to
       | Win 11), maybe earlier. I don't bother powering off.
       | 
       | The only time I had problems is when I tried to add a 5th disk
       | using a USB hub, which caused drives attached to the hub get
       | disconnected randomly under load. This actually happened with 3
       | different hubs, so I since stopped trying to expand that
       | monstrosity and just replace drives with larger ones instead.
       | Don't use hubs for storage, majority of them are shitty.
       | 
       | Currently ~64TiB (less with redundancy).
       | 
       | Same as OP. No data loss, no broken drives.
       | 
       | A couple of years ago I also added an off-site 46TiB system with
       | similar software, but a regular ATX with 3 or 4 internal drives
       | because the spiderweb of mini PC + dangling USBs + power supplies
       | for HDDs is too annoying.
       | 
       | I do weekly scrubs.
       | 
       | Some notes: https://lostmsu.github.io/ReFS/
        
       | anjel wrote:
       | Ca't help but wonder how much electricity would have been
       | consumed if you had left it on 24/7 for ten years...
        
         | louwrentius wrote:
         | On the original blog it states that the machine used 200W idle.
         | Thats 4,8 KWh a day. 17520 KWh over 10 years? At around 0.30
         | euro per KWh that's 5K+ if I'm not mistaken.
        
         | nine_k wrote:
         | Why wonder, let's approximate.
         | 
         | A typical 7200 rpm disk consumes about 5W when idle. For 24
         | drives, it's 120W. Rather substantial, but not an electric
         | kettle level. At $0.25 / kWh, it's $0.72 / day, or about $22 /
         | mo, or slightly more than $260 / year. But this is only the
         | disks; the CPU + mobo can easily consume half as much on
         | average, so it would be more like $30-35 / mo.
         | 
         | And if you have electricity at a lower price, the numbers
         | change accordingly.
         | 
         | This is why my ancient NAS uses 5400 RPM disks, and a future
         | upgrade could use even slower disks if these were available.
         | The reading bandwidth is multiplied by the number of disks
         | involved.
        
       | louwrentius wrote:
       | Because a lot of people pointed out that it's not that unlikely
       | that all 24 drives survive without failure over 10 years, given
       | the low failure rate reported by Backblaze Drive Stats Reports,
       | I've also updated the article with that notion.
        
       | Jedd wrote:
       | Surprised to not find 'ecc' on that page.
       | 
       | I know it's not a guarantee of no-corruption, and ZFS without ECC
       | is probably no more dangerous than any other file system without
       | ECC, but if data corruption is a major concern for you, and
       | you're building out a _pretty hefty_ system like this, I can 't
       | imagine not using ECC.
       | 
       | Slow on-disk data corruption resulting from gradual and near-
       | silent RAM failures may be like doing regular 3-2-1 backups --
       | you either mitigate against the problem because you've been stung
       | previously, or you're in that blissful pre-sting phase of your
       | life.
       | 
       | EDIT: I found TFA's link to the original build out - and happily
       | they are in fact running a Xeon with ECC. Surprisingly it's a
       | 16GB box (I thought ZFS was much hungrier on the RAM : disk
       | ratio.) Obviously it hasn't helped for physical disk failures,
       | but the success of _the storage array_ owes a lot to this
       | component.
        
         | louwrentius wrote:
         | The system is using ECC and I specifically - unrelated to ZFS -
         | wanted to use ECC memory to reduce risk of data/fs corruption.
         | I've also added 'ecc' to the original blog post to clarify.
         | 
         | Edit: ZFS for home usage doesn't need a ton of RAM as far as
         | I've learned. There is the 1 GB of RAM per 1TB of storage rule
         | of thumb, but that was for a specific context. Maybe the ill-
         | fated data deduplication feature, or was it just to sustain
         | performance?
        
           | rincebrain wrote:
           | It was a handwavey rule of estimation for dedup, handwavey
           | because dedup scales on number of records, which is going to
           | vary wildly by recordsize.
        
             | InvaderFizz wrote:
             | Additionally unless it's changed in the last six years, you
             | should pretend ZFS dedupe doesn't exist.
        
               | rincebrain wrote:
               | Not in a stable release yet, but check out
               | https://github.com/openzfs/zfs/discussions/15896 if you
               | have a need for that.
        
           | Jedd wrote:
           | Thanks, and all good - it was my fault for not following the
           | link in this story to your post about the actual build,
           | before starting on my mini-rant.
           | 
           | I'd heard the original ZFS memory estimations were somewhat
           | exuberant, and recommendations had come down a lot since the
           | early days, but I'd imagine given your usage pattern -
           | powered on periodically - a performance hit for whatever
           | operations you're doing during that time wouldn't be
           | problematic.
           | 
           | I used to use mdadm for software RAID, but for several years
           | now my home boxes are all hardware RAID. LVM2 provides the
           | other features I need, so I haven't really ever explored zfs
           | as a replacement for both - though everyone I know that uses
           | it, loves it.
        
         | hinkley wrote:
         | Accidentally unplugged my raid 5 array and thought I damaged
         | the raid card. Hours after boot I'd get problems. I glitched a
         | RAM chip and the array was picking it up as disk corruption.
        
         | Filligree wrote:
         | It's difficult as a home user to find ECC memory, harder to
         | make sure it actually works in your hardware configuration, and
         | near-impossible to find ECC memory that doesn't require lower
         | speeds than what you can get for $50 on amazon.
         | 
         | I would very much like to put ECC memory in my home server, but
         | I couldn't figure it out this generation. After four hours I
         | decided I had better things to do with my time.
        
           | Jedd wrote:
           | Indeed. I'd started to add an aside to the effect of 'ten
           | years ago it was probably _easier_ to go ECC '. I'll add it
           | here instead.
           | 
           | A decade ago if you wanted ECC your choice was basically
           | Xeon, and all( _) Xeon motherboards would accept ECC.
           | 
           | I agree that these days it's much more complex, since you are
           | ineluctably going get sucked into the despair-spiral of
           | trying to work out what combination of Ryzen + motherboard +
           | ECC RAM will give you _actual, demonstrable* ECC (with
           | correction, not just detection).
        
             | rpcope1 wrote:
             | Sounds like the answer is to just buy another Xeon then,
             | even if it's a little older and maybe secondhand. I think
             | there's a reason the vast majority of Supermicro
             | motherboards are still just Intel only.
        
               | Filligree wrote:
               | You might also need performance. Or efficiency.
        
       | throw0101c wrote:
       | > _This NAS is very quiet for a NAS (video with audio)._
       | 
       | Big (large radius) fans can move a lot of air even at low RPM.
       | And be much more energy efficient.
       | 
       | Oxide Computer, in one of their presentations, talks about using
       | 80mm fans, as they are quiet and (more importantly) don't use
       | much power. They observed, in other servers, as much as 25% of
       | the power went just to powering the fans, versus the ~1% of
       | theirs:
       | 
       | * https://www.youtube.com/shorts/hTJYY_Y1H9Q
       | 
       | * https://www.youtube.com/watch?v=4vVXClXVuzE
        
         | louwrentius wrote:
         | +1 for mentioning 0xide. I love that they went this route and
         | that stat is interesting. I hate the typical DC high RPM small
         | fan whine.
         | 
         | I also hope that they do something 'smart' when they control
         | the fan speed ;-)
        
           | mkeeter wrote:
           | It's moderately smart - there's a PID loop with per-component
           | target temperatures, so it's trying not to do more work than
           | necessary.
           | 
           | (source: I wrote it, and it's all published at https://github
           | .com/oxidecomputer/hubris/tree/master/task/the... )
           | 
           | We also worked with the fan vendor to get parts with a lower
           | minimum RPM. The stock fans idle at about 5K RPM, and ours
           | idle at 2K, which is already enough to keep the system cool
           | under light loads.
        
             | louwrentius wrote:
             | Ha! thanks a lot for sharing, love it. Nice touch to use
             | low idle RPM fans.
             | 
             | Same thing for my ancient NAS: after boot, the fans run at
             | idle for hours and the PID controller just doesn't have to
             | do anything at all.
        
         | sss111 wrote:
         | just curious, are you associated with them, as these are very
         | obscure youtube videos :D
         | 
         | Love it though, even the reduction in fan noise is amazing. I
         | wonder why nobody had thought of it before, it seems so simple.
        
           | throw0101c wrote:
           | > _just curious, are you associated with them, as these are
           | very obscure youtube videos :D_
           | 
           | Unassociated, but tech-y videos are often recommended to me,
           | and these videos got pushed to me. (I have viewed other,
           | unrelated Tech Day videos, so probably why I got that short.
           | Also an old Solaris admin, so aware of Cantril, especially
           | his rants.)
           | 
           | > _Love it though, even the reduction in fan noise is
           | amazing. I wonder why nobody had thought of it before, it
           | seems so simple._
           | 
           | Depends on the size of the server: can't really expand fans
           | with 1U or even 2U pizza boxes. And for general purpose
           | servers, I'm not sure how many 4U+ systems are purchased--
           | perhaps some more now that perhaps GPUs cards may be a
           | popular add-on.
           | 
           | For a while chassis systems (e.g., HP c7000) were popular,
           | but I'm not sure how they are nowadays.
        
             | baby_souffle wrote:
             | > I'm not sure how many 4U+ systems are purchased--perhaps
             | some more now that perhaps GPUs cards may be a popular add-
             | on.
             | 
             | Going from what i see at eCycle places, 4U dried up years
             | ago. Everything is either 1 or 2U or massive blade
             | receptacles (10+ U).
             | 
             | We (the home-lab on a budget people) may see a return to 4U
             | now that GPUs are in vogue but i'd bet that the hyper
             | scalers are going to drive that back down to something
             | that'll be 3U with water cooling or so over the longer
             | term.
             | 
             | We may also see similar with storage systems too; it's only
             | a matter of time before SSD gets "close enough" to spinning
             | rust on the $/gig/unit-volume metrics.
        
         | daemonologist wrote:
         | Interesting - I'm used to desktop/workstation hardware where
         | 80mm is the _smallest_ standard fan (aside from 40mm 's in the
         | near-extinct Flex ATX PSU), and even that is kind of rare.
         | Mostly you see 120mm or 140mm.
        
           | mustache_kimono wrote:
           | > 80mm is the smallest standard fan (aside from 40mm's in the
           | near-extinct Flex ATX PSU)
           | 
           | Those 40mm PSU fans, and the PSU, are what they are replacing
           | with a DC bus bar.
        
             | throw0101c wrote:
             | > _Those 40mm PSU fans, and the PSU, are what they are
             | replacing with a DC bus bar._
             | 
             | DC (power) in the DC (building) isn't anything new: the
             | telco space has used -48V (nominal) power for decades. Do a
             | search for (say) "NEBS DC power" and you'll get a bunch of
             | stuff on the topic.
             | 
             | Lot's of chassis-based system centralized the AC-DC power
             | supplies.
        
           | globular-toast wrote:
           | Yeah. In a home environment you should absolutely use desktop
           | gear. I have 5 80mm and one 120mm PWM fans in my NAS and they
           | are essentially silent as they can't be heard over the sound
           | of the drives (which is essentially the noise floor for a
           | NAS).
           | 
           | It is necessary to use good PWM fans though if concerned
           | about noise as cheaper ones can "tick" annoyingly. Two brands
           | I know to be good in this respect are Be Quiet! and Noctua.
           | DC would in theory be better but most motherboards don't
           | support it (would require an external controller and thermal
           | sensors I think).
        
         | chiph wrote:
         | My Synology uses two 120mm fans and you can barely hear them
         | (it's on the desk next to me). I'm sold on the idea of moving
         | more volume at less speed.
         | 
         | (which I understand can't happen in a 1U or 2U chassis)
        
       | bearjaws wrote:
       | I feel like 10 years is when my drives started failing the most.
       | 
       | I run a 8x8tb array zraid2 redundancy, initially it was a 8x2tb
       | array but drives started failing once every 4 months, after 3
       | drives failed I upgraded the remaining ones.
       | 
       | Only downside to hosting your own is power consumption. OS
       | upgrades have been surprisingly easy.
        
       | leighleighleigh wrote:
       | Regarding the custom PID controller script: I could have sworn
       | the Linux kernel had a generic PID controller available as a
       | module, which you could setup via the device tree, but I can't
       | seem to find it! (grepping for 'PID' doesn't provide very helpful
       | results lol).
       | 
       | I think it was used on nVidia Tegra systems, maybe? I'd be
       | interested to find it again, if anyone knows. :)
        
         | ewalk153 wrote:
         | Maybe related to this?
         | 
         | https://github.com/torvalds/linux/blob/master/tools/thermal/...
        
       | rnxrx wrote:
       | In my experience the environment where the drives are running
       | makes a huge difference in longevity. There's a ton more
       | variability in residential contexts than in data center (or even
       | office) space. Potential temperature and humidity variability is
       | a notable challenge but what surprised me was the marked effect
       | of even small amounts of dust.
       | 
       | Many years ago I was running an 8x500G array in an old Dell
       | server in my basement. The drives were all factory-new Seagates -
       | 7200RPM and may have been the "enterprise" versions (i.e. not
       | cheap). Over 5 years I ended up averaging a drive failure every 6
       | months. I ran with 2 parity drives, kept spares around and RMA'd
       | the drives as they broke.
       | 
       | I moved houses and ended up with a room dedicated to lab stuff.
       | With the same setup I ended up going another 5 years without a
       | single failure. It wasn't a surprise that the new environment was
       | better, but it was surprising how _much_ better a cleaner, more
       | stable environment ended up being.
        
         | stavros wrote:
         | How does dust affect things? The drives are airtight.
        
           | kenhwang wrote:
           | They're airtight now (at the high end or enterprise level).
           | They weren't airtight not very long ago and had filters to
           | regulate the air exchange.
        
             | Kirby64 wrote:
             | They're not airtight in the true sense (besides the helium
             | filled ones nowadays), but every drive made in the past...
             | 30? 40 years is airtight in the sense that no dust can ever
             | get into the drive. There's a breather hole somewhere (with
             | a big warning to not cover it!) to equalize pressure, and a
             | filter that doesn't allow essentially any particles in.
        
               | kenhwang wrote:
               | No dust is supposed to get "in" the drive, but dust can
               | very well clog the breather hole and cause pressure
               | issues that could kill the drive.
        
               | Kirby64 wrote:
               | Unless you're moving the altitude of the drive
               | substantially after it's already clogged, how would this
               | happen? There's no air exchange on hard drives.
        
               | kenhwang wrote:
               | Unless your drives are in a perfectly controlled
               | temperature, humidity, and atmospheric pressure
               | environment, those will all impact the internal pressure.
               | Temperature being the primary concern because drives do
               | get rather warm internally while operating.
        
               | Kirby64 wrote:
               | Sure, it has some impact, but we're not talking about
               | anything too crazy. And that also assumes full total
               | clogging of all pores... which is unlikely to happen. You
               | won't have perfect sealing and pressure will just
               | equalize.
        
           | deafpolygon wrote:
           | everything else isn't. the dust can get into power supplies
           | and cause irregularities.
        
         | Loughla wrote:
         | Are your platters open to air? Or was it the cooling system?
         | I'm confused.
        
         | ylee wrote:
         | >Many years ago I was running an 8x500G array in an old Dell
         | server in my basement. The drives were all factory-new Seagates
         | - 7200RPM and may have been the "enterprise" versions (i.e. not
         | cheap). Over 5 years I ended up averaging a drive failure every
         | 6 months. I ran with 2 parity drives, kept spares around and
         | RMA'd the drives as they broke.
         | 
         | Hah! I had a 16x500GB Seagate array and also averaged an RMA
         | every six months. I think there was a firmware issue with that
         | generation.
        
         | kalleboo wrote:
         | A drive failure every 6 months almost sounds more like dirty
         | power than dust, I've always kept my NAS/file servers in dusty
         | residential environments (I have a nice fuzzy gray Synology
         | logo visible right now) and never seen anything like that
        
           | bitexploder wrote:
           | Drives are sealed anyway. Humidity maybe. Dust can't really
           | get in. Power or bad batch of drives.
        
             | rkagerer wrote:
             | Don't know the details, but dust could have been impeding
             | the effectiveness of his fans or clumping to create other
             | hotspots in the system (including in the PSU).
        
             | userbinator wrote:
             | Except for the helium-filled ones, they aren't sealed;
             | there is a very fine filter that equalises atmospheric
             | pressure. (This is also why they have a maximum operating
             | altitude --- the head needs a certain amount of atmospheric
             | pressure to float.)
        
               | aaronmdjones wrote:
               | Yup, this is why the label will say something along the
               | lines of "DO NOT COVER DRIVE HOLES".
        
               | bitexploder wrote:
               | How does the helium stay in if it is not sealed? I am not
               | familiar with hard drive construction, but helium is
               | notoriously good at escaping.
        
               | lorax wrote:
               | I think he meant in general drives aren't sealed, except
               | the helium ones are sealed.
        
               | bitexploder wrote:
               | Oh, I see. Makes sense. I wonder if dust really can
               | infiltrate a drive? Hmm.
        
         | daniel-s wrote:
         | Does dust matter for SSD drives?
        
           | earleybird wrote:
           | Only when checking for finger prints :-)
        
         | sega_sai wrote:
         | It is most likely the model's fault. I once had a machine with
         | 36 Seagate ST3000DM001, they were failing almost once a month
         | -- see the annual failure rate here
         | https://www.backblaze.com/blog/best-hard-drive-q4-2014/
        
         | mapt wrote:
         | "Do you think that's air you're breathing?"
         | 
         | This is no longer much of an issue with sealed, helium filled
         | drives, if it ever was.
        
       | 8n4vidtmkvmk wrote:
       | I run a similar but less sophisticated setup. About 18 TiB now,
       | and I run it 16 hours a day. I let it sleep 8 hours per night so
       | that it's well rested in the morning. I just do this on a cron
       | because I'm not clever enough to SSH into a turned off (and
       | unplugged!) machine.
       | 
       | 4 drives: 42k hours (4.7 years), 27k hours (3 years), 15k hours
       | (1.6 years), and the last drive I don't know because apparently
       | it isn't SMART.
       | 
       | 0 errors according to scrub process.
       | 
       | ... but I guess I can't claim 0 HDD failures. There has been 1 or
       | 2, but not for years now. Knock on wood. No data loss because of
       | mirroring. I just can't lose 2 in a pair. (Never run RAID5 BTW,
       | lost my whole rack doing that)
        
         | BenjiWiebe wrote:
         | Looks like you're quite clever actually, if you can get cron to
         | run on a powered off unplugged machine.
         | 
         | I think I'm missing something.
        
           | alanfranz wrote:
           | Some bios and firmware support turning on at a certain time.
           | Maybe cron was a way to simplify.
        
           | bigiain wrote:
           | I use a wifi controlled powerpoint to power up and down a
           | pair of raid1 backup drives.
           | 
           | A weekly cronjob on another (always on) machine does some
           | simple tests (md5 checksums of "canary files" on a few
           | machines on the network) then powers up and mounts the
           | drives, runs an incremental backup, waits for it to finish,
           | then unmounts and powers them back down. (There's also a
           | double-check cronjob that runs 3 hours later that confirms
           | they are powered down, and alerts me if they aren't. My
           | incrementals rarely take more than an hour.)
        
           | 8n4vidtmkvmk wrote:
           | Just power, not unplugged. It's simply
           | 
           | 0 2 * * * /usr/sbin/rtcwake -m off -s 28800 # off from 2am to
           | 10am
           | 
           | "and unplugged" was referring to OP's setup, not mine
        
       | lvl155 wrote:
       | I have a similar approach but I don't use ZFS. It's a bit
       | superfluous especially if you're using your storage periodically
       | (turn on and off). I use redundant NVMEs in two stages and
       | periodically save important data into multiple HDDs (cold
       | storage). Worth noting, it's important to prune your data.
       | 
       | I also do not backup photos and videos locally. It's a major
       | headache and they just take up a crap ton of space when Amazon
       | Prime will give you photo storage for free.
       | 
       | Anecdotally, only drives that failed on me were enterprise-grade
       | HDDs. And they all failed within a year and in an always-on
       | system. I also think RAIDs are over-utilized and frankly a big
       | money pit outside of enterprise-level environments.
        
       | ed_mercer wrote:
       | > Losing the system due to power shenanigans is a risk I accept.
       | 
       | A UPS provides more than just that, it delivers constant energy
       | without fluctuations and thus makes your hardware last longer.
        
         | vunderba wrote:
         | Yeah, this definitely caused me to raise an eyebrow. UPS covers
         | brown outs and obviously the occasional temporary power outage.
         | All those drives spinning at full speed suddenly coming to a
         | grinding halt as the power is suddenly cut, and you're
         | quibbling over a paltry additional 10 watts? I can only assume
         | that the data is not that important.
        
           | flemhans wrote:
           | As part of resilience testing I've been turning off our 24
           | drive backup drive array daily for two years, by flicking the
           | wall switch. So far nothing happened.
        
       | snvzz wrote:
       | >but for residential usage, it's totally reasonable to accept the
       | risk.
       | 
       | Polite disagree. Data integrity is the natural expectation humans
       | have from computers, and thus we should stick to filesystems with
       | data checksums such as ZFS, as well as ECC memory.
        
         | naming_the_user wrote:
         | I agree.
         | 
         | I think that the author may not have experienced these sorts of
         | errors before.
         | 
         | Yes, the average person may not care about experiencing a
         | couple of bit flips per year and losing the odd pixel or block
         | of a JPEG, but they will care if some cable somewhere or
         | transfer or bad RAM chip or whatever else manages to destroy a
         | significant amount of data before they notice it.
        
           | AndrewDavis wrote:
           | I had a significant data loss years ago.
           | 
           | I was young and only had a deskop, so all my data was there.
           | 
           | So I purchased a 300GB external usb drive to use for periodic
           | backup. It was all manual copy/paste files across with no
           | real schedule, but it was fine for the time and life was
           | good.
           | 
           | Over time my data grew and the 300GB drive wasn't large
           | enough to store it all. For a while some of it wasnt backed
           | up (I was young with much less disposable income).
           | 
           | Eventually I purchased a 500GB drive.
           | 
           | But what I didn't know is my desktop drive was dying. Bits
           | were flipping, a lot of them.
           | 
           | So when I did my first backup with the new drive I copied all
           | my data off my desktop along with the corruption.
           | 
           | It was months before I realised a huge amount of my files
           | were corrupted. By that point I'd wiped the old backup drive
           | to give to my Mum to do her own backups. My data was long
           | gone.
           | 
           | Once I discovered ZFS I jumped on it. It was the exact thing
           | that would have prevented this because I could have detected
           | the corruption when I purchased the new backup drive and did
           | the initial backup to it.
           | 
           | (I made up the drive sizes because I can't remember, but the
           | ratios will be about right).
        
             | 369548684892826 wrote:
             | There's something disturbing about the idea of silent data
             | loss, it totally undermines the peace of mind of having
             | backups. ZFS is good, but you can also just run rsync
             | periodically with checksum and dryrun args and check the
             | output for diffs.
        
               | willis936 wrote:
               | It happens all the time. Have a plan, perform fire
               | drills. It's a lot of time and money, but there's no
               | equivalent feeling to unfucking yourself quite like being
               | able to get your lost, fragile data back.
        
               | lazide wrote:
               | The challenge with silent data loss is your backups will
               | eventually not have the data either - it will just be
               | gone, silently.
               | 
               | After having that happen a few times (pre-ZFS), I started
               | running periodic find | md5sum > log.txt type jobs and
               | keeping archives.
               | 
               | It's caught more than a few problems over the years, and
               | allows manual double checking even when using things like
               | ZFS. In particular, some tools/settings just aren't sane
               | to use to copy large data sets, and I only discovered
               | that when... some of it didn't make it to it's
               | destination.
        
               | AndrewDavis wrote:
               | Absolutely, if you can't use a filesystem with checksums
               | (zfs, btrfs, bcachefs) then rsync is a great idea.
               | 
               | I think filesystem checksums have one big advantage vs
               | rsync. With rsync if there's a difference it isn't clear
               | which one is wrong.
        
               | bombcar wrote:
               | I have an MP3 file that still skips to this day, because
               | a few frames were corrupted on disk twenty years ago.
               | 
               | I could probably find a new copy of it online, but that
               | click is a good reminder about how backups aren't just
               | copies but have to be verified.
        
             | taneq wrote:
             | My last spinning hard drive failed silently like this, but
             | I didn't lose too much data... I think. The worst part is
             | not knowing.
        
         | mustache_kimono wrote:
         | > Data integrity is the natural expectation humans have from
         | computers
         | 
         | I've said it once, and I'll say it again: the only reason ZFS
         | isn't the norm is because we all once lived through a
         | primordial era when it didn't exist. No serious person
         | designing a filesystem today would say it's okay to misplace
         | your data.
         | 
         | Not long ago, on this forum, someone told me that ZFS is only
         | good because _it had no competitors_ in its space. Which is
         | kind of like saying the heavyweight champ is only good because
         | no one else could compete.
        
           | wolrah wrote:
           | To paraphrase, "ZFS is the worst filesystem, except for all
           | those other filesystems that have been tried from time to
           | time."
           | 
           | It's far from perfect, but it has no peers.
           | 
           | I spent many years stubbornly using btrfs and lost data
           | multiple times. Never once did the redundancy I had
           | supposedly configured actually do anything to help me. ZFS
           | has identified corruption caused by bad memory and a bad CPU
           | and let me know immediately which files were damaged.
        
           | Too wrote:
           | The reason ZFS isn't the norm is because it historically was
           | difficult to set up. Outside of NAS solutions, it's only
           | since Ubuntu 20.04 it has been supported out of the box on
           | any high profile customer facing OS. The reliability of the
           | early versions was also questionable, with high zsys cpu
           | usage and some times arcane commands needed to rebuild pools.
           | Anecdotally, I've had to support lots of friends with zfs
           | issues, never so with other file systems. The data always
           | comes back, it's just that it needs petting.
           | 
           | Earlier, there used to be lot of fears around the license,
           | with Torvalds advising against its use, both for that reason
           | and for lack of maintainers. Now i believe that has been
           | mostly ironed out and should be less of an issue.
        
             | mustache_kimono wrote:
             | > The reason ZFS isn't the norm is because it historically
             | was difficult to set up. Outside of NAS solutions, it's
             | only since Ubuntu 20.04 it has been supported out of the
             | box on any high profile customer facing OS.
             | 
             | In this one very narrow sense, we are agreed, if we are
             | talking about Linux on root. IMHO it should also have been
             | virtually everywhere else. It should have been in MacOS,
             | etc.
             | 
             | However, I think your particular comment may miss the
             | forest for the trees. Yes, ZFS was difficult to set up for
             | Linux, because Linux people disfavored its use (which you
             | do touch upon later).
             | 
             | People sometimes imagine that purely technical
             | considerations govern the technical choices of remote
             | groups. However, I think when people say "all tech is
             | political" in the cultural-war-ing American politics sense,
             | they may be right, but they are absolutely right in the
             | small ball open source politics sense.
             | 
             | Linux communities were convinced not to include or build
             | ZFS support. Because licensing was a problem. Because btrfs
             | was coming and would be better. Because Linus said ZFS was
             | mostly marketing. So they didn't care to build support. Of
             | course, this was all BS or FUD or NIH, but it was what
             | happened, not that ZFS had new and different recovery tool,
             | or was less reliable in the arbitrary past. It was because
             | the Linux community engaged in its own (successful) FUD
             | campaign against another FOSS project.
        
             | zvr wrote:
             | Was there any change in the license that made you believe
             | that it should be less than a issue?
             | 
             | Or do you think people simply stopped paying attention?
        
               | Too wrote:
               | Canonical took a team of lawyers to deeply review the
               | license in 2016. It's beyond my legal skills to say if
               | the conclusion made it more or less of an issue, at least
               | the boundaries should now be more clear, for those who
               | understand these matters more.
               | 
               | https://canonical.com/blog/zfs-licensing-and-linux
               | 
               | https://softwarefreedom.org/resources/2016/linux-kernel-
               | cddl...
        
             | hulitu wrote:
             | > The reason ZFS isn't the norm is because it historically
             | was difficult to set up.
             | 
             | Has this changed ? ZFS comes with a BSD view of the world
             | (i.e slices). It also needed a sick amount of RAM to
             | function properly.
        
           | KMag wrote:
           | > No serious person designing a filesystem today would say
           | it's okay to misplace your data.
           | 
           | Former LimeWire developer here... the LimeWire splash screen
           | at startup was due to experiences with silent data
           | corruption. We got some impossible bug reports, so we created
           | a stub executable that would show a splash screen while
           | computing the SHA-1 checksums of the actual application DLLs
           | and JARs. Once everything checked out, that stub would use
           | Java reflection to start the actual application. After moving
           | to that, those impossible bug reports stopped happening. With
           | 60 million simultaneous users, there were always some of them
           | with silent disk corruption that they would blame on
           | LimeWire.
           | 
           | When Microsoft was offering free Win7 pre-release install
           | ISOs for download, I was having install issues. I didn't want
           | to get my ISO illegally, so I found a torrent of the ISO, and
           | wrote a Python script to download the ISO from Microsoft, but
           | use the torrent file to verify chunks and re-download any
           | corrupted chunks. Something was very wrong on some device
           | between my desktop and Microsoft's servers, but it eventually
           | got a non-corrupted ISO.
           | 
           | It annoys me to no end that ECC isn't the norm for all
           | devices with more than 1 GB of RAM. Silent bit flips are just
           | not okay.
           | 
           | Edit: side note: it's interesting to see the number of
           | complaints I still see from people who blame hard drive
           | failures on LimeWire stressing their drives. From very early
           | on, LimeWire allowed bandwidth limiting, which I used to keep
           | heat down on machines that didn't cool their drives properly.
           | Beyond heat issues that I would blame on machine vendors,
           | failures from write volume I would lay at the feet of drive
           | manufacturers.
           | 
           | Though, I'm biased. Any blame for drive wear that didn't fall
           | on either the drive manufacturers or the filesystem
           | implementers not dealing well with random writes would
           | probably fall at my feet. I'm the one who implemented
           | randomized chunk order downloading in order to rapidly
           | increase availability of rare content, which would increase
           | the number of hard drive head seeks on non-log-based
           | filesystems. I always intended to go back and (1) use
           | sequential downloads if tens of copies of the file were in
           | the swarm, to reduce hard drive seeks and (2) implement
           | randomized downloading of rarest chunks first, rather than
           | the naive randomization in the initial implementation. I say
           | naive, but the initial implementation did have some logic to
           | randomize chunk download order in a way to reduce the size of
           | the messages that swarms used to advertise which peers had
           | which chunks. As it turns out, there were always more
           | pressing things to implement and the initial implementation
           | was good enough.
           | 
           | (Though, really, all read-write filesystems should be copy-
           | on-write log-based, at least for recent writes, maybe having
           | some background process using a count-min-sketch to estimate
           | locality for frequently read data and optimize read locality
           | for rarely changing data that's also frequently read.)
           | 
           | Edit: Also, it's really a shame that TCP over IPv6 doesn't
           | use CRC-32C (to intentionally use a different CRC polynomial
           | than Ethernet, to catch more error patterns) to end-to-end
           | checksum data in each packet. Yes, it's a layering
           | abstraction violation, but IPv6 was a convenient point to
           | introduce a needed change. On the gripping hand, it's
           | probably best in the big picture to raise flow control,
           | corruption/loss detection, retransmission (and add forward
           | error correction) in libraries at the application layer (a la
           | QUIC, etc.) and move everything to UDP. I was working on
           | Google's indexing system infra when they switched
           | transatlantic search index distribution from multiple
           | parallel transatlantic TCP streams to reserving dedicated
           | bandwidth from the routers and blasting UDP using rateless
           | forward error codes. Provided that everyone is implementing
           | responsible (read TCP-compatible) flow control, it's really
           | good to have the rapid evolution possible by just using UDP
           | and raising other concerns to libraries at the application
           | layer. (N parallel TCP streams are useful because they
           | typically don't simultaneously hit exponential backoff, so
           | for long-fat networks, you get both higher utilization and
           | lower variance than a single TCP stream at N times the
           | bandwidth.)
        
             | pbhjpbhj wrote:
             | It sounds like a fun comp sci exercise to optimise the algo
             | for randomised block download to reduce disk operations but
             | maintain resilience. Presumably it would vary significantly
             | by disk cache sizes.
             | 
             | It's not my field, but my impression is that it would be
             | equally resilient to just randomise the start block (adjust
             | spacing of start blocks according to user bandwidth?) then
             | let users just run through the download serially; maybe
             | stopping when they hit blocks that have multiple sources
             | and then skipping to a new start block?
             | 
             | It's kinda mindbogglingly to me too think of all the
             | processes that go into a 'simple' torrent download at the
             | logical level.
             | 
             | If AIs get good enough before I die then asking it to
             | create simulations on silly things like this will probably
             | keep me happy for all my spare time!
        
               | KMag wrote:
               | For the completely randomized algorithm, my initial
               | prototype was to always download the first block if
               | available. After that, if fewer than 4 extents
               | (continuous ranges of available bytes) were downloaded
               | locally, randomly chose any available block. (So, we
               | first get the initial block, and 3 random blocks.) If 4
               | or more extents were available locally, then always try
               | the block after the last downloaded block, if available.
               | (This is to minimize disk seeks.) If the next block isn't
               | available, then the first fallback was to check the list
               | of available blocks against the list of next blocks for
               | all extents available locally, and randomly choose one of
               | those. (This is to chose a block that hopefully can be
               | the start of a bunch of sequential downloads, again
               | minimizing disk seeks.) If the first fallback wasn't
               | available, then the second fallback was to compute the
               | same thing, except for the blocks before the locally
               | available extents rather than the blocks after. (This is
               | to avoid increasing the number of locally available
               | extents if possible.) If the second fallback wasn't
               | available, then the final fallback was to randomly
               | uniformly pick one of the available blocks.
               | 
               | Trying to extend locally available extents if possible
               | was desirable because peers advertised block availability
               | as pairs of <offset, length>, so minimizing the number of
               | extents minimized network message sizes.
               | 
               | This initial prototype algorithm (1) minimized disk seeks
               | (after the initial phase of getting the first block and 3
               | other random blocks) by always downloading the block
               | after the previous download, if possible. (2) Minimized
               | network message size for advertising available extents by
               | extending existing extents if possible.
               | 
               | Unfortunately, in simulation this initial prototype
               | algorithm biased availability of blocks in rare files,
               | biasing in favor of blocks toward the end of the file.
               | Any bias is bad for rapidly spreading rare content, and
               | bias in favor of the end of the file is particularly bad
               | for audio and video file types where people like to start
               | listening/watching while the file is still being
               | downloaded.
               | 
               | Instead, the algorithm in the initial production
               | implementation was to first check the file extension
               | against a list of extensions likely to be accessed by the
               | user while still downloading (mp3, ogg, mpeg, avi, wma,
               | asf, etc.).
               | 
               | For the case where the file extension indicates the user
               | is unlikely to access the content until the download is
               | finished (the general case algorithm), look at the number
               | of extents (continuous ranges of bytes the user already
               | has). If the number of extents is less than 4, pick any
               | block randomly from the list of blocks that peers were
               | offering for download. If there are 4 or more extents
               | available locally, for each end of each extent available
               | locally, check the block before it and the block after it
               | to see if they're available for download from peers. If
               | this list of available adjacent blocks is non-empty, then
               | randomly chose one of those adjacent blocks for download.
               | If the list of available adjacent blocks is empty, then
               | uniformly randomly chose from one of the blocks available
               | from peers.
               | 
               | In the case of file types likely to be viewed while being
               | downloaded, it would download from the front of the file
               | until the download was 50% complete, and then randomly
               | either download the first needed block, or else use the
               | previously described algorithm, with the probability of
               | using the previous (randomized) algorithm increasing as
               | the percentage of the download completed increased. There
               | was also some logic to get the last few chunks of files
               | very early in the download for file formats that required
               | information from a file footer in order to start using
               | them (IIRC, ASF and/or WMA relied on footer information
               | to start playing).
               | 
               | Internally, there was also logic to check if a chunk was
               | corrupted (using a Merkle tree using the Tiger hash
               | algorithm). We would ignore the corrupted chunks when
               | calculating the percentage completed, but would remove
               | corrupted chunks from the list of blocks we needed to
               | download, unless such removal resulted in an empty list
               | of blocks needed for download. In this way, we would
               | avoid re-downloading corrupted blocks unless we had
               | nothing else to do. This would avoid the case where one
               | peer had a corrupted block and we just kept re-requesting
               | the same corrupted block from the peer as soon as we
               | detected corruption. There was some logic to alert the
               | user if too many corrupted blocks were detected and give
               | the user options to stop the download early and delete
               | it, or else to keep downloading it and just live with a
               | corrupted file. I felt there should have been a third
               | option to keep downloading until a full-but-corrupt
               | download was had, retry downloading every corrupt block
               | once, and then re-prompt the user if the file was still
               | corrupt. However, this option would have resulted in more
               | wasted bandwidth and likely resulted in more user
               | frustration due to some of them hitting "keep trying"
               | repeatedly instead of just giving up as soon as it was
               | statistically unlikely they were going to get a non-
               | corrupted download. Indefinite retries without prompting
               | the user were a non-starter due to the amount of
               | bandwidth they would waste.
        
           | KMag wrote:
           | How are the memory overheads of ZFS these days? In the old
           | days, I remember balking at the extra memory required to run
           | ZFS on the little ARM board I was using for a NAS.
        
             | doublepg23 wrote:
             | That was always FUD more or less. ZFS uses RAM as its
             | primary cache...like every other filesystem, so it if you
             | have very little RAM for caching the performance will
             | degrade...like every other filesystem.
        
               | KMag wrote:
               | But if you have a single board computer with 1 GB of RAM
               | and several TB of ZFS, will it just be slow, or actually
               | not run? Granted, my use case was abnormal, and I was
               | evaluating in the early days when there were both license
               | and quality concerns with ZFS on Linux. However, my
               | understanding at the time was that it wouldn't actually
               | work to have several TB in a ZFS pool with 1 GB of RAM.
               | 
               | My understanding is that ZFS has its own cache apart from
               | the page cache, and the minimum cache size scales with
               | the storage size. Did I misundertand/is my information
               | outdated?
        
               | homebrewer wrote:
               | > will it just be slow
               | 
               | This. I use it on a tiny backup server with only 1 GB of
               | RAM and a 4 TB HDD pool, it's fine. Only one machine
               | backs up to that server at a time, and they do that at
               | network speed (which is admittedly only 100 Mb/s, but it
               | should go somewhat higher if it had faster network).
               | Restore also runs ok.
        
               | KMag wrote:
               | Thanks for this. I initially went with xfs back when
               | there were license and quality concerns with zfs on Linux
               | before btrfs was a thing, and moved to btrfs after btrfs
               | was created and matured a bit.
               | 
               | These days, I think I would be happier with zfs and one
               | RAID-Z pool across all of the disks instead of individual
               | btrfs partitions or btrfs on RAID 5.
        
               | BSDobelix wrote:
               | >That was always FUD more or less
               | 
               | Thank you thank you, exactly this! And additionally that
               | cache is compressed. In the day's of 4GB machines ZFS was
               | overkill but today...no problem.
        
               | magicalhippo wrote:
               | > That was always FUD more or less.
               | 
               | To give some context. ZFS support de-duplication, and
               | until fairly recently, the de-duplication data structures
               | _had_ to be resident in memory.
               | 
               | So if you used de-duplication earlier, then yes, you
               | absolutely _did_ need a certain amount of memory per byte
               | stored.
               | 
               | However, there is absolutely no requirement to use de-
               | duplication, and without it the memory _requirements_ are
               | just a small, fairly fixed amount.
               | 
               | It'll store writes in memory until it commits them in a
               | so-called transaction group, so you need to have room for
               | that. But the limits on a transaction group is
               | configurable, so you can lower the defaults.
        
               | doublepg23 wrote:
               | I don't think I came across anyone suggesting zfs dedupe
               | without insisting that it was effectively broken except
               | for very specific workloads.
        
           | hulitu wrote:
           | > the only reason ZFS isn't the norm is because we all once
           | lived through a primordial era when it didn't exist.
           | 
           | There were good filesystems before ZFS. I would love to have
           | a versioning filesystem like Apollo had.
        
         | Timshel wrote:
         | Just checked my scrub history, for 20TB on consumer hardware
         | during the last two years it repaired twice around 2 and 4
         | blocks each time.
         | 
         | So not much but at the same time with a special kind of luck
         | might have been on an encrypted archive ^^.
        
           | nolok wrote:
           | That's all fine and good until that one random lone broken
           | block stops you from opening that file you really need.
        
             | lazide wrote:
             | Or in my case, a key filesystem metadata block that ruins
             | everything. :s
        
               | pbhjpbhj wrote:
               | I only know about FAT but these "key file metadata
               | blocks" are redundant, so you need really special double-
               | plus bad luck to do that.
        
               | Szpadel wrote:
               | so I can consider myself very lucky and unlucky at the
               | same time. I had data corruption on zfs filesystem that
               | destroyed whole pool to unrecoverable state (zfs was
               | segfaulting while trying to import, all recovery zfs
               | features where crashing zfs module and required reboot)
               | the lucky part is that this happened just after
               | (something like next day) I migrated whole pool to
               | another (bigger) server/pool so that system was already
               | scheduled for full disk wipe
        
               | howard941 wrote:
               | This happened to me too. The root cause was a bad memory
               | stick.
        
               | lazide wrote:
               | It was ext4, and I've had it happen two different times -
               | in fact, I've never had it happen in a 'good' recoverable
               | way before that I've ever seen.
               | 
               | It triggered a kernel panic in every machine that I
               | mounted it in, and it wasn't a media issue either. Doing
               | a block level read of the media had zero issues and
               | consistently returned the exact same data the 10 times I
               | did it.
               | 
               | Notably, I had the same thing happen using btrfs due to
               | power issues on a Raspberry Pi (partially corrupted
               | writes resulting in a completely unrecoverable
               | filesystem, despite it being in 2x redundancy mode).
               | 
               | Should it be impossible? Yes. Did it definitely, 100% for
               | sure happen? You bet.
               | 
               | I never actually lost data on ZFS, and I've done some
               | terrible things to pools before that took quite awhile to
               | unbork, including running it under heavy write load with
               | a machine with known RAM problems and no ECC.
        
               | hulitu wrote:
               | Good to know. Ext2 is much more robust against
               | corruption. Or at least it was 10 years ago when i had
               | kernel crashes or power failures.
        
             | mrjin wrote:
             | What you need is backup. RAID is not backup and is not for
             | most home/personal users. I learnt that the hard way. Now
             | my NAS use simple volumes only, after all, I really don't
             | have many things I cannot lose on it. If it's something
             | really important, I have multiple copies on different
             | drives, and some offline cold backup. So now if any of my
             | NAS drive is about to fail, I can just copy out the data
             | and replace the drive, instead of spending weeks trying to
             | rebuild the RAID and ended with a total loss as multiple
             | drives failed in a row. The funny thing is that, after
             | moving to simply volumes approach, I never had a drive with
             | even a bad sector since.
        
               | nolok wrote:
               | Oh I have backups myself. But parent is more or less
               | talking about a 71TiB NAS for residential usage and being
               | able to ignore the bit rot; in that context such a person
               | probably wouldn't have backup.
               | 
               | Personnaly I have long since moved out of raid 5/6 into
               | raid 1 or 10 with versionned backup, at some level of
               | data raid 5/6 just isn't cutting it anymore in case
               | anything goes slightly wrong.
        
         | louwrentius wrote:
         | Most people don't run ZFS on their laptop, desktop so let's not
         | pretend it's such a huge deal.
        
           | BSDobelix wrote:
           | Most people run Windows on their Laptop (without ReFS), and
           | many people use paid data restore services if something
           | "important" gets missing/corrupt.
           | 
           | >let's not pretend it's such a huge deal
           | 
           | Depends on the importance of you data right?
        
             | louwrentius wrote:
             | I bet even 99.9% of HN visitors don't run ZFS on their
             | laptop/desktop. Basically we all take this risk except for
             | a few dedicated nerds.
             | 
             | Everything has a price and people like to have their
             | uncorrupted files, but not at all cost.
        
               | rabf wrote:
               | I find this thinking difficult to reconcile. When I setup
               | up my workstation it does usually take me half a day to
               | sort out an encrypted rootfs mirrored volume with
               | zfsbootmenu + linux, but after that its all set for the
               | next decade. A small price for the peace of mind it
               | affords.
        
         | mytailorisrich wrote:
         | I think this is over the top for standard residential usage.
         | 
         | Make sure you have good continuous backups and perhaps RAID 1
         | on your file server (if you have one) to save efforts in case a
         | disk fails and you are more than covered.
        
         | rollcat wrote:
         | > we should stick to filesystems with data checksums such as
         | ZFS, as well as ECC memory.
         | 
         | While I don't disagree with this statement, consider the
         | reality:
         | 
         | - APFS has metadata checksums, but no data checksums. WTF
         | Apple?
         | 
         | - Very few Linux distributions ship zfs.ko (&spl.ko); those
         | that do, theoretically face a legal risk (any kernel
         | contributor could sue them for breaching the GPL); rebuilding
         | the driver from source is awkward (even with e.g. DKMS), pulls
         | more power, takes time, and may randomly leave your system
         | unbootable (YES it happened to me once).
         | 
         | - Linux itself explicitly treats ZFS as unsupported; loading
         | the module taints the kernel.
         | 
         | - FreeBSD is great, and is actually making great progress
         | catching up with Linux on the desktop. Still, it is a catch-up
         | game. I also don't want to install a system that needs to
         | install another guest system to actually run the programs I
         | need.
         | 
         | - There are no practical alternatives to ZFS that even come
         | close; sibling comment complains about btrfs data loss. I never
         | had the guts to try btrfs in production after all the horror
         | stories I've heard over the decade+.
         | 
         | - ECC memory on laptops is practically unheard of, save for a
         | couple niche Thinkpad models; and comes with large premiums on
         | desktops.
         | 
         | What are the _practical_ choices for people who _do not want
         | to_ cosplay as sysadmins?
        
           | louwrentius wrote:
           | I feel that on HN people tend to be a bit pedantic about
           | topics like data integrity, and in business settings I
           | actually agree with them.
           | 
           | But for residential use, risks are just different and as you
           | point out, you have no options except to only use a desktop
           | workstation with ECC. People like/need laptops so that's not
           | realistic for most people. Just run Linux/freebsd with ZFS
           | isn't reasonable advice to me.
           | 
           | What I feel most strongly about is that it's all about
           | circumstances, context and risk evaluation. And I see so much
           | blanket absolutist statements that don't think about the
           | reality of life and people's circumstances.
        
           | nubinetwork wrote:
           | > - Very few Linux distributions ship zfs.ko (&spl.ko); those
           | that do, theoretically face a legal risk (any kernel
           | contributor could sue them for breaching the GPL)
           | 
           | > - Linux itself explicitly treats ZFS as unsupported;
           | loading the module taints the kernel.
           | 
           | So modify the ZFS source so it appears as a external GPL
           | module... just don't tell anyone or distribute it...
           | 
           | I can't say much about dracut or having to build the module
           | from source... as a Gentoo user, I do it about once a month
           | without any issues...
        
             | rollcat wrote:
             | > So modify the ZFS source
             | 
             | Way to miss the point
        
               | nubinetwork wrote:
               | Not really... the complaint was over licensing and
               | tainting the kernel... so just tell the kernel it's not a
               | CDDL module... problem solved.
        
               | rollcat wrote:
               | > _What are the practical choices for people who do not
               | want to cosplay as sysadmins?_
               | 
               | The specific complaint is not at all about the kernel
               | identifying itself as tainted, the specific complaint is
               | about the kernel developers' unyielding unwillingness to
               | support any scenario where ZFS is concerned, thus leaving
               | one with even more "sysadmin duties". I want to _use_ my
               | computer, not _serve_ it.
        
           | defrost wrote:
           | Off the shelf commodity home NAS systems with ZFS onboard?
           | 
           | eg: https://www.qnap.com/en-au/operating-system/quts-hero
           | 
           | IIRC (been a while since I messed with QNAP) QuTS hero would
           | be a modded Debian install with ZFS baked in and a web based
           | admin dasboard.
           | 
           | https://old.reddit.com/r/qnap/comments/15b9a0u/qts_or_quts_h.
           | ..
           | 
           | As a rule of thumb (IMHO) steer clear of commodity NAS Cloud
           | add ons, such things attract ransomware hackers like flies to
           | a tip whether it's QNAP, Synology, or InsertVendorHere.
        
           | freeone3000 wrote:
           | "Tainting" the kernel doesn't affect operations, though.
           | You're not allowed to redistribute it with changes -- but
           | you, as an entity, can freely use ZFS and the kernel together
           | without restriction. Linux plus zfs works fine.
        
           | simoncion wrote:
           | > I never had the guts to try btrfs in production after all
           | the horror stories I've heard over the decade+.
           | 
           | I've been running btrfs as the primary filesystem for all of
           | my desktop machines since shortly after the on-disk format
           | stabilized and the extX->btrfs in-place converter appeared
           | [0], and for my home servers for the past ~five years. In the
           | first few years after I started using it on my desktop
           | machines, I had four or five "btrfs shit the bed and trashed
           | some of my data" incidents. I've had zero issues in the past
           | ~ten years.
           | 
           | At $DAYJOB we use btrfs as the filesystem for our CI workers
           | and have been doing so for years. Its snapshot functionality
           | makes creating the containers for CI jobs instantaneous, and
           | we've had zero problems with it.
           | 
           | I can think of a few things that might separate me from the
           | folks who report issues that they've had within the past
           | five-or-ten years:
           | 
           | * I don't use ANY of the built-in btrfs RAID stuff.
           | 
           | * I deploy btrfs ON TOP of LVM2 LVs, rather than using its
           | built-in volume management stuff. [1]
           | 
           | * I WAS going to say "I use ECC RAM", but one of my desktop
           | machines does not and can never have ECC RAM, so this isn't
           | likely a factor.
           | 
           | The BTRFS features I use at home are snapshotting (for
           | coherent point-in-time backups), transparent compression, the
           | built-in CoW features, and the built-in checksumming
           | features.
           | 
           | At work, we use all of those except for compression, and
           | don't use snapshots for backup but for container volume
           | cloning.
           | 
           | [0] If memory serves, this was around the time when the OCZ
           | Vertex LE was hot, hot shit.
           | 
           | [1] This has actually turned out to be a really cool
           | decision, as it has permitted me to do low- or no- downtime
           | disk replacement or repartitioning by moving live data off of
           | local PVs and on to PVs attached via USB or via NBD.
        
             | curt15 wrote:
             | >At $DAYJOB we use btrfs as the filesystem for our CI
             | workers and have been doing so for years. Its snapshot
             | functionality makes creating the containers for CI jobs
             | instantaneous, and we've had zero problems with it.
             | 
             | $DAYJOB == "facebook"?
        
               | simoncion wrote:
               | > $DAYJOB == "facebook"?
               | 
               | Nah. AFAIK, we don't have any Linux kernel gurus on the
               | payroll... we're ordinary users just like most everyone
               | else.
        
       | why_only_15 wrote:
       | I'm confused about optimizing 7 watts as important -- rough
       | numbers, 7 watts is 61 kWh/y. If you assume US-average prices of
       | $0.16/kWh that's about $10/year.
       | 
       | edit: looks like for the netherlands (where he lives) this is
       | more significant -- $0.50/kWh is the average price, so ~$32/year
        
         | yumraj wrote:
         | CA is average ~$0.5/kWh.
         | 
         | Where in US matters a lot. I think OR is pretty cheap.
        
         | louwrentius wrote:
         | Althoug outside the scope of the artice, I try to keep a small
         | electricity usage footprint.
         | 
         | My lowest is 85Kwh in an apartment a moth but that was because
         | of perfect solar weather.
         | 
         | I average around 130 KWH a month now.
        
       | ztravis wrote:
       | Around 12 years ago I helped design and set up a 48-drive, 9U,
       | ~120TB NAS in the Chenbro RM91250 chassis (still going strong!
       | but plenty of drive failures along the way...). This looks like
       | it's probably the 24-drive/4U entry in the same line (or
       | similar). IIRC the fans were very noisy in their original hot-
       | swappable mounts but replacing them with fixed (screw) mounts
       | made a big difference. I can't tell from the picture if this has
       | hot-swappable fans, though - I think I remember ours having
       | purple plastic hardware.
        
       | hi_hi wrote:
       | I've had the exact same NAS for over 15 years. It's had 5 hard
       | drives replaced, 2 new enclosures and 1 new power supply, but
       | it's still as good as new...
        
       | matheusmoreira wrote:
       | > It's possible to create the same amount of redundant storage
       | space with only 6-8 hard drives with RAIDZ2 (RAID 6) redundancy.
       | 
       | I've given up on striped RAID. Residential use requires easy
       | expandability to keep costs down. Expanding an existing parity
       | stripe RAID setup involves failing every drive and slowly
       | replacing them one by one with bigger capacity drives while the
       | whole array is in a degraded state and incurring heavy I/O load.
       | It's easier and safer to build a new one and move the data over.
       | So you pretty much need to buy the entire thing up front which is
       | expensive.
       | 
       | Btrfs has a flexible allocator which makes expansion easier but
       | btrfs just isn't trustworthy. I spent years waiting for RAID-Z
       | expansion only for it to end up being a suboptimal solution that
       | leaves the array in some kind of split parity state, old data in
       | one format and new data in another format.
       | 
       | It's just _so_ tiresome. Just give up on the  "storage
       | efficiency" nonsense. Make a pool of double or triple mirrors
       | instead and call it a day. It's simpler to set up, easier to
       | understand, more performant, allows heterogeneous pools of drives
       | which lowers risk of systemic failure due to bad batches, gradual
       | expansion is not only possible but actually easy and doesn't take
       | literal weeks to do, avoids loading the entire pool during
       | resilvering in case of failures, and it offers so much redundancy
       | the only way you'll lose data is if your house literally burns
       | down.
       | 
       | https://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs...
        
         | louwrentius wrote:
         | I dislike that article/advice because it's dishonest /
         | downplaying a limitation of ZFS and advocating that people
         | should spend a lot more money, that may likely not be necessary
         | at all.
        
           | matheusmoreira wrote:
           | What limitation is it downplaying? I would like to know if
           | there are hidden downsides to the proposed solution.
           | 
           | Compared to a RAID setup, this requires a lot less money.
           | It's really good for residential use.
        
       | orbital-decay wrote:
       | Do you have a drive rotation schedule?
       | 
       | 24 drives. Same model. Likely the same batch. Similar wear.
       | Imagine most of them failing at the same time, and the rest
       | failing as you're rebuilding it due to the increased load,
       | because they're already almost at the same point.
       | 
       | Reliable storage is tricky.
        
         | winrid wrote:
         | This. I had just two drives in raid 1, and the 2nd drive failed
         | _immediately_ after silvering a new drive to re-create the
         | array. very lucky :D
        
         | otras wrote:
         | Reminds me of the HN outage where two SSDs both failed after
         | 40k hours: https://news.ycombinator.com/item?id=32031243
        
           | throwaway48476 wrote:
           | That's a firmware bug, not wear.
        
             | hawk_ wrote:
             | Yes and risk management dictates diversification to
             | mitigate this kind of risk as well.
        
             | tcdent wrote:
             | bug or feature?
        
             | generalizations wrote:
             | For one reason or another, the drives tended to age out at
             | the same time. Firmware bugs are just hardware failures for
             | solid state devices.
        
         | sschueller wrote:
         | Reminds me of the time back in the day when Dell shipped us a
         | server with drives serial numbers being consecutive.
         | 
         | Of course both failed at the same time and I spent an all
         | nighter doing a restore.
        
           | jll29 wrote:
           | I ordered my NAS drives on Amazon, to avoid getting the same
           | batch (all consecutive serial numbers) I used amazon.co.uk
           | for one half and amazon.de for the other half of them. One
           | could also stage the orders in time.
        
             | orbital-decay wrote:
             | Yeah, the risk of the rest of the old drives failing under
             | high load while rebuilding/restoring is also very real, so
             | staging is necessary as well.
             | 
             | I don't exactly hoard data by dozens of terabytes, but I
             | rotate my backup drives each few years, with a 2-year
             | difference between them.
        
             | Tempest1981 wrote:
             | Back in the day, I remember driving to different Frys and
             | Central Computers stores to get a mix of manufacturing
             | dates.
        
         | madduci wrote:
         | That's why you buy different drives from different stores, so
         | you can reduce the chances to get HDDs from the same batch
        
           | flemhans wrote:
           | Drive like a maniac to the datacenter to shake 'em up a bit
        
         | louwrentius wrote:
         | I bought the drives in several batches from 2 or 3 different
         | shops.
        
         | londons_explore wrote:
         | Software bugs might cause that (eg. drive fails after exactly 1
         | billion IOPS due to some counter overflowing). But hardware
         | wear probably won't be as consistent.
        
           | lazide wrote:
           | That depends entirely on how good their Q&A and manufacturing
           | quality is - the better it is, the more likely eh?
           | 
           | Especially in an array where it's possible every drive
           | operation will be identical between 2 or 3 different drives.
        
         | Tinned_Tuna wrote:
         | I've seen this happen to a friend. Back in the noughties they
         | built a home NAS similar to the one in the article, using fewer
         | (smaller) drives. It was in RAID5 configuration. It lasted
         | until one drive died and a second followed it during the
         | rebuild. Granted, it wasn't using ZFS, there was no regular
         | scrubbing, 00s drive failure rates were probably different, and
         | they didn't power it down when not using it. The point is the
         | correlated failure, not the precise cause.
         | 
         | Usual disclaimers, n=1, rando on the internet, etc.
        
           | layer8 wrote:
           | This is the reason why I would always use RAID 6. A second
           | drive failing during rebuild is significantly likely.
        
             | SlightlyLeftPad wrote:
             | You're far better off having two raids, one as a daily
             | backup of progressive snapshots that only turns on
             | occasionally to backup and is off the rest of the time.
        
       | tedk-42 wrote:
       | Really a non article as it feels like an edge case for usage.
       | 
       | It's not on 24/7.
       | 
       | No mention of I/O metrics or data stored.
       | 
       | For all we know, OP is storing their photos and videos and never
       | actually need to have 80% of the drives actually on and
       | connected.
        
         | louwrentius wrote:
         | It's 80% full. I linked to the original article about the
         | system for perf stats (sequential)
        
       | fulafel wrote:
       | Regular reminder: RAID (and ZFS) don't replace backups. It's an
       | availability solution to reduce downtime in event of disk
       | failure. Many things can go wrong with your files and filesystem
       | besides disk failure, eg user error, userspace software/script
       | bugs, driver or FS or hardware bugs, ransomware, etc)
       | 
       | The article mentions backups near the end saying eg "most of the
       | data is not important" and the "most important" data is backed
       | up. Feeling lucky I guess.
        
         | louwrentius wrote:
         | I'm well aware of the risks, I just accept it.
         | 
         | You shouldn't ever do what I do if you really care about tour
         | data.
        
         | lonjil wrote:
         | ZFS can help you with backups and data integrity beyond what
         | RAID provides, though. For example, I back up to another
         | machine using zfs's snapshot sending feature. Fast and
         | convenient. I scrub my machine and the backup machine every
         | week, so if any data has become damaged beyond repair on my
         | machine, I know pretty quickly. Same with the backup machine.
         | And because of the regular integrity checking on my machine,
         | it's very unlikely that I accidentally back up damaged data.
         | And finally, frequent snapshots are a great way to recover from
         | software and some user errors.
         | 
         | Of course, there are still dangers, but ZFS without backup is a
         | big improvement over RAID, and ZFS with backups is a big
         | improvement over most backup strategies.
        
       | yread wrote:
       | Nowadays you could almost fit all that on a single 61TB SSD and
       | not bother with 24 disks
        
         | tmikaeld wrote:
         | And loose all of it when it fails.
        
       | ffsm8 wrote:
       | > _Losing the system due to power shenanigans is a risk I
       | accept._
       | 
       | There is another (very rare) failure an ups protects against, and
       | that's imbalance in the electricity.
       | 
       | You can get a spike (up or down, both can be destructive) if
       | there is construction in your area and something happens with the
       | electricity, or lightning hits a pylon close enough to your
       | house.
       | 
       | First job I worked at had multiple servers die like that, roughly
       | 10 yrs ago. it's the only time I've ever heard of such an issue
       | however
       | 
       | To my understanding, an ups protects from such spikes as well, as
       | it will die before letting your servers get damaged
        
         | Gud wrote:
         | Electronics is absolutely sensitive to this.
         | 
         | Please use filters.
        
           | bboygravity wrote:
           | Filters won't help against prolonged periods of higher/lower
           | voltages though.
        
             | Gud wrote:
             | Voltages should be normalised before they hit the servers
             | psu.
        
             | dist-epoch wrote:
             | But computer equipment uses switched power supplies which
             | doesn't care about voltage, as long as there is enough
             | power.
        
         | int0x29 wrote:
         | Isn't this what a surge protector is for?
        
           | bayindirh wrote:
           | Yes. In most cases, assuming you live in a 220V country, a
           | surge protector will absorb the upwards spike, and the
           | voltage range (a universal PSU can go as low as 107V) will
           | handle the brownout voltage dip.
        
           | Kerb_ wrote:
           | Pretty sure surge protectors are less effective against dips
           | than they are spikes
        
           | acstapleton wrote:
           | Nothing is really going to protect you from a direct
           | lightning strike. Lightning strikes are on the order of
           | millions of volts and thousands of amps. It will arc between
           | circuits that are close enough and it will raise the ground
           | voltage by thousands of volts too. You basically need a
           | lighting rod buried deep into the earth to prevent it hitting
           | your house directly and then you're still probably going to
           | deal with fried electronics (but your house will survive).
           | Surge protectors are for faulty power supplies and much
           | milder transient events on the grid and maybe a lightning
           | strike a mile or so away.
        
             | Wowfunhappy wrote:
             | Would a UPS protect against that either, though?
        
               | Kirby64 wrote:
               | No. Current will find a way. Lightning will destroy
               | things you didn't even think would be possible to
               | destroy.
        
               | Wowfunhappy wrote:
               | So I'm still left with int0x29's original question:
               | "Isn't this [an electricity spike that a UPS could
               | protect against] what a surge protector is for?"
        
         | danw1979 wrote:
         | I've had firsthand experience of a lightning strike hitting
         | some gear that I maintained...
         | 
         | My parent's house got hit right on the TV antenna, which was
         | connected via coax down the the booster/splitter unit in comms
         | cupboard ... then somehow it got onto the nearby network patch
         | panel and fried every wired ethernet controller attached to the
         | network, including those built into switch ports, APs, etc. In
         | the network switch, the current destroyed the device's power
         | supply too, as it was trying to get to ground I guess.
         | 
         | Still a bit of a mystery how it got from the coax to the cat5.
         | maybe a close parallel run the electricians put in somewhere ?
         | 
         | Total network refit required, but thankfully there were no
         | wired computers on site... I can imagine storage devices
         | wouldn't have fared very well.
        
           | nuancebydefault wrote:
           | Well this is an other order of magnitude than 'spikes on the
           | net'. The electrical field is so intense that current will
           | easily cross large air gaps.
        
         | manmal wrote:
         | We've had such spikes in an old apartment we were living in. I
         | had no servers back then, but LED lamps annoyingly failed every
         | few weeks. It was an old building from the 60s and our own
         | apartment had some iffy quick fixes in the installation.
        
         | JonChesterfield wrote:
         | Lightning took out a modem and some nearby hardware here about
         | a week ago. Residential. The distribution of dead vs damaged vs
         | nominally unharmed hardware points very directly at the copper
         | wire carrying vdsl. Modem was connected via ethernet to
         | everything else.
         | 
         | I think the proper fix for that is probably to convert to
         | optical, run along a fibre for a bit, then convert back. It
         | seems likely that electricity will take a different route in
         | preference to the glass. That turns out to be
         | disproportionately annoying to spec (not a networking guy, gave
         | up after an hour trying to distinguish products) so I've put a
         | wifi bridge between the vdsl modem and everything else.
         | Hopefully that's the failure mode contained for the next storm.
         | 
         | Mainly posting because I have a ZFS array that was wired to the
         | same modem as everything else. It seems to have survived the
         | experience but that seems like luck.
        
         | louwrentius wrote:
         | True, this is also what I mean with power shenanigans.
         | 
         | My server is off most off the time, disconnected. But even if
         | it wasn't, I just accept the risk.
        
           | ragebol wrote:
           | Assuming you live in the Netherlands judging just by name:
           | our power grid is pretty damn reliable with little
           | shenanigans. I'd take that risk indeed.
        
             | louwrentius wrote:
             | Yes, I'm in NL, indeed our grid is very reliable.
        
         | deltarholamda wrote:
         | This depends very much on the type of UPS. Big, high dollar
         | UPSes will convert the AC to DC and back to AC, which gives
         | amazing pure sine wave power.
         | 
         | The $99 850VA APC you get from Office Depot does not do this.
         | It switches from AC to battery very quickly, but it doesn't
         | really do power conditioning.
         | 
         | If you can afford the good ones, they genuinely improve
         | reliability of your hardware over the long term. Clean power is
         | great.
        
       | sneak wrote:
       | My home NAS is about 200TB, runs 24/7, is very loud and power
       | inefficient, does a full scrub every Sunday, and also hasn't had
       | any drive failures. It's only been 4 or 5 years, however.
        
       | tie-in wrote:
       | We've been using a multi-TB PostgreSQL database on ZFS for quite
       | a few years in production and have encountered zero problems so
       | far, including no bit flips. In case anyone is interested, our
       | experience is documented here:
       | 
       | https://lackofimagination.org/2022/04/our-experience-with-po...
        
       | lifeisstillgood wrote:
       | My takeaway is that there is a difference between residential and
       | industrial usage, just as there is a difference between
       | residential car ownership and 24/7 taxi / industrial use
       | 
       | And that no matter how amazing the industrial revolution has
       | been, we can build reliability at the residential level but not
       | the industrial level.
       | 
       | And certainly at the price points.
       | 
       | The whole "At FAANG scale" is a misnomer - we aren't supposed to
       | use residential quality (possibly the only quality) at that scale
       | - maybe we are supposed to park our cars in our garages and drive
       | them on a Sunday
       | 
       | Maybe we should keep our servers at home, just like we keep our
       | insurance documents and our notebooks
        
         | bofadeez wrote:
         | I might be interested in buying storage at 1/10 of the price if
         | the only tradeoff was a 5 minute wait to power on a hard drive.
        
       | tobiasbischoff wrote:
       | Let me tell you powering these drives on and off is far more
       | dangerous then just keeping them running. 10 years is well in the
       | MTBF of these enterprise drives. (I worked for 10 years as
       | enterprise storage technician, i saw a lot if sh*).
        
       | manuel_w wrote:
       | Discussions on checksumming filesystems usually revolve around
       | ZFS and BTRFS, but has someone any experience with bcachefs? It's
       | upstreamed in the linux kernel, I learned, and is supposed to
       | have full checksumming. The author also seems to take filesystem
       | responsibility seriously.
       | 
       | Is anyone using it around here?
       | 
       | https://bcachefs.org/
        
         | olavgg wrote:
         | It is marked experimental, and since it was merged into the
         | kernel there have been a few major issues that has been
         | resolved. I wouldn't risk production data on it, but for a home
         | lab it could be fine. But you need to ask yourself, how much
         | time are you willing to spend if something should go wrong? I
         | have also been running ZFS for 15+ years, and I've seen a lot
         | of crap because of bad hardware. But with good enterprise
         | hardware it has been working flawless.
        
         | clan wrote:
         | That was a decision Linus regretted[1]. There has been some
         | recent discussion about this here on Hacker News[2].
         | 
         | [1] https://linuxiac.com/torvalds-expresses-regret-over-
         | merging-...
         | 
         | [2] https://news.ycombinator.com/item?id=41407768
        
           | Ygg2 wrote:
           | Context. Linux regrets it because bcachefs doesn't have same
           | commitment to stability as Linux.
           | 
           | Kent wants to fix a bug with large PR
           | 
           | Linux doesn't want to merge and review PR that touches so
           | many non-bcachefs things.
           | 
           | They're both right in a way. Kent wants bcachefs to be
           | stable/work good, Linus wants Linux to be stable.
        
             | teekert wrote:
             | Edit: replied to wrong person. I agree with you.
             | 
             | Kent from bcachefs was just late in the cycle, somewhere in
             | rc5. That was indeed too late for such a huge push of new
             | code touching so many things.
             | 
             | There is some tension but there is no drama and implying so
             | is annoying.
             | 
             | Bcachefs is going places, I think I'd already choose it
             | over btrfs atm.
        
           | homebrewer wrote:
           | As usual, the top comments in that submission are very
           | biased. I think HN should sort comments in a random order in
           | every polarizing discussion. Anyone reading this, do yourself
           | a favor and dig through both links, or ignore the parent's
           | comment altogether.
           | 
           | Linus "regretted" it in the sense "it was a bit too early
           | because bcachefs is moving at such a fast speed", and not in
           | the sense "we got a second btrfs that eats your data for
           | lunch".
           | 
           | Please provide context and/or short human-friendly
           | explanation, because I'm pretty sure most readers won't go
           | further than your comment and will remember it as "Linus
           | regrets merging bcachefs", helping spread FUD for years down
           | the line.
        
             | clan wrote:
             | Well. Point taken. You have an important core of truth to
             | your argument about polarization.
             | 
             | But...
             | 
             | Strongly disagree.
             | 
             | I think that is a very unfair reading of what I wrote. I
             | feel that you might have a bias which shows but that would
             | be the same class of ad hominem as you have just displayed.
             | That is why I choose to react even though it might be wise
             | to let slepping dogs lie. We should minimize polarization
             | but not to a degree where we cannot have civilized
             | disagreement. You are then doing exactly what you preach
             | not to do. Is that then FUD with FUD on top? Two wrongs
             | make a right?
             | 
             | I was reacting on the implicit approval in mentioning that
             | it had been upstreamed in the kernel. The reason for the
             | first link. Regrets where clearly expressed.
             | 
             | Another HN trope is rehashing the same discussions over and
             | over again. That was the reason for the second link. I
             | would like to avoid yet another discussion on a topic which
             | was put into light less than 14 days ago. Putting that more
             | bluntly would have been impolite and polarizing. Yet here I
             | am.
             | 
             | The sad part is that my point got through to you loud and
             | clear. Sad because rather than simply dismissing as
             | polarizing that would have been a great opener for a
             | discussion. Especially in the context of ZFS and
             | durability.
             | 
             | You wrote:
             | 
             | > Linus "regretted" it in the sense "it was a bit too early
             | because bcachefs is moving at such a fast speed", and not
             | in the sense "we got a second btrfs that eats your data for
             | lunch".
             | 
             | If you allow me a little lighthearted response. The first
             | thing which comes to mind was the "They're the same
             | picture" meme[1] from The Office. Some like to move quickly
             | and break things. That is a reasonable point of view. But
             | context matters. For long term data storage I am much more
             | conservative. So while you might disagree; to me it is the
             | exact same picture.
             | 
             | Hence I very much object to what I feel is an ad hominem
             | attack because your own worldview was not reflected
             | suitably in my response. It is fair critique that you feel
             | it is FUD. I do however find it warranted for a filesystem
             | which is marked experimental. It might be the bees knees
             | but in my mind it is not ready for mainstream use. Yet.
             | 
             | That is an important perspective for the OP to have. If the
             | OP just want to play around all is good. If the OP does not
             | mind moving quickly and break things, fine. But for
             | production use? Not there yet. Not in my world.
             | 
             | Telling people to ignore my comment because you know people
             | cannot be bothered to actually read the links? And then
             | lecturing me that people might take the wrong spin on it?
             | Please!
             | 
             | [1] https://knowyourmeme.com/memes/theyre-the-same-picture
        
             | Novosell wrote:
             | You're saying this like the takeaway of "Linus regrets
             | merging bcachefs" is unfair when the literal quote from
             | Linus is "[...] I'm starting to regret merging bcachefs."
             | And earlier he says "Nobody sane uses bcachefs and expects
             | it to be stable[...]".
             | 
             | I don't understand how you can read Linus' response and
             | think "Linus regrets merging bcachefs" is an unfair
             | assessment.
        
           | CooCooCaCha wrote:
           | After reading the email chain I have to say my enthusiasm for
           | bcachefs has diminished significantly. I had no idea Kent was
           | _that_ stubborn and seems to have little respect for Linus or
           | his rules.
        
         | eru wrote:
         | I'm using it. It's been ok so far, but you should have all your
         | data backed up anyway, just in case.
         | 
         | I'm trying a combination where I have an SSD (of about 2TiB) in
         | front of a big hard drive (about 8 TiB) and using the SSD as a
         | cache.
        
         | rollcat wrote:
         | Can't comment on bcachefs (I think it's still early), but I've
         | been running with bcache in production on one "canary" machine
         | for years, and it's been rock-solid.
        
         | ffsm8 wrote:
         | I tried it out on my homelab server right after the merge into
         | the Linux kernel.
         | 
         | Took roughly one week for the whole raid to stop mounting
         | because of the journal (8hdd, 2 ssd write cache, 2 nvme read
         | cache).
         | 
         | The author responded on Reddit within a day, I tried his fix,
         | (which meant compiling the Linux kernel and booting from that),
         | but his fix didn't resolve the issue. He sadly didn't respond
         | after that, so I wiped and switched back to a plain mdadmin
         | raid after a few days of waiting.
         | 
         | I had everything important backed up, obviously (though I did
         | lose some unimportant data), but it did remind me that bleeding
         | edge is indeed ... Unstable
         | 
         | The setup process and features are fantastic however, simply
         | being able to add a disk and flag it as read/write cache feels
         | great. I'm certain I'll give it another try in a few years,
         | after it had some time in the oven.
        
           | iforgotpassword wrote:
           | New filesystems seems to have a chicken and egg problem
           | really. It's not like switching from Nvidia's proprietary
           | drivers to nouveau and then back if it turns out they don't
           | work that well. Switching filesystems, especially in larger
           | raid setups where you desperately need more testing and real
           | world usage feedback, is pretty involved, and even if you
           | have everything backed up it's pretty time consuming
           | restoring everything should things go haywire.
           | 
           | And even if you have the time and patience to be one of these
           | early adopters, debugging any issues encountered might also
           | be difficult, as ideally you want to give the devs full
           | access to your filesystem for debugging and attempted fixes,
           | which is obviously not always feasible.
           | 
           | So anything beyond the most trivial setups and usage patterns
           | gets a miniscule amount of testing.
           | 
           | In an ideal world, you'd nail your FS design first try, make
           | no mistakes during implementation and call it a day. I'd like
           | to live in an ideal world.
        
             | mdaniel wrote:
             | > In an ideal world, you'd nail your FS design first try,
             | make no mistakes during implementation and call it a day
             | 
             | Crypto implementations and FS implementations strike me as
             | the ideal audience for actually investing the mental energy
             | in the healthy ecosystem we have of modeling and
             | correctness verification systems
             | 
             | Now, I readily admit that I could be talking out of my ass,
             | given that I've not tried to use those verification systems
             | in anger, as I am not in the crypto (or FS) authoring space
             | but AWS uses formal verification for their ... fork? ... of
             | BoringSSL et al https://github.com/awslabs/aws-lc-
             | verification#aws-libcrypto...
        
               | orbital-decay wrote:
               | A major chunk of storage reliability is all these weird
               | and unexpected failure modes and edge cases which are not
               | possible to prepare for, let alone write fixed specs for.
               | Software correctness assumes the underlying system
               | behaves correctly and stays fixed, which is not the case.
               | You can't trust the hardware and the systems are too
               | diverse - this is the worst case for formal verification.
        
         | DistractionRect wrote:
         | I'm optimistic about it, but probably won't switch over my home
         | lab for a while. I've had quirks with my (now legacy) zsys +
         | zfs on root for Ubuntu, but since it's a common config//widely
         | used for years it's pretty easy to find support.
         | 
         | I probably won't use bcachefs until a similar level of
         | adoption/community support exists.
        
       | wazoox wrote:
       | I currently support many NAS servers in the 50TB - 2PB range,
       | many of them being 10, 12, and up to 15 years old for some of
       | them. Most of them still run with their original power supplies,
       | motherboard and most of their original (HGST -- now WD --
       | UlstraStar) drives, though of course a few drives have failed for
       | some of them (but not all).
       | 
       | 2, 4, 8TB HGST UltraStar disks are particularly reliable. All of
       | my desktop PCs currently hosts mirrors of 2009 vintage, 2 TB
       | drives that I got when they're put out of service. I have heaps
       | of spare, good 2 TB drives (and a few hundreds still running in
       | production after all these years).
       | 
       | For some reason 14TB drives seem to have a much higher failure
       | rate than Helium drives of all sizes. On a fleet of only about 40
       | 14 TB drives, I had more failures than on a fleet of over 1000 12
       | and 16 TB.
        
       | n3storm wrote:
       | After 20 years using ext3 and ext4 I only lost data when ffing
       | around with parted and did.
        
         | rollcat wrote:
         | There's an infinite amount of ways you can lose data. Here's
         | one of my recent stories:
         | https://www.rollc.at/posts/2022-05-02-disaster-recovery/
         | 
         | Just among my "personal" stuff, over the last 12 years I've
         | completely lost 4 hard drives due to age/physical failure. ZFS
         | made it a no-deal twice, and aided with recovery once (once
         | I've dd'd what was left of one drive, zvol snapshots made
         | "risky" experimentation cheap&easy).
        
       | Tepix wrote:
       | Having 24 drives probably offers some performance advantages, but
       | if you don't require them, having a 6-bay NAS with 18TB disks
       | instead woukld offer a ton of advantages in terms of power usage,
       | noise, space required, cost and reliability.
        
         | louwrentius wrote:
         | I agree, that's what I would do today if I wanted the same
         | amount of space, as I states in the article.
        
         | foepys wrote:
         | I'd want more redundancy in that case. With such large HDDs zfs
         | resilver could kill another disk and then you would lose your
         | data.
        
         | prepend wrote:
         | 18TB drives didn't exist back when this setup was designed.
         | 
         | Of course a 3 bay with 128TB drives would also be superior, but
         | this comment only makes sense in a few years.
        
       | prepend wrote:
       | I wish he had talked more about his movie collection. I'm
       | interested in the methods of selecting initial items as well as
       | ones that survive in the collection for 10+ years.
        
         | not_the_fda wrote:
         | My server isn't nearly as big as his, but my collection is
         | mostly Criterion https://www.criterion.com/closet-picks
        
         | lvl155 wrote:
         | Off topic but why do people run Plex and movies locally in
         | 2024? Is "iTunes" that bad?
        
           | prepend wrote:
           | I don't know about other people but I run Plex because it
           | lets me run my home movie collection through multiple clients
           | (Apple TV, browser, phones, etc). iTunes works great with
           | content bought from Apple, but is useless when playing your
           | own media files from other sources.
           | 
           | I just want every Disney movie available so my kids can watch
           | without bothering me.
        
       | bevenhall wrote:
       | I wouldn't call a single user computer a server if you can "Turn
       | the server off when you're not using it." Not really a challenge
       | or achievement.
        
       | lawaaiig wrote:
       | Regarding the intermittent power cutoffs during boot it should be
       | noted the drives pull power from the 5V rail on startup:
       | comparable drives typically draw up to 1.2A. Combined with the
       | maximum load of 25A on the 5V rail (Seasonic Platinum 860W), it's
       | likely you'll experience power failures during boot if staggered
       | spinup is not used.
        
       | 383toast wrote:
       | Can someone explain why a single geolocated node makes sense for
       | data storage? If there's a house fire for example, wouldn't all
       | the data be lost?
        
         | Fastjur wrote:
         | I'm guessing that the 71 TiB is mostly used for media, as in,
         | plex/jellyfin media, which is sad to loose but not
         | unrecoverable. How would one ever store that much of personal
         | data? I hope they have an off site backup for the all important
         | unrecoverable data like family photos and whatnot.
        
           | leptons wrote:
           | I have about about 80TB (my wife's data and mine) backed up
           | to LTO5 tape. It's pretty cheap to get refurb tape drives on
           | ebay. I pay about $5.00/TB for tape storage, not including
           | the ~$200 for the LTO drive and an HBA card, so it was pretty
           | economical.
        
             | Fastjur wrote:
             | Wow that is surprisingly cheap actually.
        
               | leptons wrote:
               | I get email alerts from ebay for anything related to
               | LTO-5, and I only buy the cheap used tapes. They are
               | still fine, most of them are very low use, the tapes
               | actually have a chip in them that stores usage data like
               | a car's odometer, so you can know how much a tape has
               | been used. So far I trust tapes more than I'd trust a
               | refurb hard drive for backups. And I also really like
               | that the tapes have a write-protect notch on them, so
               | once I write my backup data, there's no risk of having a
               | tape accidentally get erased, unlike if I plugged in a
               | hard drive and maybe there's some ransomware virus that
               | automatically fucks with any hard drive drive that gets
               | plugged in. It's just one less thing to worry about.
        
       | Saris wrote:
       | I'm curious what's your use case for 71TB of data where you can
       | also shut it down most of the time?
       | 
       | My NAS is basically constantly in use, between video footage
       | being dumped and then pulled for editing, uploading and editing
       | photos, keeping my devices in sync, media streaming in the
       | evening, and backups from my other devices at night..
        
       | drzzhan wrote:
       | It's always nice to know that people can store their data for so
       | long. In my research lab, we still only use separate external HDD
       | drives due to budget reasons. Last year 4 (over 8) drives failed
       | and we lost the data. I guess we mainly work with public data so
       | it is not a big deal. But, it is a dream of mine to research
       | without such worries. I do keep backups for my stuff though, but
       | only me in my lab.
        
       ___________________________________________________________________
       (page generated 2024-09-14 23:01 UTC)