[HN Gopher] ZFS fans, rejoice - RAIDz expansion will be a thing ...
___________________________________________________________________
ZFS fans, rejoice - RAIDz expansion will be a thing soon
Author : rodrigo975
Score : 173 points
Date : 2021-06-17 07:33 UTC (1 days ago)
(HTM) web link (arstechnica.com)
(TXT) w3m dump (arstechnica.com)
| milofeynman wrote:
| My resizing consists of buying 8 more hard drives that are 2x the
| previous 8 and moving data over every few years (:
| garmaine wrote:
| FYI you don't have to move data. You can just replace each disk
| one at a time, and after the last replacement you magically
| have a bigger vpool.
| curtis3389 wrote:
| Does anyone know if this also means a draid can be expanded?
| bearjaws wrote:
| Just upgraded my home NAS, had to swap all 8 drives, took 7
| days... Not to mention it doubled the size of the array, I would
| have been much happier with an incremental increase.
| dsr_ wrote:
| With RAID10, one could swap out 2 drives to get a size
| increase.
|
| With 2 4 disk vdevs, one could swap out 4 drives for a size
| increase.
|
| So I'm assuming you have a single 8 disk vdev, and no spare
| places to put disks.
| xanaxagoras wrote:
| I did that once, and the experience was a big part of why I use
| unraid now.
| louwrentius wrote:
| > Data newly written to the ten-disk RAIDz2 has a nominal storage
| efficiency of 80 percent--eight of every ten sectors are data--
| but the old expanded data is still written in six-wide stripes,
| so it still has the old 67 percent storage efficiency.
|
| This makes this feature quite 'meh'. The whole goal is capacity
| expansion and you won't be able to use the new capacity unless
| you rewrite all existing data, as I understand it.
|
| This feature is mostly relevant for home enthusiasts and I think
| it doesn't really bring the desired behavior this user group
| wants and needs.
|
| > Undergoing a live reshaping can be pretty painful, especially
| on nearly full arrays; it's entirely possible that such a task
| might require a week or more, with array performance limited to a
| quarter or less of normal the entire time.
|
| Not an issue for home users as they often don't have large work
| loads thus this process is fast and convenient. Even if it would
| take two days.
| uniqueuid wrote:
| The article is a great example of all the somewhat surprising
| peculiarities in ZFS. For example, the conversion will keep the
| stripe width and block size, meaning your throughput of existing
| data won't improve. So it's not quite a full re-balance.
|
| Other fun things are the flexible block sizes and their relation
| to the size you're writing and compression ... Chris Siebenmann
| has written quite a bit about it (https://utcc.utoronto.ca/~cks/s
| pace/blog/solaris/ZFSLogicalV...).
|
| One thing I'm particularly interested in is to see if this new
| patch offers a way to decrease fragmentation on existing and
| loaded pools (allocation changes if they are too full, and this
| patch will for the first time allow us to avoid building a
| completely new pool).
|
| [edit] The PR is here: https://github.com/openzfs/zfs/pull/12225
|
| I also recommend reading the discussions in the ZFS repository -
| they are quite interesting and reveal a lot of the reasoning
| behind the filesystem. Recommended even to people who don't write
| filesystems as a living.
| chungy wrote:
| > The article is a great example of all the somewhat surprising
| peculiarities in ZFS. For example, the conversion will keep the
| stripe width and block size, meaning your throughput of
| existing data won't improve. So it's not quite a full re-
| balance.
|
| This is generally in-line with other ZFS operations. For
| example, changing compression policies will not rewrite
| existing data and only new data is affected.
|
| It simplifies some code paths and keeps performance good no
| matter what. You don't get a surprising reduction on
| performance.
| [deleted]
| nwmcsween wrote:
| I'm starting to get concerned about the ZFS issue list, there are
| a ton of gotchas hiding in using OpenZFS that will cause data
| loss:
|
| * Swap on ZVOL (data loss)
|
| * Hardlocking when removing ZIL (this has caused dataloss for us)
| nimbius wrote:
| this might sound like a troll comment but its coming from someone
| with almost zero experience with raid. What is the purpose of ZFS
| in 2021 if we have hardware RAID and linux software RAID? BTRFS
| does RAID too. Why would people choose ZFS in 2021 if both Oracle
| and Open Source users have 2 competing ZFS? are they
| interoperable?
| rektide wrote:
| No matter what happens, people will seemingly forever declare
| BTRFS is not as stable and not as safe. There's a status page
| that details what BTRFS thinks of itself[1], and I doubt any of
| the many people docking BTRFS have read or know or care what
| that page says. There is one issue still being worked out to
| completion, a "write hole" problem, involving two separate
| failures, an unplanned/power-loss shut-down, followed by a
| second disk failure, which can result in some data being
| lost[2] in RAID5/6 scenarios.
|
| Other than that one extreme double-failure scenario being
| worked out, BTRFS has proven remarkably stable for a while now.
| A decade ago that wasn't quite as absolutely bulletproof, but
| today the situation is much different. Personally, it feels to
| me like there is a persistent & vocal small group of people who
| seemingly either have some agenda that makes them not wish to
| consider BTRFS, or they are unwilling to review & reconsider
| how things might have changed in the last decade. Not to
| belabor the point but it's quite frustrating, and it feels a
| bit odd that BTRFS is such a persistent target of slander &
| assault. Few other file systems seem to face anywhere near as
| much criticism, never so out of hand/casually, and honestly, in
| the end, it just seems like there's some continent of ZFS folks
| with some strange need to make themselves feel better by
| putting others down.
|
| One big sign of trust: Fedora 35 Cloud looks likely to switch
| to BTRFS as default[3], following Fedora 33 desktop lat year
| making the move. A number of big names use BTRFS, including
| Facebook. I have yet to see any hyperscalers interested in ZFS.
|
| I'm excited to see ZFS start to get some competent
| expandability. Expanding ZFS used to be a nightmare. I'll
| continue running BTRFS for now, but I'm excited to see file
| systems flourish. Things I wouldn't do? Hardware RAID.
| Controllers are persnickety weird devices, each with their own
| invisible sets of constraints & specific firmware issues. If at
| all possible, I'd prefer the kernel figure out how to make
| effective use out of multiple disks. BTRFS, and now it seems
| ZFS perhaps too, do a magical job of making that easy,
| effective, & fast, in a safe way.
|
| Edit: the current widely-adopted write hole fix is to use RAID1
| or RAID1c3 or RAID1c4 (3 copy RAID1, 4 copy RAID1) for meta-
| data, RAID5/6 for data.
|
| [1] https://btrfs.wiki.kernel.org/index.php/Status
|
| [2] https://btrfs.wiki.kernel.org/index.php/RAID56
|
| [3]
| https://www.phoronix.com/scan.php?page=news_item&px=Fedora-C...
| Datagenerator wrote:
| Netflix has been using ZFS in production for many years now.
| Unnamed research companies are using ZFS moving PB's of data.
| NetApp is FreeBSD based and was on the forefront of what we
| now call ZFS. I'm totally biased, designed many production
| critical systems with ZFS at it's core in one way or another.
| The power of ZFS send and receive function is tremendous to
| say the least, it beats any file based synchronizing methods.
| webmobdev wrote:
| One guess I can make for the "hate" BTRFS gets is probably
| because everyone loves their data and doesn't expect to
| "fight" with a file system to get access to it.
|
| E.g. Sailfish OS is perhaps the only mobile OS I know that
| uses / used BTRFS in _production_ (and they adopted it nearly
| 6-7 years ago!). And some of its users have had issues with
| BTRFS in the earlier versions - https://together.jolla.com/qu
| estions/scope:all/sort:activity... ... in fact, I too
| remember that once or twice, we had to manually run the btrfs
| balancer before doing an OS update. For Sailfish OS on Tablet
| Jolla even experimented with LVM and ext4, and perhaps even
| considered dropping BTRFS. (I don't know what it uses for
| newer versions of Sailfish OS now - I think it allows the
| user to choose between BTRFS or LVM / EXT4).
|
| Most users consider a file system (be it ZFS or BTRFS) to be
| a really low-level system software with which they only wish
| to interact transparently (even I got anxious when I had to
| run btrfs balancer on Sailfish OS the first time worrying
| what would happen if there was not enough free space to do
| the operation and hoping I wouldn't lose my data). Even on
| older systems, everybody frustrated over the need to run a
| defragmenter.
|
| Perhaps because of improper expectations or configurations,
| some of the early adopters of BTRFS got burnt with it after
| possibly even losing their precious data. It's hard to forget
| that kind of experience and thus perhaps the "continuing
| hate" you see for BTRFS - a PR issue that BTRFS' proponents
| needs to fix.
|
| (It's interesting to see the progress BTRFS has made. Thanks
| to your post, I may consider it for future Linux
| installations over EXT4. Except for the hands-on tinkering it
| required once or twice, I remember it as being rock-solid on
| my Sailfish mobile.)
| chasil wrote:
| Suse uses btrfs in production for the root filesystem, and
| they have done so for years.
| sz4kerto wrote:
| I don't want to be trolling either, but a simple Google search
| gives you really detailed answers. Or just look at Wikipedia:
| https://en.wikipedia.org/wiki/ZFS
|
| Some highlights: hierarchical checksumming, CoW snapshots,
| deduplication, more efficient rebuilds, extremely configurable,
| tiered storage, various caching strategies, etc.
| magicalhippo wrote:
| > What is the purpose of ZFS in 2021 if we have hardware RAID
| and linux software RAID?
|
| Others have touched on the main points, I just wanted to stress
| that an important distinction between ZFS and hardware RAID and
| linux software RAID (by which I assume you mean MD) is that the
| latter two present themselves as block devices. One has to put
| a file system on top to make use of them.
|
| In contrast, ZFS does away with this traditional split, and
| provides a filesystem as well as support for a virtual block
| device. By unifying the full stack from the filesystem down to
| the actual devices, it can be smarter and more resilient.
|
| The first few minutes of this[1] presentation does a good job
| of explaining why ZFS was built this way and how it improves on
| the traditional RAID solutions.
|
| [1]: https://www.youtube.com/watch?v=MsY-BafQgj4
| wyager wrote:
| ZFS RAID is the best RAID implementation in many respects.
| Hardware RAID is bad at actually fixing errors on disk (as
| opposed to just transparently correcting) and surfacing errors
| to the user.
|
| BTRFS is frequently not considered stable enough for production
| usage.
|
| ZFS has dozens of useful features besides RAID. Transparent
| compression, instant atomic snapshots, incremental snapshot
| sync, instant cloning of file systems, etc etc.
|
| Yes, different ZFS implementations are mostly compatible in my
| experience, and they should become totally compatible as
| everyone moves to OpenZFS. FreeBSD 13 and Linux currently have
| ZFS feature parity I believe.
| tehbeard wrote:
| I can't speak with much experience, but what I have gleamed is.
|
| - You generally want to avoid hardware raid, if the card dies
| you'll likely need to source a compatible replacement vs.
| grabbing another SATA/was expander and reconstructing the
| array.
|
| - zfs handles the stack all the way from drives to filesystem,
| allowing them to work together (i.e filesystem usage info can
| better dictate what gets moved around tiered storage, or better
| raid recovery.
| LambdaComplex wrote:
| My understanding is that hardware RAID is mainly a thing in
| the Windows world, because apparently its software RAID
| implementation is garbage
| nickik wrote:
| > What is the purpose of ZFS in 2021 if we have hardware RAID
|
| Hardware RAID is actually older then ZFS style software RAID.
| ZFS was specifically designed fix the issues with hardware
| RAID.
|
| The problem with Hardware RAID is that is has no ideas what
| going on on top of it, and even worse, its a mostly a bunch of
| closed-source fireware from a vendor. And they cost money.
|
| You can find lots of terrible story about those.
|
| ZFS is open-source and battle tested.
|
| > linux software RAID
|
| Not sure what you are referring too.
|
| > BTRFS does RAID too.
|
| BTRFS is basically copied many of the features done in ZFS.
| BTRFS has a history of being far less stable. ZFS is far more
| battle tested. They say its stable now, but they had said that
| many times. It eat my data twice so I have not followed the
| project anymore. A file system in my opinion gets exactly 1
| chance with me.
|
| They each have some features the other doesn't but broadly
| speaking they are similar technology.
|
| The new bcacheFS is also coming up and adding some interesting
| features.
|
| > Why would people choose ZFS in 2021 if both Oracle and Open
| Source users have 2 competing ZFS?
|
| Not sure what that has do with anything. Oracle is an evil
| company, they tried take all these great open source
| technologies away from people and the community thought against
| it. Most of the ZFS team left after the merger.
|
| The Open-Source version is arguable better, and has far more of
| the original designers working on it. The two code bases have
| diverged a lot since then.
|
| At the end of the day ZFS is incredibly battle tested, works
| incredibly well at what it does. And had a incredible
| reputation of stability basically since it came out. They
| question in my opinion is why not ZFS, then why ZFS.
| _tom_ wrote:
| > ZFS is far more battle tested. They say its stable now, but
| they had said that many times. It eat my data twice
|
| Did you mean "it ate my data" to apply to ZFS? Or did you
| mean BTRFS?
| znpy wrote:
| It was probably BTRFS.
|
| I never fell for the BTRFS meme but many friends of mine
| did, and many of them ended up with a corrupted filesystem
| (and lost data).
| garmaine wrote:
| He was referring to mdadm RAID.
| Quekid5 wrote:
| > hardware RAID
|
| That's just the worst of all worlds: Usually proprietary _and_
| you get the extreme aversion to improvement (or any change
| really) of hardware vendors.
|
| This ZFS changes is going to come, and it may end up being
| complex to implement for users... but it's happening. At the
| risk of being hyperbolic: Something like would never be
| possible with a HW raid system unless it had explicitly been
| designed for it from the start.
|
| Also: ZFS does much more than any hardware RAID ever did.
| boomboomsubban wrote:
| They are not interoperable but they're barely competing as
| Solaris is dead. Does Oracle Linux even offer Oracle ZFS? I
| assume they stick to btrfs considering they are the original
| developers.
|
| RAID does not feature the data protection offered by a copy on
| write filesystem, and OpenZFS is the most stable and portable
| option.
| nix23 wrote:
| It's pretty easy, having lots of experience with HW-Raid and
| SW-Raid, software it the way to go because:
|
| 1. Do you trust Firmware...i don't, i can tell you storys about
| freaking out san's...never had that with solaris or freebsd and
| zfs.
|
| 2. Why having a additional abstraction layer, HW Raid caching
| vs FS-Caching, no transparency for error correction, not smart
| raid rebuild etc.
|
| the list can go on and on, but HW-Raid is a thing of the past
| (exceptions are specialized san's etc)
| usefulcat wrote:
| The last time I had to use HW raid it was horrible. The
| software for managing the RAID array was a poorly documented,
| difficult to use proprietary blob. I used it for years and the
| experience never improved. And this is a thing where if you
| make a mistake you can destroy the very data that you've gone
| to such lengths to protect. Having switched to ZFS several
| years ago, I lack to the words to express how much I don't miss
| having to deal with that.
| nickik wrote:
| I prefer just to have mirrors but its cool that it slowly coming,
| some people seem to really want this feature.
|
| ZFS has been amazing to me, I have zero complaints.
|
| I just wish it wouldn't have taken so long to come to /root on
| linux. Even still today you have to a lot of work unless you want
| to use the new support in Ubuntu.
|
| This license snafu is so terrible, open-source licenses excluding
| each other. Crazy. The world would have been a better place if
| linux had incorporated ZFS long ago. (And no we don't need yet
| another legal discussion, my point is just that its sad).
| 1980phipsi wrote:
| This will be very useful!
|
| TIL FreeNAS is now TrueNAS.
| znpy wrote:
| Actually there's more!
|
| A new version of TrueNAS is in the works, it's called TrueNAS
| scale and it's going to be Linux-based (no more FreeBSD).
|
| I'm frankly happy, because TrueNAS is great as a NAS operating
| system but I really wanted to run containers where my storage
| is and having to run a VM adds a really unnecessary overhead
| (plus, it's another machine to manage)
| d33lio wrote:
| I'll believe it when I see it, why anyone uses BTRFs (UnRaid or
| any other form of software raid that _isn 't_ ZFS) is still
| beyond me. At least when we're not talking SSD's ;)
|
| ZFS is incredible, curious to mess around with these new
| features!
| mixedCase wrote:
| I just put two 8TB drives into btrfs because it's a home
| server, I can't provision things up front. One day I may put a
| third 8TB drive and turn this RAID1 into RAID5. btrfs lets me
| do that, zfs doesn't, simple as.
|
| One day I may switch the whole thing to bcachefs, which I've
| donated and am looking forwards to. For the moment, btrfs will
| have to do.
|
| EDIT: downvoted by... the filesystem brigade?
| edgyquant wrote:
| There are a large group of people who really dislike BTRFS. I
| think they were probably burned by it at some point but I've
| never had trouble and I've been using it since it became the
| default on fedora.
| nix23 wrote:
| >RAID5
|
| I wish you lots of fun with that on btrfs :)
|
| Edit:
|
| https://btrfs.wiki.kernel.org/index.php/Status
|
| RAID56 Unstable n/a write hole still exists
|
| > treated as if I'm storing business data or precious
| memories without backups, guess I'm just dumb
|
| No your not, but don't use unstable features in a filesystem
| mixedCase wrote:
| Well, that's the idea! This a low I/O media server where
| all the important stuff (<5G of photos) has 2+ redundancy,
| once remotely, and on every workstation I sync, with the
| rest of the data being able to crash and burn without much
| repercussion.
|
| The whole point of me using RAID1 (and maybe later RAID5)
| is that if a disk goes bust, odds are I can still watch a
| movie from it until I can get another disk. What's more, if
| I ever fill the RAID1 and I don't feel like breaking the
| piggy bank for another disk, I can go JBOD as far as my
| usecase is concerned.
|
| But hey, if the orange website tells me all servers are
| supposed to be treated as if I'm storing business data or
| precious memories without backups, guess I'm just dumb. On
| that note: donations welcome, each 8TB disk costs close to
| 500 USD here in Uruguay, so if anyone's first world opinion
| can buy me a couple so I can use the Right Filesystem(tm),
| I'd appreciate it!
| jbverschoor wrote:
| Licensing. Similarly, otherwise it would've been included in
| macOS a long time ago (as the default fs according to some..)
| tw04 wrote:
| The reason it didn't end up in macOS is because NetApp sued
| Sun for patent infringement. Apple wanted nothing to do with
| that lawsuit and quickly abandoned the project.
|
| As others have stated, dtrace has the exact same license and
| has been in MacOS for years.
| jen20 wrote:
| The licensing is nothing to do with it on OSX - indeed DTrace
| (also under the CDDL) has been shipping in it for years.
| bsder wrote:
| I do believe that the license was fine for macOS but when
| Oracle bought Sun that killed it cold.
|
| Jobs _never_ liked anybody other than himself holding all the
| cards. Having Ellison and Oracle holding the keys to ZFS was
| just never going to fly.
| spullara wrote:
| I had ZFS on a Mac from Apple for a short amount of time
| during one of the betas :( I think TimeMachine was going to
| be based on it but they pulled out.
| codetrotter wrote:
| FYI there is a third-party effort for making OpenZFS
| usable on macOS.
|
| https://openzfsonosx.org/
|
| I used it for a while but unfortunately since they are
| not many people working on this and they are not working
| on it full time it can take them a good while from a new
| version of macOS is released until OpenZFS is usable with
| that version of macOS. This was certainly the case a
| while ago and why I stopped using OpenZFS on macOS and
| went back to only using ZFS on FreeBSD and Linux instead
| of additionally using it on macOS. So with my Mac
| computers I only use APFS.
| qaq wrote:
| Jobs and Ellison were really close friends
| jamiek88 wrote:
| And also cold hearted clear eyed businessmen unlikely to
| allow friendship to affect their corporations.
|
| I'd love to be a fly on the wall for some of those
| conversations.
| tw04 wrote:
| That makes absolutely no sense. Jobs and Ellison were best
| friends. Oracle acquiring Sun would have made it MORE
| attractive, not less.
|
| https://www.cnet.com/news/larry-ellison-talks-about-his-
| best...
| ghaff wrote:
| It's a combination of the license and the fact that it's
| Oracle, of all entities, that owns the copyright. Perhaps
| either one by itself wouldn't be a dealbreaker but the
| combination is. And, of course, Oracle could have changed
| the license at any time after buying Sun.
|
| (Of course, Jobs may have just decided he didn't want to
| depend on someone else for the MacOS filesystem in any
| case.)
|
| ADDED: And as others noted, there were also some storage
| patent-related issues with Sun. So just a lot of potential
| complications.
| ghaff wrote:
| And it's arguably even a bigger issue on Linux distros.
| mnd999 wrote:
| It's a moderate pain on Linux and then only really that if
| you're running on something bleeding-edge like Arch.
| Otherwise it's just a kernel module like any other.
| ghaff wrote:
| But it doesn't ship with either Red Hat or SUSE distros,
| which is an issue for supported commercial use.
| justaguy88 wrote:
| Whats Oracle's play here, do they somehow make money out of
| ZFS which makes them reluctant to re-license it?
| sneak wrote:
| Is there a CLA for OpenZFS/ZoL? I don't believe there is,
| so I don't think Oracle can unilaterally relicense it.
| Dylan16807 wrote:
| > why anyone uses BTRFs (UnRaid or any other form of software
| raid that isn't ZFS) is still beyond me.
|
| BTRFS can do after-the-fact deduplication (with much better
| performance than ZFS dedup) and copy-on-write files. And you
| can turn snapshots into editable file systems.
| eptcyka wrote:
| I've had 3 catastrophic BTRFS failures. In two cases, the
| root filesystem just ran out of space and there was no way to
| repair the partition. Last time, the partition was just
| rendered unmountable after a reboot. All data was lost.No
| such thing has ever happened with ZFS for me.
| Dylan16807 wrote:
| I've had some annoying failures too. But I wasn't listing
| pros and cons, I was explaining that there _are_ some very
| notable features that ZFS lacks.
| eptcyka wrote:
| That's fair. However, when listing notable features for
| the sake of comparing software, I think it's important to
| also list other characteristics of a given piece of
| software. If we were to compare software by feature sets
| alone, one might argue that Windows has the most
| features, so Windows must be best OS.
| tux1968 wrote:
| A recent Fedora install here came with a new default of
| BTRFS use rather than ext4. So i'm curious about your
| experience, were any of those catastrophic failures recent?
| Do you know of any patches entering the kernel that purport
| to fix the issues you experienced?
| benlivengood wrote:
| I think cloning a zfs snapshot into a writeable filesystem
| matches at least the functionality of btrfs writeable
| snapshots, but I could be ignorant about some use-cases.
| Dylan16807 wrote:
| Let's say you want to clear out part of a snapshot of
| /home, but keep the rest.
|
| So you clone it and delete some files. All good so far, but
| the snapshot is still wasting space and needs to be
| deleted.
|
| But to make this happen, your clone has to stop being copy-
| on-write. All the data that exists in both /home and the
| clone will now be duplicated.
|
| And you could say "plan ahead more", but even if you split
| up your drive into many filesystems, now you have the
| problem that you can't move files between these different
| directories without making extra copies.
| auxym wrote:
| RAM?
|
| Everytime I looked into setting up a freenas box, every
| hardware guide insisted that ungodly amounts of absolutely-has-
| to-be-ECC RAM was essential, and I just gave up at that point.
| colechristensen wrote:
| ZFS likes RAM and uses it to get better performance (and
| don't think about using dedup without huge ram), but you
| don't need it and can change the defaults.
|
| ECC tends to attract zealots after a perfect error-free
| existence which ECC does tend towards but doesn't deliver, it
| just reduces errors. I personally don't care about a tiny
| amount of bit rot (zfs will prevent most of this) and
| rebooting my storage machine now and then.
|
| You can run ZFS/freenas on a crappy old machine and you'll be
| just fine as long as you aren't hosting storage for dozens of
| people and you aren't a digital archivist trying to keep
| everything for centuries.
|
| Real advice:
|
| * Mirrored vdevs perform way better than raidz, I don't think
| the storage gain is worth it until you have dozens of drives
|
| * Dedup isn't worth it
|
| * Enable lz4 compression everywhere
|
| * Have a hot spare
|
| * You can increase performance by adding a vdev set and by
| adding RAM
|
| * Use drives with the same capacity
| InvaderFizz wrote:
| > Dedup isn't worth it
|
| To add to that, ZFS dedup is a lie and you should forget
| its existence unless you have a very specific scenario of
| being a SAN with a massive amount of RAM, and even then,
| you had better be damn sure.
|
| I really wish ZFS had either an option to store the Dedup
| Table on a NVMe like Optane, or to do an offline
| deduplication job.
| hpfr wrote:
| Does rebooting help with soft errors in non-ECC RAM? I
| would have thought bit flips would be transient in nature,
| but I'm not really familiar.
| AdrianB1 wrote:
| Running ZFS (FreeNAS/TrueNAS) on 2 home made NAS devices
| for years and years, I can say it is rock solid without
| ever using ECC RAM due to lack of choices. I can bet
| there were many soft-errors in all these years, but so
| far I never had problems that could not be recovered; the
| biggest issue ever was destroying the boot USB storage in
| months, but that was partially solved lately, I moved to
| fixed drives as boot drive and later I moved to
| virtualization for boot disk and OS, so the problem
| completely went away.
| livueta wrote:
| > Enable lz4 compression everywhere
|
| Is the perf penalty low enough now that it just doesn't
| matter? I've always disabled compression on datasets I know
| are going to store only high-entropy data, like encoded
| video, that has a poor compression ratio.
|
| I second the hot spare recommendation many times over. It
| can save your bacon.
| simcop2387 wrote:
| It's generally the other way around actually, aside from
| storing already highly compressed datasets (e.g. video).
| The compression from lz4 will get you better effective
| performance because of the lower amount of io that has to
| be done, both in throughput and latency on zfs. This is
| because your CPU can usually do lz4 at hundreds of gb/s
| compared to the dozen you might get on your spinning rust
| disks.
| livueta wrote:
| Neat! Makes sense.
| n8ta wrote:
| The freenas hardware requirements themselves say "8 GB RAM
| (ECC recommended but not required)"
|
| https://www.freenas.org/hardware-requirements/
|
| I myself use freenas with 16GB of non-ECC ram.
|
| Of course it is possible to have a bit flip in memory that is
| then dutifully stored incorrectly by ZFS to disk, but this
| was a possibility without ZFS as well.
|
| I've actually been waiting for this feature for since I first
| setup my pool. It seemed theoretically possible we were just
| waiting for an implementation.
| dsr_ wrote:
| Neither quantity nor ECC is essential.
|
| ZFS defaults to assuming it is the primary reason for your
| box to exist, but it only takes two lines to define more
| reasonable RAM usage: zfs_arc_min and zfs_arc_max. On a NAS
| type server, I would think setting the max to half of your
| RAM is reasonable. Maybe 3/4 if you never do anything except
| storage.
|
| ECC is not recommended because ZFS has some kind of special
| vulnerability without it; ECC is recommended because ZFS has
| taken care of all the more likely chances of undetectable
| corruption, so that's the next step.
| fpoling wrote:
| It is not that simple regarding ECC. Since ZFS uses more
| memory, the probability of hitting a memory bug is simply
| higher with it.
| amarshall wrote:
| But it doesn't really use more memory. The ARC gives the
| impression of high memory usage because it's different
| than the OS page cache and usually called out explicitly
| and not ignored in many monitoring tools like the OS
| cache is. Linux--without ZFS--will happily consume nearly
| all RAM with _any_ filesystem if enough data is read and
| written.
| dsr_ wrote:
| This is correct. Any filesystem using the kernel's
| filesystem cache will do this, too.
|
| For a long running, non-idle system, a good rule of thumb
| is that all RAM not being actively used is being used by
| evictable caching.
| zerd wrote:
| A colleague who was used to other UNIXes was
| transitioning to Linux for a database. He saw in free
| that used was more at more than 90%, so he added more
| ram. But to his surprise it was still using 90%! He kept
| adding ram. I told him that he had to subtract the buffer
| and cached values (this was before free had the Available
| column).
| mark-wagner wrote:
| Before the Available column there was the -/+
| buffers/cache line that provided the same information.
| Maybe it was too confusing.
| total used free shared buffers cached
| Mem: 12286456 11715372 571084 0 81912
| 6545228 -/+ buffers/cache: 5088232 7198224
| Swap: 24571408 54528 24516880
| ahofmann wrote:
| Others have said good things (ECC is good by itself, has not
| much to do with ZFS) and it is actually quite easy to check
| if you need much RAM for ZFS. Start a (Linux) VM with a few
| hundred megabytes of RAM and run ZFS an on it. Of course, it
| will not be as performant as having a lot of RAM. But it will
| not crash, or hang or be unusable in one way or another.
|
| Sources: - https://www.reddit.com/r/DataHoarder/comments/3s7v
| rd/so_you_... - https://www.reddit.com/r/homelab/comments/8s6
| r2r/what_exactl... - My own tests with around 8 TB ZFS data
| in a Linux vm with 256 MB RAM.
| IgorPartola wrote:
| Heh so you have that backwards. All RAM should be ECC if you
| care about what's stored in it. It's not a ZFS requirement,
| it's just that ZFS specifically cares about data integrity so
| it advises you to use ECC RAM. But it's not like any other
| file system is immune from random RAM corruption: it's not,
| it just won't tell you about it.
| handrous wrote:
| The "you need at least 32GB of memory and it _has to be_ ECC,
| or don 't even bother trying to use ZFS" crowd has done some
| serious harm to ZFS adoption. Sure, that's what you need if
| you want _excellent_ data integrity guarantees and to use
| _all_ of ZFS ' advanced features. If you're fine with merely
| way-better-than-most-other-filesystems data integrity
| guarantees and using only _most_ of ZFS ' advanced features,
| you don't need those.
| tombert wrote:
| I really don't know where the "You gotta have ECC RAM!"
| thing started. I've been running a ZFS RAID on Nvidia
| Jetson Nanos for years now and haven't had any issues at
| all with data integrity.
|
| I don't see why ZFS would be more prone to data integrity
| issues spawning from a lack of ECC than any other
| filesystem.
| kurlberg wrote:
| Years ago I saw it at:
|
| https://www.truenas.com/community/threads/ecc-vs-non-ecc-
| ram...
|
| (the gist of the scary story is that faulty ram while
| scrubbing might kill "everything".) However, in the end
| ECC appears to NOT be so important, e.g., see
|
| https://news.ycombinator.com/item?id=23687895
| radiowave wrote:
| Relevant quote from one of ZFS's primary designers, Matt
| Ahrens: "There's nothing special about ZFS that
| requires/encourages the use of ECC RAM more so than any
| other filesystem. ... I would simply say: if you love
| your data, use ECC RAM. Additionally, use a filesystem
| that checksums your data, such as ZFS."
| tombert wrote:
| Yeah, I remember reading that a few years ago.
|
| If I were running a server farm or something, then yeah,
| I'd probably use ECC memory, but I think if you're
| running a home server, then the argument that ZFS
| necessitates ECC more than Ext4 or Btrfs or XFS or
| whatever doesn't really seem to be accurate.
| oarsinsync wrote:
| > the argument that ZFS necessitates ECC more than Ext4
| or Btrfs or XFS or whatever doesn't really seem to be
| accurate
|
| Agreed.
|
| > If I were running a server farm or something, then
| yeah, I'd probably use ECC memory, but I think if you're
| running a home server
|
| Then you should still use ECC RAM, regardless of what
| filesystem you're using.
|
| No, really. ECC matters
| (https://news.ycombinator.com/item?id=25622322)
| generally.
| tombert wrote:
| Fair enough, though AFAIK none of the SBC systems out
| there have ECC, and I generally use SBCs due to the low
| power consumption.
| simcop2387 wrote:
| You really only end up needing that if and only if you're
| also going to do live deduplication of large amounts of data.
| Very few people actually need that, just using compression
| with lz4 or zstd depending on your needs will suffice for
| just about everyone and perform better. the ECC argument is
| probably about a 50/50 kind of thing, you can get away
| without it and ZFS will do it's best to detect and prevent
| issues but if the data was flipped before it was given to ZFS
| then there's nothing anyone can do. You might get some false
| positives when reading data back if you got some flaky ram
| but as long as you have parity or redundancy on the disks
| then things should still get read correctly even if a false
| problem is detected. That might mean you want to run a scrub
| (essentially ZFS's version of fsck) more often to look for
| potential issues but it shouldn't fundamentally be a big
| deal. If you end up wanting 24/7 highly available storage
| that won't blip out occasionally you'll probably really want
| the ECC ram but if you're fine with having to reboot it
| occasionally or tell it to repair problems that it thinks
| were there (but weren't because the disk is fine but the ram
| wasn't) then you should be fine. The extra checksums and data
| that ZFS can use for all this can make it really robust even
| on bad hardware. I had a bios update cause some massive PCIE
| bus issues that I didn't realize were going on for a bit and
| ZFS kept all my data in good condition even though writes
| were sometimes just never happening because of ASPM causing
| issues with my controller card.
| UI_at_80x24 wrote:
| As always, it depends on your use-case.
|
| I have several file-servers all use ZFS exclusively. and 10x
| that number of servers using ZFS as the system FS.
|
| Rule of thumb that I like: 1GB RAM/TB of storage. This seems
| to give me the best bang-for-our-buck.
|
| For a small (under 20) number of office users, doing general
| 'office' stuff, using Samba, it's overkill.
|
| For large media shares with heavy editor access, and heavy
| strains on the network, it's a minimum.
|
| Depends on what the server is serving.
|
| DeDUP is a different story. The RAM is used to store the
| frequently accessed data. If you are using DeDUP you fill the
| motherboard with as much RAM as will fit. NO EXCEPTIONS! This
| may have been the line of thinking that scared you away from
| it.
|
| I have a 100TB server that is just used for writing data to
| and is never read from (sequential file back-ups before it's
| moved to "long term storage"). It has 8GB of RAM, and is
| barely touched.
|
| I also have a 20TB server with 2TB of RAM, that keeps the RAM
| maxed out with DeDUP usage.
|
| ECC: It's insurance, and it's worth it.
| the8472 wrote:
| btrfs does have some advantages over zfs - no
| data duplicated between page cache and arc - no upgrade
| problems on rolling distros - balance allows
| restructuring the array - offline dedup, no need for
| huge dedup tables - ability to turn off checksumming for
| specific files - O_DIRECT support - reflink copy
| - fiemap - easy to resize
| chasil wrote:
| - defragmentation
| nix23 wrote:
| But the main thing of an fs is to preserve your files...btrfs
| can't even check the most important point.
| edgyquant wrote:
| I see this a lot but have never had problems with BTRFS and
| I've used it both on my larger disks (2+tb) and my root
| (250gb ssd) across multiple computers for the last four
| years.
| the8472 wrote:
| The checksumming helps to spot faulty hardware, that's a
| step above most other filesystems and often smart info too.
| akvadrako wrote:
| Checksums don't help against bugs. You are much less
| likely to lose your whole disk with ext4 or ZFS than
| BTRFS.
| baaym wrote:
| And even included in the kernel
| donmcronald wrote:
| BTRFS was useful for me. When those (RAID5) parity patches got
| rejected many, many years ago for non-technical reasons like
| not matching a business case/goal or similar, it changed my
| view of open source.
|
| That was the day I realized that some open source participants
| and supporters are interested in having open source projects
| that are good enough to act as a barrier to entry, but not good
| enough to compete with their commercial offerings.
|
| Judge the world from that perspective for a while and it can
| help to explain why so much open source feels 80% done and
| never gets the last 20% of the polish needed to make it great.
| imiric wrote:
| Simplicity. There's a lot of complexity in ZFS I'd rather not
| depend on, and because it does so many things it's a big
| investment and liability to switch to.
|
| While I understand why it would be useful in a corporate
| setting, for personal use I've found the combination of
| LUKS+LVM+SnapRAID to work well and don't see the benefit of
| switching to ZFS. Two of those are core Linux features, and
| SnapRAID has been rock solid, though thankfully I haven't
| tested its recovery process, but it seems straightforward from
| the documentation. Sure I don't have the real-time error
| correction of ZFS and other fancy features, but most of those
| aren't requirements for a personal NAS.
| [deleted]
| nix23 wrote:
| > LUKS+LVM+SnapRAID
|
| + your fs
|
| Yeah that sounds like a lot less complexity
| imiric wrote:
| ZFS has all of these features and more. If I don't need
| those extra features by definition it's a less complex
| system.
|
| Using composable tools is also better from a maintenance
| standpoint. If tomorrow SnapRAID stops working, I can
| replace just that component with something else without
| affecting the rest of the system.
| TimWolla wrote:
| > If tomorrow SnapRAID stops working, I can replace just
| that component with something else without affecting the
| rest of the system.
|
| Can you actually? If some layer of that storage stack
| stops working then you can no longer access your existing
| data, because all these layers need to work correctly to
| correctly reassemble the data read from disk.
| imiric wrote:
| It's a hypothetical scenario :) In reality if there's a
| project shutdown there would be enough time to migrate to
| a different setup. Of course it would be annoying to do,
| but at least it's possible. With a system like ZFS I'm
| risking having to change the filesystem, volume manager,
| storage array, encryption and whatever other feature I
| depended on. It's a lot to buy into.
| nix23 wrote:
| Since all those tools are from different dev's the system
| gets more complex. But hey if you really think that ZFS
| is to complex to hold 55 petabytes because it has to many
| potential bugs you should tell them:
|
| https://computing.llnl.gov/projects/zfs-lustre
| imiric wrote:
| Thankfully I don't have to manage 55 petabytes of data,
| but good luck to them.
|
| Did you miss the part where I mentioned "for personal
| use"?
|
| > Since all those tools are from different dev's the
| system gets more complex.
|
| I fail to see the connection there. Whether software is
| developed by a single entity or multiple developers has
| no relation to how complex the end user system will be.
|
| But many small tools focused on just the functionality I
| need allows me to build a simpler system overall.
| funcDropShadow wrote:
| > Whether software is developed by a single entity or
| multiple developers has no relation to how complex the
| end user system will be.
|
| The first part of this sentence is probably true, as far
| as I see, but the complexity of a system perceived by the
| user depends primarily on the "surface" of the system.
| That surface includes the UI, the documentation and
| important concepts you have to understand for effective
| usage of the system. And in that regard, ZFS wins hands
| down against LUKS + LVM + SnapRaid + your FS of choice.
| Some questions a user of that LVM stack has to answer,
| aren't even asked of a ZFS user. E.g. the question how to
| split the space between volumes or how to change the size
| of volumes.
| j1elo wrote:
| What about if you were just starting today, with 0 knowledge
| about basically anything related to storage and how to do it
| right?
|
| That's my case, I'm learning before setting up a cheap home
| lab and a NAS, and I'm wondering if biting into ZFS is just
| the best option that I have given today's ecosystem.
| imiric wrote:
| I would still go with a collection of composable tools
| rather than something monolithic as ZFS, and to avoid the
| learning curve. But again, for personal use. If you're
| planning to use ZFS in a professional setting it might be
| good to experiment with it at home.
| j1elo wrote:
| As mentioned in the sibling comment, one thing I like is
| having systems that don't require me to supervise, fix
| things, etc. In part that's why I've been alwas a user of
| ext4, it just works.
|
| But I've recently found bitrotin some of my data files
| and now that I happened to be learning about how to build
| a NAS, I wanted to make the jump to some FS that helps me
| with that task.
|
| Could you mention which tools you would use to replace
| ZFS? Think of checksumming, snapshotting, and to a lesser
| degree, replication/RAID.
| throw0101a wrote:
| > _That 's my case, I'm learning before setting up a cheap
| home lab and a NAS, and I'm wondering if biting into ZFS is
| just the best option that I have given today's ecosystem._
|
| ZFS is the simplest stack that you can learn IMHO. But if
| you want to learn all the moving parts of an operating
| system for (e.g.) professional development, then more
| complex may be more useful.
|
| If you want to created a mirrored pair of disks in ZFS, you
| do: _sudo zpool create mydata mirror /dev/sda /dev/sdb_
|
| In the old school fashion, you first partition with _gdisk_
| , then you use _mdadm_ to create the mirroring, then
| (optionally) LVM to create volume management, then _mkfs_.
| cogman10 wrote:
| I dove into ZFS for my home lab as a relative novice.
|
| It's not terrible, but there are a few new concepts to come
| to grips with. Once you have them down, it's not terrible.
|
| If you don't plan on raiding, IMO, ZFS is overkill. The
| check-summing is nice, but you can get that from other
| filesystems.
|
| Maintenance is fairly straight forward. I've even done a
| disk swap without too much fuss.
|
| The biggest issue I had was setting up raid z on root with
| ubuntu was a PITA (at the time at least, March of this
| year). I ended up switching over to debian instead. Once
| setup, things have been pretty smooth.
| j1elo wrote:
| Two things I like from it, as per what I've read so far:
|
| * Checksumming
|
| * As you mention, easy maintenance
|
| * Snapshots and how useful they are for backups
|
| In the end what I value is stuff that works reliably,
| doesn't get in the way, and requiring minimal
| supervision. And in the particular case of FS, I'd like
| to adopt a system that helps avoid bitrot in my data.
|
| Could you drop some names that you would consider as good
| alternatives of ZFS?
| gregmac wrote:
| For my big media volume, which had existed for around 10 years,
| I use snapraid.
|
| Because of several things:
|
| * I can mix disk sizes
|
| * I can add new disks over time as needed
|
| * If something dies, up to the entire server, I can just stick
| any data disk in another system and read it
|
| I didn't want to become a zfs expert (and the learning curve
| seems steep!), and I didn't want to spend thousands of dollars
| on new gear (dedicated NAS box and a bunch of matched-size
| disks).
|
| I repurposed my old workstation into a server, spent a few
| hours getting it set up, and it works. I've had two disks fail
| (one data, one parity, and recovered from both). Every time
| I've added a new disk, it's been 50-100% larger than my
| existing disks.
|
| I've also migrated the entire setup to a new system (newer old
| retired workstation), running proxmox, and was pleasantly
| surprised it only took about an hour to get that volume back up
| (incidentally, that server runs zfs as well.. I just don't use
| it for my large media storage volume).
| joshstrange wrote:
| UnRaid and Synology user here and I completely agree with all
| your points. The knowledge that at worst I will lose the data
| on just 1 disk (or 2 if I fail during a rebuild) is very
| calming. If not for UnRaid there is no way I could manage the
| size of the media volume I maintain (from a time, energy, and
| money perspective). I mean if you know ZFS well and trust
| yourself then more power to you but UnRaid and friends fill a
| real gap.
| atmosx wrote:
| The learning curve of ZFS compared to every alternative out
| there is significantly lower IMO. The interface is easier and
| the guides online are great.
|
| There are drawbacks as the one discussed here, but as a Linux
| user who doesn't want to mess up with the FS and uses ZFS for
| the backup server, the experience has been great so far.
___________________________________________________________________
(page generated 2021-06-18 23:00 UTC)