[HN Gopher] Raid-Z Expansion Feature for ZFS Goes Live
___________________________________________________________________
Raid-Z Expansion Feature for ZFS Goes Live
Author : chungy
Score : 67 points
Date : 2022-02-08 09:16 UTC (13 hours ago)
(HTM) web link (freebsdfoundation.org)
(TXT) w3m dump (freebsdfoundation.org)
| aniou wrote:
| I'm sorry to say that but this article is not entirely true - an
| illustration "how does traditional raid 4/5/6 do it?" shows ONLY
| RAID 4. There is a big difference between RAID 4 and RAID 5/6 and
| former was abandoned a years (decades?) ago in favor of RAID 5
| and - later - 6.
|
| Of course, it gives "better publicity" for RAID-Z, but it is
| rather an marketing trick not engineering.
|
| See https://en.wikipedia.org/wiki/Standard_RAID_levels
| mrighele wrote:
| Note that the article talks about the way the array is
| expanded, not how the specific level works.
|
| In other words, what they are saying is that the traditional
| way to expand an array is essentially to rewrite the whole
| array from scratch, so if the old array has three stripes, with
| blocks [1,2,3,p1] [4,5,6,p2] and [7,8,9,p3] (with p1 and p2
| being the parity blocks), the new array will have stripes
| [1,2,3,4,p1'], [5,6,7,8,p2'] and [9,x,x,x,p3'], i.e. not only
| has to move the blocks around, but also recompute essentially
| all the parity blocks.
|
| _IF_ I understand the ZFS approach correctly, the existing
| blocks are not restructured but only reshuffled, so the new
| layout will be logically still [1,2,3,p1] [4,5,6,p2] and
| [7,8,9,p3] but distributed on five disks so [1,2,3,p1,4]
| [5,6,p2,7,8], [9,p3,x,x,x]
|
| It seems that this means less work while expanding, but some
| space lost unless one manually copies old data in a new place.
|
| _IF_ I got it right, I am not sure who is the intended
| audience for this feature: enterprise users will probably not
| use it, and power users would probably benefit from getting all
| the space they could get from the extra disk
| genpfault wrote:
| So is this feature FreeBSD-only? Or will it be integrated into
| OpenZFS at some point?
| chungy wrote:
| It is an experimental feature on FreeBSD 14-CURRENT. It will be
| merged into OpenZFS eventually (and maybe backported to FreeBSD
| 13-STABLE and whatever new point releases happen).
| mnd999 wrote:
| Since the Linux crowd moved in zfs development seems to have gone
| from stability to feature feature feature. I'm starting to get a
| bit concerned that this isn't going to end well. I really hope
| I'm wrong.
| nightfly wrote:
| These feature feature features are ones people have been asking
| for _years_.
| dsr_ wrote:
| This feature is being developed in FreeBSD, and will become
| part of the general ZFSonLinux set.
| p_l wrote:
| Nothing really changed in development, ZFSonLinux team was
| actually one of the more conservative in terms of data safety,
| what changed is that a bunch of things that were _really_ long
| in the works coincided in reaching maturity.
|
| If you want "feature chasing", FreeBSD ZFS TRIM is the ur
| example. I've read that code end to end... and I'll leave it at
| this.
| mrighele wrote:
| For those that like videos more than text, there is a youtube
| video from last year [1] that explain the feature (unless it's
| changed since ,but it seems not to be the case).
|
| One downside that I see of this approach, if I understand it
| correctly, is that the data already present on disk will not take
| advantage of the extra disk per slice. For example, if I have a
| raidz of 4 disks (so 25% of space "wasted"), and add another
| disk, new data will be distributed on 5 disks (so 20% of space
| "wasted") but the old data will keep using stripes of 4 blocks,
| they will just be reshuffled between the disks. Do I understand
| it correctly ?
|
| [1] https://www.youtube.com/watch?v=yF2KgQGmUic
| vanillax wrote:
| ZFS is great until its not. When you lose a zpool or vdev its
| unrecoverable, its pretty crappy when it happens. Check out how
| Linus Tech Tips lost everything.
| https://www.youtube.com/watch?v=Npu7jkJk5nM
| 7steps2much wrote:
| To be fair, they didn't really understand how ZFS works and
| failed to set up bitrot detection.
|
| A "clean" setup would include those, as well as either a
| messaging system or a regular checkup on how your FS is doing.
| Youden wrote:
| You can say this about literally any storage system.
| Unrecoverable failures can always happen, that's why you keep
| backups.
|
| ZFS redundancy features aren't there to eliminate the need for
| backups, they're there to reduce the chance of downtime.
| reincarnate0x14 wrote:
| This is great, there has been a demand for this since forever.
| Enterprise-y people generally didn't care much but the
| homelab/SMB users end up dealing with it a lot more than might be
| naively imagined.
|
| Always reminds me of when NetApp used to do their arrays in
| RAID-4 because it made expansion super-fast, just add a new
| zeroed disk and only had to update the new disk blocks + parity
| drive on writes. Used to blow our Netware admin's mind as almost
| nobody else ever used RAID-4 -- I had it as an interview question
| along with "what is virtual memory" because you'd get interesting
| answers :)
| uniqueuid wrote:
| This is great, but an important and little known caveat is that
| raidz is limited to the iops of one disk. So growing a raidz will
| at some point have lots of throughput but suffer in small and
| random reads and writes. At that point, it will be better to grow
| the pool with additional, separate raidz.
| Osiris wrote:
| So you setup two raid-z then stripe them for increased
| performance?
| FullyFunctional wrote:
| I certainly wanted this. I even heckled Bill Moore about it.
| Having gone through the expansion the old way (replace each drive
| one at a time with a larger one), this looks a lot simpler.
| Unfortunately it appears to not work with simple mirror and
| stripes (~ RAID10) so it will make no difference for me. (Drives
| are cheap but performance is not -> RAID10).
| chungy wrote:
| Simple mirrors and stripes could always be expanded (and
| reduced, too). RAID-Z has been special.
| [deleted]
| arwineap wrote:
| This looks different than the old way
|
| The old way ( as you referenced ) was to replace each disk one
| by one with a larger one.
|
| If I'm understanding this right, and please correct me, this
| feature will allow you to add a 5th disk to a 4 disk raidz
|
| And if I'm right about that, then this feature wouldn't really
| make sense for RAID10 anyway
| FullyFunctional wrote:
| I love ZFS but this is something that just works in btrfs;
| mirror just means all blocks live in two physical locations.
| You certainly can do that even with an odd number of drives.
| However ZFS is more rigid and doesn't allow flowing blocks
| like this, nor dynamic defragmentation.
| deagle50 wrote:
| Would you use raid 5/6 in btrfs?
| NavinF wrote:
| The btrfs raid5/6 write hole is still around, if anyone's
| wondering. Though it was only recently that btrfs started
| warning users that it would eat their data: https://www.p
| horonix.com/scan.php?page=news_item&px=Btrfs-Wa...
| NavinF wrote:
| What are you on about? The submission is about raidz.
|
| Adding drives to a mirror has worked in zfs since
| prehistoric times. "zpool attach test_pool sda sdc" will
| mirror sda to sdc. If sda was already mirrored with sdb,
| you now have a triple-mirror with sda, sdb, and sdc.
| 2OEH8eoCRo0 wrote:
| ZFS is a fad.
| sleepycatgirl wrote:
| Nah, ZFS is pretty comfy FS, it has lots nice features, it is
| reasonably fast, and it is stable. And as far as I know, it has
| been used for fairly long time.
| ggm wrote:
| Lots of people have wanted this for ages. I managed to cope with
| spindle replace and resize into new space (larger spindles) but
| being able to add more discrete devices and get more parity
| coverage and more space (I may be incorrectly assuming you get
| better redundancy as well) is great.
| rincebrain wrote:
| This trick cannot be used to turn an N-disk raidzP into an [any
| number]-disk raidzP+1, as far as I understand.
| [deleted]
___________________________________________________________________
(page generated 2022-02-08 23:00 UTC)