https://github.com/openzfs/zfs/pull/15022 Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Customer Stories + White papers, Ebooks, Webinars + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. {{ message }} openzfs / zfs Public * Notifications * Fork 1.6k * Star 9.3k * Code * Issues 997 * Pull requests 134 * Discussions * Actions * Projects 6 * Wiki * Security * Insights More * Code * Issues * Pull requests * Discussions * Actions * Projects * Wiki * Security * Insights New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. Pick a username [ ] Email Address [ ] Password [ ] [ ] Sign up for GitHub By clicking "Sign up for GitHub", you agree to our terms of service and privacy statement. We'll occasionally send you account related emails. Already on GitHub? Sign in to your account Jump to bottom raidz expansion feature #15022 Open don-brady wants to merge 14 commits into openzfs:master base: master Choose a base branch [ ] Branches Tags Could not load branches Branch not found: {{ refName }} {{ refName }} default Could not load tags Nothing to show {{ refName }} default Are you sure you want to change the base? Some commits from the old base branch may be removed from the timeline, and old review comments may become outdated. Change base from don-brady:raidz-expansion Open raidz expansion feature #15022 don-brady wants to merge 14 commits into openzfs:master from don-brady:raidz-expansion +5,385 -581 Conversation 13 Commits 14 Checks 19 Files changed 59 Conversation This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters don-brady Copy link Contributor @don-brady don-brady commented Jun 29, 2023 * edited Motivation and Context This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). For additional context as well as a design overview, see Matt Ahrens' talk at the 2021 FreeBSD Developer Summit (video) (slides), and a news article from Ars Technica. Description Initiating expansion A new device (disk) can be attached to an existing RAIDZ vdev, by running zpool attach POOL raidzP-N NEW_DEVICE, e.g. zpool attach tank raidz2-0 sda. The new device will become part of the RAIDZ group. A raidz expansion will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The feature@raidz_expansion on-disk feature flag must be enabled to initiate an expansion, and it remains active for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. During expansion The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with zpool status. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. After expansion When the expansion completes, the additional space is available for use, and is reflected in the available zfs property (as seen in zfs list, df, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to zfs list, df, ls -s, and similar tools. Manpage changes zpool-attach.8: NAME zpool-attach -- attach new device to existing ZFS vdev SYNOPSIS zpool attach [-fsw] [-o property=value] pool device new_device DESCRIPTION Attaches new_device to the existing device. The behavior differs depend- ing on if the existing device is a RAIDZ device, or a mirror/plain device. If the existing device is a mirror or plain device ... If the existing device is a RAIDZ device (e.g. specified as "raidz2-0"), the new device will become part of that RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). Its progress can be monitored with zpool status. Data redundancy is maintained during and after the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to- parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distrib- uted among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to zfs list, df, ls -s, and similar tools. Status Matt Ahrens' original pull request (#12225) has been rebased here to current master branch and updated to incorporate recent code cleanups in the OpenZFS codebase. This feature is believed to be complete. However, like all PR's, it is subject to change as part of the code review process. Since this PR includes on-disk changes, it shouldn't be used on production systems before it is integrated to the OpenZFS codebase. Tasks that still need to be done before integration: * [*] Additional code cleanup in ztest code * [*] zloop changes to drive coverage of this feature * [ ] Address test failures in ztest runs * [*] Document the high-level design in a "big theory statement" comment * [*] Remove verbose logging * [*] Detection of MBR partitions using reserved boot area (FreeBSD BTX boot loader) * [ ] Address any performance concerns Acknowledgments Thank you to the FreeBSD Foundation for commissioning this work in 2017 and continuing to sponsor it well past the original time estimates! Thank you to iXsystems for sponsoring the final push to land this feature into OpenZFS. Thanks also to contributors @FedorUporovVstack, @stuartmaybee, @thorsteneb, and @Fmstrat for portions of the implementation. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Contributions-by: Stuart Maybee stuart.maybee@comcast.net Contributions-by: Fedor Uporov fuporov.vstack@gmail.com Contributions-by: Thorsten Behrens tbehrens@outlook.com Contributions-by: Fmstrat nospam@nowsci.com Contributions-by: Don Brady dev.fs.zfs@gmail.com How Has This Been Tested? Tests added to the ZFS Test Suite (functional/raidz) and ztest, in addition to manual testing. Types of changes * [ ] Bug fix (non-breaking change which fixes an issue) * [*] New feature (non-breaking change which adds functionality) * [ ] Performance enhancement (non-breaking change which improves efficiency) * [ ] Code cleanup (non-breaking change which makes code smaller or more readable) * [ ] Breaking change (fix or feature that would cause existing functionality to change) * [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv) * [ ] Documentation (a change to man pages or other documentation) Checklist: * [*] I have updated the documentation accordingly. * [*] I have read the contributing document. * [*] I have added tests to cover my changes. * [ ] I have run the ZFS Test Suite with this change applied. * [ ] All commit messages are properly formatted and contain Signed-off-by. Pull Request Comments Please limit comments here to code review/feedback and testing questions/results. For generic discussions about RAID-Z, or discussions on future enhancements to RAIDZ expansion, please use the OpenZFS discussions. Sorry, something went wrong. 62 congerh, jumbi77, Avamander, grimurd, lin72h, Ai-Himmel, abjugard, magma1447, f-andrey, PimvanderLoos, and 52 more reacted with thumbs up emoji 3 abjugard, endigma, and ysaito8015 reacted with laugh emoji 116 ahrens, dalbani, davidchalifoux, drewthor, Evernow, eugenesvk, 0x2E, codykrieger, marvinvr, Skaronator, and 106 more reacted with hooray emoji [?] 43 timawesomeness, tinsukE, just1689, JaredF, lin72h, Gudahtt, IcyMidnight, abjugard, toast-gear, mufunyo, and 33 more reacted with heart emoji 8 askiiart, KoffeinKaio, abjugard, mikesplain, endigma, seqizz, reinismu, and ysaito8015 reacted with rocket emoji 19 luispabon, ShadowJonathan, rbtr, D0han, dustinpianalto, venom85, Blacklands, IcyMidnight, abjugard, toast-gear, and 9 more reacted with eyes emoji All reactions * 62 reactions * 3 reactions * 116 reactions * [?] 43 reactions * 8 reactions * 19 reactions @don-brady don-brady added Type: Feature Feature request or new feature Status: Code Review Needed Ready for review and testing labels Jun 29, 2023 @don-brady don-brady assigned mmaybee Jun 29, 2023 @don-brady don-brady requested review from behlendorf, ahrens and mmaybee June 29, 2023 15:21 @ahrens ahrens mentioned this pull request Jun 29, 2023 RAIDZ Expansion feature #12225 Closed 18 tasks @Evernow Copy link Evernow commented Jun 29, 2023 Thank you for you and @ahrens work! Hopefully this gets merged soon! 35 JaredF, goarano, lin72h, KaeTuuN, vianchiel, nicman23, arantius, amatus-, joeportela, Gornoka, and 25 more reacted with thumbs up emoji All reactions * 35 reactions Sorry, something went wrong. @emaste Copy link emaste commented Jun 29, 2023 A small comment, one of the trailer lines is inconsistent (space vs dash): Sponsored-by: The FreeBSD Foundation Sponsored by: iXsystems, Inc. All reactions Sorry, something went wrong. @mmayer Copy link mmayer commented Jul 4, 2023 Thank you so much for continuing this work. And thank you to iXsystems for sponsoring it. 65 lin72h, vinicentus, darkbasic, calvu, joeportela, lexxxel, johnpyp, TETYYS, Radtoo, maresb, and 55 more reacted with thumbs up emoji All reactions * 65 reactions Sorry, something went wrong. @ahrens @don-brady raidz expansion feature ... 88efba2 This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Authored-by: Matthew Ahrens Contributions-by: Fedor Uporov Contributions-by: Stuart Maybee Contributions-by: Thorsten Behrens Contributions-by: Fmstrat Contributions-by: Don Brady Signed-off-by: Don Brady @don-brady don-brady force-pushed the raidz-expansion branch from 546bd92 to 88efba2 Compare July 6, 2023 18:40 don-brady added 2 commits July 6, 2023 14:00 @don-brady zloop and functional/raidz changes ... b2e296c Add support to zloop to drive testing raidz expansion In the ZTS test, make sure to reset RAIDZ_EXPAND_MAX_OFFSET_PAUSE Signed-off-by: Don Brady @don-brady Major cleanup for ztest raidz expansion test ... 9140f44 - The test no longer takes a guessed value. It stops at 25,50 and 75 percent of reflow amount. - Simplified the way we fill up the pool in ztest_rzx_thread. - Refactored ztest_raidz_expand_run and ztest_run to remove duplicate code Signed-off-by: Don Brady @EvanCarroll Copy link EvanCarroll commented Jul 17, 2023 We are all cheering for you. May god bless all of those who contribute to this patch and may their children live long and healthy forever, without death. May this patch swiftly be tested and committed so the heavens may finally open and rain bliss down upon us. Amen. 25 CleverUnderDog, lin72h, xpader, EvgueniGavrilov, friism, johnpyp, danielloader, zwimer, felix-gohla, aedalzotto, and 15 more reacted with thumbs up emoji 26 L3MON4D3, lin72h, mweinelt, IcyMidnight, Nulifier, chrisduerr, rsynnest, ashleyprimo, danielloader, felix-gohla, and 16 more reacted with laugh emoji 5 bountin, mgreenw, 0-st, au5ton, and disconsented reacted with confused emoji [?] 40 ThunderMikey, bricewge, J-ZD, abjugard, GaetanLepage, zikphil, nemchik, kattjevfel, apooridiot, szymczag, and 30 more reacted with heart emoji 20 hkrutzer, lin72h, fourjuaneight, eli-jordan, just1689, abjugard, TheNexter, danielloader, Jade-TheCat, KoffeinKaio, and 10 more reacted with rocket emoji All reactions * 25 reactions * 26 reactions * 5 reactions * [?] 40 reactions * 20 reactions Sorry, something went wrong. don-brady and others added 9 commits July 18, 2023 10:36 @don-brady Check raidz_expansion feature before attaching ... dc8975f Signed-off-by: Don Brady Requires-builders: arch,style,centos7,centos8,fedora38,freebsd13,coverage @don-brady Add test for disk attach with feature disabled ... 24f1812 Also add ztest -X to zloop mix Signed-off-by: Don Brady @don-brady Remove some chatty diagnostic zfs_dbgmsg logging ... dc45090 Signed-off-by: Don Brady @don-brady Address a ztest race between vdev offline and vdev attach ... fcaf638 Signed-off-by: Don Brady @don-brady Address check abi failures ... df9565e Signed-off-by: Don Brady @don-brady Fix ENOSPC failure in ztest expansion test ... d4bab2c Signed-off-by: Don Brady @fuporovvStack @don-brady ztest: Fix scratch object verification ... 1cd666b Make scratch object verification logic more robust to decrease number of verification assertions triggering. Remove reflow pause from verification logic and add additional scratch object states. Implement verification based on these new scratch states added. Signed-off-by: Don Brady @don-brady Add error handling to raidz_reflow_scratch_sync() ... 355b473 also remove more diagnostic logging Signed-off-by: Don Brady @don-brady Cleanup raidz expand pause variable usage ... f91d1c2 Signed-off-by: Don Brady @shivabohemian Copy link shivabohemian commented Aug 9, 2023 Thank you so much. BTW, When will we merge this pr? @behlendorf @ahrens @mmaybee 2 johnkeates and dfgshdsfh reacted with thumbs up emoji 55 rbtr, dampcake, kattjevfel, DianaNites, albino1, scineram, Solvik, jdjingdian, mstinsky, monke0192, and 45 more reacted with thumbs down emoji All reactions * 2 reactions * 55 reactions Sorry, something went wrong. @abjugard Copy link abjugard commented Aug 10, 2023 BTW, When will we merge this pr? When it's good and ready. You can't rush art. 16 nickcmaynard, vinicentus, PimvanderLoos, kellerkindt, joeportela, GracefulTabby, roolrz, chrisduerr, au5ton, stephen-zhao, and 6 more reacted with laugh emoji [?] 20 sachaz, kellerkindt, joeportela, just1689, Solvik, EvanCarroll, lin72h, au5ton, kylegordon, disconsented, and 10 more reacted with heart emoji All reactions * 16 reactions * [?] 20 reactions Sorry, something went wrong. @EvanCarroll Copy link EvanCarroll commented Aug 10, 2023 When this patch is finally done, the pope himself will consent to painting it over that other crap in the Sistine Chapel. 16 alphaleonis, Volkor3-16, DianaNites, kylegordon, iynaix, codykrieger, Pheidologeton, crypdick, dan3805, TacticAlpha, and 6 more reacted with laugh emoji 1 vinicentus reacted with eyes emoji All reactions * 16 reactions * 1 reaction Sorry, something went wrong. @shivabohemian Copy link shivabohemian commented Aug 11, 2023 Got it. I don't mean to rush and just want to know if it's on the plan~ 2 johnkeates and dfgshdsfh reacted with thumbs up emoji 30 rbtr, dampcake, kattjevfel, fourjuaneight, pascalj, KoffeinKaio, toast-gear, abjugard, codykrieger, ERivierePEReN, and 20 more reacted with thumbs down emoji 1 vinicentus reacted with eyes emoji All reactions * 2 reactions * 30 reactions * 1 reaction Sorry, something went wrong. @KaeTuuN Copy link KaeTuuN commented Aug 11, 2023 This PR is followed by many people, so please: STOP FLOATING IT WITH USELESS COMMENTS! That would be really awesome! If you have a question to the Code or want to help, fine. In every other case: Do not post! Sorry for the harsh wording, but it's really annoying me... 72 GaetanLepage, toast-gear, MichaelYochpaz, abjugard, codykrieger, MatthiasBenaets, dassiegfried, ERivierePEReN, kattjevfel, maresb, and 62 more reacted with thumbs up emoji 7 luispabon, nemchik, fourjuaneight, lin72h, stefan-hennings, Prillan, and johnkeates reacted with rocket emoji 2 crypdick and NickPaul41 reacted with eyes emoji All reactions * 72 reactions * 7 reactions * 2 reactions Sorry, something went wrong. @don-brady Add a "big theory statement" comment to code ... 172c1a8 Signed-off-by: Don Brady D0han D0han reviewed Aug 14, 2023 View reviewed changes module/zfs/vdev_raidz.c Outdated * context. The design also allows for fast discovery of what data to copy. * * The VDEV metaslabs are processed, one at a time, to copy the block data to * have it flow across all the disks. The metasab is disabled for allocations Copy link @D0han D0han Aug 14, 2023 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Typo in metasab Sorry, something went wrong. 1 vinicentus reacted with thumbs up emoji All reactions * 1 reaction Copy link Contributor Author @don-brady don-brady Aug 15, 2023 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Thanks. Fixed. Sorry, something went wrong. All reactions D0han D0han reviewed Aug 14, 2023 View reviewed changes module/zfs/vdev_raidz.c Outdated * * == Reflow Progress Updates == * After the initial scratch-based reflow, the expansion process works * similarly to device removal. We create a new open context thread whichi Copy link @D0han D0han Aug 14, 2023 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Typo in whichi Sorry, something went wrong. 2 vinicentus and Tombert reacted with thumbs up emoji All reactions * 2 reactions Copy link Contributor Author @don-brady don-brady Aug 15, 2023 There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Thanks. Fixed. Sorry, something went wrong. All reactions @don-brady Detect BTX boot loader in the zfs reserved boot section ... 05fbc6e Signed-off-by: Don Brady Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment Reviewers @D0han D0han D0han left review comments @behlendorf behlendorf Awaiting requested review from behlendorf @ahrens ahrens Awaiting requested review from ahrens @mmaybee mmaybee Awaiting requested review from mmaybee At least 1 approving review is required to merge this pull request. Assignees @mmaybee mmaybee Labels Status: Code Review Needed Ready for review and testing Type: Feature Feature request or new feature Projects None yet Milestone No milestone Development Successfully merging this pull request may close these issues. None yet 12 participants @don-brady @Evernow @emaste @mmayer @EvanCarroll @shivabohemian @abjugard @KaeTuuN @D0han @mmaybee @ahrens @fuporovvStack Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later. Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time.