[HN Gopher] ZFSBootMenu - A boot loader to manage ZFS boot envir...
___________________________________________________________________
ZFSBootMenu - A boot loader to manage ZFS boot environments for
Linux
Author : nixcraft
Score : 136 points
Date : 2022-11-12 09:50 UTC (13 hours ago)
(HTM) web link (zfsbootmenu.org)
(TXT) w3m dump (zfsbootmenu.org)
| nailer wrote:
| I mean you can't maintain ZFS normally, and people have been
| trying to make zfs happen for what... two decades now?
| magicalhippo wrote:
| > people have been trying to make zfs happen for what... two
| decades now?
|
| When players like AWS provides ZFS as one of four alternatives
| in their "filesystem as a service"[1], I'd say we're beyond
| "trying to make zfs happen".
|
| Not to mention the PB worth of data others[2] rely on ZFS to
| keep safe.
|
| [1]: https://aws.amazon.com/fsx/
|
| [2]: https://openzfs.org/wiki/Companies
| phaer wrote:
| What do you mean "have been trying to make zfs happen"? ZFS is
| used in production in many places.
| knaekhoved wrote:
| When you say "normally", do you mean "badly, with fsck"?
|
| ZFS is growing incredibly quickly in popularity, and the only
| reason it's not the dominant filesystem already is because A)
| it took linux a long time to add support, and only dedicated
| appliance vendors had the will and ability to move to freebsd
| and B) macos was going to switch to zfs in the late '00s, but
| they got scared off by oracle's legal shenanigans, which seems
| to no longer be a relevant factor.
| WastingMyTime89 wrote:
| > A) it took linux a long time to add support
|
| Linux has no support for ZFS. This is an out-of-tree patch
| set and therefor a no-go for most including myself.
|
| ZFS intentionally has a terrible license and is owned by
| Oracle. People are free to do what they want but I wish all
| the time wasted on it could have been put in something more
| interesting.
| erk__ wrote:
| As with most thinks in the linux world it depends on who
| builds your upstream. ZFS is in the kernel distributed by
| Ubuntu which is one of the largest distributions.
|
| Linux is unlike Eg freebsd not a monolithic operating
| system, but only a kernel so it is in my opinion not really
| right to say that it has no support.
| PlutoIsAPlanet wrote:
| > but only a kernel so it is in my opinion not really
| right to say that it has no support.
|
| The kernel has no native support for ZFS.
|
| Ubuntu may ship with ZFS, but that's one distro.
| Meanwhile, RHEL etc won't even touch it.
|
| On Linux, XFS dominates and likely will continue to
| dominate the server world, meanwhile btrfs will slowly
| erase ext4 in the desktop side of things.
| Android/Embedded have always used their own different
| filesystems so it's irrelevant there.
| throw0101c wrote:
| > _The kernel has no native support for ZFS._
|
| Neither does it have accelerated Nvidia card support that
| could be used for things like HPC/AI/ML. Yet I'm
| administrating an entire cluster of Ubuntu machines with
| cards just fine.
|
| We generally use the "nvidia-driver-NNN-server" package.
|
| If you want to live ideologically pure no one is going to
| stop you, but someone of us need to get work done.
| ghaff wrote:
| CDDL isn't an especially terrible license in isolation
| (it's basically Mozilla) but it is generally considered
| incompatible with GPL which, depending upon which set of 20
| year old memories from ex-Sun employees you're inclined to
| believe, was more or less a deliberately nefarious state of
| affairs.
|
| Oracle owns most/all of the copyrights and Canonical was
| willing to take a calculated risk after, presumably, some
| back-channel discussions. But those companies with
| something to actually lose from a lawsuit with Oracle or
| organizations with strong free software principles aren't
| going anywhere close. Oracle has had a long time to change
| the license if they actually cared to.
|
| Personally, I find it unfortunate that all the effort that
| has gone into ZFS as essentially a hobbyist copy-on-write
| filesystem didn't go into btrfs instead.
| kobalsky wrote:
| > I find it unfortunate that all the effort that has gone
| into ZFS as essentially a hobbyist copy-on-write
| filesystem didn't go into btrfs instead.
|
| don't some BSDs and Linux share the same code base for
| ZFS?
|
| Last I heard FreeBSD switched to ZfsOnLinux as upstream a
| few years ago before it was merged with OpenZFS.
|
| IMO calling it a hobbyist fs is a bit unfair.
| yakak wrote:
| Being called hobbyist software derisively by the Linux
| community is like being knighted, I assume.
| cmeacham98 wrote:
| While the CDDL isn't a terrible license in a vacuum, it
| is (according to its author) _intentionally_ incompatible
| with the GPL (https://en.wikipedia.org/wiki/Common_Develo
| pment_and_Distrib...).
|
| This is, in my opinion, the most important part. It's not
| some unhappy accident that there are significant legal
| issues with ZFS and GPL-licenced Linux - that is
| (allegedly) by design.
| ghaff wrote:
| As that section says, there is (at least for public
| consumption) disagreement among then-Sun employees as to
| what the intent and beliefs were at the time.
|
| I know all those folks to greater or lesser degrees and
| Sun was a client of mine as an analyst. There were
| certainly a lot of conflicting motivations and concerns
| concerning Solaris and Linux.
| throw0101c wrote:
| > _ZFS intentionally has a terrible license_
|
| The folks on FreeBSD didn't / don't seem to think so.
| Neither does Apple (who pulled in DTrace, which has the
| exact same license).
|
| > _and is owned by Oracle._
|
| The OpenZFS folks would don't seem to think so.
| boomboomsubban wrote:
| >The OpenZFS folks would don't seem to think so.
|
| The OpenZFS people are fully aware that Oracle owns ZFS,
| that's why they forked the last free copy and made
| OpenZFS. A small nitpick.
| jacob019 wrote:
| I guess the trolls are out today. Thought I was on Reddit for a
| minute.
| hnlmorg wrote:
| I don't understand why you're being snarky. ZFS has been hugely
| successful since it's release, and continues to be successful
| even now. The reason why it's popular is precisely because it's
| easy to maintain.
| nailer wrote:
| Not sure why you'd consider criticism of ZFS to he snark.
| Running any filesystem outside the mainline kernel is a bunch
| of extra effort and I'm sure as a ZFS user you'd know that.
| I'm not sure how ZFS could be "hugely successful" after two
| decades and still not in the Linux kernel.
| kobalsky wrote:
| > Running any filesystem outside the mainline kernel is a
| bunch of extra effort and I'm sure as a ZFS user you'd know
| that
|
| zfs-dkms makes usage simple since openzfs backwards
| compatible down to 3.xx kernels. no need to mix match zfs
| with kernel versions anymore.
|
| the only drawback is that you may not get to use the
| lastest kernel until openzfs mantainers give it the thumbs
| up (no 6.xx compatible release yet), but that's not "a
| bunch of extra effort".
|
| and that's only a problem if you want to be in the bleeding
| edge. LTS kernel users wouldn't know about it.
|
| > I'm not sure how ZFS could be "hugely successful" after
| two decades and still not in the Linux kernel.
|
| "There is no way I can merge any of the ZFS efforts until I
| get an official letter from Oracle that is signed by their
| main legal counsel or preferably by Larry Ellison himself
| that says that yes, it's OK to do so and treat the end
| result as GPL'd" -Linus Torvalds
|
| sound like a legal issue more than technical.
| hnlmorg wrote:
| Linux users are constantly using drivers outside of the
| mainline kernel. Whether it's graphics cards, radio drivers
| (things have gotten better in that regard but Bluetooth is
| support is still terrible) or FUSE file systems.
|
| The difference with ZFS is that the code is kernel-ready
| but there's just some licensing worries (understandable
| ones) that stop it from being mainlined.
|
| I've been running ZFS on Ubuntu Server for several years
| now and frankly ZFS is the only part of that entire system
| that doesn't suck (in my opinion). I'd switch back to
| FreeBSD in a heartbeat if I didn't need Docker support but
| credit where credit is due, Ubuntu's ZFS support has been
| really good.
|
| Edit: just to add, I've got nothing against anyone who does
| enjoy Ubuntu Server. It's just not a Linux distro I
| personally have much fondness for.
| vermaden wrote:
| I still wait for ANY Linux distro that would have installer that
| would allow you to install Linux with Root on ZFS and with
| ZFSBootMenu (or any other ZFS Boot Environments tool) ...
| szanni wrote:
| I would love to see that too, but believe this is rather
| unlikely for any major Linux distro. Why? Because there is no
| guarantee the required kernel symbols will stay available.
|
| This has for example happened with the linux-rt branch that
| decided to change the license of some of the exported kernel
| symbols to GPL, which prevents the ZFS module from compiling.
|
| As far as I can tell, the kernel developers make sure to not
| break user space but no such guarantees are given for the
| kernel modules. Having the driver for your root file system
| possibly not compile on the next kernel update seems like a
| nightmare to support for any distribution.
| yjftsjthsd-h wrote:
| > I would love to see that too, but believe this is rather
| unlikely for any major Linux distro. Why? Because there is no
| guarantee the required kernel symbols will stay available.
|
| Er, Ubuntu already supports ZFS root out of the box in the
| default installer; why would ZBM be any harder to support
| than that?
| ghoward wrote:
| I'm in the process of building a NixOS-like distro. I'm 90%
| certain mine will use these.
| ninefathom wrote:
| I'm glad to see interest in this functionality taking off in
| Linux-land. I think there are one or two other projects with
| similar goals (i.e. implementing BE selection on Linux) and it
| might be time for me to do a side-by-side.
|
| This capability was something of which the lack on Linux has long
| puzzled me. Solaris actually implemented a very early incarnation
| of this ability (called "live upgrades" at the time from its
| original use case) back in the early '00s- in Solaris 8, and on
| top of UFS no less, if I recall correctly. It evolved over the
| next decade first adding ZFS into the mix, then finally morphing
| from the early "live upgrade" stuff into the full "boot
| environment" concept around 2010 with Solaris 11. FreeBSD
| implemented it around 2012, in the early days of their ZFS work.
| More than a decade ago. That puts Linux at least ten years behind
| the curve here, and arguably closer to twenty.
|
| I'm a fan of using the right tool for the right job, and jumping
| freely between Solaris (or OpenIndiana nowadays), Linux, and
| FreeBSD for any given deployment is par for the course. Until
| now, all other things being equal, FreeBSD or Solaris would often
| win out if minimizing downtime* was a much higher priority than
| ease of replacing admins. Assuming that BE support in Linux
| matures quickly, that calculus has now swung strongly in Linux's
| favor.
|
| *Re: minimizing downtime, if somebody is puzzled as to what I
| mean, think of the last time that you had a Linux installation
| fail to come back up to full operation after a borked round of
| package upgrades. It's not often, but it does happen
| occasionally. Now imagine that the time you spent getting back up
| and working, whatever it might have been, was reliably less than
| sixty seconds. Now imagine it's 2am, you're not even fully awake
| following a panicked phone call from the operations night shift,
| and your job hangs in the balance. Makes quite a difference.
| nortonham wrote:
| what do you use OpenIndiana for?
|
| >This capability was something of which the lack on Linux has
| long puzzled me.
|
| Agreed. I recently tried out OI "hipster" and the way boot
| environments are integrated into caja (the file manager) with
| Time Slider was so smooth it got me thinking why something like
| it wasn't more popular in linux.
| ninefathom wrote:
| > what do you use OpenIndiana for?
|
| I find that it's a good fit for quite a few things, but if
| you're looking for a specific example: clustered Java
| application stacks, like ELK or Hadoop.
|
| Zones, crossbow networking, SMF, and ZFS w/ BEs all working
| seamlessly together is a fantastic combination for easy-
| button admin of low- or zero-downtime clustered applications.
| ploxiln wrote:
| (without having read the article) after-update filesystem
| rollback sounds like what Suse has offered for about 10 years:
| https://www.suse.com/c/introduction-system-rollbacks-btrfs-a...
|
| There just hasn't been much demand for it. There are a bunch of
| other mechanisms used instead, like redundant systems and
| gradual rollouts, working with full system images (or container
| images) instead, etc.
|
| For personal-ish systems, things are reliable enough, and if
| there is a problem you can't just stop updating, you'll need to
| fix it soon anyway. I've been updating a debian install on my
| home fileserver for 3 major debian releases, 5+ years ...
| esjeon wrote:
| I tried this alongside Void Linux, but I found I don't really
| need it.
|
| TBH, this is really cool. I liked that I'm able to choose
| snapshots for booting - a very good recovery option. The
| interface is well polished, and comes with fzf for quick
| searching. It's a true dream for distro hoppers, since ZFS works
| like thin-provisioned partitions (though distro options are
| limited due to ZFS). Pretty cool in and out.
|
| But it turned out to be a super-overkill for me. Firstly, I
| stopped dual-booting like a decade ago. I run everything else in
| VMs. Secondly, the host system these days hardly breaks. Lots of
| things work out-of-box unlike old days, and service settings can
| be isolated in containers. Thirdly, my host environment can be
| recreated within an hour including the download time and few
| trims, as long as `/home` is backed up. So I don't worry much
| about the root partition.
|
| I wonder how this is working for others.
| E39M5S62 wrote:
| I use ZFSBootMenu to boot a single distribution on each of my
| systems. While it certainly can help booting multiple different
| environments, the real value-add to me is that my entire OS is
| contained on a single filesystem. There's no longer a need to
| make an entirely separate boot pool to work around GRUBs
| extremely limited ZFS support.
|
| Because ZBM (can) use the kernel and ZFS userland+modules on
| your own system, it's never really behind what your OS is
| running. Additionally, since we import the pool read-only by
| default, new breaking features/pool flags in ZFS typically
| aren't a problem. It's only when you try to import a pool read-
| write that ZFS will have issues, so we detect that, warn you
| and then prevent it from happening.
|
| Since it also ships as an EFI executable that can import/boot
| any pool, it's really easy to make recovery media. Just throw
| the EFI on a USB drive with an ESP and name it BOOTX64.EFI and
| most modern firmware will use it in the absence of any other
| working boot entries.
| seized wrote:
| Boot environments are one of those magic features that when
| you've used it, it's hard to give up.
|
| My NAS has long been on OpenIndiana. Boot environments mean zero
| risk OS upgrades. At one point I could have gone back to a 4 year
| old OS version and booted it with no data loss.
|
| You can create one at any time, so it brings an even better take
| on VM snapshots to the physical machine world. Hacking on
| something and want a fallback? "Beadm create beforehacking" and
| you're safe.
| nortonham wrote:
| how do you like using OI day to day on your NAS? Any pitfalls
| or things to be aware of?
| willis936 wrote:
| I've never used a boot environment. Is there a way to use a
| boot environment to have a ZFS-backed Windows install?
| infogulch wrote:
| You may be interested in recent (~the last month)
| developments in adding Windows support to OpenZFS:
| https://github.com/openzfs/zfs/pull/14034
| [deleted]
| Teknoman117 wrote:
| I've been kinda doing a similar thing with my Gentoo installation
| on btrfs.
|
| The btrfs subvolumes are structured like this:
|
| <root subvolume>/$(hostname)/${environment}/@volume (e.g. @root,
| @home)
|
| snapshots look like this:
|
| <root subvolume>/$(hostname)/${environment}/volume_$(date -u
| +%Y-%m-%d_%H-%M-00)
|
| I have a few scripts "make-snapshots", "backup-snapshots",
| "update-shell", and "update-commit". make-snapshots creates
| readonly snapshots of my system, backup-snapshots does
| incremental backups of those to my NAS, update-shell creates a
| writable snapshot of @root as @root-update and drops you into a
| chroot environment. You can then run all the portage commands you
| want without fear of borking your current environment. Upon exit
| it checks whatever the /usr/src/linux symlink points to, copies
| the associated vmlinuz and initramfs images to the EFI partition,
| and creates/updates a boot entry in rEFInd. You can then boot
| either into your previous version or the update version. Once
| you're satisfied that your new environment works, you run
| "update-commit" which deletes the @root subvolume and replaces it
| with your current @root-update subvolume.
|
| A change I've been considering is to drop the concept of having a
| @root subvolume at all. Current implementation requires two
| reboots: one to get from @root to @root-update, where (if it's
| good) you delete @root and make a writable snapshot of @root-
| update as @root. The second reboot is to get onto (the new)
| @root. An alternative might be to include the date/version in the
| name of the writable snapshots as well. "committing" the update
| would just mean setting the current booted subvolume as the
| "head". Future snapshots/updates will be made from that
| subvolume. No need for a reboot because you're currently on it
| with everything mounted correctly. Any writable subvolumes older
| than "head" would be cleaned up upon booting "head".
|
| Could even go a step further and add something to my initramfs
| where if you try to boot a version where the writable subvolume
| has been deleted, it would make a temporary writable subvolume
| for it from the snapshot.
| kkfx wrote:
| So... After a decade GNU/Linux have something similar to BEAdm
| integrated with the boot process...
|
| When we talk about sorry state of REAL tech evolution this and
| many others features should be counted...
| dazzawazza wrote:
| Always good to see Linux being inspired by FreeBSD.
| 1letterunixname wrote:
| hnlmorg wrote:
| Choice is good if it offers something different, which this
| does because it is more than just a boot menu with ZFS support.
|
| Anyway, since when has Linux been adverse to choice? Multiple
| different init daemons, window managers, desktop environments,
| cron daemons, MTAs, scripting languages, shells, etc. even the
| way you set up a networking interface can differ wildly.
| yyyk wrote:
| GRUB(any number) is horrible and should be entirely replaced.
| fjdiccf wrote:
| Pretty much every single person using Linux or writing open
| source is doing it specifically because the choices they had
| were not adequate.
|
| Don't like choice? Go back to Mac, spare us your hot takes.
| sirn wrote:
| The problem is, ZFS support on GRUB2 hasn't been great, partly
| due to CDDL/GPL licensing incompatibility requiring lots of ZFS
| internals to be re-implemented in GRUB. This resulted in issues
| such as grub-probe unable to detect ZFS pools due to
| unsupported ZFS features[1] (including native ZFS encryption,
| which is a deal-breaker for many)
|
| ZBM took another approach. It provides a small initramfs image
| that are built on the host machine via standard method such as
| dracut or mkinitcpio. This image provides an interface for
| decrypting/mounting ZFS filesystems using the very same ZFS
| kernel module and tools installed on the host. After the
| filesystem is mounted, it then kexec'd into the host kernel.
|
| This also means ZBM doesn't completely replaced GRUB2 or
| syslinux. Instead, it rely on those intermediate bootloader
| (including EFI bootloader such as rEFId/gummiboot) to load ZBM
| itself. (Though ZBM itself only has built-in hooks for syslinux
| and gummiboot).
|
| Being an initramfs give an extra benefit of providing
| interesting mechanism during boot e.g. providing a SSH server
| for entering an encryption key on a headless server[2], ability
| to discover, managing/booting from ZFS snapshots, etc.
|
| (No affiliate; just a very happy user.)
|
| [1]: https://savannah.gnu.org/bugs/?58555
|
| [2]: https://github.com/zbm-dev/zfsbootmenu/wiki/Remote-Access-
| to...
| matja wrote:
| Wouldn't kexec with a mounted filesystem lose the filesystem
| state because the kernel heap+stack is overwritten? I think
| ZBM copies the kernel/initramfs from the ZFS dataset
| (presumably to tmpfs), unmounts/exports, then kexec's, and
| the new initramfs imports/mounts the pool/dataset as usual?
| sirn wrote:
| My understanding is that during the boot process using
| ZBM's initramfs:
|
| 1. ZBM prompt for encryption passphrase, decrypts the
| filesystem, locate kernel/initramfs on ZFS datasets, then
| display boot menu
|
| 2. ZBM kexec into the kernel on the filesystem using the
| chosen kernel/initramfs while appending root=zfs:... to the
| kernel parameter
|
| 3. The target kernel decrypts the filesystem[^] and mounts
| the root ZFS again and boot into final system
|
| [^]: In this case, ZBM requires the encryption key to be
| placed in the target initramfs (not ZBM's) for the target
| kernel to load (dataset need to be decrypted again since
| kernel state is disregarded). This initramfs is located
| inside the encrypted filesystem itself, only accessible
| after initial decryption/mount by ZBM in step 1, so the
| only way to obtain this key is to already have access to
| encrypted filesystem in the first place.
| E39M5S62 wrote:
| That's exactly right. We also append spl.spl_hostid to
| the command line, to work around any possible hostid
| mismatches inside the boot environment.
| sirn wrote:
| Thank you for such a great tool. I've recently migrated
| from one server to another server in different contenient
| via `zfs send | zfs recv` (using hrmpf), and `generate-
| zbm` inside the chroot was all I need to get it working
| again.
| E39M5S62 wrote:
| Since the pool itself is imported read-only by default,
| there's no state to keep. We don't even need to export the
| pool, no txg's can be generated and left in a pending
| state.
|
| If a pool is switched to read-write so that the default
| kernel can be set, a snapshot cloned to a new BE, etc, we
| check for that and then export the pool just before kexec.
|
| Once kexec is done, your BEs kernel and initramfs
| essentially start fresh and actas if it's a fresh boot.
___________________________________________________________________
(page generated 2022-11-12 23:00 UTC)