[HN Gopher] Implement unprivileged chroot
___________________________________________________________________
Implement unprivileged chroot
Author : 0mp
Score : 209 points
Date : 2021-07-22 10:14 UTC (12 hours ago)
(HTM) web link (cgit.freebsd.org)
(TXT) w3m dump (cgit.freebsd.org)
| EdSchouten wrote:
| FreeBSD already supported something like this effectively, but in
| my opinion better way.
|
| You can call cap_enter(), which disables open(), unlink(),
| mkdir(), etc. entirely. You can, however, still use openat(),
| unlinkat(), mkdirat() with relative paths that expand to a
| location underneath a directory file descriptor. This achieves
| the same thing, except that you can now have as many chroots as
| you want. Not just one.
|
| Unfortunately, the idea never caught on, because virtually no
| software on UNIX uses the *at() functions. Also: the non-*at()
| functions are still available as symbols, meaning that you can't
| perform simple compile-time checks to ensure that you application
| works properly when this form of sandboxing is enabled. Turns out
| that off-the-shelf software (e.g., libraries) end up misbehaving
| in unpredictable ways if you disable ~50% of the POSIX API.
|
| It's a shame, because this feature effectively requires you to
| treat the file system in an object oriented/dependency injected
| way. Pretty good from a reusability/testability perspective.
| toast0 wrote:
| Capabilities mode is useful, but it's very difficult to apply
| to programs that don't fit the model.
|
| If you need to make network connections, you have to do that
| before entering capabilities mode, because there is no
| capability to allow it later. You can work through a proxy
| program, but adding that complexity doesn't seem worthwhile to
| me unless your program to be sandboxed is very complex.
|
| I haven't worked with OpenBSD's pledge, but the idea of being
| able to end use of specific dangerous things seems more widely
| applicable.
| jerf wrote:
| One of my minor disappointments with Go, considering the time
| it came out and the UNIX heritage that it descended from, was
| that it didn't prioritize the *at() functions. It's difficult,
| if not virtually impossible, to write secure code with the
| "traditional" path-based system because every time you do one
| thing, then some other thing to a path that has some sort of
| security implication, you've written a TOCTOU problem if
| somebody can wedge between those two things to change some
| critical aspect of the file.
|
| It's hard for me to blame programmers for not using these
| functions more when hardly any language properly exposes them.
| But since nobody exposes them, nobody's aware they should use
| them.... chicken & egg strike again.
| donio wrote:
| os.Open has been using openat since 2015.
|
| https://github.com/golang/go/commit/e7a7352e527ca275a2b66cc3.
| ..
| jerf wrote:
| I was unclear. See my other cousin reply; you can't use it
| yourself to have a directory handle and securely open files
| in that directory. You can only open things by path.
| tines wrote:
| But openat, for example, is still path-based; it just changes
| the directory that the path is relative to. If you give it an
| absolute path, it will open it, and I didn't see any reason
| in the man page why you couldn't just pass in a bunch of
| ../../ as the usual exploits do. Maybe you're referring to
| another category of bugs?
| karatinversion wrote:
| He was - TOCTOU has its own wiki page [1]. These can be
| nastier, because they don't require the attacker to be able
| to submit strings or file names.
|
| [1] https://en.wikipedia.org/wiki/Time-of-check_to_time-of-
| use
| tines wrote:
| I guess I'm not sure how you would use open() that would
| expose a TOCTOU bug that openat () wouldn't. Can you give
| an example?
| pmahoney wrote:
| For what it's worth, Linux 5.6 introduced openat2 [1] which
| accepts some additional flags controlling path resolution.
|
| For example, RESOLVE_IN_ROOT "is as though the calling
| process had used chroot(2) to (temporarily) modify its root
| directory (to the directory referred to by dirfd)".
|
| [1] https://man7.org/linux/man-pages/man2/openat2.2.html
| jerf wrote:
| Sorry, I was unclear. Too much context in my head from the
| times I've jousted with this and I forgot to contextualize
| properly. (Which is ironic since part of my complaint is
| precisely that too few people know this stuff.) That family
| of functions allows you to open things based on handles
| more easily. So you can open a directory, and while holding
| on to the handle for that directory, know that you are
| still in that directory, even potentially open files in
| that directory and then, once you do that, know that you
| have a file in that directory (or, atomically, don't).
|
| It's the difference between dirHandle =
| open("some path"); fileInDir = openat(dirHandle,
| "some file");
|
| versus dir = open("some path")
| // examine the directory, then fileInDir =
| open("some path/some file");
|
| In the second case, between those two lines, you can have
| something else jump in and modify or remove or repermission
| or whatever the "some file". It has never been the largest
| security issue, but it's been a running undercurrent of
| securit issues for decades.
|
| In the first case, you have atomically-safe operations; you
| either get the directory or don't, then either get the file
| handle or don't, etc, and once you have the handle nobody
| else can take it from you, even if they rename the file
| under you, etc. It means that if you are writing logic like
| "if the file is setuid, do this", there's no way for an
| external process to wedge in between the two things.
|
| In other words, you ought to be able to not just read from
| a file handle, but also open relative to the handle
| directly, and do all those other things. Any API that
| operates in terms of paths is pretty much intrinsically
| open to TOCTOU, because any time you "check" a path vs.
| "use" the path, which is fairly common, you have a window
| of opportunity for lossage. I'm not sure I've yet seen a
| non-C way of doing this built into a standard library.
|
| Also... before you jump in with some "what ifs", no, these
| functions don't magically make your code more secure. You
| still have to use them correctly and it's still pretty easy
| to mistakenly let path-based logic slip in accidentally
| even so. It doesn't make insecure code secure; it makes
| guaranteed insecure (in security-sensitive contexts,
| obviously a lot of time this isn't a security issue) code
| _possible_ to write securely.
| tines wrote:
| Makes sense, but I think that you only gain safety when
| you are checking attributes of the directories leading to
| the file, but not when you are checking the file itself.
| For example, you said
|
| > In the second case, between those two lines, you can
| have something else jump in and modify or remove or
| repermission or whatever the "some file".
|
| Modifying/removing/repermissioning "some file" is still
| possible even with openat() if you do it between the time
| you open("some path") and openat("some file"). There is
| still a race condition there in either case if you are
| examining the contents of the directory (e.g. "stat"ing
| the file and then calling openat). You can also
| modify/repermission "some path" as well. The only thing
| openat() protects you from is removing/replacing "some
| path" (not "some file") and I agree that that is valuable
| for security purposes.
| jerf wrote:
| 'Modifying/removing/repermissioning "some file" is still
| possible even with openat() if you do it between the time
| you open("some path") and openat("some file").'
|
| This is part of what I was trying to head off with my
| parenthetical. You still have to use it correctly to do
| secure things. But at least it's _possible_. This kind of
| security is basically impossible with pure path-based
| APIs. Plus, as mentioned elsewhere, there are some
| additional flags you can use for even more security that
| you can 't get out of an API that is "open(filename)",
| simply because that API is mathematically incapable of
| carrying such flags (assuming you don't start trying to
| encode them in the filename itself, but that way lies
| madness).
| donio wrote:
| It's doable when you need it, something like:
| filefd, err := syscall.Openat(int(dir.Fd()), filename,
| os.O_RDONLY, 0) file :=
| os.NewFile(uintptr(filefd), filename) // for use with
| library functions
| catlifeonmars wrote:
| I'm confused. How would using *at() APIs prevent race
| conditions?
| silon42 wrote:
| You would need to standardize passing of current root as a file
| handle, I think? Probably will break some software...
| phicoh wrote:
| The problem is that many libraries need access to configuration
| files or other stuff that comes with the library.
|
| So if you start with a system that has some form of persistent
| objects, then very quickly a root namespace object is created
| to solve those library issues.
|
| And then you are mostly back to a Unix root directory.
| wahern wrote:
| cap_enter can be invoked _after_ library initialization.
| Libraries can open the files and directories they need during
| initialization.
|
| A single jailed root is where you end up when you take the
| route of putting software into sandboxes for which they
| weren't designed, because now you need to _emulate_ a
| traditional environment.
|
| pledge and unveil are a middle ground, albeit closer to
| Capsicum, in that they're much more accommodating of existing
| software patterns. But they do still require application
| refactoring. OpenBSD has refactored their _entire_ userland
| codebase this way. That typically involves identifying the
| necessary resources a program needs and either shifting their
| acquisition to before privilege dropping (i.e. early in
| main), or arranging so that they 're subsequently accessible
| (e.g. using unveil).
|
| It's a shame Linux never merged the Capsicum patches. While
| pledge and unveil are more convenient from a developer
| perspective, they can't easily be adopted in a standardized
| way by other operating systems, like Linux. Capsicum was the
| closest thing we could have gotten to a standardized
| sandboxing model in the POSIX universe. If it became widely
| available ( _cough_ Linux), I believe a large chunk of
| software, especially critical network-facing software, would
| slowly migrate; and an ecosystem of idioms, patterns, and
| libraries would evolve to increasingly smooth the transition.
|
| What's doubly shameful is that Capsicum is architecturally
| extremely simple. In principle it would be easy for any POSIX
| system to adopt. The APIs are trivial, and Linux is already
| nearly there now that it has process descriptors and an
| openat that can prevent parent directory traversal. Most of
| the leg work is in blocking access, after cap_enter has been
| invoked, to non-standard interfaces and syscalls that expose
| resources.
| GoblinSlayer wrote:
| Why not treat open(path) as openat(AT_FDCWD,path)?
| wizzwizz4 wrote:
| Because cap_enter() blocks that too.
| c0l0 wrote:
| FWIW (and iirc), with programs using recent-ish glibc, you will
| never see a call to open() in the wild unless the program takes
| special care to bypass the implicit libc wrapper. glibc will
| transparently convert these calls to openat() under its own
| hood. I do notice that this probably doesn't do you any good on
| FreeBSD, though :)
| markjdb wrote:
| This is mostly true on FreeBSD as well. The real problem is
| that capability mode also disallows openat(AT_FDCWD) - there
| has to be an explicit directory descriptor.
| aduitsis wrote:
| Mildly off-topic note, the parent is the author of CloudABI
| (https://github.com/NuxiNL/cloudlibc), which was (in my
| opinion) a truly brilliant approach to running untrusted code
| in a FreeBSD system.
| HPsquared wrote:
| In Linux there's "PRoot" - used by Termux on Android to provide
| userspace chroot-like functionality (can run Debian, for
| instance).
|
| https://proot-me.github.io/
| thenoblesunfish wrote:
| For those, like me, lacking context, what are the implications of
| this?
| jsiepkes wrote:
| You can for example run a build in a chroot as a unprivileged
| user.
| tyingq wrote:
| chroot existed, but could only be run as the root user. It was
| that way to prevent things like this (old actual exploit for
| Ultrix): $ mkdir /tmp/etc $ echo
| root::0:0::/:/bin/sh > /tmp/etc/passwd $ mkdir /tmp/bin
| $ cp /bin/sh /tmp/bin/sh $ cp /bin/chmod /tmp/bin/chmod
| $ chroot /tmp /bin/login # whoami root #
| chmod 4700 /bin/sh now, log out of the chroot and use
| your newly minted setuid shell
|
| Since they now have the "NO_NEW_PRIVS" protection, they can let
| regular users safely use chroot.
| phicoh wrote:
| The key feature of chroot is that you can provide a process
| with a completely different filesystem view. You can leave
| stuff out that exist in the standard view, or change things.
| Change the contents of system directories.
|
| The problem with traditional chroot is that you can typically
| import setuid applications in this new space which can get
| confused, for example by a new /etc/passwd file. For this
| reason, chroot can be used only by root.
|
| The advantage of such a NO_NEW_PRIVS flag is that this kind of
| abuse of setuid applications is not possible.
|
| This should make it safe to allow ordinary users to use chroot.
| codetrotter wrote:
| chroot is a system call that assigns a limited view of the file
| system to a process. In particular it makes it so that the
| specific directory will appear as the top level directory to
| the process.
|
| Some people like to run for example FTP servers in a chroot so
| that users have access only to a specific directory and its
| subdirectories, rather than being able to browse other files on
| the system.
|
| FreeBSD also has a technology called jails which is what you'd
| rather use for containerization.
|
| Anyway, previously you had to be root (the Unix admin user) in
| order to use chroot. FreeBSD now implementing unprivileged
| chroot means that regular users are able to run processes in
| chroot as well.
|
| So for example if you were a regular user on a system, you can
| now create a sub directory in your home directory and run an
| FTP demon chrooted to that directory and bound to an
| unprivileged port, and then you can give someone else FTP
| access to that directory without them being able to see the
| other files in your home directory, keeping your private data
| private from them.
| [deleted]
| krylon wrote:
| The commit message does NOT indicate when this will be available
| to mere mortals like myself.
|
| Can someone enlighten me if this will be part of FreeBSD 14, or
| if there is a chance it will become available earlier, perhaps
| with FreeBSD 13.1?
|
| EDIT: The commit message does NOT indicate etc. Silly me.
| 0mp wrote:
| The commit message does not mention any MFC timeline [1] so
| this feature is not planned to be merged back into existing
| stable branches. In other words, the first release with this
| feature is going to be FreeBSD 14.0-RELEASE.
|
| [1]: Also, you may look for the commit hash
| (a40cf4175c90142442d0c6515f6c83956336699) at
| https://mfc.kernelnomicon.org/ to see the back-porting status.
| swills wrote:
| This feature should be in the weekly snapshot pretty soon:
|
| https://download.freebsd.org/ftp/snapshots/ISO-IMAGES/14.0/
| stabbles wrote:
| On many linux distro's you can already do this with user
| namespaces: $ mkdir rootfs $ docker
| export $(docker create ubuntu:20.04) | tar -C rootfs -xf -
| $ unshare -r chroot rootfs bash # ls bin dev
| home ...
|
| Very often when you use chroot you also want unprivileged mounts,
| in particular overlay mounts if you don't want to mutate the
| underlying rootfs. You can do that with mount namespaces:
| `unshare -rm`, but you need Linux kernel 5.13 (or a distro with a
| patched kernel like Ubuntu) to allow unpriviliged overlayfs.
| rkeene2 wrote:
| As an alternative, one can also use User Mode Linux (UML) to
| implement a pretty fancy chroot (and fakeroot).
|
| It can do a few things userns can't, like load kernel modules.
| I've had to use this to deal with bugs in BtrFS before.
| gunapologist99 wrote:
| I love UML. I used to use it all the time. Is it still
| developed? It was really a pretty slick system and very easy
| to work with.
|
| http://user-mode-linux.sourceforge.net/
|
| (another good site: https://wiki.archlinux.org/title/User-
| mode_Linux )
| dividuum wrote:
| An alternative to unshare is also bubblewrap
| (https://github.com/containers/bubblewrap) which also sets up a
| new namespace. You can build up your own new filesystem by
| binding existing paths into the new root and then run a process
| within it: $ mkdir -p root/bin $ cp
| /bin/busybox root/bin/ $ bwrap --bind root /
| /bin/busybox sh BusyBox v1.27.2 (Ubuntu
| 1:1.27.2-2ubuntu3.3) built-in shell (ash) Enter 'help'
| for a list of built-in commands. / $ ls -l /
| total 0 drwxrwxr-x 2 1000 1000 60 Jul
| 22 11:07 bin
| gigatexal wrote:
| Interesting. Going to check out Bubblewrap
| Cloudef wrote:
| I used bubblewrap to do a lightweight containers on top of
| arch + pacman. Basically you could install packages on
| overlays of the host and do whatever there without affecting
| the host fs. It was pretty nice.
| stabbles wrote:
| So how does this work? Can you mount / as the lower layer
| of the overlayfs? Doesn't that create a weird recursion
| because the mountpoint is a path inside /?
| Cloudef wrote:
| I used unionfs first to combine the sandbox and / where
| host / is read-only. Then simply bubblewrap into it. I
| also mounted / to /host if for some reason you wanted to
| access host fs from inside the sandbox.
| geofft wrote:
| I wish Linux would do this. Patches are available:
| https://lwn.net/Articles/849125/
|
| Yes, you can do this on Linux with a user namespace, but a user
| namespace changes the view of user accounts. You have to map
| every usable UID inside the namespace to a UID you control
| outside the namespace. At best, you can map a range of UIDs you
| control to "real" users (root, 1000, etc.) inside the namespace,
| but they won't be real users outside the namespace. If you're on
| a multi-user system, seeing other people's files as owned by
| "nobody" is confusing.
|
| It should be enough to use NO_NEW_PRIVS mode, meaning setuid
| transitions are not allowed. Then it doesn't matter what user IDs
| you see inside the chroot.
|
| In fact, back when Linux introduced the NO_NEW_PRIVS flag (almost
| a decade ago!), this was one of the motivating use cases.
| marcodiego wrote:
| *BSD have been quite innovative recently. The pledge and unveil
| syscalls, although achievable by other means on linux, are very
| simple and effective for what they do. I don't know a way on
| linux to use a system on a directory without being root; even if
| possible I'd still need root to mount --bind some dirs, but
| definitely something I'd like to do.
|
| I don't think containers should be needed for that.
| lima wrote:
| "containers" are just a combination of multiple kernel
| features, one of which does precisely that (user namespaces).
| pjmlp wrote:
| And were known as vaults on HP-UX 11, back in 2000.
| jerf wrote:
| Arguably, the issue with these features isn't their
| existence, since it's not even that hard to add them to a
| kernel, relative to the generalized difficulty of adding
| things to a kernel in general. The problem has been the
| need for mass awareness and desire for the feature, and
| that's what's taken multiple decades to emerge. It does no
| good for a kernel to have a security feature that only a
| vanishing fraction of developers care about and use.
|
| (And I say "vanishing fraction" relative to the pool of
| developers as a whole; even if a particular subcommunity
| uses it extensively that doesn't make it a pervasive
| request. I can name subcommunities with all sorts of exotic
| interests that have not penetrated the mainstream yet, like
| the capabilities-based security community. Someday, when
| that emerges, we'll all point back to E as a pioneer, but
| in the meantime, effectively nobody wants it right now.)
| wil421 wrote:
| Sounds like Jails in FreeBSD. Wikipedia says they were
| added in 1999.
| hestefisk wrote:
| And Zones on Solaris :) phk was the original author of
| Jails; he wrote an excellent paper called "Defying the
| omnipotent root", which I can highly recommend.
| amarshall wrote:
| *Confining the omnipotent root
| hestefisk wrote:
| Yes indeed! Brain fart, apologies.
| hestefisk wrote:
| And LPARs on System z :)
| tyingq wrote:
| Both LPARS and z/VM look more like hypervisors to me.
| Things like containers and chroot probably don't make
| much sense in the mainframe world since they already had
| granular facilities to limit access to networks, data
| sets, etc.
| indigodaddy wrote:
| Aren't lpars quite a lot different in nature than zones
| and jails though?
| queuebert wrote:
| And VMs on IBM VM/370.
| geofft wrote:
| On Linux, you can do unshare --user --mount
| --map-root-user chroot /path/to/whatever
|
| and if you need to bind-mount some directories, you can do that
| before the chroot, e.g., $ unshare --user
| --mount --map-root-user # mount --bind /proc
| /path/to/whatver/proc # mount --bind /proc
| /path/to/whatver/sys # chroot /path/to/whatever
|
| without being root. (This requires a sysctl to be enabled for
| unprivileged user namespaces, which is on by default in the
| kernel.org tree and I think all major distro kernels have it on
| now. The feature has been in the upstream kernel since 2013.)
|
| If you want to do this at scale, a handy tool is bwrap(1) from
| https://github.com/containers/bubblewrap . (The README talks
| about how bwrap is a setuid program to prevent the need for
| that sysctl, but it also works great as a non-setuid program
| when that sysctl is enabled, and its value is it has a bunch of
| handy command-line flags for this sort of thing. We use it
| extensively at my workplace in non-setuid mode for things that
| don't quite need containers but need to see alternative root
| directories etc.)
___________________________________________________________________
(page generated 2021-07-22 23:01 UTC)