[HN Gopher] Implement unprivileged chroot
       ___________________________________________________________________
        
       Implement unprivileged chroot
        
       Author : 0mp
       Score  : 209 points
       Date   : 2021-07-22 10:14 UTC (12 hours ago)
        
 (HTM) web link (cgit.freebsd.org)
 (TXT) w3m dump (cgit.freebsd.org)
        
       | EdSchouten wrote:
       | FreeBSD already supported something like this effectively, but in
       | my opinion better way.
       | 
       | You can call cap_enter(), which disables open(), unlink(),
       | mkdir(), etc. entirely. You can, however, still use openat(),
       | unlinkat(), mkdirat() with relative paths that expand to a
       | location underneath a directory file descriptor. This achieves
       | the same thing, except that you can now have as many chroots as
       | you want. Not just one.
       | 
       | Unfortunately, the idea never caught on, because virtually no
       | software on UNIX uses the *at() functions. Also: the non-*at()
       | functions are still available as symbols, meaning that you can't
       | perform simple compile-time checks to ensure that you application
       | works properly when this form of sandboxing is enabled. Turns out
       | that off-the-shelf software (e.g., libraries) end up misbehaving
       | in unpredictable ways if you disable ~50% of the POSIX API.
       | 
       | It's a shame, because this feature effectively requires you to
       | treat the file system in an object oriented/dependency injected
       | way. Pretty good from a reusability/testability perspective.
        
         | toast0 wrote:
         | Capabilities mode is useful, but it's very difficult to apply
         | to programs that don't fit the model.
         | 
         | If you need to make network connections, you have to do that
         | before entering capabilities mode, because there is no
         | capability to allow it later. You can work through a proxy
         | program, but adding that complexity doesn't seem worthwhile to
         | me unless your program to be sandboxed is very complex.
         | 
         | I haven't worked with OpenBSD's pledge, but the idea of being
         | able to end use of specific dangerous things seems more widely
         | applicable.
        
         | jerf wrote:
         | One of my minor disappointments with Go, considering the time
         | it came out and the UNIX heritage that it descended from, was
         | that it didn't prioritize the *at() functions. It's difficult,
         | if not virtually impossible, to write secure code with the
         | "traditional" path-based system because every time you do one
         | thing, then some other thing to a path that has some sort of
         | security implication, you've written a TOCTOU problem if
         | somebody can wedge between those two things to change some
         | critical aspect of the file.
         | 
         | It's hard for me to blame programmers for not using these
         | functions more when hardly any language properly exposes them.
         | But since nobody exposes them, nobody's aware they should use
         | them.... chicken & egg strike again.
        
           | donio wrote:
           | os.Open has been using openat since 2015.
           | 
           | https://github.com/golang/go/commit/e7a7352e527ca275a2b66cc3.
           | ..
        
             | jerf wrote:
             | I was unclear. See my other cousin reply; you can't use it
             | yourself to have a directory handle and securely open files
             | in that directory. You can only open things by path.
        
           | tines wrote:
           | But openat, for example, is still path-based; it just changes
           | the directory that the path is relative to. If you give it an
           | absolute path, it will open it, and I didn't see any reason
           | in the man page why you couldn't just pass in a bunch of
           | ../../ as the usual exploits do. Maybe you're referring to
           | another category of bugs?
        
             | karatinversion wrote:
             | He was - TOCTOU has its own wiki page [1]. These can be
             | nastier, because they don't require the attacker to be able
             | to submit strings or file names.
             | 
             | [1] https://en.wikipedia.org/wiki/Time-of-check_to_time-of-
             | use
        
               | tines wrote:
               | I guess I'm not sure how you would use open() that would
               | expose a TOCTOU bug that openat () wouldn't. Can you give
               | an example?
        
             | pmahoney wrote:
             | For what it's worth, Linux 5.6 introduced openat2 [1] which
             | accepts some additional flags controlling path resolution.
             | 
             | For example, RESOLVE_IN_ROOT "is as though the calling
             | process had used chroot(2) to (temporarily) modify its root
             | directory (to the directory referred to by dirfd)".
             | 
             | [1] https://man7.org/linux/man-pages/man2/openat2.2.html
        
             | jerf wrote:
             | Sorry, I was unclear. Too much context in my head from the
             | times I've jousted with this and I forgot to contextualize
             | properly. (Which is ironic since part of my complaint is
             | precisely that too few people know this stuff.) That family
             | of functions allows you to open things based on handles
             | more easily. So you can open a directory, and while holding
             | on to the handle for that directory, know that you are
             | still in that directory, even potentially open files in
             | that directory and then, once you do that, know that you
             | have a file in that directory (or, atomically, don't).
             | 
             | It's the difference between                    dirHandle =
             | open("some path");          fileInDir = openat(dirHandle,
             | "some file");
             | 
             | versus                    dir = open("some path")
             | // examine the directory, then          fileInDir =
             | open("some path/some file");
             | 
             | In the second case, between those two lines, you can have
             | something else jump in and modify or remove or repermission
             | or whatever the "some file". It has never been the largest
             | security issue, but it's been a running undercurrent of
             | securit issues for decades.
             | 
             | In the first case, you have atomically-safe operations; you
             | either get the directory or don't, then either get the file
             | handle or don't, etc, and once you have the handle nobody
             | else can take it from you, even if they rename the file
             | under you, etc. It means that if you are writing logic like
             | "if the file is setuid, do this", there's no way for an
             | external process to wedge in between the two things.
             | 
             | In other words, you ought to be able to not just read from
             | a file handle, but also open relative to the handle
             | directly, and do all those other things. Any API that
             | operates in terms of paths is pretty much intrinsically
             | open to TOCTOU, because any time you "check" a path vs.
             | "use" the path, which is fairly common, you have a window
             | of opportunity for lossage. I'm not sure I've yet seen a
             | non-C way of doing this built into a standard library.
             | 
             | Also... before you jump in with some "what ifs", no, these
             | functions don't magically make your code more secure. You
             | still have to use them correctly and it's still pretty easy
             | to mistakenly let path-based logic slip in accidentally
             | even so. It doesn't make insecure code secure; it makes
             | guaranteed insecure (in security-sensitive contexts,
             | obviously a lot of time this isn't a security issue) code
             | _possible_ to write securely.
        
               | tines wrote:
               | Makes sense, but I think that you only gain safety when
               | you are checking attributes of the directories leading to
               | the file, but not when you are checking the file itself.
               | For example, you said
               | 
               | > In the second case, between those two lines, you can
               | have something else jump in and modify or remove or
               | repermission or whatever the "some file".
               | 
               | Modifying/removing/repermissioning "some file" is still
               | possible even with openat() if you do it between the time
               | you open("some path") and openat("some file"). There is
               | still a race condition there in either case if you are
               | examining the contents of the directory (e.g. "stat"ing
               | the file and then calling openat). You can also
               | modify/repermission "some path" as well. The only thing
               | openat() protects you from is removing/replacing "some
               | path" (not "some file") and I agree that that is valuable
               | for security purposes.
        
               | jerf wrote:
               | 'Modifying/removing/repermissioning "some file" is still
               | possible even with openat() if you do it between the time
               | you open("some path") and openat("some file").'
               | 
               | This is part of what I was trying to head off with my
               | parenthetical. You still have to use it correctly to do
               | secure things. But at least it's _possible_. This kind of
               | security is basically impossible with pure path-based
               | APIs. Plus, as mentioned elsewhere, there are some
               | additional flags you can use for even more security that
               | you can 't get out of an API that is "open(filename)",
               | simply because that API is mathematically incapable of
               | carrying such flags (assuming you don't start trying to
               | encode them in the filename itself, but that way lies
               | madness).
        
               | donio wrote:
               | It's doable when you need it, something like:
               | filefd, err := syscall.Openat(int(dir.Fd()), filename,
               | os.O_RDONLY, 0)         file :=
               | os.NewFile(uintptr(filefd), filename) // for use with
               | library functions
        
           | catlifeonmars wrote:
           | I'm confused. How would using *at() APIs prevent race
           | conditions?
        
         | silon42 wrote:
         | You would need to standardize passing of current root as a file
         | handle, I think? Probably will break some software...
        
         | phicoh wrote:
         | The problem is that many libraries need access to configuration
         | files or other stuff that comes with the library.
         | 
         | So if you start with a system that has some form of persistent
         | objects, then very quickly a root namespace object is created
         | to solve those library issues.
         | 
         | And then you are mostly back to a Unix root directory.
        
           | wahern wrote:
           | cap_enter can be invoked _after_ library initialization.
           | Libraries can open the files and directories they need during
           | initialization.
           | 
           | A single jailed root is where you end up when you take the
           | route of putting software into sandboxes for which they
           | weren't designed, because now you need to _emulate_ a
           | traditional environment.
           | 
           | pledge and unveil are a middle ground, albeit closer to
           | Capsicum, in that they're much more accommodating of existing
           | software patterns. But they do still require application
           | refactoring. OpenBSD has refactored their _entire_ userland
           | codebase this way. That typically involves identifying the
           | necessary resources a program needs and either shifting their
           | acquisition to before privilege dropping (i.e. early in
           | main), or arranging so that they 're subsequently accessible
           | (e.g. using unveil).
           | 
           | It's a shame Linux never merged the Capsicum patches. While
           | pledge and unveil are more convenient from a developer
           | perspective, they can't easily be adopted in a standardized
           | way by other operating systems, like Linux. Capsicum was the
           | closest thing we could have gotten to a standardized
           | sandboxing model in the POSIX universe. If it became widely
           | available ( _cough_ Linux), I believe a large chunk of
           | software, especially critical network-facing software, would
           | slowly migrate; and an ecosystem of idioms, patterns, and
           | libraries would evolve to increasingly smooth the transition.
           | 
           | What's doubly shameful is that Capsicum is architecturally
           | extremely simple. In principle it would be easy for any POSIX
           | system to adopt. The APIs are trivial, and Linux is already
           | nearly there now that it has process descriptors and an
           | openat that can prevent parent directory traversal. Most of
           | the leg work is in blocking access, after cap_enter has been
           | invoked, to non-standard interfaces and syscalls that expose
           | resources.
        
         | GoblinSlayer wrote:
         | Why not treat open(path) as openat(AT_FDCWD,path)?
        
           | wizzwizz4 wrote:
           | Because cap_enter() blocks that too.
        
         | c0l0 wrote:
         | FWIW (and iirc), with programs using recent-ish glibc, you will
         | never see a call to open() in the wild unless the program takes
         | special care to bypass the implicit libc wrapper. glibc will
         | transparently convert these calls to openat() under its own
         | hood. I do notice that this probably doesn't do you any good on
         | FreeBSD, though :)
        
           | markjdb wrote:
           | This is mostly true on FreeBSD as well. The real problem is
           | that capability mode also disallows openat(AT_FDCWD) - there
           | has to be an explicit directory descriptor.
        
         | aduitsis wrote:
         | Mildly off-topic note, the parent is the author of CloudABI
         | (https://github.com/NuxiNL/cloudlibc), which was (in my
         | opinion) a truly brilliant approach to running untrusted code
         | in a FreeBSD system.
        
       | HPsquared wrote:
       | In Linux there's "PRoot" - used by Termux on Android to provide
       | userspace chroot-like functionality (can run Debian, for
       | instance).
       | 
       | https://proot-me.github.io/
        
       | thenoblesunfish wrote:
       | For those, like me, lacking context, what are the implications of
       | this?
        
         | jsiepkes wrote:
         | You can for example run a build in a chroot as a unprivileged
         | user.
        
         | tyingq wrote:
         | chroot existed, but could only be run as the root user. It was
         | that way to prevent things like this (old actual exploit for
         | Ultrix):                 $ mkdir /tmp/etc       $ echo
         | root::0:0::/:/bin/sh > /tmp/etc/passwd       $ mkdir /tmp/bin
         | $ cp /bin/sh /tmp/bin/sh       $ cp /bin/chmod /tmp/bin/chmod
         | $ chroot /tmp /bin/login       # whoami       root       #
         | chmod 4700 /bin/sh       now, log out of the chroot and use
         | your newly minted setuid shell
         | 
         | Since they now have the "NO_NEW_PRIVS" protection, they can let
         | regular users safely use chroot.
        
         | phicoh wrote:
         | The key feature of chroot is that you can provide a process
         | with a completely different filesystem view. You can leave
         | stuff out that exist in the standard view, or change things.
         | Change the contents of system directories.
         | 
         | The problem with traditional chroot is that you can typically
         | import setuid applications in this new space which can get
         | confused, for example by a new /etc/passwd file. For this
         | reason, chroot can be used only by root.
         | 
         | The advantage of such a NO_NEW_PRIVS flag is that this kind of
         | abuse of setuid applications is not possible.
         | 
         | This should make it safe to allow ordinary users to use chroot.
        
         | codetrotter wrote:
         | chroot is a system call that assigns a limited view of the file
         | system to a process. In particular it makes it so that the
         | specific directory will appear as the top level directory to
         | the process.
         | 
         | Some people like to run for example FTP servers in a chroot so
         | that users have access only to a specific directory and its
         | subdirectories, rather than being able to browse other files on
         | the system.
         | 
         | FreeBSD also has a technology called jails which is what you'd
         | rather use for containerization.
         | 
         | Anyway, previously you had to be root (the Unix admin user) in
         | order to use chroot. FreeBSD now implementing unprivileged
         | chroot means that regular users are able to run processes in
         | chroot as well.
         | 
         | So for example if you were a regular user on a system, you can
         | now create a sub directory in your home directory and run an
         | FTP demon chrooted to that directory and bound to an
         | unprivileged port, and then you can give someone else FTP
         | access to that directory without them being able to see the
         | other files in your home directory, keeping your private data
         | private from them.
        
       | [deleted]
        
       | krylon wrote:
       | The commit message does NOT indicate when this will be available
       | to mere mortals like myself.
       | 
       | Can someone enlighten me if this will be part of FreeBSD 14, or
       | if there is a chance it will become available earlier, perhaps
       | with FreeBSD 13.1?
       | 
       | EDIT: The commit message does NOT indicate etc. Silly me.
        
         | 0mp wrote:
         | The commit message does not mention any MFC timeline [1] so
         | this feature is not planned to be merged back into existing
         | stable branches. In other words, the first release with this
         | feature is going to be FreeBSD 14.0-RELEASE.
         | 
         | [1]: Also, you may look for the commit hash
         | (a40cf4175c90142442d0c6515f6c83956336699) at
         | https://mfc.kernelnomicon.org/ to see the back-porting status.
        
           | swills wrote:
           | This feature should be in the weekly snapshot pretty soon:
           | 
           | https://download.freebsd.org/ftp/snapshots/ISO-IMAGES/14.0/
        
       | stabbles wrote:
       | On many linux distro's you can already do this with user
       | namespaces:                   $ mkdir rootfs         $ docker
       | export $(docker create ubuntu:20.04) | tar -C rootfs -xf -
       | $ unshare -r chroot rootfs bash         # ls         bin   dev
       | home ...
       | 
       | Very often when you use chroot you also want unprivileged mounts,
       | in particular overlay mounts if you don't want to mutate the
       | underlying rootfs. You can do that with mount namespaces:
       | `unshare -rm`, but you need Linux kernel 5.13 (or a distro with a
       | patched kernel like Ubuntu) to allow unpriviliged overlayfs.
        
         | rkeene2 wrote:
         | As an alternative, one can also use User Mode Linux (UML) to
         | implement a pretty fancy chroot (and fakeroot).
         | 
         | It can do a few things userns can't, like load kernel modules.
         | I've had to use this to deal with bugs in BtrFS before.
        
           | gunapologist99 wrote:
           | I love UML. I used to use it all the time. Is it still
           | developed? It was really a pretty slick system and very easy
           | to work with.
           | 
           | http://user-mode-linux.sourceforge.net/
           | 
           | (another good site: https://wiki.archlinux.org/title/User-
           | mode_Linux )
        
         | dividuum wrote:
         | An alternative to unshare is also bubblewrap
         | (https://github.com/containers/bubblewrap) which also sets up a
         | new namespace. You can build up your own new filesystem by
         | binding existing paths into the new root and then run a process
         | within it:                   $ mkdir -p root/bin         $ cp
         | /bin/busybox root/bin/         $ bwrap --bind root /
         | /bin/busybox sh              BusyBox v1.27.2 (Ubuntu
         | 1:1.27.2-2ubuntu3.3) built-in shell (ash)         Enter 'help'
         | for a list of built-in commands.              / $ ls -l /
         | total 0         drwxrwxr-x    2 1000     1000            60 Jul
         | 22 11:07 bin
        
           | gigatexal wrote:
           | Interesting. Going to check out Bubblewrap
        
           | Cloudef wrote:
           | I used bubblewrap to do a lightweight containers on top of
           | arch + pacman. Basically you could install packages on
           | overlays of the host and do whatever there without affecting
           | the host fs. It was pretty nice.
        
             | stabbles wrote:
             | So how does this work? Can you mount / as the lower layer
             | of the overlayfs? Doesn't that create a weird recursion
             | because the mountpoint is a path inside /?
        
               | Cloudef wrote:
               | I used unionfs first to combine the sandbox and / where
               | host / is read-only. Then simply bubblewrap into it. I
               | also mounted / to /host if for some reason you wanted to
               | access host fs from inside the sandbox.
        
       | geofft wrote:
       | I wish Linux would do this. Patches are available:
       | https://lwn.net/Articles/849125/
       | 
       | Yes, you can do this on Linux with a user namespace, but a user
       | namespace changes the view of user accounts. You have to map
       | every usable UID inside the namespace to a UID you control
       | outside the namespace. At best, you can map a range of UIDs you
       | control to "real" users (root, 1000, etc.) inside the namespace,
       | but they won't be real users outside the namespace. If you're on
       | a multi-user system, seeing other people's files as owned by
       | "nobody" is confusing.
       | 
       | It should be enough to use NO_NEW_PRIVS mode, meaning setuid
       | transitions are not allowed. Then it doesn't matter what user IDs
       | you see inside the chroot.
       | 
       | In fact, back when Linux introduced the NO_NEW_PRIVS flag (almost
       | a decade ago!), this was one of the motivating use cases.
        
       | marcodiego wrote:
       | *BSD have been quite innovative recently. The pledge and unveil
       | syscalls, although achievable by other means on linux, are very
       | simple and effective for what they do. I don't know a way on
       | linux to use a system on a directory without being root; even if
       | possible I'd still need root to mount --bind some dirs, but
       | definitely something I'd like to do.
       | 
       | I don't think containers should be needed for that.
        
         | lima wrote:
         | "containers" are just a combination of multiple kernel
         | features, one of which does precisely that (user namespaces).
        
           | pjmlp wrote:
           | And were known as vaults on HP-UX 11, back in 2000.
        
             | jerf wrote:
             | Arguably, the issue with these features isn't their
             | existence, since it's not even that hard to add them to a
             | kernel, relative to the generalized difficulty of adding
             | things to a kernel in general. The problem has been the
             | need for mass awareness and desire for the feature, and
             | that's what's taken multiple decades to emerge. It does no
             | good for a kernel to have a security feature that only a
             | vanishing fraction of developers care about and use.
             | 
             | (And I say "vanishing fraction" relative to the pool of
             | developers as a whole; even if a particular subcommunity
             | uses it extensively that doesn't make it a pervasive
             | request. I can name subcommunities with all sorts of exotic
             | interests that have not penetrated the mainstream yet, like
             | the capabilities-based security community. Someday, when
             | that emerges, we'll all point back to E as a pioneer, but
             | in the meantime, effectively nobody wants it right now.)
        
             | wil421 wrote:
             | Sounds like Jails in FreeBSD. Wikipedia says they were
             | added in 1999.
        
               | hestefisk wrote:
               | And Zones on Solaris :) phk was the original author of
               | Jails; he wrote an excellent paper called "Defying the
               | omnipotent root", which I can highly recommend.
        
               | amarshall wrote:
               | *Confining the omnipotent root
        
               | hestefisk wrote:
               | Yes indeed! Brain fart, apologies.
        
               | hestefisk wrote:
               | And LPARs on System z :)
        
               | tyingq wrote:
               | Both LPARS and z/VM look more like hypervisors to me.
               | Things like containers and chroot probably don't make
               | much sense in the mainframe world since they already had
               | granular facilities to limit access to networks, data
               | sets, etc.
        
               | indigodaddy wrote:
               | Aren't lpars quite a lot different in nature than zones
               | and jails though?
        
               | queuebert wrote:
               | And VMs on IBM VM/370.
        
         | geofft wrote:
         | On Linux, you can do                   unshare --user --mount
         | --map-root-user chroot /path/to/whatever
         | 
         | and if you need to bind-mount some directories, you can do that
         | before the chroot, e.g.,                   $ unshare --user
         | --mount --map-root-user         # mount --bind /proc
         | /path/to/whatver/proc         # mount --bind /proc
         | /path/to/whatver/sys         # chroot /path/to/whatever
         | 
         | without being root. (This requires a sysctl to be enabled for
         | unprivileged user namespaces, which is on by default in the
         | kernel.org tree and I think all major distro kernels have it on
         | now. The feature has been in the upstream kernel since 2013.)
         | 
         | If you want to do this at scale, a handy tool is bwrap(1) from
         | https://github.com/containers/bubblewrap . (The README talks
         | about how bwrap is a setuid program to prevent the need for
         | that sysctl, but it also works great as a non-setuid program
         | when that sysctl is enabled, and its value is it has a bunch of
         | handy command-line flags for this sort of thing. We use it
         | extensively at my workplace in non-setuid mode for things that
         | don't quite need containers but need to see alternative root
         | directories etc.)
        
       ___________________________________________________________________
       (page generated 2021-07-22 23:01 UTC)