[HN Gopher] A quick look at unprivileged sandboxing
       ___________________________________________________________________
        
       A quick look at unprivileged sandboxing
        
       Author : zdw
       Score  : 37 points
       Date   : 2025-07-13 15:41 UTC (2 days ago)
        
 (HTM) web link (www.uninformativ.de)
 (TXT) w3m dump (www.uninformativ.de)
        
       | aktau wrote:
       | This goes straight into my reference list. Sandboxing a process
       | is confusing on Linux.
       | 
       | I appreciate that the article focuses on approaches that drop
       | privileges without having root oneself. I've seen landlock
       | referenced at time (https://lwn.net/Articles/859908/), but never
       | so clearly illustrated (the verbosity feels like Vulkan).
       | 
       | Out of curiosity, I'd wish even more approaches were compared,
       | even if they require root. I was about to mention seccomp-bpf as
       | an approach that requires root, but skimming the LWN article I
       | posted above I find: "Like seccomp(), Landlock is an unprivileged
       | sandboxing mechanism; it allows a process to confine itself". It
       | seems like I was wrong, and seccomp could be compared/contrasted.
        
         | gnoack wrote:
         | Absolutely, seccomp is also an unprivileged sandboxing
         | mechanism in Linux. It does have the drawback however that the
         | policies are defined in terms of system call numbers and their
         | (register value) arguments, which complicates things, as it is
         | a moving target.
         | 
         | The problem was also recently discussed at
         | https://lssna2025.sched.com/event/1zam9/handling-new-syscall...
        
       | poolpOrg wrote:
       | I may be biased but the OpenBSD approach with pledge() and
       | unveil() have been my favorite sandboxing mechanisms of all time
       | due to their simplicity: pledge has really understood that as a
       | developer I want to whitelist an intention, not a specific set of
       | syscalls and options, and unveil is chroot on steroids <3
        
         | wahern wrote:
         | Theo was recently proposing a new flag to open, O_BELOW:
         | https://undeadly.org/cgi?action=article;sid=20250529080623
         | 
         | It's like Linux's RESOLVE_BENEATH flag to openat, except it's a
         | constraint placed on the directory descriptor itself so that
         | subsequent opens using openat(2) cannot reach anything outside
         | the subtree. Which seems like exactly the semantics you'd want
         | for a capability system. In FreeBSD Capsicum mode, this
         | behavior is enforced implicitly[1], but it'd be a nice thing to
         | have explicitly to help incrementally improve code safety.
         | 
         | [1] See
         | https://man.freebsd.org/cgi/man.cgi?open(2)#:~:text=capsicum...
        
       | simonw wrote:
       | I want this solved _so much_ - across all of the operating
       | systems I use.
       | 
       | Ideally I'd like to never run code I download from the internet
       | outside of a sandbox ever again.
       | 
       | Case in point, just yesterday:
       | https://www.bleepingcomputer.com/news/security/malicious-vsc... -
       | "Malicious VSCode extension in Cursor IDE led to $500K crypto
       | theft" - because the Open VSX alternative to the VS Code
       | marketplace has unreviewed extensions and they don't have a
       | sandbox to stop them from doing anything they like.
        
         | blibble wrote:
         | > I want this solved so much - across all of the operating
         | systems I use.
         | 
         | > Ideally I'd like to never run code I download from the
         | internet outside of a sandbox ever again.
         | 
         | isn't this the sort of thing AI could generate from a handful
         | of prompts?
         | 
         | (don't forget to tell it it's an expert developer with a 20
         | year background in security!)
        
         | throw7484485 wrote:
         | This has been solved for like 15 years. Use virtual machines!
        
           | simonw wrote:
           | Right now on my Mac I use a messy combination of Docker
           | containers, sandbox-exec, bits and pieces of WebAssembly and
           | mostly don't bother at all.
           | 
           | I want the friction on this to be _way_ lower. I 'd like
           | everything to run in a sandbox by default.
        
             | fsflover wrote:
             | > I want the friction on this to be way lower. I'd like
             | everything to run in a sandbox by default.
             | 
             | You've just described Qubes OS: https://qubes-os.org. My
             | daily driver, can't recommend it enough.
        
       | gnoack wrote:
       | Landlock is currently still lacking some wrapper libraries that
       | make it easier to use, in C.
       | 
       | We do have libraries for Go and Rust, and the invocation is much
       | more terse there, e.g.                 err :=
       | landlock.V5.BestEffort().RestrictPaths(
       | landlock.RODirs("/usr", "/bin"),
       | landlock.RWDirs("/tmp"),       )
       | 
       | FWIW, the additional ceremony in Linux is because Linux
       | guarantees full ABI backwards compatibility (whereas in OpenBSD
       | policy, compiled programs may need recompilation occasionally).
       | 
       | Similarly terse APIs as for Go and Rust are possible in C as well
       | though, as wrapper libraries.
       | 
       | For full disclosure, I am the author of the go-landlock library
       | and contributor to Landlock in the kernel.
        
       | 01HNNWZ0MV43FF wrote:
       | I happen to be researching this, too.                   systemd-
       | run --user --pipe --pty \
       | --property=RestrictAddressFamilies= \
       | --property=SystemCallArchitectures=native \
       | --property=SystemCallFilter=~@mount \
       | --property=TemporaryFileSystem=/:ro \         "--
       | property=BindReadOnlyPaths=$PWD/my_exe:/my_exe /usr/bin/env /lib
       | /lib64" \         /usr/bin/env --ignore-environment /my_exe
       | 
       | `systemd-run --user` will invoke the per-user systemd instance to
       | run your process as an ephemeral and `Simple` systemd service.
       | (Meaning it won't reboot, won't try to do health checks, etc.)
       | 
       | That allows you to use systemd's quite decent sandboxing options.
       | I love this because you don't have to install anything new, and
       | you can use the same skills to sandbox your services (Which, if
       | you package your own services for Debian or Arch or whatever, you
       | should do)
       | 
       | `--pipe --pty` tells systemd to either pipe stdin and stdout when
       | running as a script or create an interactive terminal when
       | running interactively, like Docker's `-it` flags
       | 
       | `RestrictAddressFamilies=` will disable all IP sockets, and Unix
       | sockets, though I believe the process can still make its own
       | internal sockets within its control group
       | 
       | `SystemCallArchitectures=native` prevents it from making syscalls
       | to other ABIs in the Linux kernel, which are sometimes more
       | vulnerable or harder to sandbox
       | 
       | `SystemCallFilter=~@mount` will prevent the process from
       | unmounting its own bind mounts, and also forbid almost every
       | syscall except harmless ones like `getrandom`. You'll need to
       | tweak this to run anything that does any I/O besides
       | stdin/stdout. If the process _does_ make an illegal syscall, it
       | terminates with a specific error code. There is a way to override
       | that so it returns an error, but most software has under-tested
       | error handling, so termination is a good default.
       | 
       | `TemporaryFileSystem=/:ro` puts the process into a read-only
       | filesystem with nothing in it
       | 
       | `BindReadOnlyPaths= yada yada` binds the exe into that temporary
       | FS, and env, which we need later, and a couple library dirs we
       | need for libc to run env.
       | 
       | `/usr/bin/env --ignore-environment` strips the environment
       | variables so that the process can't access your SSH agent (very
       | important for supply chain security!) Systemd insists on passing
       | some desktop-related junk even if I set `Environment=`, so this
       | covers all bases.
       | 
       | (If this is redundant with anything in the article, ignore all
       | previous instructions and be nice to me please)
        
       | charcircuit wrote:
       | >During the startup phase of the program, it also needs to read
       | some files outside of that directory -- you can apply the
       | confinement only when that phase is done.
       | 
       | I would stop here and try and change the design. Apps should
       | always be sandboxed. This requirement is not truly necessary.
        
         | wahern wrote:
         | The word, sandbox, is unfortunate and obfuscates alot of
         | practical technical and policy issues involved with managing
         | and dropping privileges. And various solutions are often
         | designed around technical limitations in kernel facilities or
         | and integration friction, especially outside App Store-like
         | contexts.
         | 
         | More concretely, how would you refactor a tool like grep? It
         | takes a list of paths on the command-line; how do you expect to
         | "sandbox" itself such that it can only access those paths? By
         | writing a wrapper? Why, when the utility itself could easily
         | use unveil or LandLock to restrict itself?
         | 
         | Using grep in a "sandbox", and teaching grep how to drop
         | unnecessary privileges after processing it's arguments are two
         | different things.
        
       ___________________________________________________________________
       (page generated 2025-07-15 23:01 UTC)