[HN Gopher] Systemd service sandboxing and security hardening (2...
___________________________________________________________________
Systemd service sandboxing and security hardening (2020)
Author : capableweb
Score : 242 points
Date : 2022-01-18 10:31 UTC (1 days ago)
(HTM) web link (www.ctrl.blog)
(TXT) w3m dump (www.ctrl.blog)
| the8472 wrote:
| Alas, no whitelisting option. A service should start in an empty
| filesystem root without network access - and if we had something
| as convenient as pledge() also without any allowed syscalls - and
| then you could only add what is needed.
|
| firejail does this a bit better but it also started out with a
| blacklist approach and it's more geared towards desktop
| application use, not system services.
| Icathian wrote:
| One of my favorite podcasts, Risky Business[0] regularly plugs
| Airlock[1]. They seem like they might be the one out front, at
| least as a paid service.
|
| [0] https://risky.biz/netcasts/risky-business/ [1]
| https://www.airlockdigital.com/
| 5e92cb50239222b wrote:
| What's the problem with firejail? Start with an empty profile,
| blacklist everything, and whitelist only the stuff you need. It
| works just fine for server applications, and unlike systemd
| isolation flags you can setup a proper separate firewall with
| the `netfilter` option.
| the8472 wrote:
| > blacklist everything,
|
| That isn't a whitelist approach.
| goodpoint wrote:
| That is exactly an allowlist approach.
| Someone wrote:
| > blacklist everything, and whitelist only the stuff you
| need
|
| That _is_ a whitelist approach.
| Seirdy wrote:
| Firejail has had multiple sandbox escape vulns in the past.
| Firejail is an SUID executable in which sandbox escapes can
| lead to privilege escalation. In contrast, Systemd allows you
| to run services as unprivileged users, and even create users
| on demand.
|
| Systemd also supports firewalling: it supports IP address
| allow/deny policies, ports, etc. For more advanced firewall
| policies you're probably better off using an actual firewall
| daemon like firewalld or ufw.
| Someone wrote:
| pledge is excellent, but it protects programmers against
| writing security bugs that have large impact, it doesn't
| protect you against the software they write. It's those
| programmers who restrict what their tools can do, and who
| decide when to throw the switch to enable those restrictions.
|
| If you trust those programmers, it's indeed way more convenient
| than other tools, if only because it removes the need for
| configuring things twice. For example, instead of configuring
| your web server to serve files from _/ foo/bar/_ _and_ telling
| SELinux that your web server is allowed to read from _/
| foo/bar_, you only configure the web server, and it will tell
| the OS "I shouldn't read from anything but _/ foo/bar_,
| starting ... now".
|
| You'll have to trust the web server to do that, though.
| the8472 wrote:
| That's what it is intended for. But pledge has nice
| properties beyond that which are also useful for external
| sandboxing. Such as defining easy to understand syscall
| groups maintained by the kernel as new syscalls are
| introduced. If linux had that we could for example grant
| stdio+rpath and not worry about the kernel introducing
| preadv3 and programs compiled with that getting broken or
| suboptimal performance when isolated and it would
| automatically apply to equivalent io_uring implementations
| block equivalent SQEs too.
| ape4 wrote:
| Apache needs to start as `root` but then drops to an non-
| privileged user. systemd's `User=<user>` can't really express
| that. Perhaps an option that says a unit needs to be root until
| the first fork when it has to be a specified user.
| `ForkUser=apache`
| staticassertion wrote:
| This is one of the main problems with "whole program
| sandboxes". Many times a program only needs permissions right
| at the start and then never again. From the outside though
| there's no way to signal "OK, I'm done, lock me down" for most
| sandboxing systems.
|
| One approach that _may_ work with systemd is to have two
| processes. One would be a broker, running as root. It would
| grab a port, for example. The other process would be spawned by
| the broker as a limited service and inherit that port from the
| parent, with no permissions of its own to open it, only to
| inherit.
|
| IDK how to express that in systemd-land though. At that point
| you might be better off just writing the code to sandbox things
| yourself.
| candiddevmike wrote:
| It only needs to root to bind to privileged ports I believe.
| You should be able to use a non-root user and give it
| CAP_NET_BIND_SERVICE:
|
| [Service]
|
| AmbientCapabilities=CAP_NET_BIND_SERVICE
| ape4 wrote:
| Cool! But then I suppose the forked processes could then bind
| to a low numbered port - something they can't do now. So
| Apache would have to make sure to revoke that capability when
| forking.
| 5e92cb50239222b wrote:
| You could combine it with something like this
| SocketBindDeny=any SocketBindAllow=tcp:80
| SocketBindAllow=tcp:443
|
| These ports should be denied by the kernel because they're
| already taken by httpd, and all other will be denied by bpf
| filters installed by systemd.
|
| It feels like plugging holes in a dam, but that's what you
| do with popular operating systems.
| 5e92cb50239222b wrote:
| I don't know about httpd specifically, but many applications
| want root only to be able to bind to a privileged port (like
| :80). This can be circumvented in one of a few ways:
|
| 1. add this to .service
| AmbientCapabilities=CAP_NET_BIND_SERVICE
|
| 2. or listen on :8080 and use NAT: iptables -t
| nat -I OUTPUT -p tcp -o lo --dport 80 -j REDIRECT --to-ports
| 8080
|
| 3. or make the port unprivileged sysctl -w
| net.ipv4.ip_unprivileged_port_start=80
|
| It may work for httpd too, I haven't tested it.
| [deleted]
| Un1corn wrote:
| The correct Systemd solution would be to create a socket unit
| but your solutions works without modifying the service code
| growse wrote:
| I think this requires support from the service, no?
|
| Not everything that wants to open up a port seems to
| support socket activation. I tried with 6tunnel and
| couldn't get it to work.
| Spivak wrote:
| I can't find anything for an officially supported for
| Apache or Nginx to support inetd/systemd socket activation
| bit it certainly would be nice.
| marcosdumay wrote:
| Apache also uses the start user to read stuff like TLS
| private keys, that its normal user does not have access to.
| ape4 wrote:
| And I think its common for the log files to be in
| /var/log/httpd owned by root but I suppose they could be
| moved and chown-ed.
| eliaspro wrote:
| Using systemd's LogDirectory= directive will fully take
| care of ensuring the required directory is present and
| permissions match the defined User=/Group= of the unit.
| VTimofeenko wrote:
| It's possible to remove the root requirement for this
| through systemd's credentials mechanisms:
|
| https://www.freedesktop.org/software/systemd/man/systemd.ex
| e...
| eliaspro wrote:
| Many applications don't need to bind the port themselves but
| will happily accept one passed to them during process
| invocation.
|
| This allows to let systemd to manage ports using socket units
| which will also stay up and buffer requests when restarting a
| service, allow service activation on demand/incoming requests
| or per connection service instances, e.g. for better isolation
| of sshd's per connection/user.
| a-dub wrote:
| can you limit outbound network access to specified
| masks/ports/devices on a per-service level?
| [deleted]
| 5e92cb50239222b wrote:
| This is a pretty lax policy IMHO, you can go much farther. These
| days I usually start with this, it's much more strict:
|
| https://news.ycombinator.com/item?id=29976096
|
| Or simply follow whatever `systemd-analyze security` recommends,
| just make sure you run it on a system with recent systemd.
| westurner wrote:
| Which distro has the best out-of-the-box output for:?
| systemd-analyze security
|
| Is there a tool like `audit2allow` for systemd units?
| selinux/python/audit2allow/audit2allow:
| https://github.com/SELinuxProject/selinux/blob/master/python...
|
| https://stopdisablingselinux.com/
| [deleted]
| goodpoint wrote:
| Debian does a lot of sandboxing.
| aidenn0 wrote:
| To the point where it breaks logind on NIS setups...
| 5e92cb50239222b wrote:
| > Which distro has the best out-of-the-box output
|
| I haven't seen any difference between distributions with the
| same systemd version. Anything with a recent one should do
| fine. More recent than RHEL8, mind you (which is on systemd
| 239): for example, a syscall allow/deny analysis is buggy
| there and asks you to enable some protections, and then
| disable them. The same unit is analyzed correctly on my
| desktop with v250 (I use the popular rolling release
| distribution).
|
| I haven't seen anything like audit2allow. It's probably not
| especially necessary because of the difference in
| philosophies: SELinux is deny by default, while in systemd
| you're playing whack-a-mole anyway, and are expected to add
| directives one by one until the application stops working.
| Unit logs usually make it obvious if something was denied.
| Arnavion wrote:
| The usual way I've seen (and do myself) is to just let the
| process be killed and have its coredump taken, then
| `coredumpctl gdb $process_name -A '-ex "print $rax" -ex
| "quit"'` to get the syscall number, then check `systemd-
| analyze syscall-filter` for whether I want to allow just
| that one syscall or the whole group it's in.
| growse wrote:
| > The usual way I've seen (and do myself) is to just let
| the process be killed and have its coredump taken, then
| `coredumpctl gdb $process_name -A '-ex "print $rax" -ex
| "quit"'` to get the syscall number, then check `systemd-
| analyze syscall-filter` for whether I want to allow just
| that one syscall or the whole group it's in.
|
| Another approach would be to set SystemCallLog= to be the
| opposite of SystemCallFilter= (negate each group with ~)
| and then you'll see the call (and caller) in the journal.
| d2wa wrote:
| This is a getting started/101 introduction; it also talks about
| and recommends systemd-analyze security. There's a link to part
| two at the bottom of the article that goes deeper into things.
| DyslexicAtheist wrote:
| any system that starts security by blacklisting instead of
| whitelisting tends to be doomed by upcoming changes.
| egberts1 wrote:
| Whose gonna write THE holy-grail of analyzer of many executables
| to determine what Linux capabilities, cgroups, and syscalls are
| just being referenced?
|
| Caveat: it has to dig into ALL the linked libraries as well.
| kenniskrag wrote:
| if you want to test these settings I can recommend `sudo systemd-
| run -p "DynamicUser=yes" -p "ProtectSystem=yes" -p
| "ProtectHome=yes" --shell` but be in a readable directory like
| /tmp or you receive an error.
| 5e92cb50239222b wrote:
| This is a very handy command in day-to-day work, actually. For
| example, I use to limit the total amount of memory available to
| an application, including page cache: $
| systemd-run --user --scope --property=MemoryHigh=1G qbittorrent
|
| It works just as you'd expect -- if qbittorrent's working set
| goes above 1024 MiB, it pushes the least recently used page out
| of the page cache. Doesn't really have any effects on upload or
| download speeds, while helping to keep more useful data in
| memory.
|
| Many isolation flags are not available in `systemd-run --user`,
| though, so if you'd like to have some protection you either
| have to combine `sudo systemd-run` with `su -c`, or wrap the
| command in firejail.
|
| https://github.com/netblue30/firejail/
| wmanley wrote:
| I have a bash alias for `make` and `ninja` to do something
| similar. Just having all the spawned processes in a cgroup
| helps with system interactivity while building. This works
| because the kernel will then schedule the whole build as a
| single unit against the other work on the system, rather than
| scheduling each process that the build spawns against every
| other process that I'm running.
| t0astbread wrote:
| Interesting, a few months ago I tried using systemd-run to
| implement unprivileged memory limits for a process and I'm
| pretty sure it didn't work with the user manager. Is this a
| recent addition? (I'm not sure what version of systemd I had
| at the time.)
| pram wrote:
| Ooh, is this a good way to sandbox execs like ImageMagick or
| stuff like that?
| 5e92cb50239222b wrote:
| Use firejail, it's a "one click" solution with prepackaged
| profiles.
|
| https://github.com/netblue30/firejail/
|
| It uses the same kernel knobs as systemd does, but is more
| user-friendly and has more features.
|
| I use it for every application that handles data received
| from other machines: books, images, documents, whatever.
| YorickPeterse wrote:
| You can also use Bubblewrap, but getting it up and running
| requires a lot more fiddling around. For example, this is
| what I use to isolate Zoom from the rest of my system: http
| s://gitlab.com/yorickpeterse/dotfiles/-/blob/0a0492c78b6...
|
| In my case I'm using Bubblewrap because Firejail was only
| used for Zoom, and this felt a bit of a waste considering
| Bubblewrap was already installed.
| max002 wrote:
| Great article :) thank you!
| [deleted]
| HowardStark wrote:
| Is there any advice for working with older systemd versions?
| Right off the bat, systemd 237 is out because there is no
| security feature for that version of systemd-analyze.
| 5e92cb50239222b wrote:
| Use the same config you'd use for the latest systemd version.
| It will ignore flags it doesn't know (and warn you in unit
| logs).
| bloopernova wrote:
| Not meant to be a snarky comment, but a serious question: how
| does this differ from SELinux?
| cpuguy83 wrote:
| They are completely different things, and where available
| should be used together.
|
| SELinux is a policy system where policy is enforced via labels.
|
| Labels are applied to processes which classify what the process
| is.
|
| Labels are applied to files which define the what
| classification of process can access the file.
|
| The application of labels happens automatically based on
| policy. Such policy would include the location of the file or
| the label of the parent process.
|
| As an example, the default policy for httpd would prevent httpd
| from accessing /etc/passwd even though the process is running
| as (or can be) the root user. I believe you could also do
| interesting things like prevent httpd from opening a socket on
| a non-standard port if you wanted to.
|
| SELinux is very powerful but complicated. Ideally you use this
| with distro packages which should have policies already
| configured for you.
|
| Critically it is not one vs the other. Use both if you have it.
| tyingq wrote:
| It seems to be using mostly the linux capabilities:
| https://man7.org/linux/man-pages/man7/capabilities.7.html
|
| So the overlap choice seems to be more around SELinux versus
| Capabilities. Where SELinux is more fine-grained and tunable,
| but more complicated also.
| aseipp wrote:
| It's not just Linux capabilities; on their own Linux
| capabilities actually suck majorly and are very limited (AKA
| "crapabilities"). But systemd also makes extensive usage of
| cgroups and namespacing facilities to back it up e.g.
| preventing runaway memory/CPU quotas and stopping
| applications from accessing paths they shouldn't, restricting
| network access, stuff like that. Some of this overlaps with
| SELinux (e.g. restricting file access) but the mechanism is
| fairly different.
|
| The overlap/comparison between capabilities, systemds
| features, and selinux features isn't really well defined in
| any meaningful way IMO. It's really like 5 different features
| being used in various ways.
| PeterWhittaker wrote:
| I'm curious what you mean by SELinux features not being
| well-defined? While poorly documented, they are
| extraordinarily precisely defined, allowing fine-grained
| control of pretty much everything, all enforced by the
| kernel with no workarounds, at least in enforcing mode.
| staticassertion wrote:
| It's vastly simpler, for one thing. SELinux is basically a
| weird DSL/ programming language for describing system
| interactions whereas systemd is providing a very basic
| interface for common restrictions.
|
| I would pretty much never ask a human being to write SELinux
| policies unless that was explicitly part of their job whereas I
| can pretty much point any developer to what systemd is
| providing and they'll be able to work with it.
| chasil wrote:
| SELinux is designed as "mandatory access control," meaning that
| it is not normally disabled.
|
| The normal filesystem permissions of read/write/execute for
| user/group/other are among those known as "discretionary access
| controls," meaning that they can be relaxed.
|
| The systemd unit security options are discretionary, at the
| control of the administrator.
| t0astbread wrote:
| Is SELinux not also in the administrator's control?
| candiddevmike wrote:
| These days, systemd is better/easier to sandbox _services_ than
| SELinux. SELinux/AppArmor is still the best way to protect
| individual GUI and user apps (anything not ran from systemd
| basically).
| mbakke wrote:
| I don't have much experience with SELinux, but at least in my
| org the base policy is to run anything started interactively
| by the user (or root) in _unconfined_t_ , i.e. with
| protections disabled.
|
| That is, the same command that gets denied by SELinux through
| systemd will run fine (and unprotected) when started from a
| shell.
|
| Do you write your own policies for individual end-user
| programs?
| p_l wrote:
| Easier, maybe. Better, nope. The breadth and detail available
| just don't compare, and not in the way where systemd can even
| touch the scope available to SElinux
| candiddevmike wrote:
| Can you expand on that? In my opinion, systemd has far more
| controls for process security over SELinux (networking,
| cgroups, nspawn sandboxing, etc).
| p_l wrote:
| Out of those, the only things that aren't covered by
| SELinux are things that would be expected to be set by
| wrapper/launcher process (modifying namespaces - which
| covers nspawn and setting cgroups). Everything else, i.e.
| actual run-time access decisions, is more fine grained
| and controllable through SELinux, including level of
| access control like whether a program can listen on a
| socket or bind a socket, while still permitting it to
| connect.
| mst wrote:
| SElinux is more capable in theory but so much less
| usable/discoverable in practice that I suspect anybody who
| isn't truly dedicated to doing SElinux right will end up
| averaging better security via the systemd route.
|
| (and I say this based on both observation and personal
| experience, I have some stuff to harden later this year and
| I'm really hoping I'll be able to involve somebody who
| -has- that level of SElinux knowledge but plan B is almost
| certainly going to be 'mst does his best with the unit
| configs')
| PeterWhittaker wrote:
| As someone who does a fair amount of SELinux
| professionally, I'd mostly agree with this: getting
| started can be daunting, so one could likely get far more
| value from a short time focusing on systemd security.
|
| But if one can spare the time, SELinux can secure
| everything, not just systemd services.
|
| It all depends on the threat vectors one faces.
| p_l wrote:
| That's why I won't even try to suggest SELinux is
| _easier_. It 's definitely easier to apply _some_
| sandboxing through systemd, but it 's pretty coarse
| grained and mostly seems to hit some relatively easy wins
| involving capabilities dropping and stuff that is often
| hidden deep inside PAM. Good start, but I wouldn't call
| it "better" ultimately.
| kaba0 wrote:
| Why not use both? They are not complementary.
| candiddevmike wrote:
| Why would you use SELinux along with systemd? Systemd can
| do filesystem permissions declaratively vs SELinux having
| to label the files individually, e.g.:
|
| [Service]
|
| ProtectSystem=strict
|
| ReadWritePaths=/some/path
|
| ReadOnlyPaths=/some/otherpath
|
| InaccessiblePaths=/etc
| PeterWhittaker wrote:
| One can write extraordinarily short FC files using regexp
| to apply specific SELinux labels as desired, and control
| access to those labels with only a few rules.
|
| Unlike systemd, they then apply to everything.
| loudtieblahblah wrote:
| Does your box still touch local dns before connecting to VPN? No?
|
| Then anything with systemd and security can stuff it.
| getcrunk wrote:
| What?
___________________________________________________________________
(page generated 2022-01-19 23:00 UTC)