[HN Gopher] File Permissions: A painful side of Docker (2019)
___________________________________________________________________
File Permissions: A painful side of Docker (2019)
Author : zdw
Score : 104 points
Date : 2021-05-28 04:04 UTC (3 days ago)
(HTM) web link (blog.gougousis.net)
(TXT) w3m dump (blog.gougousis.net)
| addingnumbers wrote:
| This doesn't even really seem like a problem that docker
| introduced. All these problems have been encountered by anyone
| running an NFS server, or a dozen other ways you can have systems
| with disparate uid/gid mappings using a shared or removable file
| system
| mschuster91 wrote:
| > First of all, security issues may rise in a production system.
| If a container is compromised and the container is executed as
| root (uid = 0), then the intruder has access to any file of the
| host filesystem that has been loaded to the container filesystem
| through a mount. The owner UID of files that belong to the host
| root will be 0 in the container. So, they will be accessible to
| the intruder.
|
| Use supervisord to coordinate the processes inside your Docker
| container, as easy as that. Bonus point, you don't need to
| wrangle with properly handling "docker stop"/ctrl+c.
| throayobviousl wrote:
| Isn't this a bit of an anti-pattern? There really are very few
| situations in which you should be mounting things in production.
| Apache/PHP/etc is definitely not one of those situations.
| q3k wrote:
| I would absolutely say it's a production antipattern to run a
| container with access to some already existing host files
| belonging to some other user.
|
| However, this is something that's basically unavoidable if
| you're attempting to use OCI/Docker for dev where you access a
| developer's source code checkout from a container running a
| standardized language runtime. And that's what a lot of people
| use OCI/Docker for...
| dividedbyzero wrote:
| Couldn't you run into this issue when mounting device files?
| I believe doing that for accessing external hardware or
| sensors is not all that uncommon.
| q3k wrote:
| Sure, that's one of the cases when this might needed in
| prod (although in the parent post I meant only access to
| honest-to-god data files, not things like bindmounting
| /dev).
|
| In practice bindmount smell can also be somewhat alleviated
| by using things like k8s device plugins to request things
| at a higher level ('I want GPU access' vs. 'please
| bindmount /dev/drm... and use the proper modes'). It's
| still effectively a bindmount, but some extra security
| precautions can be made to ensure exclusive access and that
| no arbitrary mounts from the host are permitted. And things
| like k8s device plugins can also poke at file modes and
| other namespace magic at runtime so that the end user never
| has to worry about things like UID/GID and chardev modes.
| That IMO prevents the smell associated with random host
| bindmouts.
| dividedbyzero wrote:
| I wasn't aware of k8s device plugins, that seems like it
| would help with that, if k8s is an option. Thanks for the
| pointer!
| q3k wrote:
| You're welcome :).
|
| They're also very easy to write, so if you ever happen to
| run k8s and need to give workloads access to some
| odd/custom host hardware, implementing a proper plugin
| for it is quite painless and gives much better guarantees
| than plain bindmounts.
| nicce wrote:
| Every additional mount can be considered as extra failure in
| design in terms of security or just being considered as
| laziness. Those all increase the attack vector. Even though
| containers are not designed in terms of isolation, every mount
| and volume are one step closer to break this isolation. Of
| course, the total risk depends on where from you are mounting.
| gourlaysama wrote:
| There is a new mount syscall in Linux 5.12, see "ID remapping in
| mounts" [1], that should help with all the permission madness,
| eventually.
|
| It allows different mounts to expose the same content with
| different ownership, and in general to map permissions IDs
| between mounts in any way we like.
|
| systemd-homed wll use that to abstract over the uids and gids of
| portable home directories, for example.
|
| [1]: https://lwn.net/Articles/837566/
| xorcist wrote:
| Using uid 0 in containers is asking for trouble. Any privileged
| resources (such as low ports) can be mapped in without messing
| with capabilities so there should be no need for it.
| hda111 wrote:
| The port mapping is done by the container engine, not the
| container. Using low ports is allowed if the engine runs as
| root. Moreover I think it's acceptable to use uid 0 inside a
| rootless container like podman since it's by default only
| mapped to the user running it.
| 0xbadcafebee wrote:
| AWS Fargate won't let you remap ports. Whatever the container
| exposes, that's the port it's going to listen on. To work
| around this and other problems, I ended up making fat
| containers that start as root, and add entrypoints that can
| either run a process as root (to listen on low ports) or sudo
| to a user to drop perms before starting a process (to listen
| on high ports).
|
| There's also weird junk you sometimes need to do in order to
| capture file handles depending on how a container engine is
| running the container, which you need to do before you fork
| or drop privs. But it took me years to finally run into that
| use case, most people will never need to do this.
| choeger wrote:
| No one mentions how podman solves that problem with user id
| mapping?
| hda111 wrote:
| I mentioned this in another comment here.
| dividedbyzero wrote:
| Would you mind elaborating?
| choeger wrote:
| I don't have the time to write elaborate comments right now,
| but see here:
|
| http://docs.podman.io/en/latest/markdown/podman-run.1.html
|
| Especially the "userns" option with the "keep-id" value.
| FooBarWidget wrote:
| I blogged about this same problem a month ago.
|
| "Docker and the host filesystem owner matching problem":
| https://www.joyfulbikeshedding.com/blog/2021-03-15-docker-an...
|
| In my blog post I layout 2 solution strategies, how one might go
| about implementing them, and caveats to watch out for.
|
| 1. Matching the container's UID/GID with the host's UID/GID.
|
| 2. Remounting the host path in the container using BindFS.
| 988747 wrote:
| I think all those problems disappear when you run containers with
| proper orchestration tools, such as Kubernetes.
|
| And not only that, I think that examples given in the article
| ("Assume that your Apache/PHP container is mounting the host's
| /home/alexandros/myapp/ application directory to the container's
| /var/www/html directory.") are in fact anti-patterns. If your
| container depends on specific file being available at specific
| location on the host then you're doing it wrong. The only place
| where that makes sense is on developer's local environment. In
| shared enviornments you want something like Kubernetes ConfigMap
| to contain config files, and dedicated persistent volumes for
| everything else.
| 0xbadcafebee wrote:
| The orchestration tool does not provide any additional
| functionality to fix this problem, it's up to the container
| execution environment, and today's container execution
| environments have no way (that I am aware of) to natively map
| file permissions outside of the container.
|
| It could be I just haven't dug enough into the kernel
| internals, maybe there is a transparent permissions remapping
| thing. But something would absolutely have to map permissions.
| Otherwise there is no way to use filesystem ownership between
| execution environments without them using conflicting UID/GIDs,
| to say nothing of changing the file perms.
| hda111 wrote:
| In Podman this is a solved problem: podman run --userns=keep-id
| Honiix wrote:
| also `podman unshare` is really helpful
| voidfunc wrote:
| Ive always solved this by just having a proxy script that creates
| a user when the container starts with the right UID/GID then
| executes the given command.
| dandarie wrote:
| > The problem with this approach is that is not portable. What if
| I am developing using more than one computers where in each
| computer my user has different ID?
|
| Make the build script use local $USERID and $GROUPID as args
| during the build process.
|
| In docker-compose.yml (or, if using docker directly, using
| --build-arg): build: context: ./build
| args: USERID: ${USERID} GROUPID:
| ${GROUPID}
|
| So you're passing the local uid and gid as variables to the build
| process.(1)
|
| In build/Dockerfile: FROM image:tag WORKDIR
| "/application" ARG USERID ARG GROUPID RUN
| if [ ${USERID:-0} -ne 0 ] && [ ${GROUPID:-0} -ne 0 ]; then
| userdel -f www-data ;fi \ && if getent group ${GROUPID} ;
| then groupdel www-data; fi \ && groupadd -g ${GROUPID}
| www-data && useradd -m -l -u ${USERID} -g www-data www-data -s
| /bin/bash \
|
| (1) $USERID and $USERID might not be available as an environment
| variable on your system. To do so, place this under .bashrc:
| export USERID=$(id -u) export GROUPID=$(id -g)
| q3k wrote:
| But that doesn't solve the problem, just works around it:
|
| 1. Images are still pre-baked with a given UID/GID pair, so you
| can't distribute them as something universal and reusable.
|
| 2. This requires workarounds / extra steps on a local
| workstation, so it doesn't work for everyone unless they follow
| a given project's unique quirks setup.
|
| Shell/compose duct tape like this doesn't make for a great
| experience, this really should be solved by upstream projects
| themselves as it's an extremely common issue when attempting to
| use Docker.
| dandarie wrote:
| 1. Nope, they are not pre-baked. They are built at runtime
| from env vars on each machine. 2. One step, setting up two
| vars. They can be set by a build script. Lots of things have
| build scripts way more complicated than this.
|
| The only tedious thing is you have to adapt this for every
| image type you run.
| momothereal wrote:
| If you have to build it on each machine, I would not
| consider that easily/universally distributable. One of the
| key points of Docker is you can build once (in your CI or
| someone else's) and run it on any machine. I think that was
| GP's point.
| woodrowbarlow wrote:
| but that _requires_ you to build-at-runtime, which is
| sometimes not the best way to deploy a docker app. if you
| have one app that you want to run on many nodes, you'll
| want to set up a docker registry and have the nodes pull
| pre-built images.
| dandarie wrote:
| Of course, but really only build once on every machine.
| The subsequent starts use the cached build, even after
| reboot.
|
| In fact, docker-compose up -d takes care of the build
| thing by itself. It's a five second tradeoff for the
| lifetime of the application.
| lukeck wrote:
| For anyone that uses immutable infrastructure where
| servers' configuration is never once built and subsequent
| deployments result in replacement with entirely new VMs,
| building once per machine still happens every time there
| is a deployment. You don't ever reboot these machines.
|
| In environments where vulnerability scanning of docker
| images used is important, running anything in production
| that isn't stored in a docker registry kind of breaks
| things.
|
| This approach also won't work with container
| orchestrators like Kubernetes, ECS, Lambda, CloudRun,
| etc.
|
| Where I can see doing a docker build of a small layer
| that just sets file perms potentially being useful is for
| container based dev environments to be ran on laptops and
| workstations.
| oauea wrote:
| Sure, great, let me just rebuild all my docker images on
| every single machine they run on thereby completely
| defeating the point of having images in the first place.
| dandarie wrote:
| You start from a base image of your choice. You only
| build the user replacement part.
|
| You run docker-compose build ONCE and you're set. On my
| machine, it takes five seconds.
|
| Heck, you can even run docker-compose build everytime you
| start the application, it will use the cached build and
| take less than one second.
|
| ---
|
| Correction: the docker-compose up -d takes care of the
| build process the first time it runs.
|
| Literally, it takes more to complain about the issue than
| build the image ONCE.
| q3k wrote:
| > The only tedious thing is you have to adapt this for
| every image type you run.
|
| The tedious thing is that this escalates into complexity
| whenever you have to deal with K developers using M
| projects developed by N teams each using a different way to
| handle this:
|
| Do I need to set USERID for project foo, or UID? Does it
| default to 1000 or the author's UID? Oh, someone has a
| problem with our project, did they remember to set
| COMPANY_USERID in their bashrc? Oh, wait, they're using
| zsh, how do you do that there? Oh, but they followed this
| other project's readme and that set COMPANY_USERID but not
| COMPANY_GROUPID...
|
| Docker is supposed to simplify this by unification and a
| limited API surface, and applying hacks like this on top
| kind of kills that whole premise.
| dandarie wrote:
| > Do I need to set USERID for project foo, or UID? Does
| it default to 1000 or the author's UID? Oh, someone has a
| problem with our project, did they remember to set
| COMPANY_USERID in their bashrc? Oh, wait, they're using
| zsh, how do you do that there? Oh, but they followed this
| other project's readme and that set COMPANY_USERID but
| not COMPANY_GROUPID...
|
| You set it to the output of id -u and id -g. It's two
| lines. There are definitely lots of things more complex
| when dealing with docker than this.
|
| You provide the team with a script containing those two
| lines and a docker-compose wrapper and you're set.
|
| Of course it would have been better not to have to care
| about these things, but hey, at least you're not
| installing and configuring 4-5 services to bootstrap an
| application.
| rad_gruchalski wrote:
| It's a feature for a multi-tenant deployment if you use user
| remaps. Maybe you only allow specific tenant containers with
| tenant specific uid/gid.
| VLM wrote:
| > more than one computers where in each computer my user has
| different ID
|
| Decades of network filesystem users have had many solutions to
| that.
| Joker_vD wrote:
| I can think of basically two solutions:
|
| 1) pass user/group names around and resolve them at the
| destination to UID/GID; 2) ignore them entirely; assign
| ownership of all newly created files to the currently
| authenticated user (if authorized).
|
| Are there other ones?
| fiddlerwoaroof wrote:
| 3) treat a machine-id/user-id pair as the "real userid" 4)
| add a remote->local userid mapping feature to your
| filesystem.
| encryptluks2 wrote:
| Containers are ideally meant for a single service. The best way
| I've found is to just pass the `--user` flag to `docker run`
| and have the service run as whatever user it is that you want.
| The only challenge is that you need to make sure that the
| volume mounts are already created on the host with the correct
| permissions.
| dandarie wrote:
| That runs the container as a given usee, but doesn't prevent
| the container running some processes as a different internal
| user.
| professor_v wrote:
| Within docker-compose.yml I use services:
| foo: image: foo/bar:6.9 user:
| ${UID:-1000}:${UID:-1000}
|
| On Linux with Bash it runs with your current user and most
| other platforms it runs with id 1000, which is setup as the
| default user in the Dockerfile. This is no problem on MacOS or
| Windows because of the way Docker-Desktop uses VM's.
|
| ZSH or other shells don't necessarily set $UID, so if you're
| running Linux, not id 1000 and not running Bash you might need
| a little .env file with `UID=1001` in it to make it work. And
| then the user is still nameless in the container. This is kind
| of rare and I only use it for dev containers where most
| relevant files (and permissions) are bind-mounted from the
| host, so it hasn't really been a problem in practice.
|
| Remaps would be cleaner but I find it too much work to explain
| for normal developers just wanting to use a dev container.
| dandarie wrote:
| From my experience, UID is not always available as to docker-
| compose.yml because it isn't exported (at least in bash).
|
| See more here: https://stackoverflow.com/a/50900530/15428104
|
| $ declare -p UID declare -ir UID="1000"
|
| The -x option is missing.
| StavrosK wrote:
| This has been a major Docker pain point, and not many people
| know about this trick. I didn't know you could have the
| variables in the Compose file directly, does that really work?
|
| Our approach so far was to add yet another layer (a script to
| pass uid/gid to Compose), but if we don't need the script that
| would be fantastic.
|
| EDIT: Ah, I just saw the bashrc wrinkle you mention. Yeah,
| that's why we had the script, and it's a damn shame Docker
| can't do this natively. It has been a _major_ hassle.
| nickjj wrote:
| > I didn't know you could have the variables in the Compose
| file directly, does that really work?
|
| Yep, it's because the build args get read in from a .env file
| by default and then from there Docker Compose sends those
| build args to Docker when it builds the image.
|
| This was one of the topics from my talk at DockerCon last
| week (creating a production ready Docker Compose set up). The
| video and 6,000 word blog post for it will be coming out
| tomorrow. Both things will be added to the talk's reference
| links at https://github.com/nickjj/dockercon21-docker-best-
| practices.
| StavrosK wrote:
| That's interesting, thanks! My shell sets the USER variable
| (but no USERID or GROUPID), which might be good enough for
| all our developers, but probably not reliable enough for a
| general audience.
| nickjj wrote:
| Honestly in practice everything tends to work fine
| without any hacks or extra scripts.
|
| I run all of my containers as a non-root user and create
| the user in the image with its default values of
| 1000:1000 for the uid:gid. I haven't bothered to expose
| the uid:gid as build arguments because it's pretty much
| never an issue in development or production.
|
| With a uid:gid of 1000:1000 built into the image any bind
| mounted files end up being correctly owned by the Docker
| host's user under the following conditions:
|
| - Docker Desktop on macOS
|
| - Docker Desktop on Windows using WSL 1
|
| - Docker Desktop on Windows using WSL 2 and native Linux
| (as long as your dev box's user is set to 1000:1000)
|
| IMO it's really rare that your dev box's user wouldn't be
| 1000:1000 on native Linux or WSL 2.
|
| In production you also have full control over the uid:gid
| of your deploy user.
|
| The only time where it kind of stinks is CI, but it's
| super easy to get around this by simply not using volumes
| in CI.
|
| I have a bunch of examples of this pattern at:
| - https://github.com/nickjj/docker-flask-example
| - https://github.com/nickjj/docker-django-example
| - https://github.com/nickjj/docker-rails-example
| - https://github.com/nickjj/docker-phoenix-example
| - https://github.com/nickjj/docker-node-example -
| https://github.com/oleksandra-holovina/docker-play-
| example
| q3k wrote:
| > IMO it's really rare that your dev box's user wouldn't
| be 1000:1000 on native Linux or WSL 2.
|
| Any company-wide (GNU/)Linux deployment that uses LDAP or
| some other centralized user directory will not have devs
| with UID/GID 1000:1000. Hope is not a strategy.
| nickjj wrote:
| > Any company-wide (GNU/)Linux deployment that uses
| LDAP...
|
| You can go the extra mile and turn the UID:GID into build
| args like the original parent and you're good to go. No
| hacks necessary, and since it's all self contained into a
| .env file there's nothing extra you need to run since
| you're likely using an .env file already for other vars.
|
| Alternatively you could do this:
| https://news.ycombinator.com/item?id=27344491
|
| In either case you can solve the problem without too much
| effort.
| q3k wrote:
| > You can go the extra mile and turn the UID:GID into
| build args like the original parent and you're good to
| go.
|
| That doesn't help you if you're attempting to use pre-
| built/existing Docker images that are not built
| internally and make the assumption that "1000:1000 is
| good enough". You then not only have to hack around
| Docker limitations, but also around someone else's broken
| assumption.
| nickjj wrote:
| > That doesn't help you if you're attempting to use pre-
| built/existing Docker images that are not built
| internally
|
| Most pre-built images that I've come across don't require
| bind mounts to function.
|
| Images like PostgreSQL aren't affected by this because
| you can use a named volume, and most pre-built
| applications that are shipped as images tend to store
| their state in a database and don't require bind mounts
| to function.
| dilatedmind wrote:
| maybe i did something weird last time i installed ubuntu,
| but my user is 1001:1002 and the default ubuntu user is
| 1000:1001
| 1_player wrote:
| IIRC on Arch, unless you create your own group, you're
| part of the users group, with GID 100
| mschuster91 wrote:
| > IMO it's really rare that your dev box's user wouldn't
| be 1000:1000 on native Linux or WSL 2.
|
| Any major company using LDAP/AD or other forms of
| centralized user management won't be able to make that
| guarantee.
|
| > In production you also have full control over the
| uid:gid of your deploy user.
|
| If you're running in an un-managed environment, yes -
| managed hosting of any kind generally doesn't provide
| these guarantees.
| staticassertion wrote:
| It makes sense that mounting a volume requires understanding a
| user mapping tbh. I think the answer is twofold:
|
| a) Many problems solvable with a volume can be solved with a
| bind-mount, cache-mount, etc [0].
|
| b) In the event that you actually need to map in a user-file,
| wrap the docker command in a script that manages the logic. At
| this point you're writing a system tool that's doing things
| outside of the context of a container - it's not really docker's
| fault that it doesn't try to make this trivial.
|
| [0] https://vsupalov.com/buildkit-cache-mount-dockerfile/
| viraptor wrote:
| > If this user is the "root", then these files will not be
| accessible from web server or the CGI server, except if the
| server is running as root
|
| Wait, what? Why not install the immutable files as root and let
| them be readable to everyone?
| prpl wrote:
| This is something CharlieCloud was built around for HPC and
| something podman can work around. User namespaces and fuse-
| overlayfs are the building blocks to fix this
| tacone wrote:
| Shameless plug: a boilerplate where I had to solve UID
| permissions, running as non-root user, publishing files to
| another container, mounting fs as read only, and hot reloading in
| dev environment.
|
| It's still pretty much a proof of concept and it relies on docker
| compose but perhaps some of you may find it useful as a starting
| point: https://github.com/tacone/loki
| epage wrote:
| Recently ran into this. So far I've landed on `setfacl`
|
| - `--user` didn't work for me because there were root permissions
| in my image
|
| - I didn't dig into why `userns-remap` didn't work
|
| - I didn't give https://github.com/boxboat/fixuid a try yet
|
| Some notes from my experience setfacl -dm
| "u:alexandros:rw" ~/alpine
|
| should be
|
| setfacl -R -dm "u:alexandros:rwx" ~/alpine
|
| In case:
|
| - `-R`: There is existing content in `~/alpine` you want made
| avalable
|
| - `x`: You want your container to be able to create directories
|
| However, you can still run into problems if
|
| - Your container copies data from outside your bind-mount to
| inside. It sort-of worked except somehow the mask was `r--`,
| making things lose writeable.
|
| - Your container moves data from outside your bind-mount to
| inside. This fully preserves the permissions.
|
| I ended up creating a `.keep` file in the bind mount and doing a
| `cp --attributes-only --preserve=mode,ownership,xattr .keep
| <target>`
| hasheddan wrote:
| Related to this post, a recent runc version included a change
| that inadvertently made a number of images built on the
| distroless base image difficult to use:
| https://danielmangum.com/posts/runc-chdir-to-cwd/
___________________________________________________________________
(page generated 2021-05-31 23:01 UTC)