[HN Gopher] Optimizing Docker image size and why it matters
___________________________________________________________________
Optimizing Docker image size and why it matters
Author : swazzy
Score : 113 points
Date : 2022-01-06 19:13 UTC (3 hours ago)
(HTM) web link (contains.dev)
(TXT) w3m dump (contains.dev)
| bravetraveler wrote:
| The article doesn't seem to do much... in the 'why'. I'm
| inundated with _how_ , though.
|
| I've been on both sides of this argument, and I really think it's
| a case-by-case thing.
|
| A highly compliant environment? As minimal as possible. A
| hobbyist/developer that wants to debug? Go as big of an image as
| you want.
|
| It shouldn't be an expensive operation to update your image base
| and deploy a new one, regardless of size.
|
| Network/resource constraints (should) be becoming less of an
| issue. In a lot of cases, a local registry cache is all you need.
|
| I worry partly about how much time is spent on this quest, or
| secondary effects.
|
| Has the situation with name resolution been dealt with in musl?
|
| For example, something like /etc/hosts overrides not taking
| proper precedence (or working at all). To be sure, that's not a
| great thing to use - but it _does_ , and leads to a lot of head
| scratching
| 3np wrote:
| I mean on one hand, yeah, but comparing Debian (124 MB) with
| Ubuntu (73 MB) shows that with some effort you can eat your
| cake and have it too.
| yjftsjthsd-h wrote:
| > A highly compliant environment? As minimal as possible. A
| hobbyist/developer that wants to debug? Go as big of an image
| as you want.
|
| Hah, I go the other way; at work hardware is cheap and the
| company wants me to ship yesterday, so sure I'll ship the big
| image now and hope to optimize later. At home, I'm on a slow
| internet connection and old hardware and I have no deadlines,
| so I'm going to carefully cut down what I pull and what I
| build.
| no_wizard wrote:
| I like this article, and there is a ton of nuance in the image
| and how you should choose the appropriate one. I also like how
| they cover only copying the files you actually need, particularly
| with things like vendor or node_modules, you might be better off
| just doing a volume mount instead of copying it over to the
| entire image.
|
| The only thing they didn't seem to cover is consider your target.
| My general policy is dev images are almost always going to be
| whatever lets me do one of the following:
|
| - Easily install the tool I need
|
| - All things being equal, if multiple image base OS's satisfy the
| above, I go with alpine, cause its smallest
|
| One thing I've noticed is simple purpose built images are faster,
| even when there are a lot of them (big docker-compose user myself
| for this reason) rather than stuffing a lot of services inside of
| a single container or even "fewer" containers
|
| EDIT: spelling, nuisance -> nuance
| Sebb767 wrote:
| > I also like how they cover only copying the files you
| actually need, particularly with things like vendor or
| node_modules, you might be better off just doing a volume mount
| instead of copying it over to the entire image.
|
| I'd highly suggest not to do that. If you do this, you directly
| throw away reproducibility, since you can't simply revert back
| to an older image if something stops working - you need to also
| check the node_modules directory. You also can't simply run old
| images or be sure that you have the same setup on your local
| machine as in production, since you also need to copy the
| state. Not to mention problems that might appear when your
| servers have differing versions of the folder or the headaches
| when needing to upgrade it together with your image.
|
| Reducing your image size _is_ important, but this way you 'll
| loose a lot of what Docker actually offers. It might make sense
| in some specific cases, but you should be very aware of the
| drawbacks.
| dvtrn wrote:
| _I like this article, and there is a ton of nuisance in the
| image and how you should choose the appropriate one._
|
| By chance, did you mean _nuance_? Because while I can agree it
| you can quickly get into some messy weeds optimizing an
| image...hearing someone call it a "nuisance" made me chuckle
| this afternoon
| no_wizard wrote:
| I did! Edited for clarification, though it definitely can be
| both!
| somehnacct3757 wrote:
| The analyzer product this post is content marketing for looks
| interesting, but I would want to run it locally rather than
| connect my image repo to it.
|
| Am I being paranoid? Is it reasonable to connect my images to a
| random third party service like this?
| adamgordonbell wrote:
| You might not need to care about image size at all if your image
| can be packaged as stargz.
|
| stargz is a gamechanger for startup time.
|
| kubernetes and podman support it, and docker support is likely
| coming. It lazy loads the filesystem on start-up, making network
| requests for things as needed and therefore can often start up
| large images very fast.
|
| Take a look at the startup graph here:
|
| https://github.com/containerd/stargz-snapshotter
| yjftsjthsd-h wrote:
| > 1. Pick an appropriate base image
|
| Starting with: Use the ones that are supposed to be small. Ubuntu
| does this by default, I think, but debian:stable-slim is 30 MB
| (down from the non-slim 52MB), node has slim and alpine tags,
| etc. If you want to do more intensive changes that's fine, but
| start with the nearly-zero-effort one first.
|
| EDIT: Also, where is the author getting these numbers? They've
| got a chart that shows Debian at 124MB, but just clicking that
| link lands you at a page listing it at 52MB.
| alanwreath wrote:
| I always feel helpless with python containers - it seems there
| isn't much savings ever eeked out of multi-stage and other
| strategies that typically are suggested. Docker container size
| really has made compiled languages more attractive to me
| bingohbangoh wrote:
| For my two cents, if you're image requires anything not vanilla,
| you may be better off stomaching the larger Ubuntu image.
|
| Lots of edge cases around specific libraries come up that you
| don't expect. I spent hours tearing my hair out trying to get
| Selenium and python working on an alpine image that worked out-
| of-the-box on the Ubuntu image.
| aledalgrande wrote:
| I would rather install the needed libraries myself and not have
| to deal with tons of security fixes of libraries I don't use.
| CJefferson wrote:
| Do libraries just sat there on disc do any damage?
|
| Also, are you going to update those libraries as soon as a
| security issue arises? Debian/Ubuntu and friends have teams
| dedicated to that type of thing.
| postalrat wrote:
| Can they be used somehow? They perhaps.
|
| Depending where you work you might also need to pass some
| sort of imaging scan that will look at the versions of
| everything installed.
| erik_seaberg wrote:
| That's rolling your own distro. We could do that but it's not
| really our job. It also prevents the libraries from being
| shared between images, unless you build one base layer and
| use it for everything in your org (third parties won't).
| curiousgal wrote:
| I mean honestly if you're _that_ paranoid then you shouldn 't
| be using Docker in the first place.
| aledalgrande wrote:
| What does docker have to do with patching security fixes?
| If you have an EC2 box it's going to be the same. I don't
| consider that paranoid.
| pas wrote:
| musl DNS stub resolver is "broken" unfortunately (it doesn't
| do TCP, which is a problem usually when you want to deploy
| something into a highly dynamic DNS-configured environment,
| eg. k8s)
| coredog64 wrote:
| Once you start adding stuff, I think Alpine gets worse. For
| example, there's a libgmp issue that's in the latest Alpine
| versions since November. It's fixed upstream but hasn't been
| pulled into Alpine.
| FinalBriefing wrote:
| I generally agree.
|
| I start all my projects based on Alpine (alpine-node, for
| example). I'll sometimes need to install a few libraries like
| ImageMagic, but if that list starts to grow, I'll just use
| Ubuntu.
| qbasic_forever wrote:
| There's some more to consider with the latest buildkit frontend
| for docker, check it out here:
| https://hub.docker.com/r/docker/dockerfile
|
| In particular cache mounts (RUN --mount-type=cache) can help the
| package manager cache size issue, and heredocs are a game-changer
| for inline scripts. Forget doing all that && nonsense, write
| clean multiline run commands: RUN <<EOF
| apt-get update apt-get install -y foo bar baz
| etc... EOF
|
| All of this works right now in plain old desktop docker you have
| installed right now, you just need to use the buildx command
| (buildkit engine) and reference the docker labs buildkit frontend
| image above. Unfortunately it's barely mentioned in docs or
| anywhere else other than their blog right now.
| no_wizard wrote:
| Somewhat tangentially related to the topic of this post: does
| anyone know any good tech for keeping an image "warm". For
| instance, I like to spin up separate containers for my tests vs
| development so they can be "one single process" focused, but it
| is not always practical (due to system resources on my local dev
| machine) to just keep my test runner in "watch" mode, so I spin
| it down and have to spin it back up, and there's always some
| delay - even when cached. Is there a way to keep this "hot" but
| not run a process as a result? I generally try to do watch mode
| for tests, but with webdev I got alot of file watchers running,
| and this can cause a lot of overhead with my containers (on macOS
| for what its worth)
|
| Is there anything one can do to help this issue?
| pas wrote:
| You could launch the container itself with sleep. (docker run
| --entrypoint /bin/sh [image] sleep inf) Then start the dev
| watch thing with 'docker exec', and when you don't need it
| anymore you can kill it. (Eg. via htop)
|
| With uwsgi you can control which file to watch. I usually just
| set it to watch the index.py so when I want to restart it, I
| just switch to that and save the file.
|
| Similarly you could do this with "entr"
| https://github.com/eradman/entr
| PhilippGille wrote:
| > keeping an image "warm"
|
| Do you mean container? So you'd like to have your long running
| dev container, and a separate test container that keeps running
| but you only use it every now and then, right? Because you
| neither want to include the test stuff in your dev container,
| nor use file watchers for the tests?
|
| Then while I don't know your exact environment and flow, could
| you start the container with `docker run ... sh -c "while true;
| do sleep 1; done"` to "keep it warm" and then `docker exec ...`
| to run the tests?
| 2OEH8eoCRo0 wrote:
| I also liked this one:
|
| https://fedoramagazine.org/build-smaller-containers/
|
| I don't avoid large images because of their size, I avoid them
| because it's an indicator that I'm packaging much more than is
| necessary. If I package a lot more than is necessary then perhaps
| I do not understand my dependencies well enough or my container
| is doing too much.
| nodesocket wrote:
| A very common mistake I see (though not related to image size
| perse) when running Node apps is to do CMD ["npm", "run",
| "start"]. This is first memory wasteful, as npm is running as the
| parent process and forking node to run the main script. Also, the
| bigger problem is that the npm process does not send signals down
| to its child thus SIGINT and SIGTERM are not passed from npm into
| node which means your server may not be gracefully closing
| connections.
| Ramiro wrote:
| I never really thought about this, it's a good point. What do
| you suggest it's used instead of ["npm", "run", "start"]?
| nicholasjarnold wrote:
| This is a great use case for tini[0]. Try this, after
| installing the tini binary to /sbin:
| ENTRYPOINT ["/sbin/tini", "--"] CMD ["node",
| "/path/to/main/process.js"]
|
| [0]: https://github.com/krallin/tini
|
| edit: formatting, sorry.
| remram wrote:
| I think this is built into docker now:
| https://docs.docker.com/engine/reference/run/#specify-an-
| ini...
|
| If you use Kubernetes then you have to add tini for now
| (https://github.com/kubernetes/kubernetes/issues/84210)
| bravetraveler wrote:
| I'm not a Node/NPM person, but I imagine they had in mind the
| equivalent of whatever is expected from npm. I expect some
| nodejs command to invoke the service directly
|
| Edit: Consequently this should make the container logs a bit
| more useful, beyond better signal handling/respect
| davidjfelix wrote:
| ["node", "/path/to/your/entrypoint.js"]
| pineconewarrior wrote:
| I assume it'd be better to execute index.js directly with
| node
| j1elo wrote:
| Node.js has both a Best Practices [0] and a tutorial [1] that
| instruct to use _CMD [ "node", "main.js"]_. In short: do not
| run NPM as main process; instead, run Node directly.
|
| This way, the Node process itself will run as PID 1 of the
| container (instead of just being a child process of NPM).
|
| The same can be found in other collections of best practices
| such as [2].
|
| What I do is a bit more complex: an entrypoint.sh which ends up
| running exec node main.js "$*"
|
| Docs then tell users to use " _docker run --init_ "; this flag
| will tell Docker to use the Tini minimal init system as PID 1,
| which handles system SIGnals appropriately.
|
| [0]: https://github.com/nodejs/docker-
| node/blob/main/docs/BestPra...
|
| [1]: https://nodejs.org/en/docs/guides/nodejs-docker-webapp/
|
| [2]: https://dev.to/nodepractices/docker-best-practices-with-
| node...
|
| Edit: corrected the part about using --init for proper handling
| of signals.
| [deleted]
| miyuru wrote:
| There are another base images from google that are smaller than
| the base images and come handy when deploying applications that
| runs on single binary.
|
| > Distroless images are very small. The smallest distroless
| image, gcr.io/distroless/static-debian11, is around 2 MiB. That's
| about 50% of the size of alpine (~5 MiB), and less than 2% of the
| size of debian (124 MiB).
|
| https://github.com/GoogleContainerTools/distroless
| yjftsjthsd-h wrote:
| So the percentage makes it look impressive, but... you're
| saving no more than 5MB. Don't get me wrong, I like smaller
| images, but I feel like "smaller than Alpine" is getting into
| -funroll-loops territory of over-optimizing.
| ImJasonH wrote:
| It got removed from the README at some point, but the smallest
| distroless image, gcr.io/distroless/static is 786KB compressed
| -- 1/3 the size of this image of shipping containers[0], and
| small enough to fit on a 3.5" floppy disk.
|
| 0: https://unsplash.com/photos/bukjsECgmeU
| Ramiro wrote:
| Distroless are tiny, but sometimes the fact that don't have
| anything on them other than the application binary makes them
| harder to interact with, specially when troubleshooting or
| profiling. We recently moved a lot of our stuff back to vanilla
| debian for this reason. We figured that the extra 100MB
| wouldn't make that big of a difference when pulling for our
| Kubernetes clusters. YMMV.
| jrockway wrote:
| I've found myself exec-ing into containers a lot less often
| recently. Kubernetes has ephemeral containers for debugging.
| This is of limited use to me; the problem is usually lower
| level (container engine or networking malfunctioning) or
| higher level (app is broke, and there is no command "fix-app"
| included in Debian). For the problems that are lower level,
| it's simplest to resolve by just ssh-ing to the node (great
| for a targeted tcpdump). For the problems that are higher
| level, it's easier to just integrate things into your app (I
| would die without net/http/pprof in Go apps, for example).
|
| I was an early adopter of distroless, though, so I'm probably
| just used to not having a shell in the container. If you use
| it everyday I'm sure it must be helpful in some way. My
| philosophy is as soon as you start having a shell on your
| cattle, it becomes a pet, though. Easy to leave one-off fixes
| around that are auto-reverted when you reschedule your
| deployment or whatever. This has never happened to me but I
| do worry about it. I'd also say that if you are uncomfortable
| about how "exec" lets people do anything in a container,
| you'd probably be even more uncomfortable giving them root on
| the node itself. And of course it's very easy to break things
| at that level as well.
| gravypod wrote:
| There are some tools that allow you to copy debug tools into
| a container when needed. I think all that needs to be I'm the
| container is tar and it runs `kubectl exec ... tar` in the
| container. This allows you to get in when needed but still
| keep your production attack surface low.
|
| Either way as long as all your containers share the same base
| layer it doesn't really matter since they will be
| deduplicate.
| theptip wrote:
| I believe "Ephemeral containers" are intended to resolve
| this issue; you can attach a "debug container" to your pod
| with a shell and other tools.
|
| https://kubernetes.io/docs/concepts/workloads/pods/ephemera
| l...
|
| Still beta, I haven't tried it yet myself. Looks
| interesting though.
| theptip wrote:
| Also if you are running k8s, and use the same base image for
| your app containers, you amortize this cost as you only need
| to pull the base layers once per node. So in practice you
| won't pull that 100mb many times.
|
| (This benefit compounds the more frequently you rebuild your
| app containers.)
| PaulKeeble wrote:
| Base images like alpine/debian/ubuntu get used by a lot of
| third party containers too so if you have multiple
| containers running on the same device they may in practice
| be very small until the base image gets an upgrade.
| erik_seaberg wrote:
| This. The article talks about
|
| > Each layer in your image might have a leaner version
| that is sufficient for your needs.
|
| when reusing a huge layer is cheaper than choosing a
| small layer that is _not_ reused.
| yjftsjthsd-h wrote:
| Doesn't that only work if you used the _exact_ same base?
| If I build 2 images from debian:11 but one of them used
| debian:11 last month and one uses debian:11 today, I
| thought they end up not sharing a base layer because they
| 're resolving debian:11 to different hashes and actually
| using the base image by exact image ID.
| podge wrote:
| I found this to be an issue as well, but there are a few ways
| around this for when you need to debug something. The most
| useful approach I found was to launch a new container from a
| standard image (like Ubuntu) which shares the same process
| namespace, for example:
|
| docker run --rm -it --pid=container:distroless-app
| ubuntu:20.04
|
| You can then see processes in the 'distroless-app' container
| from the new container, and then you can install as many
| debugging tools as you like without affecting the original
| container.
|
| Alternatively distroless have debug images you could use as a
| base instead which are probably still smaller than many other
| base images:
|
| https://github.com/GoogleContainerTools/distroless#debug-
| ima...
| staticassertion wrote:
| The way I imagine this is best solved is by keeping a
| compressed set of tools on your host and then mounting those
| tools into a volume for your container.
|
| So if you have N containers on a host you only end up with
| one set of tooling across all of them, and it's compressed
| until you need it.
|
| You can decouple your test tooling from your
| images/containers, which has a number of benefits. One that's
| perhaps understated is reducing attacker capabilities in the
| container.
|
| With log4j some of the payloads were essentially just calling
| out to various binaries on Linux. If you don't have those
| they die instantly.
| tonymet wrote:
| This app is great for discovering waste
|
| https://github.com/wagoodman/dive
|
| I've found 100MB fonts and other waste.
|
| All the tips are good, but until you actually inspect your
| images, you won't know why they are so bloated.
| Twirrim wrote:
| Every now and then I break out dive and take a look at
| container images. Almost without fail I'll find something we
| can improve.
|
| The UX is great for the tool, gives me absolutely everything I
| need to see, in such a clear fashion, and with virtually no
| learning curve at all for using it.
| jasonpeacock wrote:
| A common mistake that's not covered in this article is the need
| to perform your add & remove operations in the same RUN command.
| Doing them separately creates two separate layers which inflates
| the image size.
|
| This creates two image layers - the first layer has all the added
| foo, including any intermediate artifacts. Then the second layer
| removes the intermediate artifacts, but that's saved as a diff
| against the previous layer: RUN ./install-foo
| RUN ./cleanup-foo
|
| Instead, you need to do them in the same RUN command:
| RUN ./insall-foo && ./cleanup-foo
|
| This creates a single layer which has only the foo artifacts you
| need.
|
| This why the official Dockerfile best practices show[1] the apt
| cache being cleaned up in the same RUN command:
| RUN apt-get update && apt-get install -y \ package-
| bar \ package-baz \ package-foo \
| && rm -rf /var/lib/apt/lists/*
|
| [1] https://docs.docker.com/develop/develop-
| images/dockerfile_be...
| gavinray wrote:
| You can use "--squash" to remove all intermediate layers
|
| https://docs.docker.com/engine/reference/commandline/build/#...
|
| The downside of trying to jam all of your commands into a
| gigantic single RUN invocation is that if it isn't correct/you
| need to troubleshoot it, you can wind up waiting 10-20 minutes
| between each single line change just waiting for your build to
| finish.
|
| You lose all the layer caching benefits and it has to re-do the
| entire build.
|
| Just a heads up for anyone that's not suffered through this
| before.
| yjftsjthsd-h wrote:
| But then you end up with just one layer, so you lose out on
| any caching and sharing you might have gotten. Whether this
| matters is of course _very_ context dependent, but there are
| times when it 'll cost you space.
| imglorp wrote:
| This is huge, thanks for the lead. Others should note it's
| still experimental and your build command may fail with
|
| > "--squash" is only supported on a Docker daemon with
| experimental features enabled
|
| Up til now, our biggest improvement was with "FROM SCRATCH".
| selfup wrote:
| Good to know. `FROM scratch` is such a breath of fresh air
| for compiled apps. No need for Alpine if I just need to run
| a binary!
| jrockway wrote:
| Do keep in mind that you might want a set of trusted TLS
| certificates and the timezone database. Both will be
| annoying runtime errors when you don't trust
| https://api.example.com or try to return a time to a user
| in their preferred time zone. Distroless includes these.
| gavinray wrote:
| No problem. > Others should note it's still
| experimental and your build command may fail with
|
| You might try _" docker buildx build"_, to use the BuildKit
| client -- squash isn't experimental in that one I believe
| =)
|
| https://docs.docker.com/engine/reference/commandline/buildx
| _...
| selfup wrote:
| Had no idea about squash. Using cached layers can really save
| time, especially when you already have OS deps/project deps
| installed. Thanks!
| [deleted]
| qbasic_forever wrote:
| You don't have to do this anymore, the buildkit frontend for
| docker has a new feature that supports multiline heredoc
| strings for commands: https://www.docker.com/blog/introduction-
| to-heredocs-in-dock... It's a game changer but unfortunately
| barely mentioned anywhere.
| epberry wrote:
| If you click 'Pricing' on the main site an error occurs just FYI.
___________________________________________________________________
(page generated 2022-01-06 23:00 UTC)