[HN Gopher] Minify your container
___________________________________________________________________
Minify your container
Author : JordanTenn
Score : 113 points
Date : 2022-08-03 17:42 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| CameronNemo wrote:
| Is this an official Docker project? How is this not trademark
| infringement?
| bigpod wrote:
| docker-slim started at docker hackathon in 2015 and company
| behind Slim.AI has an extension for docker desktop in its
| marketplace.
| gtirloni wrote:
| It doesn't seem to be associated with Docker Inc.
|
| Docker's trademark guidelines say that products, services and
| technology that are not their own shouldn't use the Docker name
| so it seems it's a matter of time before they get a nice letter
| from some lawyer.
| JordanTenn wrote:
| The product is a loose partner of Docker. They are an
| official Docker Desktop Extension. DockerSlim is an open
| source tool that led to the creation of Slim.Ai (and now
| Slim.AI and therefore Docker Slim) are loose partners.
|
| https://hub.docker.com/extensions/slimdotai/dd-ext
| gtirloni wrote:
| What's a "loose" partner?
|
| Does Docker Inc promise to overlook trademark infringements
| if you're a "loose" partner?
| CameronNemo wrote:
| I'm not cheering for that, but I also think impersonation is
| dishonest.
| alberth wrote:
| Off topic: wish there was a slim variant of FreeBSD.
|
| Seems like all past attempts have stalled and/or are dependent
| upon FreeBSD creating a standard for what's in a minimal
| userspace .
| cperciva wrote:
| Once pkgbase lands we'll probably see more progress there.
| alberth wrote:
| Given that it's been in the works for what seems like a
| decade, do you think pkgbase will be finalized anytime soon?
|
| Just curious (please don't take my comments as being
| negative)
|
| https://wiki.freebsd.org/PkgBase
| cperciva wrote:
| You're quite right, and to be honest at this point I've
| given up trying to keep track of where it's at. It's
| definitely something I'd like to see completed, but I've
| been too busy with FreeBSD/EC2 and speeding up the boot
| process to spend time looking at pkgbase too.
| CameronNemo wrote:
| Even further off topic but perhaps relevant: Chimera Linux,
| which consists of the FreeBSD user land ported to Linux. I
| wonder if q66 has OCI images published...
|
| https://chimera-linux.org/
| sockmeistr wrote:
| Isn't this incredibly dangerous? I know everyone likes to pretend
| they have perfect code coverage, but just ripping stuff out that
| wasn't called during 'probing' feels like the perfect way to make
| rare code paths even more dangerous.
| OMGWTF wrote:
| kkrieger (https://www.pouet.net/prod.php?which=12036) is an
| impressive 3D shooter in only 96 KiloBytes. As one of their
| optimization techniques they recorded all code paths and
| discarded unused parts. At least in the first version this was
| why you could only use CursorDown in the menus and CursorUp did
| not work.
| lapser wrote:
| If you have a good pipeline to prod, should be okay. You should
| hopefully have plenty of automated tests to ensure it doesn't
| get to prod if there are errors.
| axelthegerman wrote:
| _should be okay_ is definitely not enough for me to ship
| things to production.
|
| And while I do have automated tests, they might sometimes
| stub system calls as I'm mostly testing my code to keep
| things stable and fast.
|
| I'd rather explicitly declare my dependencies and use the
| same container for development, test and production to feel
| much more confident that it includes actually everything
| that's needed.
| bigpod wrote:
| with good pipeline and knowledge about your app you should
| be able to ensure it works without much of a problem
| Volundr wrote:
| I think a "good pipeline to prod" with sufficient automated
| tests to ensure nothing is broken is the exception not the
| rule. Even in places that think/say they have a "good
| pipeline to prod". It's something that takes a shocking
| amount of engineering effort to do well, and tons of
| discipline to maintain.
| EddySchauHai wrote:
| Hire a test engineer to manage all of that - it's a full
| time job but an important one!
| killingtime74 wrote:
| If you have good integration tests is it still a problem?
| fwip wrote:
| It depends on how comprehensive they are, and how important
| it is that your container operates correctly.
|
| For example, even the best integration tests (for small/mid-
| size companies) don't always include tests that exercise
| weird paths around dates/times - leap years, leap seconds,
| daylight savings time, etc. We often trust that our datetime
| library or code will handle these for us, but what if the
| configuration is stored in a file that isn't accessed during
| your integration tests?
|
| Best case scenario is you hit the error-path soon in
| production and your code either crashes or does something
| correct-enough with a fallback path, but a worse scenario is
| you start losing critical information and don't realize
| it/fix it until it's gone on for a while.
| mplewis wrote:
| In a non-trivial app, you can never guarantee that your "good
| integration tests" cover every edge case. If you could, we
| wouldn't have outages in production.
| mplewis wrote:
| docker-slim is incredibly dangerous and should never be used
| for a production app.
| nicce wrote:
| I guess the question is in which way dangerous? It might lead
| for crash for sure, but is that crash controlled? If it is,
| then it is just a crash. Stability vs. minimal attack surface
|
| But I agree, this is just bandaid for lazy bois. Better use
| Bazel etc. for distroless builds
| mplewis wrote:
| This is dangerous in that it strips assets, resources, and
| files from your app without understanding how they are
| used.
|
| If you forget a critical code path when you build using
| Docker-Slim, and a resource file is not used, that resource
| will be stripped. The feature which depends on it will be
| broken in production.
| bigpod wrote:
| i would disagree i use em in production apps, i configured it
| and it works if you do it blindly it happens that sometimes
| things break but if you configure it, it will work
| mplewis wrote:
| There is no guarantee that a blind code shaker will leave
| in everything important while stripping out everything that
| isn't. How could it possibly know?
|
| If Docker-Slim is working for you in production apps, you
| are either getting lucky or your app is trivial enough to
| lack unseen code paths.
| bigpod wrote:
| thats why you should test and if there are stuff htat needs to
| be included but arent and you know wont work fail the test and
| add --include-path to your docker-slim command to ensure
| something is added
| CubsFan1060 wrote:
| I guess it only seems dangerous to me if you blindly follow
| it's recommendations. Feels like it could generate a list of
| "things you may want to consider", that you'd then be able to
| use to take a look at your container.
| bigpod wrote:
| it sometimes doesnt work sure but thats why we have tests and
| test i minify all my containers nowdays and in most cases it
| works in those that it doesnt i figured out the pattern when
| and why for my apps and use include flags to ensure things
| remain inside
| jzelinskie wrote:
| Even prior to docker-slim there were tools like Quay.io that
| "did the right thing" by squashing images to just the contents
| of the final image layer.
|
| The best thing you can do is use minimal images and multi-stage
| builds. This should help you immensely to reduce your attack
| vector and do standard software bill of materials, too.
| fwip wrote:
| The quay.io squashing optimization is a lot safer though,
| right, as it doesn't remove anything that should be visible
| to the container?
|
| I agree that the multi-stage builds are the best option, but
| it can be hard to know if you've included everything that is
| required or if you've accidentally excluded something that is
| important in rare cases.
| rockemsockem wrote:
| I've had great success with reducing image size by running
| docker-show-context (https://github.com/pwaller/docker-show-
| context) and eliminating big and unnecessary files that it
| reports. This seems to go just a bit further than that with what
| seems like more complexity. I got timeouts when following their
| instructions to run it on two different containers, one of which
| is just a very simple web server.
| CameronNemo wrote:
| This is interesting for optimizing build time. But I think it
| works a bit different from docker-slim, which is focused on the
| final resulting image size.
|
| Dive is a good tool for the latter IME.
| https://github.com/wagoodman/dive
|
| It doesn't do the work for you, but it does single out the big
| layers in your image.
| siddontang wrote:
| We build our binary first with one image as the builder image,
| then use `copy` to copy the binary from the builder to the final
| executable image like alphine.
|
| an example Dockerfile likes: FROM
| golang:1.18.1-alpine as builder # RUN apk add, wget, etc,
| and build the binary FROM alpine # or FROM
| scratch COPY --from=builder builder/binary /binary
| ENTRYPOINT ["/binary"]
| U1F984 wrote:
| For Go you can use FROM scratch and save a couple more
| megabytes.
| lrvick wrote:
| This works on any language. I only use scratch in prod. Even
| for nodejs or python... compile a static interpreter binary
| and truck on.
|
| Dev tools like bash, ls, grep, etc, have no place in
| production and only increase attack surface.
| fwip wrote:
| Out of curiosity, what does alpine provide for your container
| that you need? (I assume otherwise you'd be using `FROM
| scratch`.)
| maccard wrote:
| I use wget from it for health checks [0]
|
| [0] https://stackoverflow.com/questions/47722898/how-to-do-a-
| doc...
| siddontang wrote:
| yes, `FROM scratch` may be better most of the time. I just
| use `alphine` for many years, and have not tried `scratch`
| before.
| jollyllama wrote:
| Wow, I remember when not including debug symbols was a slim
| image.
| jewayne wrote:
| I have a minor in math, and I don't know what "shrinking by 30X"
| means. To me, decreases always start from 100%. So I think we are
| talking about a ~97% decrease in size?
| AtNightWeCode wrote:
| For compression or similar ratio is used.
| JordanTenn wrote:
| Thanks for this note. I'm part of the DockerSlim and Slim.AI
| ecosystem. Will take this feedback and rework the way we phrase
| things. Thank you!
| reilly3000 wrote:
| it means... a lot!
| rr888 wrote:
| Thanks I hate this, but seems to be everywhere now. "This
| products is now 3 times cheaper!", WTF. They still haven't got
| to percentages yet, like 200% off!!.
| bigpod wrote:
| its more about being more like people say as smaller by 200%
| isnt as understandable as 30 times smaller
| SomeBoolshit wrote:
| Neither of those makes any sense.
| bigpod wrote:
| this is not valid but its what people say 30 times smaller.
| jewayne wrote:
| I know and it drives me crazy. "Bigger" and "smaller" express
| _differences_ , not fractions or multiples.
| fb03 wrote:
| I don't have a minor in math, and I instinctively thought
| "shrinking by 30x" means 1/30 of size.
| jonas21 wrote:
| Yeah. "growing" = numerator. "shrinking" = denominator.
|
| It's nice because they're inverses - if you shrink by 30x,
| then grow by 30x, you're back where you started, whereas a
| 97% decrease in size followed by a 97% increase in size
| leaves you at ~6% of the original size.
| jewayne wrote:
| I think if you want to say "1/30th of the size", you should
| say that. Growth is usually measured as a difference. For
| example, a 200% increase means the value has tripled.
| bigpod wrote:
| essentialy yes
| Karellen wrote:
| But "30x" is just another way of saying "3000%". Or,
| "3000%" is just another way of saying "30x". "Shrinking by
| 30x" means the same thing as "shrinking by 3000%".
| OJFord wrote:
| Do you know what 'two times smaller' means?
| jewayne wrote:
| No I don't. That's my point. To me, the number that's two
| times smaller than x is -x. (x-2x)
| OJFord wrote:
| I'm no mathematician and it wouldn't often be my choice of
| phrasing, but it seems unambiguous and clear to me; your
| definition is much stranger/less intuitive to me.
|
| We're clear on 'x is two times larger than y', right?
| x = 2*y
|
| An equivalent statement is 'y is two times smaller than x',
| but it conveys a construction more like:
| y = x/2
|
| Which, since we're speaking English sentences, might change
| the emphasis/implication.
| Fnoord wrote:
| > I have a minor in math, and I don't know what "shrinking by
| 30X" means.
|
| X is input
|
| Y is output
|
| X / 30 = Y
|
| All you need to know, no minor required, as its taught on
| elementary school (age 11/12 or so?).
| Karellen wrote:
| So, shrinking by 2x means dividing by 2?
|
| But... doesn't shrinking by 1/2 also mean dividing by 2?
|
| Therefore, 1/2 == 2x ??
|
| I feel like my elementary school math is letting me down
| somewhere.
| inopinatus wrote:
| If you try pronouncing that aloud as "...by a factor of 30"
| it'll seem less ungrammatical.
| hkgjjgjfjfjfjf wrote:
| sequoia wrote:
| Here's my less magical, more manual post on the subject of
| reducing docker image sizes:
| https://sequoia.makes.software/reducing-docker-image-size-pa...
| aejnsn wrote:
| > "Find SSL Certs"
|
| So we're promoting secrets being saved within a container image
| artifact? Ummmm?
| frenchman99 wrote:
| The cert is usually the public key. The private key is usually
| named key. So it doesn't promote secrets being saved within a
| container as far as I can see.
| Karellen wrote:
| Wait, I thought the cert was the CA's signature of the public
| key.
| kodah wrote:
| This kind of looks like a tool that does the reverse of what
| scratch does. Instead of _only_ including the binary and any
| dynamically linked dependencies, it tries to figure out a minimum
| set of dependencies based on access.
|
| In practice, I'm curious how error prone the result is.
| bigpod wrote:
| it is error prone somewhat but it has flags to allow you to
| fine tune what gets added back in. great thing is you can work
| with any base image and language including those that wont work
| with scratch
| saidinesh5 wrote:
| Why would someone want to use this instead of say base images
| made specifically for containers? like alpine for eg.?
|
| And for languages like golang (in their examples) - why/how would
| anyone get such huge container images in the first place? Doesn't
| go give a neat statically linked binary?
| xtracto wrote:
| Right Go binaries would be able to go with "scratch" imagine
| which only contain the kernel.
| piperswe wrote:
| They don't even contain a kernel - Docker containers use the
| host kernel. Container runtimes based on VMs like
| Firecracker's firecracker-containerd typically supply the
| kernel themselves.
| FridgeSeal wrote:
| Do you happen to know how the scratch containers differ
| from googles distroless containers?
|
| I've been using them (distroless) with great success for my
| Rust applications.
| bigpod wrote:
| you can use whatever base image you want lets say ubuntu:latest
| (i dont like alpine) and normaly base images tend to include a
| lot of stuff that doesnt have any place in container think why
| do i need a tool for ext4 managment inside contianer makes no
| sense ok for production throw it out thats what docker-slim
| does and gets rid of vulnarabilities in programs that are not
| used by your program by simply getting rid of them
| saidinesh5 wrote:
| Ubuntu-minimal ( https://canonical.com/blog/minimal-ubuntu-
| released ) doesn't have any of those binaries though.
|
| And that's also why you have multistage docker builds. To
| make sure your production container doesn't have all the
| unneeded files from your development container.
| https://docs.docker.com/develop/develop-images/multistage-
| bu... .
| bigpod wrote:
| this removes far more then multistage docker build ever
| would, do you need bash dash or passwd or many other
| binaries and files in image that are in by default no you
| dont only way to do anything simular to what docker-slim
| does is with scratch image which doesnt work if you dont
| copy everything you need in
| saidinesh5 wrote:
| The problem is not about removing though. The problem is
| what/who guarantees that nothing broke after all these
| files are removed? Especially in obscure code paths in
| nested dependencies?
|
| With something like alpine linux/ubuntu minimal, you
| trust the package maintainers to make sure that if you
| use python in your docker image it would work like it
| worked for them. Out here, it just says "Yes (it is
| safe)! Either way, you should test your Docker images.".
|
| As a bad example, if a library used by your application
| uses a different "theme" requiring different files at
| night and different files during the day, you might still
| say "it worked during my tests" but things definitely
| broke and the only thing you can blame is this
| overzealous tool.
|
| That bad example was from back when i was trying to make
| AppImages for an application we used. At first all we did
| was recursively collect all the libraries reported by
| ldd. Then it turned out some libraries were only being
| dlopen'ed by other libraries under specific circumstances
| and we missed them. So we manually added those libraries.
| Then it turned out that we missed the config files and
| other resources used by those libraries. Eventually we
| shipped all the files belonging to all the distro
| packages used by the libraries we used and left it at
| that.
| bigpod wrote:
| your tests and your application knowledge should
|
| in some cases i essentialy ensure my whole app remains
| using --include-path flags so that i get a removal of you
| know things that i absolutly dont need.
| rcoveson wrote:
| Still seems kind of silly. If you base everything on
| ubuntu minimal, you'll only have the one copy of that
| base image, which is a fraction of the size of the
| `docker` and `dockerd` binaries added together. No server
| running docker will have a problem keeping one or two
| versions of ubuntu minimal on it.
|
| But if you go around "minifying" all your applications
| independently, you won't have that shared base layer. One
| application needs `sh` and another doesn't? Now you get
| two entire base layers, one with it and one without.
| Sure, each image's total size will be less, but the size
| of all your different images added up will be greater
| because you killed the sharing.
|
| If for some reason the 29 megs of ubuntu minimal (or even
| fewer for alpine) are a problem (which they aren't on
| your server that already has over a hundred megs of
| `docker` binaries), then the right solution is to better
| control layer _sharing_. Ensure that you _don 't_ have
| different base layers between your applications. And then
| --strictly for kicks and giggles--you could minify that
| base layer to the minimal set of what _all_ your images
| require. To save a 51K `passwd` binary (woohoo!).
| bigpod wrote:
| one question is is possible in any kind of way that that
| passwd or any other binary that stays that you dont need
| has a security vulnarability that could if someone got
| into the container in one way or another(most likely your
| app) cause trouble on the host.
|
| hint yes it is and that could be a problem a giuant one
| ufmace wrote:
| Good question. Alpine is already small enough that it seems a
| little odd to go to elaborate measures to reduce image size
| further. Seems better to me to start with a minimalist image
| and only add what you need to make your app work than to start
| with a huge image with everything, install your app, and rely
| on something like this to find only the things you don't need
| to remove and not make any mistakes.
| mplewis wrote:
| docker-slim is the _wrong_ solution for container optimization.
| You can 't just have a program rm -rf files that it didn't think
| were in use. What if you missed a code path?
|
| https://twitter.com/ariadneconill/status/1506482425458798593
| fwip wrote:
| I know your question was rhetorical, but probably depends on
| how mission-critical the code path is (and the consequences of
| hitting those missing files).
|
| If you're running a website and the removed dependency is
| related to a feature that is uncommon enough that isn't covered
| by your automated tests, maybe .1% of your users experience a
| broken page.
|
| If you're running critical infrastructure and the removed
| dependency has to do with leap-second handling, maybe eight
| months from now, everything crashes and you lose millions of
| dollars.
| kylequest wrote:
| Please explain how it's wrong without simply saying you prefer
| other solutions, which was the case with Ariadne :) Dead code
| elimination is a common construct in software engineering.
| There's nothing magical about apps and their dependencies. They
| are relatively straight forward to identify and for the web
| apps with static assets there are helpers to help you ensure
| you got everything.
| bigpod wrote:
| use --include-path to ensure its in
| hnarn wrote:
| Why did you link a tweet to someone saying they "feel like
| [they] should do a blog about docker-slim at some point"? What
| does this contribute?
| mplewis wrote:
| This is a Twitter thread that continues below at the
| following tweet:
|
| https://twitter.com/ariadneconill/status/1506483943352250371
|
| Sorry - I forgot that the Twitter UI doesn't always lend
| itself to proper threading.
| jmercan wrote:
| Personally I feel like shrinking images by guessing unused parts
| is an a good way to have an image explode in your face randomly
| in the future. (Probes and heuristics missing critical but rarely
| used parts and more) Also wouldn't it hurt reproducibility?
| Temporary runtime monitoring doesn't exactly sound like a
| deterministic metric.
|
| A containerizable project probably has its requirements known and
| well-specified? I think building on top of a base with a smaller
| unused surface is a better idea than using analysis that might
| backfire. These days I am using apko + melange for my personal
| images and they are super neat.
| davidtpate wrote:
| Some form of tree-shaking type of thing would probably be quite
| handy for images, but yeah I'm a bit wary here as well. First
| thought would be what happens when it hits Out-of-Memory, DNS
| timeout, or loses network connectivity or another edge case
| that totally happens in Production.
|
| Removing those code paths would not be a good thing, but I
| guess if you build your apps right you could just have your
| container orchestration system recover by replacing the Pod.
| game-of-throws wrote:
| I wouldn't want anything killing pods every time there's a
| network timeout. That sounds like a quick way to turn a tiny
| problem into a huge problem.
| HowardStark wrote:
| Is there an equivalent tool for a normal running Linux system?
| OJFord wrote:
| Not the same, but potentially similar in intent, I use aconfmgr
| to track what's installed, changes to configuration files, and
| any potential changes/whole files left behind by some quick
| test or since uninstalled software.
|
| (Also, even primarily to me, but less relevantly, it's great
| for gitting configuration & its reasons, and syncing across
| machines.)
| tbabej wrote:
| For workloads where the image size was critical, I have achieved
| a similar result with using strace to collect the required files
| and then limiting the image to only those files in the build
| process.
|
| It's a neat approach, but ultimately brings non-negligible amount
| of uncertainty as you can never be 100% sure your test set of
| inputs did not miss a particular edge case which will require to
| have a file present in the container that no other input does.
| bigpod wrote:
| yes that tends to be the problem with docker-slim as well that
| is why it includes flags like --include-path with which you can
| easily achive such fixes
|
| personaly i highly recommend as it works in most cases and gets
| rid of those vulnerablities that come with things like bash or
| passwd that you dont need in prod apps
| viraptor wrote:
| Like others here, I wasn't very happy about / trusting automatic
| coverage, so I made this instead
| https://github.com/viraptor/cruftspy
|
| Instead of going extreme with coverage analysis, it shows places
| that can be manually cleaned during the build process. Maybe
| someone will find it useful. Smaller space gains, but gives more
| confidence.
| bigpod wrote:
| i have docker-slim in CICD completly automated seems to not
| have a problem as i ahve configure it per pipeline, maybe check
| out examples for docker-slim
___________________________________________________________________
(page generated 2022-08-03 23:00 UTC)