[HN Gopher] A lesson in dockerizing shell scripts
       ___________________________________________________________________
        
       A lesson in dockerizing shell scripts
        
       Author : bhupesh
       Score  : 150 points
       Date   : 2024-02-03 14:10 UTC (8 hours ago)
        
 (HTM) web link (bhupesh.me)
 (TXT) w3m dump (bhupesh.me)
        
       | gbN025tt2Z1E2E4 wrote:
       | I can appreciate the work to shrink the image, but copying the
       | various standardized CLI tools and related library files into the
       | image versus installing them with APK can introduce _many_
       | compatibility challenges down the road as new base Alpine
       | versions are released which can be difficult to detect if they
       | don't immediately generate total build errors. Using static
       | binary versions of the various CLI tools would be a better
       | approach here, which inevitably means larger base binaries to
       | begin with, again ballooning the docker image size... all for a
       | minimal gain of 14MB overall is not worth it for a production
       | build unless you're working in the most minimal of minimal
       | embedded OS environments, which the inclusion of FZF -and-
       | findutils would already seem to negate since there is so much
       | duplication in functionality between the two tools already.
       | 
       | Overall this approach results in an image so fragile I would
       | never use the resulting product in a high-priority production
       | environment or even just my local dev environment as I want to
       | code in it, not have to fix numerous compatibility issues in my
       | tools all over 14MB of space.
        
         | bhupesh wrote:
         | Author here
         | 
         | > copying the various standardized CLI tools and related
         | library files into the image versus installing them with APK
         | can introduce _many_ compatibility challenges down the road as
         | new base Alpine versions are released which can be difficult to
         | detect if they don't immediately generate total build errors
         | 
         | I'm maybe missing some context here, so you are saying that the
         | default location of these binaries can change (the one's that
         | get copied directly)? Or is it about the shared libraries
         | getting updated and the tools depending on these libraries will
         | eventually break?
        
           | mrweasel wrote:
           | > so you are saying that the default location of these
           | binaries can change
           | 
           | They could, Debian is in the process of unifying the bin
           | directories, see: https://wiki.debian.org/UsrMerge
           | 
           | Realistically it's not much of an issue.
           | 
           | Given that you start out with a 31.4 MB image, I don't
           | honestly think the introduced complexity in your build is
           | worth the it. It's a good lesson, for people would doesn't
           | know about build images and ships an entire build pipeline in
           | their Docker image, for a bash script and a <50 MB image the
           | complexity is a bit weird.
        
             | bhupesh wrote:
             | Oh, wasn't aware about UsrMerge, thanks for sharing.
        
           | rst wrote:
           | Can't necessarily speak for the author, but here's one thing
           | that can happen:
           | 
           | If the underlying system has a newer version of git than the
           | one freeze-dried into your container, repositories managed
           | there by native-git might be in a new format which container-
           | git can't handle. (There might be some new, spiffier way of
           | handling packs, for instance, or they might have finally
           | managed to upgrade the hash function.) And similar issues
           | potentially arise for everything else you're packaging.
        
           | gbN025tt2Z1E2E4 wrote:
           | COPY --from=ugit-ops /usr/lib/libpcre* /usr/lib/
           | 
           | COPY --from=ugit-ops /usr/lib/libreadline* /usr/lib/
           | 
           | COPY --from=ugit-ops /lib/libc.musl-* /lib/
           | 
           | COPY --from=ugit-ops /lib/ld-musl-* /lib/
           | 
           | No, what I'm saying is you're blanket copying fully different
           | versions of common library files into operating system lib
           | folders as shown above, possibly breaking OS lib symlinks
           | and/or wholly overwriting OS lib files themselves in the
           | process for _current_ versions used in Alpine OS if they
           | exist now or in the future, potentially destroying OS lib
           | dependencies, and also overwriting the ones possibly included
           | in the future by Alpine OS itself to get your statically
           | copied versions of the various CLI tools your shell script
           | needs to work. The same goes for copying bash, tr, git, and
           | other binaries to OS bin folders. No No NO!
           | 
           | That is _insanely_ shortsighted. There's a safe way to do
           | that and then there is the way you did it. If you want to
           | learn to do it right and are deadset against static binary
           | versions of those tools for the sake of file size, look at
           | how Exodus does it so that they don't destroy OS bin folders
           | and library dependency files in the process of making a
           | binary able to be moved from one OS to another.
           | 
           | Exodus: https://github.com/intoli/exodus
           | 
           | This is why I'm saying your resulting docker image is
           | incredibly fragile and something I would never depend on
           | long-term as it's almost guaranteed to crash and burn as
           | Alpine OS upgrades OS bins and lib dependency files in the
           | future. That it works now in this version is an aberration at
           | best and in reality, there probably are things that are
           | broken in Alpine OS that you aren't even aware of because you
           | may not be using the functionality you broke _yet_.
           | 
           | OS package managers handle dependencies for a reason.
        
             | parhamn wrote:
             | > That is _insanely_ shortsighted.
             | 
             | Relax. While I wouldn't recommend OPs approach either. But
             | you're not particularly right either.
             | 
             | Exodus clearly states:
             | 
             | > Exodus is a tool that makes it easy to successfully
             | relocate Linux ELF binaries from one system to another...
             | Server-oriented distributions tend to have more limited and
             | outdated packages than desktop distributions, so it's
             | fairly common that one might have a piece of software
             | installed on their laptop that they can't easily install on
             | a remote machine.
             | 
             | Exodus is specifically designed for moving between
             | different systems.
             | 
             | He is largely moving from the same base image. In the
             | article base layer is `alpine:3.18` and the target image is
             | `alpine:3.18` and in the latter part of the article
             | `scratch` (less to zero conflict surface). One would assume
             | those two would be coupled.
             | 
             | There are other technical merits to not doing what he's
             | doing but you haven't listed any and dismissed his work.
             | I'd venture if you actually knew what you're talking about
             | you'd have better things to add to this conversation than
             | "OS package managers handle dependencies for a reason."
             | 
             | Perhaps next time give some feedback that would help the
             | writer get closer to a well-working exodus like solution.
             | It's hackernews, "dont roll your own" discouragement should
             | be frowned upon.
        
               | gbN025tt2Z1E2E4 wrote:
               | We see it differently. Exodus is useful in this capacity
               | as much as any other, similar base os image or not for
               | preventing overwriting.
        
               | TJSomething wrote:
               | Overwriting what? The destination's a completely empty
               | root.
        
       | benreesman wrote:
       | Or just write a clean specification and get a docker image close
       | to optimal, and if it's not, you can prove cryptographically if
       | by some chance you beat the defaults:
       | 
       | https://xeiaso.net/blog/i-was-wrong-about-nix-2020-02-10/
       | 
       | I've got plenty of gripes with nixlang, but being worse than
       | Dockerfile-lang isn't one of them.
        
         | Cu3PO42 wrote:
         | Yes, you can use Nix to get extremely small Docker images. I
         | have personally used it to that effect, but it's not a magic
         | bullet. In this specific case, it gives pretty bad results
         | even. I have written the simplest possible Nix derivation for
         | ugit and the resulting Docker image is 158MB gzipped. I haven't
         | explored fully why that is, but that's much worse than even the
         | first effort from the OP.
        
       | politelemon wrote:
       | Thanks for sharing this. I like what the author did, they pursued
       | a goal and kept working at it, until they found a balancing
       | point.
       | 
       | I think my experience in similar pursuits would have led me to
       | stop very early on - 31.4 MB is already pretty good, to be fair.
       | Looking at the amount of potential maintenance required in the
       | future, for example if the original ugit tool starts to need more
       | dependencies which then have to be wrangled and inspected, makes
       | me think that the size I didn't reduce is worth the tradeoff.
       | Since the dependencies can be managed with package managers,
       | without having to think too much, and as the author says, Linux
       | is pretty awesome about these things already.
        
         | SOLAR_FIELDS wrote:
         | It always depends on your use case but yeah, in the world of
         | docker images, 30 MB often feels like nothing, because gigabyte
         | plus sizes are not at all out of the norm. To some extent it's
         | a design flaw of the way images and layers work but also the
         | tool doesn't seem to discourage the ballooning either
        
         | bhupesh wrote:
         | Hey author here
         | 
         | True, 31.4 MB is definitely a stopping point. But my the nerd
         | inside me kicked in and wanted to know what "exactly" is
         | required to run ugit. It was a fun experience.
        
       | osigurdson wrote:
       | > or maybe ends up sponsoring...
       | 
       | Sponsorship for a 500 line shell script. Wow!
        
         | Cu3PO42 wrote:
         | They aren't asking for sponsorship on the tool they created.
         | They expressed that they do not have interest in investing even
         | more work to rewrite it in Rust, Go, or what have you; unless
         | someone paid them to do it. And I think that is completely
         | fair!
         | 
         | If someone has no inherent interest in doing something, is not
         | othewise obligated to do it, it is not done as a favor to
         | friends or something, paying that person to do the job anyway
         | is a very accepted practice in our society. Almost all of our
         | employers pay us to do things we might otherwise not do.
        
           | osigurdson wrote:
           | There is nothing wrong with extracting as much value as
           | possible from a small effort. It just seems highly unlikely
           | anyone would sponsor it so the request seems somewhat
           | ridiculous.
           | 
           | alias lsa=ls -a
           | 
           | Sponsor me!
        
             | raziel2p wrote:
             | Is someone currently, for free, doing the work the author
             | is suggesting sponsorship for? If not, it's not ridiculous.
        
               | osigurdson wrote:
               | Will you sponsor the maintenance of my alias command
               | above then? I don't think anyone else is maintaining such
               | a command. Or, is it ridiculous?
        
       | SOLAR_FIELDS wrote:
       | Dive is a great tool for debugging this. I like image reduction
       | work just because it gives me a chance to play with Dive:
       | https://github.com/wagoodman/dive
       | 
       | One easy low hanging fruit I see a LOT for ballooning image sizes
       | is people including the kitchen sink SDK/CLI for their cloud
       | provider (like AWS or GCP), when they really only need 1/100 of
       | that. The full versions of both of these tools are several
       | hundred mb each
        
         | bloopernova wrote:
         | Do you have a link to a recommended guide to slimming down the
         | cloud provider tools?
        
         | bhupesh wrote:
         | Can vouch for dive, the final system tree was generated by dive
         | (should have acknowledged it, my bad)
        
       | tuananh wrote:
       | ugh, i would hate to maintain this dockerfile. i actually dont
       | mind a 34MB docker image vs a 17MB image like this
        
       | mhitza wrote:
       | I didn't see it in the final tree listing, but I would expect the
       | fzf.tar.gz to linger around after extraction as it was never
       | removed. If that is so, should help squeeze a few more bytes out
       | of the final image.
        
         | tuananh wrote:
         | it's multi-stage build. they only copy fzf bin to the final
         | image (scratch)
        
       | c0l0 wrote:
       | [...]        COPY --from=ugit-ops /usr/bin/tr /usr/bin/tr
       | COPY --from=ugit-ops /bin/bash /bin/        COPY --from=ugit-ops
       | /bin/sh /bin/                # copy lib files        COPY
       | --from=ugit-ops /usr/lib/libncursesw.so.6 /usr/lib/        COPY
       | --from=ugit-ops /usr/lib/libncursesw.so.6.4 /usr/lib/        COPY
       | --from=ugit-ops /usr/lib/libpcre* /usr/lib/        COPY
       | --from=ugit-ops /usr/lib/libreadline* /usr/lib/        [...]
       | 
       | For me, insane sh*t like this proves that those who do not learn
       | from distribution and package management infrastructure
       | engineering history are condemned to reinvent it, poorly.
        
         | bhupesh wrote:
         | Hey author here.
         | 
         | I understand that you might have some context about package
         | managers that I am missing. Would genuinely like some resources
         | about your comment or maybe a bit of explanation.
         | 
         | Thanks
        
           | c0l0 wrote:
           | Hey there Bhupesh - apologies for the snark! I was just
           | venting some of the frustration I feel every day with modern
           | "devops" tooling ;)
           | 
           | I am in a bit of a rush right now (which is why I try my
           | absolute best to keep procrastinating on HN at the the
           | absolute minimum, I swear! ;)), but I will try to share some
           | insight later (potentially as a comment on your blog).
        
             | bhupesh wrote:
             | Thanks, appreciate the help!
        
             | codethief wrote:
             | I'd be interested in this, too, so I'd be grateful if you
             | could notify us here, wherever you end up posting your
             | comment!
        
           | tiziano88 wrote:
           | It may be worth looking at Nix if you haven't already
        
           | gbN025tt2Z1E2E4 wrote:
           | I explained a bit here in my reply to your other comment:
           | 
           | https://news.ycombinator.com/item?id=39243450
        
       | chasil wrote:
       | I have been able to run ksh93 in an nspawn container under
       | systemd in a tiny fraction of what is presented here.
       | 
       | I did this by tracking the output of the ldd command and moving
       | only needed libraries into the container.
       | 
       | Why is docker so big?
        
       | renewiltord wrote:
       | How does removing the shebang save two megabytes? Seems like a
       | lot. Is it the env binary?
        
         | bhupesh wrote:
         | Yes, the size of env closes to 2mb. I maybe wrong here, though.
         | Seems something is wrong.
         | 
         | I wasn't able to dig deep enough on why that was the case,
         | considering the "env" utility was coming from busybox which on
         | copy averages close to 900Kb.
        
       | zdw wrote:
       | Are "Random shell scripts from the internet" categorically worse
       | than "random docker images from the internet"?
       | 
       | With the shell script, you can literally read it in an editor to
       | make sure it isn't doing anything that weird. A single pass
       | through shellcheck would likely tell you if it's doing anything
       | that is too weird/wrong in terms of structure.
       | 
       | Auditing a docker container is way more difficult/complex.
       | 
       | "Dockerize all the things", especially in cases when the prereqs
       | aren't too weird, seems like it wastes space, and also is harder
       | to maintain - if any of the included components has a security
       | patch, it's rebuild the container time...
        
         | galleywest200 wrote:
         | Reading the Dockerfile should tell you what was done to create
         | the image. If you have trust issues around the "base" images
         | such as Debian or Fedora that is a different set of inquiries.
         | 
         | As for patching, you can tell your Dockerfile to always pull
         | the latest versions of the items you are most concerned about.
         | At that point rebuilding the container is as simple as deleting
         | it with "docker container stop <id> && docker container rm
         | <id>" and then run your docker-compose command again.
        
           | zdw wrote:
           | Does anyone read/diff the build commands every time they get
           | a new `latest` docker image?
           | 
           | There would already be implicit trust in whatever the local
           | OS's package manager laid down, and trying to add another set
           | of hard to audit binaries on top is not really an
           | improvement.
        
         | photonthug wrote:
         | > Are "Random shell scripts from the internet" categorically
         | worse than "random docker images from the internet"?
         | 
         | Yes, because inspection aside, at least with a docker
         | invocation you can specify the volumes
        
           | zdw wrote:
           | Does anyone in practical invocation specify the volumes?
           | 
           | Or would they wrap it in _yet another shell script that calls
           | docker with a set of options_ , or a compose file, etc?
           | 
           | This quickly turns into complexity stacked on complexity...
        
             | yjftsjthsd-h wrote:
             | > Does anyone in practical invocation specify the volumes?
             | 
             | First: yes, I have run docker with -v recently.
             | 
             | Second:
             | 
             | > Or would they wrap it in yet another shell script that
             | calls docker with a set of options, or a compose file, etc?
             | 
             | > This quickly turns into complexity stacked on
             | complexity...
             | 
             | I agree that it can get out of hand, but a Dockerfile, a
             | compose file, and whatever is going inside the container
             | can be an entirely reasonable set of files to have so long
             | as you stick with that and are reasonable about what goes
             | in each. Where to put it differently, I think it's okay
             | because they actually are separation of concerns.
        
             | msm_ wrote:
             | Yes I run:
             | 
             | sudo docker run -it -v (pwd):(pwd) my_dev_image
             | 
             | many times every day, to create a development enviromnent
             | in CWD. My_dev_image is a debian-based image with common
             | developer utilities (pip, npm, common packages installed).
             | I don't feel comfortable installing random packages from
             | the internet on my host machine, so I use docker for
             | everything.
        
           | nopurpose wrote:
           | https://github.com/containers/bubblewrap allows specifying
           | volumes for scripts too
        
         | amcpu wrote:
         | The dive utility helps tremendously for exploring the
         | filesystem contents of a container image. Combine that with the
         | output of `docker inspect` to look at the metadata and you
         | should be able to have a good understanding of what it will do
         | when running as a container.
        
           | zdw wrote:
           | Evaluating the whole contents of a filesystem is
           | significantly more complex than evaluating one shell script.
        
         | sigotirandolas wrote:
         | A script running in a container is mostly isolated from the
         | host by default, so it can't just upload whatever SSH keys /
         | Bitcoin wallets / other stuff you have lying around or add some
         | payload on your ~/.bashrc unless you explicit share those files
         | with the container.
        
           | zdw wrote:
           | Yes, I understand https://xkcd.com/1200/ as well.
           | 
           | Running _anything_ without understanding what it does it is
           | more dangerous than trying to understand it before running
           | it.
           | 
           | I'm arguing for _less complexity and easier auditing_ ,
           | instead of a series of complex layers that each add to a
           | security story, but make the overall result much harder to
           | audit.
        
             | eropple wrote:
             | To move directionally in the way you describe, you probably
             | have to make the user experience of running scripts of any
             | kind _much weirder_. macOS does this to some extent by
             | prompting via GUI if something tries to access data
             | directories on your system (though it confuses iTerm2 for
             | "anything iTerm2 runs" and that sucks), but I think people
             | would have a lot more problems with trying to do that in a
             | server shell.
             | 
             | To that end, Linux namespacing is probably a better way to
             | constrain the blast radius for most people. That's not to
             | say it should be an _either-or_ , but in the absence of a
             | _both-and_ because the userland is not set up for
             | sufficient policing, I think Docker containers are a pretty
             | clearly better solution.
        
           | ReleaseCandidat wrote:
           | This is true, but we are talking about running this script on
           | some codebase (or whatever you want to "git undo"). I mean "I
           | don't trust this script, but let's run it on our source code"
           | sounds a bit weird.
        
             | sigotirandolas wrote:
             | I agree, in this case it's hard to defend against a rogue
             | script or container image, as you need to give it read-
             | write access to your source code, so it could add a
             | malicious payload to your source code or install a Git hook
             | to break out of the container into your host or get some
             | malicious source code onto your company's Git server.
             | 
             | There are measures that could defend against this (run all
             | your development tools inside containers, and mandatory PRs
             | with reviews) but they are probably beyond many/most
             | developers are willing to do security-wise.
             | 
             | There are a lot of scenarios where I think security through
             | isolation/containerization makes a lot of sense (e.g. for
             | code analysis tools, end-user applications like video
             | games, browsers, etc.) but not too much for this particular
             | one.
        
         | swozey wrote:
         | If you want an example of how little importance vetting oci
         | images is to most ops/infra teams I have a great example- I
         | used to work on low level k8s multitenant networking stuff,
         | think cdns. Most of them use something like multus to split up
         | vfio paths between tenants. Think chopping your NIC into 24
         | private channels and each channel is one customer. The ENTIRE
         | path has to be private, the container starts and claims that
         | network path on the physical NIC. No network packet can ever be
         | accessed by another channel, server or container. I was alpha-
         | testing multus which controls this network pathing that every
         | customer would take ingress and egress out of a cluster and put
         | up some test containers on dockerhub.
         | 
         | Multus sits at the demarc line between the container and the
         | NIC channel. I'm not saying it's possible or ever been done but
         | if I were going to set up a traffic mirror somewhere it'd
         | logically have to be there or after the NIC..
         | 
         | I wrote it 5 years ago. I have no idea what version of multus
         | it's running but even today it's getting pulls, last pull 19
         | days ago. Overall pulls over 5 years is over 10k.
         | 
         | These containers would spin up every time a container starts on
         | k8s that attaches an ovf interface. So, it's pretty much
         | guaranteed that this is in use somewhere in someones scaling
         | infra. I don't know if I SHOULD delete the image and
         | potentially take down someones infra or just let them keep
         | chugging at it. I'm not paying for dockerhub.
         | 
         | https://hub.docker.com/repository/docker/swozey/multus/gener...
         | 
         | edit: Looks like it's installing the latest multus package so
         | not AS terrible but .. multus is not something to play loose
         | with versioning..
         | 
         | Also I really wish Dockerhub gave you more stats/analytics. It
         | really means nothing in the end but I'm curious. They don't
         | even tell you the number beyond 10k, it just says 10k+
         | downloads.
         | 
         | https://github.com/k8snetworkplumbingwg/multus-cni
        
           | buffet_overflow wrote:
           | Something like this would show up in perimeter
           | network/firewall logs correct? But if someone was mirroring
           | traffic to the same cloud provider you deploy in, it would be
           | less obvious to find out _which_ set of cloud IPs aren't
           | actually your own.
        
             | tryauuum wrote:
             | assuming you have both perimeter logs and a system which
             | notifies a human if something is weird in logs.
             | 
             | Do big clouds have a solution for this? I don't usually use
             | GCP / AWS so I don't know what they have
        
         | beeboobaa wrote:
         | > Auditing a docker container is way more difficult/complex.
         | 
         | I assume you mean auditing docker images. In which case, sure.
         | That's why you grab their dockerfile and build it yourself.
         | 
         | Though using dive[1] it's pretty easy to inspect docker images
         | too, as long as they extend a base image you trust.
         | 
         | [1] https://github.com/wagoodman/dive
        
           | iforgotpassword wrote:
           | > That's why you grab their dockerfile and build it yourself.
           | 
           | Then you still didn't audit anything. What you need to do is
           | inspect the docker file, follow everything it pulls in and
           | audit that, finally audit the script itself that the whole
           | container gets built for in the first place. Whereas when you
           | just download the script and run that directly, you only need
           | to do the last step.
        
             | beeboobaa wrote:
             | All of that is the same as a shell script, yes. A
             | dockerfile is essentially just a glorified shell script
             | installing dependencies, which you'd otherwise just be
             | doing yourself.
        
           | agumonkey wrote:
           | oh dang, dive is really a nice tool, per layer diff and/or
           | accumulated changes .. really nice
        
         | 2OEH8eoCRo0 wrote:
         | I never use containers from the web unless they're created be
         | the company or developer themselves. If they don't produce one
         | then I build my own.
        
       | zilti wrote:
       | Whenever I think there can't be any worse of a "use case" to
       | dockerize something, someone comes along and proves me wrong...
       | 
       | For the last goddamn time: Docker is not a package manager!
        
       | codethief wrote:
       | > In the Alpine ecosystem, it is generally not advised to pin
       | minimum versions of packages.
       | 
       | I think it would be more accurate to say, in the Alpine
       | ecosystem, it is generally not advised to pin versions of
       | packages _at all_. Actually, this is not so much a recommendation
       | as it is a statement of impossibility: You can 't pin package
       | versions (without your Docker builds starting to fail in a week
       | or two), period. In other words: Don't use Alpine if you want
       | reproducible (easily cacheable) Docker builds.
       | 
       | I had to learn this the hard way:
       | 
       | - There is no way to pin the apk package sources ("cache"), like
       | you can on Debian (snapshot.debian.org) and Ubuntu
       | (snapshot.ubuntu.com). The package cache tarball that apk
       | downloads will disappear from pkgs.alpinelinux.org again in a few
       | weeks.
       | 
       | - Even if you managed to pin the sources (e.g. by committing the
       | tarball to git as opposed to pinning its URL), or if you decided
       | to pin the package versions individually, package versions that
       | are up-to-date today will likely disappear from
       | pkgs.alpinelinux.org in a few weeks.
       | 
       | - Many images that build upon Alpine (e.g. nginx) don't pin the
       | base image's patch version, so you get another source of entropy
       | in your builds from that alone.
       | 
       | Personally, I'm very excited about snapshot images like
       | https://hub.docker.com/r/debian/snapshot where all package
       | versions and the package sources are pinned. All I, as the
       | downstream consumer, will have to do in order to stay up-to-date
       | (and patch upstream vulnerabilities) is bump the snapshot date
       | string on a regular basis.
       | 
       | Unfortunately, the images don't seem quite ready for consumption
       | yet (they are only published once a month) but see the discussion
       | on https://github.com/docker-library/official-
       | images/issues/160... for a promising step in this direction.
        
         | bhupesh wrote:
         | > I think it would be more accurate to say, in the Alpine
         | ecosystem, it is generally not advised to pin versions of
         | packages at all. Actually, this is not so much a recommendation
         | as it is a statement of impossibility: You can't pin package
         | versions (without your Docker builds starting to fail in a week
         | or two), period. In other words: Don't use Alpine if you want
         | reproducible (easily cacheable) Docker builds.
         | 
         | Agreed, should have been clear with my sentiment there. Thanks
         | for stating this :)
         | 
         | > Personally, I'm very excited about snapshot images like
         | https://hub.docker.com/r/debian/snapshot where all package
         | versions and the package sources are pinned. All I, as the
         | downstream consumer, will have to do in order to stay up-to-
         | date (and patch upstream vulnerabilities) is bump the snapshot
         | date string on a regular basis.
         | 
         | This is really helpful, thanks for sharing. Looks like it will
         | be a good change, fingers crossed.
        
       | Cu3PO42 wrote:
       | While I likely would not have made the same tradeoffs, I do
       | relate to the desire to get the image as small as reasonably
       | possible and commend the efforts. Going to "FROM scratch" is
       | likely going to get you one of the best results possible before
       | you start patching the application and switching out components.
       | 
       | I find it mildly ironic, however, that bundling the dependencies
       | of a shell script is - in some ways - the exact opposite of
       | saving space, even if it is likely to make running your script
       | more convenient.
       | 
       | Unfortunately, I don't have a great alternative to offer. The
       | obvious approach is to either let the users handle dependencies
       | (which you can also do with ugit) or write package definitions
       | for every major distribution. And if I were the author, I
       | wouldn't want to do that for a small side project either.
        
         | yjftsjthsd-h wrote:
         | > Unfortunately, I don't have a great alternative to offer. The
         | obvious approach is to either let the users handle dependencies
         | (which you can also do with ugit) or write package definitions
         | for every major distribution. And if I were the author, I
         | wouldn't want to do that for a small side project either.
         | 
         | Well... There's nix. Complete packaging system, fully
         | deterministic results, lots of features, huge number of
         | existing packages to draw from, works on your choice of Linux
         | distro as well as Darwin and WSL. All at the tiny cost of a
         | little bit of your sanity and being its own very deep rabbit
         | hole.
        
           | Cu3PO42 wrote:
           | I do love Nix, and I think much more people should use it,
           | but I don't really consider that a good alternative in the
           | context of my original comment.
           | 
           | I'd argue writing a Nix derivation isn't that different from
           | writing a package definition for any one Linux distribution.
           | It solves the distribution problem for people who use that
           | particular distribution/tool, not everyone. Now, Nix can be
           | installed on any distribution, but if I was going for
           | widespread adoption, I might point to Nix being a solution,
           | but I probably wouldn't advertise it as the main one.
        
       | k__ wrote:
       | When FirecrackerOS?!
       | 
       | Fly.io, deliver us.
        
       | codethief wrote:
       | Does anyone here have experience using Nix to build minimal
       | Docker images? How well does it work, and how does it compare to
       | the author's approach of manually copying shared libraries into a
       | scratch image?
        
         | SirensOfTitan wrote:
         | It works quite well and you can get very minimal docker images
         | using nix with very few tricks compared to this.
         | 
         | ...with that, building those nix images on Mac is still a bit
         | rough--there's some official docs and work on getting a builder
         | VM set-up, but it's still a bit rough around the edges.
        
         | codethief wrote:
         | Responding to myself: I see that someone else here in this
         | thread commented on Nix:
         | https://news.ycombinator.com/item?id=39241768
        
       | ilaksh wrote:
       | How would I use this? Say I just made a bad commit in my
       | terminal. How would I run this container to fix it? The container
       | doesn't have my working directory does it? Or is that the idea,
       | to mount a volume with the working for or something?
       | 
       | In that case, maybe it could be helpful, but to make it
       | convenient, don't I need a script that stays in my main system
       | and invokes the docker run command for me?
       | 
       | So if you do that and just give me a one liner install command to
       | copy paste then I guess this actually makes sense. A small docker
       | container could eliminate a lot of potential gotchas with trying
       | to install dependencies in arbitrary environments.
       | 
       | Except it's a bash script. I guess it would make more sense to
       | get rid of the dependency on fzf or something nonstandard. Then
       | they can just install your bash script.
       | 
       | For cases where you have more dependencies that really can't be
       | eliminated then this would make more sense to me.
       | 
       | Why does it need fzf? Is it intended to run the container
       | interactively?
        
         | bhupesh wrote:
         | > How would I use this? Say I just made a bad commit in my
         | terminal. How would I run this container to fix it? The
         | container doesn't have my working directory does it? Or is that
         | the idea, to mount a volume with the working for or something?
         | 
         | You can refer to usage guidelines on dockerhub
         | https://hub.docker.com/r/bhupeshimself/ugit
         | 
         | > So if you do that and just give me a one liner install
         | command to copy paste then I guess this actually makes sense. A
         | small docker container could eliminate a lot of potential
         | gotchas with trying to install dependencies in arbitrary
         | environments.
         | 
         | Yes, that was also an internal motivation behind doing this.
         | 
         | > Why does it need fzf? Is it intended to run the container
         | interactively?
         | 
         | Hey fzf is required by ugit (the script) itself. I didnt want
         | to rely on cli arguments to give ability to users undo command
         | per a matching git command. Adding a fuzzy search utility makes
         | it easier for people to search what they can undo about "git
         | tag" for example.
        
         | otteromkram wrote:
         | It's not that hard to undo a git commit.
         | 
         | I don't see what value the author's side project is bringing
         | other than adding complexity to a simple task (or, more likely,
         | bolstering their resume).
        
       | kjkjadksj wrote:
       | Whats wrong with make or dare I even suggest a package manager
       | like conda? I get having a half dozen dependencies can be
       | specified in tools like docker but its just another way to do the
       | same old task thats been solved a dozen ways for decades. We are
       | sharing a shell script here. Seems crazy to me to run an entire
       | redundant file system to share a couple hundred line bash script.
       | Plus now users need docker skills as well as command line skills
       | to install and run this tooling. There are corners of the command
       | line user/programmer world that have thankfully not been polluted
       | by docker yet so its not nearly as widespread a tool as setting
       | up environments for bash scripts using some older ways.
        
         | swozey wrote:
         | I think you're seeing this from the perspective of someone who
         | runs a container for development and not someone who has to run
         | a development container at hyperscale.
         | 
         | We can't pass around bash scripts anymore. Every system has to
         | be fungible, reproducible en masse and as agnostic to the
         | underlying technology its on as possible.
        
           | kjkjadksj wrote:
           | You aren't writing machine code that can run on anything
           | though, you have this docker dependency in order to run the
           | container. Its just trading one dependency for another
           | because docker is in style these days. I don't think
           | deploying bash scripts at scale was some insurmountable
           | challenge before docker showed up.
        
             | swozey wrote:
             | We don't have a "docker dependency" - we run OCI
             | containers. You're equating Docker which is a tooling eco
             | system with containers.
             | 
             | Containers have been around for a LONG time, Solaris,
             | jails, cgroups, etc are all _built-in_ to the kernels we
             | use today.
             | 
             | You don't need to use docker.
             | 
             | The idea is fungible services, whether it's literally just
             | a container that starts with a go binary I can quickly
             | scale 1000s of COMPLETELY independent processes and
             | ORCHESTRATE THEM over thousands of clusters from one
             | centralized system.
             | 
             | If I need to shift 1000s of that one go binary to US-WEST-1
             | because US-EAST-1 is down I can run automate it or run one
             | command based on a kubernetes tag label and shift traffic.
             | 
             | These are just a few of the massive benefits we get with
             | containers.
             | 
             | I can deploy an ENTIRE datacenter with a yaml file. My
             | ENTIRE companies infrasture MTTR (mean time to recovery)
             | from a total outage, starting from a github repo is less
             | than 35 minutes and we're a billion dollar company and 80%
             | of that time is starting load balancers and clusters. The
             | only NOT agnostic hardware stuff in any of this are the
             | load balancers and network related things as each provider
             | has its own apis, IAM/Policies, etc that are completely
             | unique between providers/datacenters. Nothing cares about
             | what ram, distro, cpu or anything else is being used, we
             | can deploy anywhere ARM or x86.
             | 
             | Without containers I would need a $150k F5 load balancer to
             | distribute load between a ton of $30k dell poweredeges (and
             | I'd need this x1000's).
             | 
             | I've been in Infrastructure for 15+ years at massive scale,
             | webhosts, cdns, I do NOT want to go back to not using
             | containers ever. None of my team writes any non container
             | code or infra. The FIRST thing we do in every single repo
             | is make a dockerfile and docker-compose.yml to easily work
             | on things and every single server any company has in the
             | last decade of my SRE career we've migrated to containers
             | and never once regretted it.
        
       | swozey wrote:
       | I've been writing containers for 10+ years and this last few
       | years I've started using supervisord as pid 1 that manages
       | multiple processes inside the container for various things that
       | CAN'T function as disparate microservices in the event that one
       | fails/updated/etc a lot more.
       | 
       | And man I love it. It's totally against the 12 microservice laws
       | and shoudl NOT be done in most cases, but when it comes to
       | troubleshooting- I can exec into a container anywhere restart
       | services because supervisord sits there monitoring for the
       | service (say mysql) to exit and will immediately restart it. And
       | because supervisor is pid1 as long as that never dies your
       | container doesn't die. You get the benefit of the
       | containerization and servers without the pain of both, like
       | having to re-image/snapshot a server once you've thoroughly
       | broken it enough vs restarting a container. I can sit there for
       | hours editing .conf files trying to get something to work without
       | ever touching my dockerfile/scripts or restarting a container.
       | 
       | I don't have to make some changes, update the
       | entrypoint/dockerfile, push build out, get new image, deploy
       | image, exec in..
       | 
       | I can sit there and restart mysql, postgres, redis, zookeeper, as
       | much as I want until I figure out what I need done in one go and
       | then update my scripts/dockerfiles THEN prepare the actual
       | production infra where it is split into microservices for
       | reliability and scaling, etc.
       | 
       | I've written a ton of these for our QA teams so they can hop into
       | one container and test/break/qa/upgrade/downgrade everything
       | super quick. Doesn't give you FULL e2e but it's not we'd stop
       | doing what tests we already do now.
       | 
       | I mention this because it was something I did once a long long
       | time ago but completely forgot something that you could do until
       | I recently went that route and it really does have some useful
       | scenarios.
       | 
       | https://gdevillele.github.io/engine/admin/using_supervisord/
       | 
       | I'm also really tired of super tiny containers that are absolute
       | nightmares to troubleshoot when you need to. I work on prod infra
       | so I need to get something online immediately when a fire is
       | happening and having to call debug containers or manually install
       | packages to troubleshoot things is such a show stopper. I know
       | they're "attack vectors" but I have a vetted list of aliases,
       | bash profiles and troubleshooting tools like jq mtr etc that are
       | installed in every non-scratch container. My containers are all
       | standardized and have the exact same tools, logs, paths, etc. so
       | that everyone hopping into one knows what they can do.
       | 
       | If you're migrating your architecture to ARM64 those containers
       | spin up SO fast that the extra 150-200mb of packages to have a
       | sane system to work on when you have a fire burning under you is
       | worth it. For some scale the cross datacenter/cluster/region
       | image replication would be problematic but you SHOULD have a
       | container caching proxy in front of EVERY cluster anyway. Or at
       | least at the datacenter/rack. It could be a container ON your
       | clusters with it's storage volume a singular CEPH cluster, etc.
        
       | adrianmonk wrote:
       | > _The use of env is considered a good practice when writing
       | shell scripts, used to tell the OS which shell interpreter to use
       | to run the script_
       | 
       | When using a shebang line, the reason for 'env' is actually
       | something different.
       | 
       | You can just leave out 'env' and do a shebang with 'bash'
       | directly like this:                   #! /usr/bin/bash
       | 
       | But the problem with that is portability. On different systems,
       | the correct path may be /bin/bash or /usr/bin/bash. Or more
       | unusual places like /usr/local/bin/bash. On old Solaris systems
       | that came with ksh, bash might be somewhere under /opt with all
       | the other optional software.
       | 
       | But 'env' is at /usr/bin/env on most systems, and it will search
       | $PATH to find bash for you, wherever it is.
       | 
       | If you're defining a Docker container, presumably you know
       | exactly where bash is going to be, so you can just put that path
       | on the shebang line.
       | 
       | TLDR: You don't have to have a shebang, but you can have a
       | shebang at no cost because _your_ shebang doesn 't need an env.
        
       | hitpointdrew wrote:
       | Dockerizing a shell script????
       | 
       | Unless your tool is converted to a service how would anyone ever
       | use this? Do you expect them to run their project inside of your
       | container?
       | 
       | This is very bizarre.
        
         | oftenwrong wrote:
         | It's quite typical. You `docker run`, and specify the options
         | to mount the work tree of the project into the container.
        
       | avgcorrection wrote:
       | > Yeah, I know, I know. REWRITE IT IN GO/RUST/MAGICLANG. The
       | script is now more than 500+ lines of bash.
       | 
       | These screeds get more and more random.
       | 
       | The standard advice was always to just not let a program in Bash
       | get beyond X lines. Then move to a real programming language.
       | Like Python (est. 1991).
        
       | citruscomputing wrote:
       | This is neat :)
       | 
       | I love going and making containers smaller and faster to build.
       | 
       | I don't know if it's useful for alpine, but adding a
       | --mount=type=cache argument to the RUN command that `apk add`s
       | might shave a few seconds off rebuilds. Probably not worth it, in
       | your case, unless you're invalidating the cached layer often
       | (adding or removing deps, intentionally building without layer
       | caching to ensure you have the latest packages).
       | 
       | Hadolint is another tool worth checking out if you like spending
       | time messing with Dockerfiles:
       | https://github.com/hadolint/hadolint
        
       | nunez wrote:
       | I love reducing Docker images to their smallest forms. It's great
       | for security (minimizes the bill of materials and makes it easier
       | to update at-risk libraries and such), makes developers really
       | think about what their application absolutely needs to do what it
       | needs to do (again, great for security), and greatly improves
       | startup performance (because they are smaller).
       | 
       | We can definitely go smaller than 20MB and six layers.
       | 
       | Here's a solution that compresses everything into a single 8.7MB
       | layer using tar and an intermediate staging stage:
       | https://gist.github.com/carlosonunez/b6af15062661bf9dfcb8688...
       | 
       | Remember, every layer needs to be pulled individually and Docker
       | will only pull a handful of layers at a time. Having everything
       | in a single layer takes advantage of TCP scaling windows to
       | receive the file as quickly as the pipe can send it (and you can
       | receive it) and requires only one TCP session handshakes instead
       | of _n_ of them. This is important when working within low-
       | bandwidth or flappy networks.
       | 
       | That said, in a real-world scenario where I care about
       | readability and maintainability, I'd either write this in Go with
       | gzip-tar compression in the middle (single statically-compiled
       | binaries for the win!) or I'd just use Busybox (~5MB base image)
       | and copy what's missing into it since that base image ships with
       | libc.
        
       ___________________________________________________________________
       (page generated 2024-02-03 23:00 UTC)