[HN Gopher] Docker in Production: A History of Failure (2016)
       ___________________________________________________________________
        
       Docker in Production: A History of Failure (2016)
        
       Author : flanfly
       Score  : 85 points
       Date   : 2021-07-27 15:21 UTC (7 hours ago)
        
 (HTM) web link (thehftguy.com)
 (TXT) w3m dump (thehftguy.com)
        
       | debarshri wrote:
       | It was year 2016, kubernetes did not have jobs, cronjobs,
       | statefulsets. Pods would get stuck in terminating state or
       | container creating state. Networking in kubernetes was wonky. AWS
       | did not have support for EKS. It used be painful.
       | 
       | It is year 2021, 1000s of new startups around kubernetes, more
       | features, more resource types. Pods would still get stuck in
       | terminating state or container creating state. It still pretty
       | painful.
        
         | HelloNurse wrote:
         | > Networking in kubernetes was wonky.
         | 
         | Can you elaborate? Does Kubernetes add some thrills to the
         | relatively simple Docker network configuration?
        
           | krab wrote:
           | Yes, Kubernetes (actually its add-ons) provide a virtual
           | network that unifies communication within the cluster so you
           | don't need to care on which computer your service runs.
        
           | jrockway wrote:
           | You can make it as complicated as you want it to be; part of
           | setting up a cluster is picking the networking system
           | ("CNI"). Cloud providers often have their own IPAM (i.e. on
           | Amazon, you get this:
           | https://docs.aws.amazon.com/eks/latest/userguide/pod-
           | network... each Pod gets an IP from your VPC, resulting in
           | weird limits like 17 pods per instance because that's how
           | many IP addresses you can have for that particular instance
           | type throughout EC2).
        
           | kaidon wrote:
           | Please take a seat and let the joys of k8s networking
           | overwhelm your senses:
           | 
           | https://kubernetes.io/docs/concepts/cluster-
           | administration/n...
           | 
           | And yes... Kubernetes network configuration is on a whole
           | different level from docker networking.
        
             | theptip wrote:
             | To be fair, multi-node networking of any sort is on a
             | different level than single-host docker networking.
             | 
             | If you ever tried to use Docker Swarm to network multiple
             | nodes, god help you.
             | 
             | Also worth noting that almost all users of K8s don't
             | actually need to operate a cluster, the hosted offerings
             | handle all of that for you. You just need to understand the
             | Service object, and maybe Ingress if you're trying to do
             | some more advanced cert management or API gateway stuff.
             | 
             | It's a common meme around here to point in horror to the
             | complexity that is abstracted away under the K8s cluster
             | API, and claim that k8s is really hard to use. I think
             | that's mostly misguided, the hosted offerings like GKE
             | really do a good job of hiding away all that complexity
             | from you.
             | 
             | Honestly I think that it's defensible to say that the k8s
             | networking model is in most cases _simpler_ than what you'd
             | end up configuring in AWS / GCP to route traffic from the
             | internet to multiple VM nodes.
        
               | kazen44 wrote:
               | > Honestly I think that it's defensible to say that the
               | k8s networking model is in most cases _simpler_ than what
               | you'd end up configuring in AWS / GCP to route traffic
               | from the internet to multiple VM nodes.
               | 
               | How is routing from the internet to multiple servers a
               | problem?
               | 
               | usually, you have either one of these setups:
               | 
               | - you run a loadbalancer that distributes traffic across
               | your nodes. (This loadbalancer could even be distributed
               | thanks to BGP).
               | 
               | - you either run your own firewall or have a managed one,
               | in which you either announce your IP prefix yourself, or
               | they are announced for you by your uplink provider.
               | 
               | - you run an anycast setup (for, for example, globally
               | distributed DNS). and announce multiples of the same
               | prefix across the globe. Routing in the DFZ does the rest
               | for you.
               | 
               | Streched L2 across the globe/internet is also possible
               | (although not very performant) either by doing IPsec
               | tunneling, or by buying/setting up L2VPN services.
               | (either MPLS or VXLAN based).
        
               | theptip wrote:
               | I didn't say it was a problem. My claim was just that
               | it's easier in GKE than in GCE/EC2.
               | 
               | I only mentioned multi-node because exposing a single VM
               | to the internet is trivial -- just give it a public IP --
               | and thus is not an apples-to-apples comparison with the
               | multi-node load balancing that you get from the entry-
               | level k8s configuration of Service > Pod < Deployment.
        
           | kazen44 wrote:
           | the largest issue with kubernetes networking seems to be the
           | lack of integration with modern datacenter networking
           | technology.
           | 
           | Things like VXLAN-EVPN are supported on paper, but are no
           | where near mature compared to offerings from normal
           | networking vendors. Heck, even the BGP support inside
           | kubernetes is lacking. Which is a great shame because it
           | creates a barrier between pods and the physical world.
           | (Getting a VXLAN VTEP mapped to a kubernetes node is a major
           | PITA for instance).
           | 
           | Most major cloud providers seem to have fixed this by
           | building even more overlay networks (with the included
           | inefficiencies).
        
         | dehrmann wrote:
         | Around 2015, I was at Spotify, and we were using a container
         | orchestrator build in-house named Helios. They didn't build it
         | because Kubernetes wasn't invented there; they built it because
         | Kubernetes didn't exist, yet.
        
           | lacion wrote:
           | kubernetes was released in 2014, in late 2015 I already had a
           | cluster marked as released candidate that was put into
           | production early 2016.
           | 
           | so im guessing here that spotify wrote their own becouse they
           | had a spesific requirement?
           | 
           | nomad was also early days in 2015
        
             | dehrmann wrote:
             | I'm pretty sure the work started before Kubernetes was
             | released, and even then, it wasn't clear that was going to
             | be the de factor orchestrator.
        
         | jrockway wrote:
         | I haven't seen these failure modes in 2021. We do managed
         | clusters at work and have created around 100,000 of them, and
         | basically all the pods we intend to start start, and all the
         | pods we intend to kill die -- even with autoscaling that
         | provisions new instance types. Our biggest failure mode is TLS
         | certificates failing to provision through Let's Encrypt, but
         | that has nothing to do with Kubernetes (that is a layer above;
         | what we _run_ in Kubernetes).
         | 
         | EKS continues to be painful. It has gotten better over the
         | years, but it is a chore compared to GKE. I like to imagine
         | that Jeff Bezos walked into someone's office, said "you're all
         | fired if I don't have Kubersomethings in two weeks", and that's
         | what they launched.
        
           | cpach wrote:
           | You have probably thought about this already but I must admit
           | I'm curious: If you're on AWS, can you not use Certificate
           | Manager instead of Let's Encrypt?
        
             | jjoonathan wrote:
             | Certificate Manager pushes you (shoves you, really) in the
             | direction of using AWS managed services. They make
             | certificate installation/rotation really easy for their own
             | services and unnecessarily difficult for any that you
             | implement yourself.
             | 
             | (This may have changed in the last year or two, but it was
             | certainly this way when I tried it.)
        
               | bashinator wrote:
               | In my experience, it's difficult-to-impossible to use
               | AWS' certificate management and LB termination in
               | conjunction with Envoy-based networking like Istio or
               | Ambassador.
        
               | jjoonathan wrote:
               | Yeah, the AWS LB has more issues than that, too. I'm
               | pretty sure it's just nginx under the hood but they won't
               | tweak the simplest parameters for you, even if you make a
               | colossal stink, even if your company spends seven figures
               | a year. I wonder if it isn't a decade-old duct-tape-and-
               | bailing wire solution that shares the same config across
               | literally every customer or something. Rolling our own
               | was almost a relief -- the pile of awkward workarounds
               | had grown pretty high by the point we bit the bullet.
        
             | jrockway wrote:
             | I'm actually not on AWS, just used EKS extensively at my
             | last job (and we still manually test our software against
             | it).
             | 
             | AWS burned me hard with forgetting to auto-renew certs at
             | my last job. It just stopped working, the deadline passed,
             | and only a support ticket and manual hacking on their side
             | could make it work. cert-manager has been significantly
             | more reliable and at least transparent. The mistake we make
             | right now is asking for certificates on demand in the
             | critical path of running our app -- but since we control
             | the domain name, we could easily have a pool of domain
             | names and certificates ready to go. Our mistake is having
             | not done that yet.
        
           | shaicoleman wrote:
           | What are the EKS pain points?
        
             | jrockway wrote:
             | The biggest pain point is having to manually use
             | cloudformation to create node pools. This is especially
             | irritating when you just need to roll the Linux version on
             | nodes -- takes half a day to do right. In GKE, it's just a
             | button in the UI (or better, an easy-to-use API), and you
             | can schedule maintenance windows for security updates
             | (which are typically zero downtime anyway, assuming you
             | have the right PodDisruptionBudgets). I think AWS fixed
             | that. I remember when I used it, they said they had some
             | new tool that would handle that, but you had to re-create
             | the cluster from scratch. This was a couple years ago, and
             | is probably decent nowadays.
             | 
             | There are other warts, like certain storage classes being
             | unavailable by default (gp3), the whole ENI thing for Pod
             | IPs, the supported version being way out of date, etc. EKS
             | has always felt like "minimum viable product" to me -- they
             | really want you to use their proprietary stuff like
             | ECS/Fargate, CloudFormation, etc. If you're already on AWS
             | and want Kubernetes, it's just what you need. If you could
             | pick any cloud provider for mainly Kubernetes, it wouldn't
             | be my first choice.
             | 
             | Having used EKS, GKE, and DOKS, I definitely prefer GKE.
             | GKE is very feature-rich, and the API for managing clusters
             | works well. The nodes are also cheaper than AWS. (I use
             | DOKS for my personal stuff and I haven't had any problems,
             | and it is free, but it's missing features like regional
             | clusters that you probably want for things you make money
             | off of.)
        
               | bashinator wrote:
               | For what it's worth, there's an off-the-shelf terraform
               | module for EKS that is far simpler to use than AWS'
               | cloudformation tooling, which does allow you to pass in a
               | custom AMI and multiple nodegroup configurations as input
               | parameters.
               | 
               | https://registry.terraform.io/modules/terraform-aws-
               | modules/...
        
           | debarshri wrote:
           | This year, I have seen those issues popping up in
           | statefulsets alot. I realised somebody in the team was force
           | deleting it. It is actually well documented.
           | 
           | https://kubernetes.io/docs/tasks/run-application/force-
           | delet...
           | 
           | I have seen few scenarios where people patching a statefulset
           | actually screws up the volume mount. It is sometimes not
           | evident where the error is for instance if it is the CSI or
           | in the scheduler unless you deepdive into the issue.
        
         | CSDude wrote:
         | We spawn thousands of pods per day for jobs and never get those
         | stuck and it was not the case in 2018 either. Not sure what is
         | it you are doing causes this.
        
         | halfmatthalfcat wrote:
         | I have used Kubernetes extensively over the past couple years
         | and have never seen pods stuck in terminating or creating state
         | that didn't have to do with errors in container creation (your
         | Dockerfile/bootstrapping is messed up) or issues with
         | healthchecks.
        
       | _joel wrote:
       | In 2016 I started at a company that had no build procedures and
       | deployed to a variety of linux versions, developed on windows. It
       | was a nightmare for administration, no automation, no monitoring.
       | I implemented containers and most of the process was getting the
       | developers on board. Having technical sessions with them to
       | understand what they needed and ease them into the plan so they
       | felt enfranchished. Doing this vastly increased productivity,
       | devs could take off the shelf compose files that were written for
       | common projects (it was a GIS shop) and meant they could
       | concentrate on delivering code. It helped no end.
       | 
       | Sure there's issues (albeit a lot fewer as time progressed) with
       | docker but for what it gained in productivity and developer's
       | sanity, it was very welcome.
        
       | zz865 wrote:
       | Our big project has moved from physical servers to Openshift. Its
       | taken a lot of work, much more than expected. The best thing is
       | that developers like it on their resume, which is a bigger
       | benefit than you'd think as we've kept some good people on the
       | team. For users I see zero benefit. CI pipeline is just more
       | complicated and probably slower.
       | 
       | Cost wise it was cheaper for a while but now RedHat are bumping
       | up licensing costs so now I think is about the same costs.
       | 
       | Overall it seems like a waste of time, but has been interesting.
        
         | mixermachine wrote:
         | Moving from classic servers to containers you get:
         | 
         | - Builds with fixed dependencies that never change. Rollback is
         | easy
         | 
         | - Easy deployment of a prod environment on a local machine
         | 
         | - Fast deployment
         | 
         | - Easy automation (use version X with config Y)
         | 
         | With Kubernetes (or other derivates like Openshift) you get:
         | 
         | - Auto scaling
         | 
         | - Fail over
         | 
         | - Better resource usage if multiple environments are executed
         | 
         | - Abstraction of infrastructure
         | 
         | - Zero downtime deployment (biggest point for my company, we
         | deploy >3 times per week)
         | 
         | There are applications that do not need Kubernetes or even
         | containers, but is this list really nothing oO?
         | 
         | I can imagine that if you use Kubernetes just like a classic
         | cluster it could seem like an unnecesarry added complexity but
         | you gain a lot of things.
        
           | TheDong wrote:
           | Each of those benefits are things I had before using
           | containers or kubernetes, and were simpler.
           | 
           | > Builds with fixed dependencies that never change. Rollback
           | is easy
           | 
           | Any good build system already did this, such as Bazel, or a
           | Gemfile.lock. We'd just snapshot AMIs to keep OS dependencies
           | fixed... which is what Docker images effectively do. If you
           | re-docker-build the same Dockerfile, it's not like you get
           | the same result of "apt-get install libxml" the next time
           | either.
           | 
           | > Easy deployment of a prod environment on a local machine
           | 
           | How containers are deployed varies wildly between prod and
           | the local machine. All the things that were hard before are
           | still hard. Things like secrets and external dependencies
           | still usually vary.
           | 
           | If prod is a kubernetes environment, getting a suitable k8s
           | environment setup locally sucks, especially since it will
           | probably have a different ingress controller, load balancer
           | setup, storage classes available, resource requests, etc. If
           | prod is kubernetes and local is docker-compose, that honestly
           | seems like just as much work to create a second way to run
           | the stack than just using a bash script + "npm start" or
           | "bundle exec rails server" or whatever.
           | 
           | Either way, it's not really a prod environment. It's hard to
           | run identical-to-prod environments locally, and those
           | problems are related to secrets and clouds and such, not due
           | to the lack of containers, in my experience.
           | 
           | > Fast deployment
           | 
           | In my experience, containers haven't sped up deployment.
           | Let's say you use ubuntu for your host and container's OS.
           | Before containers, this meant you had to download one version
           | of libssl ever, and that was it. If there was an update to
           | libz, that didn't require a new download of libssl. After
           | containers, if you build your container for app1 last week,
           | and your container for app2 today, the "FROM ubuntu" likely
           | resolves to a different image. Both your apps now have
           | different "ubuntu" layers, which probably have the same
           | version of libssl, but deduplication of downloads only
           | happens if the whole layer is identical.
           | 
           | In essence, we went from downloading 1 copy of libssl (for
           | the host OS only) to 3 copies (host OS + 2 containers w/
           | different ubuntu bases), and there's no deduplication.
           | 
           | That by itself seems like it has to be slower since there's
           | an inherent increase in network bandwidth that has to happen.
           | Even if you have a shared base image, you're at least
           | doubling the downloads of libssl since before you could use
           | the host's copy only.
           | 
           | All the items you listed under k8s are things I had before
           | it, excluding "Abstraction of infrastructure". Frankly, if
           | you have a well-made load balancer, it's hard not to have
           | zero-downtime deployments and auto-scaling.
        
             | mixermachine wrote:
             | > We'd just snapshot AMIs to keep OS dependencies fixed
             | 
             | This is a good solution, but I would not call it easier.
             | 
             | Using docker container feels like installing an app on my
             | smartphone. I choose the version and it will always work
             | like I build it at date x without an additional system.
             | Works for every programming language with every dependency
             | out of the box. Python, Java, Javascript, GO, Ocaml, C, ...
             | 
             | > How containers are deployed varies wildly between prod
             | and the local machine
             | 
             | I just brought a product of my company to Kubernetes.
             | 
             | Run helm upgrade --install . -f dev-values.yaml for dev
             | 
             | Run helm upgrade --install . -f prod-values.yaml for prod
             | (of course you need the secrets there. Jenkins has them).
             | 
             | My laptop does run an environment with all components of
             | the prod env. Something like email and sap services are of
             | course mocked, but everything else? All on my machine. Why
             | not?
             | 
             | I can spin up a new test environment for customers with new
             | settings on the same day.
             | 
             | > Both your apps now have different "ubuntu" layers
             | 
             | We use a base image that does change not that often. Even
             | if: no problem, the registry is connected via 1000 MBit/s
             | and zero-downtime deployment does its magic so I don't even
             | notice if it takes one or two minutes.
             | 
             | Another thing: my node (or VM) libs and the libs of my
             | software should not be connected in any way (at least for
             | me). I want to patch my nodes and my software
             | independently. Different software should also not be bound
             | to libs of another software.
             | 
             | > All the items you listed under k8s are things I had
             | before it
             | 
             | - How do you easily scale up? Including starting new
             | machines and spinning down machines that are no longer
             | needed
             | 
             | - How are multiple software parts executed on one host?
             | 
             | - How do you do fail over?
             | 
             | I know that everything can be done without Kubernetes. With
             | enough time and money one can create large systems that do
             | this.
             | 
             | I spun up a new Kubernetes cluster and ported our product
             | (already containerized) on the cluster in about three
             | months.
             | 
             | Really: I also love the classic dev ops and have a proxmox
             | server at home, but Kubernetes just solves many problems at
             | once in a short time.
        
               | TheDong wrote:
               | > How do you easily scale up? Including starting new
               | machines and spinning down machines that are no longer
               | needed
               | 
               | AWS autoscaling groups + cloudwatch for adding and
               | removing machines + checking them into load balancers is
               | something that has worked for longer than K8s has been a
               | thing.
               | 
               | > How are multiple software parts executed on one host?
               | 
               | systemd units, or for more resource hungry things,
               | multiple autoscaling groups.
               | 
               | The overhead of running the kubelet on each host + etcd
               | cluster + apiserver means that I still end up with fewer
               | hosts if I just run each component on every single host
               | vs scaling different deployments independently.
               | 
               | It is true that kubernetes might be more resource
               | efficient in some combination of nodes and software, but
               | at under 10 servers, I've always found the overhead of
               | the etcd cluster + apiserver + kubelet to dwarf any
               | savings from not just running 10 copies of my software.
               | 
               | > How do you do fail over?
               | 
               | The AWS-managed load balancer can fail over based on
               | health checks failing, metrics, or I can add/remove
               | servers from it manually. You can also do DNS health
               | checks, or add a layer of haproxy/nginx/whatever if you
               | want.
               | 
               | It's not like k8s has some magic ability to fail over
               | under the hood. It's just using k8s service objects
               | (probably LoadBalancer type), which does the same thing.
        
               | secondcoming wrote:
               | Correct. We use the pretty much the same setup on GCP.
               | All scaling is automatic. When we deploy new code we just
               | run a jenkins job that creates an image from custom
               | debian packages. Push that to GCP and it rolls it out
               | automatically to all our DCs.
        
           | zz865 wrote:
           | Yeah we have fixed usage so scaling or easy failover is not
           | something we need.
        
             | mixermachine wrote:
             | Then I can understand you well. Kubernetes then just
             | provides zero-downtime and additional complexity. When you
             | already have something like a deployment window (like 2 am
             | to 3 am) then ZDT also does not matter.
        
           | jstimpfle wrote:
           | My gut feel is that Docker is part of a trend of decreasing
           | software quality.
           | 
           | When someone writes "fixed dependencies" I read "developers
           | can more easily add more bloat before the cardhouse tumbles".
           | That happens for example when the "fixed dependencies" are
           | upgraded.
           | 
           | I am miserable having to touch all this junk. I feel a
           | project is right when I can just git clone it (a few
           | megabytes of data at most) and am left with a self contained
           | repo that was written with minimal dependencies (optimally
           | stored in-tree), and that can be easily built in seconds with
           | a simple shell script on any reasonably modern system.
           | 
           | The bare bones way takes a good amount of initial work, but
           | mostly it's a learning experience. Once one understands a few
           | principles of writing portable software, I'm sure it saves a
           | huge amount of time compared to adding all these shells of
           | junk.
           | 
           | --
           | 
           | Oh yeah, I have zero experience about integrating with
           | Kubernetes or whatever. I've been a small time user of
           | Jenkins and CircleCI (unvoluntarily), and when I don't have
           | to set it up and it actually works, it's alright and can help
           | where the developer maybe lacks a bit of discipline (build
           | all targets, run all the tests).
           | 
           |  _But_ , I doubt these technologies are a replacement for an
           | ergonomic build environment (with simple python build script
           | or even a crude Makefile). Is incremental building a thing on
           | any of this CI pipelines? Because one thing I want is
           | building really really fast, and it's already way too much
           | overhead if I have to go through a git commit to check this
           | stuff. Don't even think about requiring a full rebuild or
           | Docker image build just to get some quick feedback on a code
           | change.
        
           | [deleted]
        
         | geerlingguy wrote:
         | OpenShift is about 10x more complex than basic Docker /
         | containers, and probably 2-4x more complex than plain old
         | Kubernetes.
         | 
         | I've seen more success from organizations running smaller K3s
         | or K8s clusters (if they need the orchestration) or just
         | running small apps via Docker/Docker Compose separately, using
         | a CI system (even as simple as GitHub Actions) to manage
         | deployments.
        
           | zz865 wrote:
           | Yeah its also a problem that our org has infrastructure teams
           | that manage the openshift clusters and they are under
           | resourced so dont help or often can't figure out how to fix
           | problems. Linux sysadmins know what they're doing as the core
           | infrastructure has been mostly the same for the last few
           | decades.
        
       | theamk wrote:
       | Back in 2016 during the original discussion of this article,
       | amount said it very well in [0]:
       | 
       | "If you hit this many problems with any given tech, I would
       | suggest you should be looking for outside help from someone that
       | has experience in the area."
       | 
       | - Yes, "clean old images" was not implemented back then. His hack
       | is not that bad, and one can filter out in-use images if they
       | want to pretty easily. Anyway, docker does have "docker image
       | prune" now.
       | 
       | - Storage driver history discussion is entirely incorrect. No,
       | docker did not invent overlayfs nor overlayfs2. There was a whole
       | big drama of aufs not mainlining, but it was mostly in context of
       | live cd's, not docker.
       | 
       | But the big missing thing is: you should not store important data
       | in docker images, Docker is designed to work with transient
       | container. If you have a database, or a high-performance data
       | store, you use volumes, and those _bypass_ docker storage drivers
       | completely.
       | 
       | - The database story is completely crazy... judging by their
       | comments, they decided to store the database data in the docker
       | container for some reason and got all the expected problems
       | (unable to recover, hard to migrate, etc....). It is not clear
       | why they didn't put database data on the volume, there is a 2016
       | StackOverflow question discussing it [0].
       | 
       | Also, "Docker is locking away [...] files through its abstraction
       | [...] It prevents from doing any sort of recovery if something
       | goes wrong." Really? I did recovery with docker, the files are
       | under /var/lib/docker in the directory named with guid, a simple
       | "find" command can locate them.
       | 
       | - By default, Docker uses Linux networking and yes, the
       | configuration is complex so it adds overhead. That's why there is
       | --net=host option (which was there for a long time) which just
       | bypasses that all.
       | 
       | [0] https://news.ycombinator.com/item?id=12872636
       | 
       | [1] https://stackoverflow.com/questions/40167245/how-to-
       | persist-...
        
       | KronisLV wrote:
       | The article seems to mention problems with AUFS, overlay and
       | possibly overlay2 as well.
       | 
       | However one of the things that i haven't quite understood, is why
       | people use Docker volumes that much in the first place, or even
       | think that they need to use additional volume plugins in most
       | deployments?
       | 
       | If it's a relatively simple deployment, that has some persistent
       | data and it's clear on which nodes the containers could be
       | scheduled (either by label or by hostname), what would prevent
       | someone from just using bind mounts (
       | https://docs.docker.com/storage/bind-mounts/ )?
       | 
       | And if you need to store it on a separate machine, why not just
       | use NFS on the host OS to mount the directory which you will bind
       | mount? Or, alternatively, why not just use GlusterFS or Ceph for
       | that sort of stuff, instead of making Docker attempt to manage
       | it?
       | 
       | For example, Docker Swarm fails to launch containers if the bind
       | mount path doesn't exist, but that bit can also be addressed by
       | creating the necessary directory structure with something like
       | Ansible - and then you're not only able to not worry about
       | volumes and the risk of them ever becoming corrupt, but you also
       | have the ability to inspect the contents of the container storage
       | on the actual host. Say, if there are some configuration files
       | that need altering (seeing as not all of the containerized
       | software out there follows 12 Factor principles with environment
       | configuration either), or you just want to do some backups for
       | the data that you've stored in a granular fashion.
        
       | pbecotte wrote:
       | Even in 2016, I had been running production services in Docker
       | successfully. Its interesting to me that they see the problem
       | "Docker isn't designed to store data" without also seeing the
       | solution "the docker copy-on-write filesystem isn't designed to
       | be written to production- but volume mounts are". I hadn't seen
       | docker crashing hosts (still haven't) - but I'm guessing that was
       | caused by using the storage drivers.
       | 
       | The complaints about their development practices are valid (and
       | haven't really improved), but even then the technology worked
       | well so long as you understood its limitations.
        
       | rubyist5eva wrote:
       | Podman and Kubernetes are like a match made in heaven. Docker was
       | a good first try for most people, but there is so much better
       | technology that exists now.
        
       | mianos wrote:
       | It seems a lot has not changed:
       | 
       | Docker gradually exhausts disk space on BTRFS Open ghost opened
       | this issue on 23 Oct 2016
       | https://github.com/moby/moby/issues/27653
       | 
       | Still comments this week showing it happens still.
        
       | clipradiowallet wrote:
       | I know this article is from 2016...but my feelings about it(the
       | article) are unchanged. Some people do not like new things, and
       | they will blog about it in some form or fashion. Maybe their
       | reasoning is valid, maybe it's not - it doesn't matter.
       | Meanwhile...businesses have, and continue, to pay top $$$ for
       | people that will help them do these things. If you want to
       | collect this $$$, get on board.
       | 
       | In a few years, the things businesses want to pay $$$ will
       | change. New blog articles about "this new stuff is bad!" will
       | appear, and new job postings paying above-market $$$ will appear
       | also. You can either rail on about the bad(or good) changes, and
       | how it's just everything-old-is-new-again....or you can get with
       | the program, and get paid. In another few years, rinse and
       | repeat.
        
         | [deleted]
        
         | jjnoakes wrote:
         | It's all anecdotal. For example I know many folks who make $$$
         | doing the boring old thing because it is reliable, it gets
         | results quickly with low risk, the engineers know the tech
         | inside and out, and not many other folks want to work with
         | "boring tech".
        
           | cpach wrote:
           | Just out of curiosity, what are some examples of boring tech
           | in this case?
        
             | isoskeles wrote:
             | Makes me think of stuff like managing WordPress although
             | I'm not sure if that's an example they had in mind.
        
             | aprdm wrote:
             | Django, Rails, Postgres...
        
               | benburleson wrote:
               | PHP, MySQL
        
             | dijit wrote:
             | Perl, Postgres, Java, Solaris
        
               | cpach wrote:
               | Hm... Which companies still use Solaris (or Illumos)? I
               | don't hear about it very often these days.
        
               | NexRebular wrote:
               | We swapped most of our linux and vmware platforms to
               | Triton and SmartOS and been loving it ever since.
               | Obviously there's still need to run linux in bhyves due
               | to some specific software (e.g. docker) but generally
               | services are on either lx-zones or smartmachines. It just
               | works.
        
               | dijit wrote:
               | I am pretty sure my IT department is transitioning to
               | Nexenta which is an illumos.
               | 
               | Company before last was using some Solaris+nexenta.
               | 
               | Samsung has an enormous smartos deployment that would
               | rival all of Azure. (Based on what I learned about Azure)
        
               | kazen44 wrote:
               | smartos is a great operating system. Kind of a shame the
               | kind of thinking did not catch on yet.
        
       | ChrisArchitect wrote:
       | Anything new since this?
       | 
       | A history of re-submitted, previously discussed posts:
       | 
       | https://news.ycombinator.com/item?id=12872304
        
       | manishsharan wrote:
       | This is a blogpost from 2016 . However if we switch to more
       | recent times, my experience with AWS ECS and Fargate has been
       | fairly boring. There was a learning curve to get it to work with
       | cloudformation, vpcs, iam and load balancer .
        
         | esotericimpl wrote:
         | Agreed, dont see why ECS and fargate isnt used everywhere.
         | 
         | It takes a bit of a learning curve to understand how tasks,
         | task definitions ,clusters, services all fir together but once
         | you do, it's pretty straight forward.
         | 
         | I've ran ECS in production for over 5 years, can count on one
         | hand where we had any issue related to docker or availability,
         | all were based on code updates we didn't test properly.
        
       | jwildeboer wrote:
       | Guy who claims to run systems in the hft space, responsible for
       | millions of trades with high values, can't be bothered to
       | actually pay for support, relies on community and blames everyone
       | but himself for being left alone with his mess. Not sorry.
        
       | user5994461 wrote:
       | Original author from 5 years ago. Surprised to see this here 5
       | years later.
       | 
       | Docker really used to crash a lot back in the days, mostly due to
       | buggy storage drivers. If you were on Debian or CentOS it's very
       | likely that you experienced crashes (though a lot of developers
       | didn't care or didn't understand the reasons the system went
       | unresponsive).
       | 
       | There was notably a new version of Debian (with a newer kernel)
       | published the year after my experience. It's a lot more stable
       | now.
       | 
       | My experience is that by 2018-2019, Docker had mostly vanished as
       | a buzzword, people were only talking about Kubernetes and looking
       | for kubernetes experience.
       | 
       | edit: at that time Docker didn't have a way to clear
       | images/containers, it was added after the article and follow up
       | articles, I will never know if it was a coincidence but I like to
       | think there is a link. I think writing the article was worth it
       | if only for this reason.
        
       | mberning wrote:
       | Docker is the chaos monkey incarnate.
        
       | stevebmark wrote:
       | > Docker is meant to be stateless. Containers have no permanent
       | disk storage, whatever happens is ephemeral and is gone when the
       | container stops.
       | 
       | It's interesting that this misconception made it into a clearly
       | knowledgeable article. Containers have state on the writeable
       | layer that is persisted between container stops and starts.
        
         | plainnoodles wrote:
         | But I think most people and tools consider "containers" to be
         | volatile storage, like RAM. Non-volatile storage would be
         | volumes.
         | 
         | Honestly I think there is a lot to be said for making the
         | writable layer of a container read-only. It makes sure that
         | things like logging, if you care about them, go somewhere safe,
         | or if you don't, get turned off explicitly. And also prevents
         | gotchas like "oops wrote important data to /var/lib/notavolume
         | when I meant to write to /var/lib/therightvolume" that show up
         | at the worst times.
        
         | beermonster wrote:
         | I'm not sure it's a misconception. That's how they're intended
         | to be used. Cattle not pets. If you don't get used to not
         | treating them as throw away you can end up accidentally relying
         | on some state. As you say, the top layer is read/write but that
         | doesn't mean you should be relying on what you write there.
         | Quite the opposite - that state should be somewhere else unless
         | you can afford to lose it.
         | 
         | I usually start mine with ---rm so they're removed on shutdown.
         | 
         | I've seen people apply security updates via 'apt update; apt
         | upgrade' within a running container. Guess what happens when
         | that container is eventually destroyed?
        
       | bfrog wrote:
       | Heh I just ran into an issue the other day with a coworker where
       | Ubuntu had a patched kernel auto update and break everything.
       | Yep, it's a sand castle
        
       | crummybowley wrote:
       | The issue is not docker, the issue is you treat your servers like
       | pets.
       | 
       | Folks need to start building systems that destroy all and re-
       | image fresh. Any other way you are just setting your self up for
       | failure.
        
         | esotericimpl wrote:
         | except the Database, don't re-image the database.
        
       | Theodores wrote:
       | Symfony console broke Magento 2 today. Same story.
        
       | belter wrote:
       | Posted many times before but this is the only one with comments:
       | 
       | https://news.ycombinator.com/item?id=12872304
        
       ___________________________________________________________________
       (page generated 2021-07-27 23:01 UTC)