[HN Gopher] A Pipeline Made of Airbags
___________________________________________________________________
A Pipeline Made of Airbags
Author : packetlost
Score : 184 points
Date : 2024-09-05 14:11 UTC (4 days ago)
(HTM) web link (ferd.ca)
(TXT) w3m dump (ferd.ca)
| swiftcoder wrote:
| It's a real shame that we are steadily losing all the lessons of
| Erlang/SmallTalk/Lisp machines.
| nine_k wrote:
| What are the specific lessons worth preserving, but being lost?
|
| (I assume that "keep an image, it's too costly to rebuild
| everything from version-controlled sources" is not such a
| lesson.)
| igouy wrote:
| Yes specific would be better.
|
| Of course "keep an image" and "version-controlled sources"
| are not mutually exclusive.
|
| https://www.google.com/books/edition/Mastering_ENVY_Develope.
| ..
| swiftcoder wrote:
| The biggest common lesson is being able to
| inspect/interrogate/modify the running system. Debugging
| distributed system failures purely based on logs/metrics
| output is not a particularly pleasant job, and most immutable
| software stacks don't offer a log more than that.
|
| However, for Erlang specifically, the lesson is pushing
| statelessness as far down the system as you possibly can.
| Stateless immutable containers that we can kill at will are
| great - but what if we could do the same thing at the request
| handler level?
| dools wrote:
| Ha, I recently wrote a system that does more or less the exact
| same thing for pushing updates to IoT devices. I can tell the
| system to update particular nodes to a given git commit, then I
| can roll it out to a handful of devices, then I can say "update
| all of them" but wait 30 seconds in between each update and so
| on.
| jamesblonde wrote:
| Joe would be turning in his grave if he knew where industry are
| right now on the k8s love-bomb.
| p_l wrote:
| The real issue is that the languages most use do not support,
| reliably, another approach.
|
| k8s is not the issue. Worse Is Better languages and runtimes
| are.
| Sebb767 wrote:
| The big thing about immutable infrastructure is that it is
| reproducible. I've seen both worlds and I do appreciate the
| simplicity and quickness of the upgrade solution presented in the
| post. The problem with this manual approach is that it is quite
| easy to end up with a few undocumented fixes/upgrades/changes to
| your pet server and suddenly upgrading or even just rebooting the
| servers/app becomes something scary.
|
| Now, for immutable infrastructure, you have a whole different set
| of problems. All your changes are nicely logged in git, but to
| deploy you need to rebuild containers and roll them out over a
| cluster. To do this smoothly, the cluster also needs to have some
| kind of high availability setup, making everything quite complex
| and, in the end, you wasted minutes to hours of compute for
| something that a pet setup can do in a few seconds. But you can
| be sure that a server going down or a reboot are completely safe
| operations.
|
| What works for you really depends on your situation (team size,
| importance of the app, etc.), but both approaches do have their
| uses and reducing the immutable infra approach to "people run k8s
| because it's hip" misses the point.
| swiftcoder wrote:
| > undocumented fixes/upgrades/changes to your pet server and
| suddenly upgrading or even just rebooting the servers/app
| becomes something scary.
|
| You can mostly prevent this by mandating that fresh nodes come
| up regularly. Have a management process that keeps a rolling
| window of ~5% of your fleet in connection-drain, and replaces
| the nodes as soon as they hit low-digits of connections.
|
| Whole fleet is replaced every ~3 weeks, you learn about any
| deployment/startup failures within one day of new code landing
| in trunk, minimal disruption to client connections.
| turtlebits wrote:
| This doesn't prevent anything, it just schedules possible
| breakages because your infra isn't 100% immutable.
|
| IME, this doesn't work because companies won't implement/will
| deprioritize any infra changes that impact the development
| cycle.
| swiftcoder wrote:
| It's not so very different to dropping your PR into any
| other automated-CI/CD-all-the-way-to-prod pipeline.
|
| Albeit maybe a little easier to justify to management that
| you are dropping everything to fix the breakage when your
| CI/CD pipeline stops.
| packetlost wrote:
| > Now, for immutable infrastructure, you have a whole different
| set of problems. All your changes are nicely logged in git, but
| to deploy you need to rebuild containers and roll them out over
| a cluster.
|
| The real issue is that it effectively forces externalizing
| nearly all state. On the surface, this seems like it's just a
| good thing, but if you think about the limitations and
| complexity it creates, it starts seeming less unquestionably
| good. Sometimes that complexity is warranted, but very
| frequently it is not.
|
| That being said, I think modifying code is a running system
| without pretty strict procedures/control around it is...
| dangerous. I've seen hotfixex get dropped/forgotten because it
| only existed on running system and not in source control more
| than a couple of times.
| toast0 wrote:
| > The big thing about immutable infrastructure is that it is
| reproducible. I've seen both worlds and I do appreciate the
| simplicity and quickness of the upgrade solution presented in
| the post. The problem with this manual approach is that it is
| quite easy to end up with a few undocumented
| fixes/upgrades/changes to your pet server and suddenly
| upgrading or even just rebooting the servers/app becomes
| something scary.
|
| The point is not really automation vs manual. Hot loading is
| amenable to automation too. The point is really that when you
| replace immutable servers with state with another set, there's
| a lengthy process to migrate the state. If you can mutate the
| servers, you save a lot of wall clock time, a lot of server cpu
| time, and a lot of client cpu time.
|
| I deal with this issue at my current job. I used to work in
| Erlang and it took a couple minutes to push most changes to
| production. Once I was ready to move to production, it was less
| than 30 minutes to prepare, push, load, verify and move on with
| my life. I could push follow ups right away, or wrap up several
| issues, one at a time, in a single day. Coming from PHP was
| pretty similar, with caveats about careful replacement of files
| (to avoid serving half a PHP file) and PHP caching.
|
| Now I work with Rust, terraform, and GCP; it takes about 12
| minutes for CI to build production builds, it takes terraform
| at least 15 minutes to build a new production version
| deployment, and several more minutes for it to actually finish
| coming up, only then can I _start_ to move traffic, and the
| traffic takes a long time to fully move, so I have to come back
| the next day to tear down the old version. I won 't typically
| push a follow up right away, because then I've got three
| versions running. I can't push multiple times a day. If I'm
| working many small issues, everything has to be batched into
| one release, or I'll be spending way too much of my time doing
| deploys, and the deployment process will be holding back
| progress.
| fiddlerwoaroof wrote:
| The funny thing here is BEAM is "immutable infrastructure as a
| programming language environment" which, to me, is strictly
| superior to the current disjunction between "infrastructure
| configuration" and "application code".
|
| Erlang defaults to pure code and every actor is like a little
| microservice with good tooling for coordination. There are
| mutable aspects like a distributed database, but nothing all
| that different from the mutable state that exists in every
| "immutable infrastructure" deployment I've seen.
| wpietri wrote:
| What a lovely and well-written piece. I think the dev vs ops
| divide has caused so many problems like this. We just write
| systems differently when we have to run things versus when it
| gets thrown over the wall to other people to deal with.
|
| Maybe that sounds like I'm blaming developers, but I'm not. I
| think this is rooted in management theories of work. They
| optimize for simple top-down understanding, not cross-functional
| collaboration. If people are rewarded for keeping to over-
| optimistic managerial plan (or keeping up "velocity"), then
| they're mostly going to throw things over the wall.
| from-nibly wrote:
| I get all of these complaints. Why do I also have to be an
| infrastrucutre engineer? And why is my infrastructure not bespoke
| enough to do this weird thing I want to do? Why cant I use 5
| different languages at this 30 person company?
|
| The thing about immutable infrastructure is that its
| straightforward. There are a set of assumptions others can make
| about your app.
|
| Immutable infrastructure is boring. Deployments are uncreative.
| Thats a good thing.
|
| Repeat after me, "my creative energy should be spent on my
| customers"
| ActionHank wrote:
| I think theres more to it than that.
|
| You are correct for 90% of the cases, but this also kills
| innovation.
| from-nibly wrote:
| If you want to do insfrastructure innovation you are more
| than welcome to. There are lots of engineers dedicated to it.
| Its also not that hard to go from software engineer to
| infrastructure engineer thus bringing your experiece and
| unique perspective. But working at a SMB or startup (the 90%)
| doesn't justify innovation for innovations sake. 1 acre of
| corn doesn't justify inventing the combine.
| andiareso wrote:
| I love that last line. That's the best analogy I've heard.
| marcosdumay wrote:
| The way to enable fast ops evolution is by creating a small
| bubble with either a mutable facade or the immutability
| restrictions disabled, and go innovate there. Once you are
| ready, you can port the changes to the overall environment.
|
| And the way to do the thing the article complains about is
| with partial deployments.
|
| Both of those are much better behaved on a large-scale ops
| than the small-scale counterparts. K8s kinda "supports" both
| of them, but like almost everything in k8s, it's more work,
| and there are many foot-guns.
| jtbayly wrote:
| How is downtime beneficial to the customer?
| dvdkon wrote:
| Nobody wants downtime, but it's easy to spend too much effort
| on avoiding it, taking time from actually important
| development. Plenty of customers don't mind occasional
| downtime, and it can mean the system is simpler and they get
| features faster.
| fifilura wrote:
| I have been there! Duly upvoted!
|
| Too much power to architects worsen the situation because
| they both have formal responsibility to keep the downtime
| low, but they are also appointed to finding technical
| solutions rather than sometimes technically mundane product
| improvements.
|
| Also in the worst case, this solution becomes so cool that
| it attracts the best developers internally, away from
| building products.
| phkahler wrote:
| >> Repeat after me, "my creative energy should be spent on my
| customers"
|
| I agree with you. But from the blog:
|
| >> "Product requirements were changed to play with the adopted
| tech."
|
| That's when things may have gone too far.
| schmidtleonard wrote:
| It's "weird" to want low downtime?
|
| The general nastiness of updates is one of the largest customer
| friction points in many systems, but creative energy should be
| directed away from fixing it?
|
| Gross.
| roland35 wrote:
| I think like many things in engineering, it depends?
|
| I'm sure most applications in life benefit from accepting a
| little downtime in order to simplify development. But there
| are certainly scenarios where we can use some "high quality
| engineering" to make downtime as low as possible.
| aziaziazi wrote:
| Don't forget startup innovation culture: everything has to be
| disturbed. Encourage with tax exemptions for << innovative >>
| jobs and you'll have cohorts of engenders reinventing wheels
| from infra to UX in a glorified innovative modern "industry".
| srpablo wrote:
| > Repeat after me, "my creative energy should be spent on my
| customers"
|
| "I should save my energy, so I won't exercise."
|
| "I should save money, so I won't deploy it towards
| investments."
|
| I don't think "creativity" is a zero-sum, finite resource; I
| think it's possible to generate more by spending it
| intelligently. And he pointed out how moving towards immutable
| infrastructure, while more "standard," directly hurt customers
| (the engineering team lost deployment velocity and
| functionality), so it's especially weird to end your comment
| the way you did.
|
| To say "immutable infrastructure is just more straightforward"
| so definitively, from the limited information you have, is just
| you stating your biases. The stateful system he describes the
| company moving away from may also have been pretty
| "straightforward" and "boring," just with different fixed
| points. Beauty in the eye of the beholder and all that.
| thih9 wrote:
| Would a middle ground be possible? E.g. by default use stateless
| containers, but for certain stacks or popular app frameworks
| support automated stateful deploys?
| mononcqc wrote:
| Two years after writing A Pipeline Made of Airbags, I ended up
| prototyping a minimal way to do hot code loading from
| kubernetes instances by using generic images and using a
| sidecar to load pre-built software releases from a manifest in
| a way that worked both for cold restarts and for hot code
| loading: https://ferd.ca/my-favorite-erlang-container.html
|
| It's more or less as close to a middle-ground as I could
| imagine at the time.
| specialist wrote:
| Exactly right. OC hits all the points. Fine granularity of
| failure, (re)warming caches, faster iteration by reducing cost of
| changes, etc.
|
| I too lived this. Albeit with "poor man's Erlang" (aka Java). Our
| customers were hospitals, ERs, etc. Our stack could not go down.
| And it had to be correct. So sometimes that means manual human
| intervention.
|
| There's another critial distinction, missed by the "whole
| freaking docker-meets-kubernetes" herd:
|
| Our deployed systems were "pets". Whereas k8s is meant for
| "cattle".
|
| Different tools for different use cases.
| turtlebits wrote:
| A well crafted app is great. It is also complex and generally
| only maintainable/supportable by those who built it.
| kmoser wrote:
| If the original devs wrote good documentation then pretty much
| anybody can maintain it easily.
| michaelteter wrote:
| A truly well crafted app requires very little maintenance or
| support, and that maintenance/support has already been throught
| through and made easy to learn and do.
|
| These things are possible, and they fit economically somewhere
| in the 3-5 year maturity of a system. Years 1-3 are usually
| necessarily focused on features and releases, but far too many
| orgs just stop at that point and aren't willing to invest that
| extra year or two in work that will save time and money for
| many years to follow.
|
| I believe this resistance is due to the short-sightedness of
| buyouts/IPOs or simply leadership churn.
| anothername12 wrote:
| K8s is a Google thing for Google problems. It's just not needed
| for most software delivery.
|
| Edit: meanwhile I'm waiting 45 minutes on average for my most
| recent, single-line change, PR to roll out to k8s cluster at
| $massive_company that totally does this like everyone else.
| fragmede wrote:
| Google doesn't use Kubernetes internally. Kubernetes is a
| simplified version of Borg. If it's taking 45 minutes to deploy
| a change, that's on your company's platform team, not Google.
___________________________________________________________________
(page generated 2024-09-09 23:01 UTC)