[HN Gopher] Elixir/Erlang Hot Swapping Code (2016)
       ___________________________________________________________________
        
       Elixir/Erlang Hot Swapping Code (2016)
        
       Author : justinludwig
       Score  : 213 points
       Date   : 2024-12-12 23:16 UTC (23 hours ago)
        
 (HTM) web link (kennyballou.com)
 (TXT) w3m dump (kennyballou.com)
        
       | omertoast wrote:
       | i'm so sick of this DevOps bullshit i wonder if there is an
       | alternative language that you can hot swap code and do all the
       | black magic stuff while keeping the reliability and performance
       | like Rust.
        
         | saurik wrote:
         | I am shocked at the idea of anyone implying that Erlang is
         | "unreliable"... it's entire reason for existence was to step up
         | the game on reliability.
        
           | zanderwohl wrote:
           | I agree. I don't know much about Erlang but what I've heard
           | seems to indicate it's used for high-uptime systems that
           | handle errors well.
        
             | Muromec wrote:
             | I suspect the causality is reversed. When you have a good
             | designed telecom system, then spmething shaped as erlang
             | happens to be a good tool to create to implement it. The
             | tool than keeps you committed to the design choices you
             | made by being restrictive enough.
        
           | Muromec wrote:
           | I heard the quote that some 50% of the mobile traffic is
           | handled by erlang. Somehow the other 50% seems to be doing
           | just fine (except the usual shitshow on the inside that
           | sofware is everywhere all the time).
        
             | simoncion wrote:
             | > I heard the quote that some 50% of the mobile traffic is
             | handled by erlang.
             | 
             | Given that you can implement OTP in any language (albeit
             | with varying degrees of difficulty), that's not surprising.
             | 
             | The thing to remember is that Erlang was first used in
             | production in like _1986_. nearly forty years is more than
             | enough time for the biggest good ideas in Erlang to
             | percolate out into non-BEAM systems.
        
           | LAC-Tech wrote:
           | As someone who is now in the rust world and very very
           | sympathetic to the Erlang world... you both probably mean
           | completely different things when you say "reliable". The
           | contexts are just world apart.
        
           | dcsommer wrote:
           | GP seems to be implying hot swapping, not Erlang, is
           | unreliable. To which, from my experience using it in Erlang,
           | I heartily agree is fraught. Inconsistent state across nodes
           | is much harder to reason about. When you _must_ ensure
           | consistency, hot swapping is reckless, especially as org size
           | and product complexity increases.
           | 
           | Leave hot loading to local/development environments, not
           | production deploys.
           | 
           | Loading configs on the fly can also have some of this risk,
           | but it is much easier to reason about typically.
        
         | sergiotapia wrote:
         | we had this wonderful thing in PHP where you would just save a
         | .php file and bada bing it was LIVE.
         | 
         | what happened? :D
        
           | thanksgiving wrote:
           | They took away our access after one too many outages :yay:
        
             | Muromec wrote:
             | Sounds like what happened with hot upgrade privileges in
             | some erlang shops too.
        
               | thanksgiving wrote:
               | I've worked at a smaller project where it worked just
               | fine. The key I think is to have the project be small
               | enough to be able to fit in my head.
               | 
               | That and have an identical test server. I used to make
               | changes locally, test it locally, then make the same
               | change on test, have someone else look at it, get a lgtm,
               | and do the same thing on the production machine. It
               | sounds like a lot of steps but it is pretty
               | straightforward.
               | 
               | Sadly, it probably doesn't work with bigger teams or more
               | complicated projects.
        
               | Muromec wrote:
               | A lot of things work on a small project where everybody
               | knows what they are doing and can keep all the
               | dependencies and data structure in their heads. Logging
               | to the production server and editing PHP file right where
               | apache is looking at it and fixing stuff with sql
               | commands in the production database to address customer
               | complaints.
               | 
               | The big question is what everyone else should be doing
               | that survives the touch with unevenly distributed amounts
               | of technical expertise and amount fucks given about
               | result.
        
           | cardanome wrote:
           | We still have that and it is awesome. PHP is better than
           | ever.
           | 
           | In serious emergencies I even sometimes end up quickly SSH-en
           | to a prod server and changing the file directly. Which is
           | kind of horrifying but hey customer is happy it got fixed
           | immediately and I get to relax and take my time to write a
           | proper fix. Beats sweating and watching the pipeline build
           | and asking around for people to approve my merge request.
        
         | Muromec wrote:
         | The black magic comes at the cost of not having one streamlined
         | procedure to release stuff.
         | 
         | And to make the black magic work you have to engage with it.
         | Most of the time people don't even bother write to a proper
         | import.meta.hot.accept thingy in javascript. Developers simply
         | hate chores, which is evident by not willing to write proper
         | unit tests (despite knowing that tests work) or writing just
         | enough to let the coverage cop pass ship the build.
         | 
         | A dedicated small team running something like whatsup? Sure,
         | look into the arcane and let it look back at you (although high
         | insight makes one more susceptible to madness you know). But
         | most of the time you will do better job with PHP in a stupid
         | restartable box behind seven load balancing proxies.
        
           | yetihehe wrote:
           | > The black magic comes at the cost of not having one
           | streamlined procedure to release stuff.
           | 
           | You can also have a streamlined procedure to release stuff.
           | Most changes in my erlang based system consist of "push to
           | staging branch, click to deploy and test, pull to master,
           | click deploy button". Can't be simpler than that. Most
           | changes in such systems are also pretty simple. When you need
           | to add something big, typically not many things are dependent
           | on that, so deploy is also pretty simple.
           | 
           | > But most of the time you will do better job with PHP in a
           | stupid restartable box behind seven load balancing proxies.
           | 
           | Yeah, we talk here about more complicated things here. If you
           | have something simple, you don't need to use erlang, `python
           | -m http.server` will be even simpler than your php in stupid
           | restartable box, because you don't need a special box, just
           | one small command.
        
             | Muromec wrote:
             | Do you do 100% of deployments using hot reload? If yes,
             | maybe you should share the recipy with everybody else,
             | since consensus seems to recommend the opposite.
             | 
             | At the very least you will have a different procedure to
             | upgrade the erlang itself, right?
             | 
             | >If you have something simple, you don't need to use erlang
             | 
             | I think on a spectrum of difficult things there is an area
             | between hosting static file on rpi at home and running
             | massivele distributed system full of long running stateful
             | processes.
        
               | yetihehe wrote:
               | > Do you do 100% of deployments using hot reload?
               | 
               | About 99%. We need to restart servers maybe once a year.
               | Maybe next year we will finally migrate from erlang 21 to
               | latest. Most "stopping everything" deployments take max
               | 1s of downtime, like this month when we needed to upgrade
               | postgresql database by switching it over to new machine,
               | having zero downtime here would take a little longer to
               | get the consistency, but we could spare a second to make
               | it much simpler task on database side (it was a restart
               | of database module, rest of erlang server was unaffected
               | and clients were not disconnected). Otherwise, most
               | deployments are not visible as disconnections, we have a
               | lot long-running connections.
               | 
               | > At the very least you will have a different procedure
               | to upgrade the erlang itself, right?
               | 
               | Yes.
               | 
               | > I think on a spectrum of difficult things there is an
               | area between hosting static file on rpi at home and
               | running massivele distributed system full of long running
               | stateful processes.
               | 
               | Is PHP good for both? I think PHP is NOT good for long
               | running stateful processes, but I didn't use it in
               | 10years. And it probably is not needed for static files.
        
               | Muromec wrote:
               | Sounds like you are doing cool stuff the cool way, which
               | can't be said about everyone.
               | 
               | >Is PHP good for both? I think PHP is NOT good for long
               | running stateful processes, but I didn't use it in
               | 10years. And it probably is not needed for static files.
               | 
               | No, of course it isn't. I didn't touch it with a long
               | pole for even more years and don't even want to. And
               | would not argue to do what you are doing in anything that
               | isn't running on BEAM/OTP.
               | 
               | The point I'm trying to make is -- for most of the web
               | stuff, making transactional response from a short-lived
               | http handler is good enough and you can do it even in PHP
               | (which is not a praise for PHP as a great tool I enjoy
               | using, but the opposite). It would not the most optimal
               | solution by any metric, not the most elegant or
               | sophisticated either, but it's survivable, it's the
               | lowest bidder.
        
         | lamuswawir wrote:
         | Erlang is built for reliability. They're chasing nine nines.
         | Everything about the BEAM is built to emphasize that, the
         | design choices, the documentation, the recommended practices.
         | 
         | Erlang is not very fast, but that's not what it was built for.
        
           | Muromec wrote:
           | Is it really beam or just otp? Sure, beam gives you
           | processes, network-transparent send, immutable structures and
           | linking-monitoring thingy on top, but is what makes it good
           | to shoot for nines?
           | 
           | I suspect the aura of mistycism around yet another jit vm is
           | not that warranted
        
             | jerf wrote:
             | It is reasonable to conceive of Erlang as encompassing OTP.
             | Perhaps somewhere in the world there is some developer out
             | there hot on Erlang but just hates OTP and doesn't use it,
             | but they must be fairly frustrated at how hard it is to
             | keep OTP out of their code base if they ever need any
             | libraries.
             | 
             | Restarting is arguably the definitive thing that makes
             | Erlang stack the 9s out past what most languages and
             | runtimes can achieve... the thing is, it's more complicated
             | to use in practice than a web page like this makes it look,
             | and it's beyond what most products need. Few applications
             | _need_ the fifth or sixth or seventh nine, and it gets to
             | the point that you can 't have it anyhow because your
             | Erlang cluster, no matter how well distributed, itself
             | probably doesn't have 99.99999 availability, and your users
             | probably don't have 99.99999 availability on their own
             | network connection.
             | 
             | It's not impossibly complicated, but it is the sort of
             | thing where you if you want to use the feature you need to
             | have it sort of constantly in mind as you write the rest of
             | your system, and it's a lot easier even in Erlang to just
             | design the system to take entire nodes down and bring them
             | back up, if not the entire cluster down, rather than fuss
             | with hot reloads. I wish Erlang advocates would be more
             | upfront about pitching this as an interesting niche
             | feature, but not really a reason to consider Erlang. Unless
             | you _absolutely need it_ , in which case it can indeed be
             | the thing that puts it on the short list of choices... but
             | as evidenced by the vast, vast majority of software and
             | systems not being on Erlang and managing to get along,
             | there aren't really that many things that _need_ it.
        
         | gf000 wrote:
         | It's not well known, but the JVM has very good hot reload
         | support, and is a very reliable and performant platform.
        
       | amelius wrote:
       | Does this hot swapping also work for closures?
        
         | Muromec wrote:
         | Erlang doesn't have closures, because erlang doesn't have
         | variables. The compiler simply desugars it to partially applied
         | function referenced by it's name (yes, those inline functions
         | in fact have names).
         | 
         | If you have something_function, then first inline function used
         | in it will be -something_function/1-fun-0- with zero being the
         | index and captured variable being another argument. Now if you
         | will change the host function to have more inlines _before_ it,
         | the indexing will drift.
         | 
         | So I would expect the body of inline function will still be
         | resolved from the old version of the module, but I didn't
         | actually try.
         | 
         | Source: I did run erlc -S at least once.
         | 
         | Add: now thinking of it, will the call to a local function from
         | the old version of the module ever escape into the new one
         | without first returning back to gen_server and letting it call
         | the new version? Another comment says that calls withing the
         | module never do, so the assumption was correct.
        
           | bitwalker wrote:
           | Erlang absolutely has closures, you are mistaken. What you
           | are referring to are "function captures", which bind a
           | function reference as a value, and there is no environment to
           | close over with those. However, you can define closures which
           | as you'd expect, can close over bindings in the environment
           | in which the closure is defined.
           | 
           | The interaction between hot reloads and function captures in
           | general is a bit subtle, particularly when it comes to how a
           | function is captured. A fully qualified function capture is
           | reloaded normally, but a capture using just a local name
           | refers to the version of the module at the time it was
           | captured, but is force upgraded after two consecutive hot
           | upgrades, as only two versions of a module are allowed to
           | exist at the same time. For this reason, you have to be
           | careful about how you capture functions, depending on the
           | semantics you want.
        
             | toast0 wrote:
             | > but is force upgraded after two consecutive hot upgrades,
             | as only two versions of a module are allowed to exist at
             | the same time.
             | 
             | Force upgraded is maybe misleading. When a module is loaded
             | for the 3rd time, any processes that still have the first
             | version in their stack are killed. That may result in a
             | supervisor restarting them with new code, if they're
             | supervised.
        
               | bitwalker wrote:
               | Ah right, good point - I was trying to remember the exact
               | behavior, but couldn't recall if an error is raised (and
               | when), or if the underlying module is just replaced and
               | "jesus take the wheel" after that.
        
             | Muromec wrote:
             | What does is it look like? I was talking about this thing:
             | Val = 1, SumFun = fun(X) -> X + Val end, SumFun(2).
             | 
             | It looks like you define arity 1 function that captures
             | Val, while in fact you define arity 2 function and bind 1
             | as a first argument. Since you can't redefine Val anyway,
             | it's as good as a closure, but technically it doesn't
             | capture the environment.
             | 
             | Maybe I'm mistaken and there is another way to express it?
        
               | bitwalker wrote:
               | The example you've given here does not work the way you
               | think it does. I would agree however that the mechanics
               | of closure environments is simpler in Erlang due to the
               | fact that values are immutable, as opposed to closures in
               | other languages where mutability must be accounted for.
               | 
               | I would also note that, for the example you've given, the
               | compiler _could_ constant-fold the whole thing away, but
               | for the sake of argument, let's assume that `Val` is an
               | argument to the current function in which `SumFun` is
               | defined, and so the compiler cannot reason about the
               | actual value that was bound.
               | 
               | The closure will be constructed at the point it is
               | captured, using the `make_fun` BIF, with a given number
               | of free var slots (in this case, 1 for the capture of
               | `Val`). `Val` is written to the slot in the closure
               | environment at this time as well. See the implementation
               | of the BIF [here](https://github.com/erlang/otp/blob/6cef
               | a05a2a977864150908feb...) if you are curious.
               | 
               | At runtime, when the closure is executed, the underlying
               | function receives the closure environment, from which it
               | loads any free vars. In my own Erlang compiler, the
               | closure environment was given via pointer, as the first
               | argument to the function, and then instructions were
               | emitted to load free variables relative to that pointer.
               | I believe BEAM does the same thing, but it may differ in
               | the specific details, but conceptually that is how it
               | works.
               | 
               | The compiler obviously must generate a new free function
               | definition for closures with free variables (hence the
               | name of the function you see in the interactive shell, or
               | in debug output). The captured MFA of the closure is this
               | generated function. The runtime distinguishes between the
               | two types of closures (function captures vs actual
               | closures) based on the metadata of the func value itself.
               | 
               | Like I mentioned near the top, it's worth bearing in mind
               | that the compiler can also do quite a bit of
               | simplification and optimization during compilation to
               | BEAM - so there may be cases where you end up with a
               | function capture instead of a closure, because the
               | compiler was able to remove the need for the free
               | variable in cases like your example, but I can't recall
               | what erlc specifically does and does not do in that
               | regard.
        
               | Muromec wrote:
               | > let's assume that `Val` is an argument to the current
               | function in which `SumFun` is defined, and so the
               | compiler cannot reason about the actual value that was
               | bound.
               | 
               | That was exactly the case I was talking about, because
               | otherwise there is no need to even make arity 2 function.
               | If the value is known at compile time, the constant is
               | embedded into the body of inlined function.
               | 
               | >At runtime, when the closure is executed, the underlying
               | function receives the closure environment, from which it
               | loads any free vars.
               | 
               | To my understanding, no it doesn't, as the value is
               | resolved when the function pointed is created, not when
               | the underlying function executes, which the code you
               | linked shows too. I know it uses the "env" as a structure
               | field, but it's partial application, not the actual
               | closure which has access to parent scope. Consider two
               | counter examples in python:                   for x in
               | range(1,10): ret.append(partial(lambda y: y*2, x)) #
               | that's what erlang does              for x in
               | range(1,10): ret.append(partial(x, lambda y: y*2)) #
               | that's an actual closure, as all lambdas will return 18
               | because x is captured from the parent context
               | 
               | But then again, it doesn't matter since variables are
               | assigned only once.
               | 
               | >Like I mentioned near the top, it's worth bearing in
               | mind that the compiler can also do quite a bit of
               | simplification and optimization during compilation to
               | BEAM - so there may be cases where you end up with a
               | function capture instead of a closure, because the
               | compiler was able to remove the need for the free
               | variable in cases like your example, but I can't recall
               | what erlc specifically does and does not do in that
               | regard.
               | 
               | I was looking into it a week ago, and erlc does what I
               | described when it can't figure out the constant at
               | compile time.
               | 
               | add: If we are at it, BEAM doesn't even know about
               | variables, only values and registers anyway, so it has
               | nothing to capture anyway.
        
       | slt2021 wrote:
       | hot reload of code is nothing new nowadays, but people use it
       | only locally during development for REPL like development style.
       | 
       | in actual production, people prefer to operate at the container
       | level + traffic management, and dont touch anything deeper than
       | the container
        
         | foota wrote:
         | Amusingly, this reminds me sort of about the story of a person
         | who joins a new company only to discover that their programming
         | framework is intricately linked to their version control
         | system.
        
         | diath wrote:
         | > in actual production, people prefer to operate at the
         | container level + traffic management, and dont touch anything
         | deeper than the container
         | 
         | How do you think video games like World of Warcraft or Path of
         | Exile deploy restartless hotfixes to millions of concurrent
         | players without killing instances? I don't think it's a matter
         | of "prefer to", it's a matter of "can we completely disrupt the
         | service for users and potentially lose some of the state"? Even
         | if that disruption lasts a mere millisecond, in some context
         | it's not acceptable.
        
           | Muromec wrote:
           | Games do in fact have downtimes on major releases and you
           | have to restart the client too before connecting.
        
             | diath wrote:
             | For major patches/backend changes that require recompiling
             | - yes, for gameplay tweaks/hotfixes - no, hot reloading is
             | preferable where possible.
        
           | qudat wrote:
           | WoW restarts every week. Not sure that's better than zero
           | downtime deployments
        
             | diath wrote:
             | That's just how it works when your backend is a hybrid
             | software that utilizes a low-level compiled programming
             | language and a high-level language that runs in its own VM.
             | You can use the latter for gameplay features, and can
             | hotfix on the go, and then for core changes you have to
             | restart, which is also why WoW will hotfix the latter on
             | the go, usually every day on an expansion launch, whereas
             | they defer the bulk of backend changes for the next weekly
             | restart without continuously disrupting the game for
             | players.
        
           | AnotherGoodName wrote:
           | That's a very big assumption that they do code hotpatching.
           | 
           | It would seem far more likely they seperate the stateful
           | (database) and stateless layers (game logic) and they just
           | spin up a new instance of the stateless server layer behind a
           | reverse proxy and spin down the old instance. It's basically
           | how all websites update without down time.
        
             | diath wrote:
             | A website that just proxies to another server does not need
             | to do much to restore the previous state to make it look
             | seamless to a user, the client will just perform another
             | GET request that triggers a few SELECT queries, it's far
             | more complex in the context of a video game.
        
           | Thaxll wrote:
           | Most of those hot fixes are data driven as in database
           | updates. Gameserver just reload the data, the binary itself
           | is not touch.
           | 
           | I've never seen a game where they hot reload code inside the
           | gameserver itself, it's usually a downtime or rolling
           | updates.
        
             | diath wrote:
             | > Most of those hot fixes are data driven as in database
             | updates. Gameserver just reload the data, the binary itself
             | is not touch.
             | 
             | And since the data from the disk/database (whether it's a
             | Lua table, XML structure, JSON object, or a query) is then
             | representend as a low-level data structure, that's
             | essentially what hot reloading is - you deserialize the new
             | data and hot-swap the pointers in the simplest terms.
             | 
             | >I've never seen a game where they hot reload code inside
             | the gameserver itself, it's usually a downtime or rolling
             | updates.
             | 
             | In World of Warcraft, you will literally have bosses
             | despawn mid-fight and spawn again with new stats or you
             | will see their health values update mid-fight, all without
             | the players getting interrupted, their spell state getting
             | desynced, or spawned items in the instance disappearing.
             | This can be observed with the release of every single new
             | raid on live streams as Blizzard employees are watching the
             | world first attempts and tweaking/tuning the fights as they
             | happen.
             | 
             | EDIT: Here's such an example, for the majority of the fight
             | the extra tank could keep a spawned monster away from the
             | boss, then mid-fight, the monster suddenly started one-
             | shotting the tank, without the disruption of the instance,
             | this was Blizzard's way of addressing a cheese strat to
             | force the players to do the right as designed:
             | https://www.youtube.com/watch?v=7gMm60BXAjU
        
               | Thaxll wrote:
               | Yes but again it's not hot swapping code as in Erlang,
               | the C++ code is unchanged, they just change some xml
               | somewhere.
               | 
               | By your definition every CRUD app have hot reloading
               | capabilities.
        
               | diath wrote:
               | > Yes but again it's not hot swapping code as in Erlang,
               | the C++ code is unchanged, they just change some xml
               | somewhere.
               | 
               | Right, not on the C++ side, but on the Lua side that WoW
               | uses - you load the new gameplay code that pulls the new
               | data, and override the globals with new functions.
        
               | dgfitz wrote:
               | Why does it matter the language? C++ built in the tooling
               | to allow hot swapping, no?
        
               | Thaxll wrote:
               | C++ because 99% of the major games are built in that
               | language.
        
             | tomjakubowski wrote:
             | LPMUDs ran almost entirely on hot reloadable code written
             | in a quirky language called LPC, which later inspired the
             | Pike language.
             | 
             | I believe that only the "driver" code, which handles system
             | calls and hosts the LPC interpreter and is written in C,
             | couldn't be hot reloaded; everything else running in the
             | game could be reloaded without restarting the server.
             | 
             | I'd guess in the modern day, there would be some games
             | where Lua scripts can be hot-reloaded like any other data,
             | from a database or object store.
        
               | cess11 wrote:
               | It's a rather fun language and programming environment,
               | I'd recommend playing around with it over doing AoC.
        
           | swat535 wrote:
           | In addition to what most people said, many other game servers
           | just simply announce upcoming maintenance work and take the
           | services offline until the patches are deployed.
           | 
           | This way they can properly test everything and rollback any
           | potential fixes if required.. even banking systems regularly
           | goes down for maintenance.
        
         | AlphaWeaver wrote:
         | People may "prefer" simply replacing containers, but as some
         | siblings mention, some applications might require more
         | reliability guarantees.
         | 
         | Erlang was originally designed for implementing telephony
         | protocols, where interrupted phone calls were not an acceptable
         | side effect of application updates.
        
           | AnotherGoodName wrote:
           | FWIW as soon as you start using containers you should be able
           | to handle those containers spinning up/down. Pretty much the
           | whole point of containers. At which point you don't need to
           | bother with code hot swapping since you already have a
           | mechanism for newer containers to spin up while older ones
           | spin down.
           | 
           | The sibling post "that's how they update without downtime" is
           | super naive. It is absolutely not how they do it.
        
             | Muromec wrote:
             | That's kinda what erlang does, just on a different level.
             | Your docker and your load balancer are both inside your
             | app.
        
               | simoncion wrote:
               | If we to wedge how Erlang does hot code swapping into a
               | container metaphor, then to get what Erlang does, you'd
               | need to have a container per function call.
               | 
               | Given that it would be absurdly wasteful to use OS
               | processes in containers to clone Erlang's code reload
               | system, AnotherGoodName might take ten minute to watch
               | Erlang: The Movie to get a better sense of the
               | capabilities of that system. The movie is available from
               | many places, including archive.org.
        
               | Muromec wrote:
               | >If we to wedge how Erlang does hot code swapping into a
               | container metaphor, then to get what Erlang does, you'd
               | need to have a container per function call.
               | 
               | You have a container that responds to HTPP requests
               | sitting behind a load balancer, then you spawn a new
               | container and tell load balancer to redirect calls to the
               | new one. From the point of view of whoever is calling the
               | load balancer you have hot swapping. You may even
               | separate containers into logical groups and call it
               | microservices architecture. Or you can define a process
               | as something having qualified name and a mailbox and is
               | sending messages to other processes.
               | 
               | Now reasonable people may disagree about what's wasteful,
               | but the market seems to tolerate places where adding a
               | checkbox to a form is a half a year process involving
               | five different departments and the market can't be wrong.
        
         | aeturnum wrote:
         | I work at a company that deploys Elixir/Erlang and while we do
         | /prefer/ to push a fully tested build in a new container,
         | sometimes things get nasty and we need to console in and re-
         | define a module in production. It's not a "best practice" but
         | it stems the bleeding while the best practice is going though
         | its test suite.
        
         | simoncion wrote:
         | > in actual production, people prefer to operate at the
         | container level + traffic management, and dont touch anything
         | deeper than the container
         | 
         | Fred Hebert (and many of the folks he has worked with) do not
         | operate that way: <https://ferd.ca/a-pipeline-made-of-
         | airbags.html>
         | 
         | One nice quote (out of many) from the article:
         | 
         | > The thing that stateless containers and kubernetes do is
         | handle that base case of "when a thing is wrong, replace it and
         | get back to a good state." The thing it does not easily let you
         | do is "and then start iterating to get better and better at not
         | losing all your state and recuperating fast".
         | 
         | (And if one wants to argue with that quote, please read the
         | entire essay first. There's important context that's relevant
         | to fully understanding Hebert's opinion here.)
        
         | toast0 wrote:
         | > in actual production, people prefer to operate at the
         | container level + traffic management, and dont touch anything
         | deeper than the container
         | 
         | I mean, this seems to be "best practices" these days, but I
         | certainly don't prefer it. At least the orchestration I use is
         | amazingly slow. And cold loading changes is terrible for long
         | running processes... this makes deployment a major chore.
         | 
         | It's less terrible if you're just doing mostly stateless web
         | stuff, but that's not my world.
         | 
         | In the time it takes to run terraform plan, I could have pushed
         | erlang code to all my machines, loaded it, and (usually)
         | confirm I fixed what I wanted to fix.
         | 
         | Low cost of deploy means you can do more updates which means
         | they can be smaller which makes them easier to review.
        
       | anonymousDan wrote:
       | I'm a distributed setup I imagine there could be cases where you
       | want to atomically hot upgrade multiple VMs at the same time. Is
       | this common in practice and if so are there recommended
       | patterns/techniques for doing it?
        
         | Muromec wrote:
         | There can't be anything atomic in a distributed system. You
         | can't even atomically hot upgrade it on a single VM anyway --
         | you instead load the new version of the module and let
         | dispatcher know to route new calls into it, the same as you
         | would do with a load balancer and a bunch of load bearing
         | docker hosts, just _inside_ your app.
        
           | knome wrote:
           | erlang has a code_change function in the otp that allows the
           | gen_server to update its current state and start using new
           | code. No connections need be broken with clients, no long
           | running processes need be stopped. Just updated in place.
           | 
           | It's not just a routing change.
           | 
           | https://www.erlang.org/docs/24/man/gen_server
        
             | Muromec wrote:
             | It's a routing change in a sense that gen_server is routing
             | function calls to the new module definition. I know about
             | gen_server and code_change, the point was that conceptually
             | the same mechanism, just on a different level of
             | abstraction.
        
         | AlphaWeaver wrote:
         | Erlang does have a mechanism that allows a module to control
         | when it moves from the "old version" to the "new version" of
         | its own code. Calls to the module with the fully qualified name
         | (e.g. `module:function()`) will invoke the "new code" once it's
         | loaded, but calls within that module using only function names
         | (just `function()`) will continue to invoke the "old code".
         | 
         | If the portion of the app you were hot upgrading was an OTP
         | process like a GenServer, you could theoretically wait for some
         | sort of atomic coordination mechanism to make that fully
         | qualified function call after the new code has loaded, at least
         | in theory.
         | 
         | We use hot code reloading at my work, but haven't had a reason
         | to atomically sync the reload. Most of the time it's a tmux
         | session with `synchronize-panes` and that suffices. If your
         | application can handle upgrades within a module smoothly, it's
         | rare to have a need for some sort of cluster-level coordination
         | of a code change, at least one that's atomic.
        
         | toast0 wrote:
         | I mean, yes, there's cases where you want that. But there's no
         | mechanism for it, because you would have to stop the world, do
         | the load, and then resume.
         | 
         | Even within a single VM, hot loading doesn't stop the world,
         | during the load some schedulers will switch before others.
         | Although there are guarantees that mean when a process runs new
         | code and sends a message to another local process, that process
         | will have the new code available when it reads the message. (It
         | _may_ still be running the old code, depending on how it 's
         | called though)
         | 
         | Dealing with multiple versions active is part of life in most
         | distributed systems though. You can architect it away in some
         | systems, but that usually involves having downtime in
         | maintenance windows.
         | 
         | A typical pattern is making progressive updates, where if you
         | want to change a request, first you deploy a server that can
         | handle old and new requests, then you deploy the client that
         | sends the new request, then you can deploy a server that no
         | longer accepts old requests.
         | 
         | For new replies, if the new reply comes with a new request,
         | that works like above... a client that sent a new request must
         | handle the new reply. Otherwise, update the client to handle
         | either type of reply, then update the server to send the new
         | reply, finally remove handling of the old reply in the clients.
         | 
         | It gets a bit harder if your team dynamics mean one
         | person/group doesn't control both sides... Then you need stats
         | to tell you when all the clients have switched.
         | 
         | Sometimes you do need more of a point in time switch. If it
         | needs to be pretty good, you can just set a config through a
         | dist 'broadcast'. If it needs to be better than that, you can
         | have the servers and clients change behavior after a specific
         | time... but make sure you understand the realities of clock
         | synchronization and think about what to do for requests in
         | flight. If that's not good enough, you can drop or buffer
         | requests for a little bit before your targer time, make sure
         | there are no in progress requests, then resume processing
         | requests with the new version.
        
       | behnamoh wrote:
       | Lisp has had this features since day 1. But Lisp-like langs like
       | Clojure, Racket, etc. don't have it. This is one of the
       | fundamental features of Common Lisp and I don't know why most
       | other Lisp-wanna-be's don't implement it.
        
         | lamuswawir wrote:
         | Came here to say this. In Lisp, you can just compile a
         | function, or load a file and it just works. It's not even sold
         | as a hot feature, not the way Erlang sells it. It's just a
         | feature.
         | 
         | I manage a few websites written in Lisp, and updating them is
         | as simple as push code, recompile and it works.
        
           | davidw wrote:
           | But what if the system is running and the new function takes
           | different arguments or something? What if there is data
           | loaded in the system, what happens to it?
           | 
           | Simply loading new code is easy, ensuring the whole system
           | works seems to require a bit more effort.
        
             | fiddlerwoaroof wrote:
             | Common Lisp has a bunch of features designed to enable
             | migrating the system. e.g. update-instance-for-redefined-
             | class ( https://www.lispworks.com/documentation/HyperSpec/B
             | ody/f_upd... ) lets you write code to update instance data
             | between class versions when a class definition is reloaded.
             | 
             | It turns out, though, that making hot-code reloading work
             | well is mainly a question of how you design your system:
             | designing for hot code reloading isn't all that hard for
             | 90% of cases once you figure out the relevant techniques.
        
             | leprechaun1066 wrote:
             | We do this in q/kdb+ systems often for patches. An
             | important thing about these languages is that this kind of
             | workflow is part of the core for solving problems. So when
             | you are building a system one of the aspects of its design
             | will always allow for this update method. Then when you
             | push a patch you both know the impact of the change
             | (because you've tested the exact same steps in a
             | dev/QA/UAT/Beta environment) and the work required to do it
             | safely.
             | 
             | Major releases do go through a full shutdown and release
             | cycle though.
        
           | osmano807 wrote:
           | Those sites have something like Phoenix LiveView or it's
           | something ad hoc like a simple SSR template engine? Would be
           | nice to have something to handle migrations in the client
           | side code to match the server side API.
        
         | fiddlerwoaroof wrote:
         | Clojure has it for a large percentage of functionality: things
         | like https://github.com/clojure-emacs/cider depend on it.
         | However, this mostly stays in dev-time and isn't used much for
         | releases. Which I find a bit funny because Clojure's
         | functional, data-driven philosophy is great for enabling
         | painless hot-code updates
        
         | chamomeal wrote:
         | Can't you do something like this with clojure?
         | 
         | I don't understand the particulars, but one selling point of
         | biff is it's got built-in support for updating things directly
         | in prod via the REPL.
         | 
         | There's a fun interview with the biff guy on the podcast "the
         | REPL". He talks about how much fun it is to develop directly on
         | the prod server, and how horrified people are by it lol.
         | 
         | https://biffweb.com/
         | 
         | https://www.therepl.net/episodes/48/
        
       | Volundr wrote:
       | It's worth noting that distillery is deprecated in favor of mix
       | releases, which don't support relups out of the box, and
       | specifically warn against them due to the complexity involved in
       | writing code to support them correctly.
       | 
       | It's a cool feature that's no doubt amazing for applications that
       | need it, but it brings a fair amount of complexity vs other
       | deployment strategies.
        
         | superdisk wrote:
         | Yeah, note that this article is from 2016. I distinctly
         | remember during that time that these hot-swap deployments were
         | all the rage in the Elixir community, and then fell out of
         | fashion with time.
        
         | thibaut_barrere wrote:
         | Good point. Someone shared this in case someone wonders:
         | 
         | https://elixirforum.com/t/how-to-tweak-mix-release-to-work-w...
         | 
         | > I've spent some time understanding how to do hot code
         | reloading with releases built using mix release, and here I'd
         | like to detail the steps needed, in hopes that it will help
         | someone.
        
       | alberth wrote:
       | (2016)
        
         | dang wrote:
         | Where do you see that? I couldn't find it.
        
           | gnabgib wrote:
           | It's in the URL :D But yeah, the page doesn't make it clear
           | (and some of the embedded JS has a 2020 date suggesting it's
           | received updates).
           | 
           | In the RSS feed too: Wed, 07 Dec 2016
           | https://kennyballou.com/index.xml
        
             | dang wrote:
             | Hidden in plain view! Ok, let's put 2016 above, on the
             | assumption that the edits since then haven't been too
             | major.
        
       | hauxir wrote:
       | At kosmi.io we use elixir hot swapping for every small
       | patch/bugfix on the backend. This allows us to deploy updates
       | multiple times a day with 0 disruption.
       | 
       | Allows the clients to remain connected and be none the wiser that
       | there was an update at all.
       | 
       | For larger updates we just do hard restarts when in-memory data
       | structures or supervision tree are changed.
        
         | deathtrader666 wrote:
         | Would love to know more how you go about it.
        
           | hauxir wrote:
           | It's a little hacky but I'll try to explain:
           | 
           | * The server runs in a docker container which has an ssh
           | server installed and running in the background. The reason
           | for SSH is simply because that's what edeliver/distillery
           | uses.
           | 
           | * The CI(local github runner) runs in a docker container as
           | well which handles building and deploying the updated
           | releases when merged on master.
           | 
           | * We use edeliver to deploy the hot upgrades/releases from
           | the CI container to the server container. This happens
           | automatically unless stopped which we do for larger merges
           | where a restart is needed.
           | 
           | * The whole deployment process is done in a bash script which
           | uses the git hash for versioning, edeliver for deploying and
           | in the end it runs the database migrations.
           | 
           | I'm not going to say it's perfect but it's allowed us to move
           | pretty damn fast.
        
       | GCUMstlyHarmls wrote:
       | This is a talk about a large scale, resilient elixir/erlang
       | deployment in healthcare.
       | 
       | Specifically they talk about running with no down time using hot
       | code reloading here: https://youtu.be/pQ0CvjAJXz4?t=2667 but the
       | whole talk is quite interesting regarding availability.
       | 
       | Warning: the video is quite quiet.
        
       | benzible wrote:
       | "hot deploys on fly.io to a planet-wide cluster, in 3 seconds.":
       | https://x.com/chris_mccord/status/1785678249424461897
        
       | jongjong wrote:
       | Forcing all clients to reload their code at the same time sounds
       | like a bad idea. Allowing different clients to run different
       | incompatible versions of the code at the same time also sounds
       | like a bad idea.
       | 
       | APIs are like database engines; they should rarely change. Making
       | it easy to change them is an anti-pattern.
       | 
       | Engineers don't build bridges with replaceable pillars or
       | skyscrapers with replaceable foundations. When aerospace
       | engineers tried building a plane with replaceable engines, we got
       | Boeing 737 Max...
        
         | tzmudzin wrote:
         | Engine replacement happens on airplanes fairly frequently. You
         | don't want to scrap an airplane because of a single damaged
         | turbine blade, or even keep it on the ground for longer.
         | 
         | https://jalopnik.com/how-airlines-decide-to-replace-jet-
         | engi....
        
       | apex_sloth wrote:
       | I used to work for a company that wanted zero downtime through
       | Erlang's hot code reload feature. While it absolutely works, it
       | requires immense effort and extra code to handle state upgrades
       | and downgrades.
        
       | modernerd wrote:
       | Live updating a drone running Erlang in 10ms while it was flying
       | with no application restart and no loss of state impressed me
       | when I saw it in 2021:
       | 
       | https://www.youtube.com/watch?v=XQS9SECCp1I
       | 
       | But I almost never hear Erlang/Elixir/Gleam folks talk about this
       | benefit of the Erlang VM now, even though it seems fairly unique
       | and interesting. Has the community moved away from it? Is it just
       | not that useful?
        
         | cess11 wrote:
         | A lot of the GenServer-information floating around explains
         | code_change/3, no? That's commonly what you want, a way to
         | handle state propagation when process code is updating in a
         | running system.
         | 
         | Most people are probably running some web services or something
         | and might as well shift machines in and out of a cluster or can
         | wait for old processes to disband on their own, because the new
         | code is backwards compatible with the one in already running
         | processes, and so on.
         | 
         | It can also be relatively hard to do without causing damage to
         | the system. Those who need and can manage it probably don't
         | need it marketed.
        
           | cess11 wrote:
           | Someone put a reply and then deleted it while I wrote a
           | response, and it irks me that it might have been a waste so
           | here's the gist of it:
           | 
           | "Is it just that people are more comfortable with blue-green
           | deploys, or are blue-green deploys actually better?"
           | 
           | It depends. If you can do a blue-green shift where you
           | gradually add 'fresh' servers/VM:s/processes and drain the
           | old, that's likely to be most convenient and robust in many
           | organisations. On the other hand, if you rely on long running
           | processes in a way where changing their PID:s break the
           | system, then you pretty much need to update them with this
           | kind of hot patching.
           | 
           | "Does Erlang offer any features to minimize damage here?"
           | 
           | The BEAM allows a lot of things in this area, on pretty much
           | every level of abstraction. If you know what you're doing and
           | you've designed your system to fit well into the provided
           | mechanisms the platform provides a lot of support for hot
           | patching without sacrificing robustness and uptime. But it's
           | like another layer of possible bugs and risks, it's not just
           | your usual network and application logic that might cause a
           | failure, your handling of updates might itself be a source of
           | catastrophe.
           | 
           | In practice you need to think long and hard about how to
           | deploy, and test thoroughly under very production like
           | conditions. It helps that you can know for sure what
           | production looks like at any given time, the BEAM VM can tell
           | you exactly what processes it runs, what the application and
           | supervisor trees look like, hardware resource consumption and
           | so on. You can use this information to stage fairly realistic
           | tests with regards to load and whatnot, so if your update for
           | example has an effect on performance and unexpected
           | bottlenecks show up you might catch it before it reaches your
           | users.
           | 
           | And as anyone can tell you who has updated a profitable, non-
           | trivial production system directly, like a lot of PHP devs of
           | ye olden times, it takes a rather strong stomach even when it
           | works out fine. When it doesn't, you get scars that might
           | never fade.
        
           | Muromec wrote:
           | This is also a reply to that deleted comment, because I had
           | to type it all and also got to go outside and have my
           | European 2 hour long lunch break while doing it.
           | 
           | If you have any kind of state in gen_server and the state or
           | assumptions of it have changed, you need to write that
           | code_change thingy that migrates the state _both_ ways
           | between two specific versions. If by some chance this
           | function is bugged, then the process is killed (which is
           | okay), so you need to nail down the supervision tree to make
           | things restartable and also not get into restart loops.
           | Remember writing database migrations for django or whatever
           | ORM of the day? Now do that, but for memory structures you
           | have.
           | 
           | Now, while the function is running it can't be updated of
           | course, so you need gen_server to call you back from the
           | outside of the module. If you like to save function
           | references instead of saving process references in your
           | state, you need to figure out which version you will be
           | actually calling.
           | 
           | If you change the arity of your record, then the old record
           | no longer matches your patterns.
           | 
           | Since updates are not atomic, you will have two versions of
           | the code running at the same time, potentially sending
           | messages that old/new stuff does not expect, and both old and
           | new code should not bug out. And if they do bug out, you have
           | been smart enough to figure out how to recover and actually
           | test that.
           | 
           | Than there is this thing, if somehow something from the
           | version V-2 still running after update to V-1 and you start
           | updating to the latest V, then things happen.
           | 
           | You can deal with all that of course and erlang gives you
           | tools and recipies to make it work. Sometimes you have to
           | make it work, because restarting and losing state is not an
           | option. Also it's probably fun to deal with complex things.
           | 
           | Or you could just do do the stupid thing that is good enough
           | and let it crash and restart instead of figuring out ten
           | different things that could go wrong. Or take a 15 minutes
           | maintenance window while your users are all sleeping (yes,
           | not everybody is doing critical infra that runs 24/7 like
           | discord group with game memes). Or just do blue-green and
           | sidestep it all completely.
        
         | thibaut_barrere wrote:
         | A lot of web apps are just well-enough served with a blue-green
         | deployment model. It is less risky.
         | 
         | But if you really need it, it's really great to have that
         | option (e.g. very long running systems which are split in
         | front/back etc), and it can be used in creative ways too (like
         | the Drone example).
         | 
         | Here is a lightning talk I gave about how to use hot-reload for
         | music / MIDI interactions:
         | https://www.youtube.com/watch?v=Z8sGQM6kLvo
        
           | modernerd wrote:
           | Great talk, thanks, nice to see other creative uses. Great
           | idea to add LiveView and SVGs for the keyboard UI.
           | 
           | "...thanks to hot reloading, which -- for once -- is
           | useful..."
           | 
           | That seems to sum up the sentiment that hot swapping in
           | Erlang has uses but they're generally not aligned with what
           | Erlang is typically employed for. It seems like it would be
           | great for tight game dev loop feedback and iteration too, for
           | example, but that's not a traditional use of Erlang either.
        
             | thibaut_barrere wrote:
             | > That seems to sum up the sentiment that hot swapping in
             | Erlang has uses but they're generally not aligned with what
             | Erlang is typically employed for
             | 
             | Actually, I think it is much more common in original Erlang
             | scenarios (including "non-web") where high availability is
             | a useful pre-requisite.
             | 
             | It is in my experience less common in Elixir, which is
             | often more web-oriented (although not exclusively).
        
           | epiccoleman wrote:
           | Extremely cool, thanks for sharing!
        
         | chefandy wrote:
         | Huh, really? I feel like I see Elixir folks sing the praises of
         | beam pretty regularly. Specifically the OTP supervisor stuff
         | for fault-resistant server deployments. I haven't looked
         | specifically for that though recently so maybe people are
         | taking it for granted?
        
       | melvinroest wrote:
       | Is this like a similar feature in Smalltalk/Pharo and Lisp?
        
         | igouy wrote:
         | Yes, the basics are there in Smalltalk and there's more support
         | built into Erlang.
         | 
         | Also:
         | 
         | "Live program changes in the Dart VM"
         | 
         | https://github.com/dart-lang/sdk/blob/main/docs/Hot-reload.m...
         | 
         | "Live reloading for your ESP32"
         | 
         | https://github.com/toitlang/jaguar
        
       | gregors wrote:
       | The Big Elixir 2018 - Desmond Bowe - Hot Upgrade Are Not Scary
       | 
       | https://www.youtube.com/watch?v=IeUF48vSxwI
        
       | epiccoleman wrote:
       | I wonder if this kind of thing could be used to make the Elixir
       | REPL a bit more LISPy. I like iex a good deal, but I often find
       | myself wishing I could just easily eval some code or expression
       | in the editor and have it make its way into the REPL context.
       | (yes, I know you can `r` on a module, but that's pretty clunky
       | compared to something like CIDER).
        
       | dszoboszlay wrote:
       | Hot code upgrades on the BEAM are awesome, but they're not a
       | piece of cake. If you're also interested in the challenges of
       | making them production safe, I gave a talk about this topic on
       | CodeBEAM Sto earlier this year:
       | 
       | https://youtu.be/epORYuUKvZ0?si=gkVBgrX2VpBFQAk5
       | 
       | OP talks in the summary about the importance of understanding the
       | process. It's very much true, but you need to understand not only
       | the process your tooling provides, but also what's going on in
       | the background and what hasn't been taken care for you by your
       | tools. I'm afraid these things are rarely understood about hot
       | upgrades, even by experienced Erlang engineers.
        
       | robocat wrote:
       | Great discussion 23 days ago on hot code loading:
       | 
       | https://news.ycombinator.com/item?id=42187761
        
       ___________________________________________________________________
       (page generated 2024-12-13 23:01 UTC)