[HN Gopher] Elixir/Erlang Hot Swapping Code (2016)
___________________________________________________________________
Elixir/Erlang Hot Swapping Code (2016)
Author : justinludwig
Score : 213 points
Date : 2024-12-12 23:16 UTC (23 hours ago)
(HTM) web link (kennyballou.com)
(TXT) w3m dump (kennyballou.com)
| omertoast wrote:
| i'm so sick of this DevOps bullshit i wonder if there is an
| alternative language that you can hot swap code and do all the
| black magic stuff while keeping the reliability and performance
| like Rust.
| saurik wrote:
| I am shocked at the idea of anyone implying that Erlang is
| "unreliable"... it's entire reason for existence was to step up
| the game on reliability.
| zanderwohl wrote:
| I agree. I don't know much about Erlang but what I've heard
| seems to indicate it's used for high-uptime systems that
| handle errors well.
| Muromec wrote:
| I suspect the causality is reversed. When you have a good
| designed telecom system, then spmething shaped as erlang
| happens to be a good tool to create to implement it. The
| tool than keeps you committed to the design choices you
| made by being restrictive enough.
| Muromec wrote:
| I heard the quote that some 50% of the mobile traffic is
| handled by erlang. Somehow the other 50% seems to be doing
| just fine (except the usual shitshow on the inside that
| sofware is everywhere all the time).
| simoncion wrote:
| > I heard the quote that some 50% of the mobile traffic is
| handled by erlang.
|
| Given that you can implement OTP in any language (albeit
| with varying degrees of difficulty), that's not surprising.
|
| The thing to remember is that Erlang was first used in
| production in like _1986_. nearly forty years is more than
| enough time for the biggest good ideas in Erlang to
| percolate out into non-BEAM systems.
| LAC-Tech wrote:
| As someone who is now in the rust world and very very
| sympathetic to the Erlang world... you both probably mean
| completely different things when you say "reliable". The
| contexts are just world apart.
| dcsommer wrote:
| GP seems to be implying hot swapping, not Erlang, is
| unreliable. To which, from my experience using it in Erlang,
| I heartily agree is fraught. Inconsistent state across nodes
| is much harder to reason about. When you _must_ ensure
| consistency, hot swapping is reckless, especially as org size
| and product complexity increases.
|
| Leave hot loading to local/development environments, not
| production deploys.
|
| Loading configs on the fly can also have some of this risk,
| but it is much easier to reason about typically.
| sergiotapia wrote:
| we had this wonderful thing in PHP where you would just save a
| .php file and bada bing it was LIVE.
|
| what happened? :D
| thanksgiving wrote:
| They took away our access after one too many outages :yay:
| Muromec wrote:
| Sounds like what happened with hot upgrade privileges in
| some erlang shops too.
| thanksgiving wrote:
| I've worked at a smaller project where it worked just
| fine. The key I think is to have the project be small
| enough to be able to fit in my head.
|
| That and have an identical test server. I used to make
| changes locally, test it locally, then make the same
| change on test, have someone else look at it, get a lgtm,
| and do the same thing on the production machine. It
| sounds like a lot of steps but it is pretty
| straightforward.
|
| Sadly, it probably doesn't work with bigger teams or more
| complicated projects.
| Muromec wrote:
| A lot of things work on a small project where everybody
| knows what they are doing and can keep all the
| dependencies and data structure in their heads. Logging
| to the production server and editing PHP file right where
| apache is looking at it and fixing stuff with sql
| commands in the production database to address customer
| complaints.
|
| The big question is what everyone else should be doing
| that survives the touch with unevenly distributed amounts
| of technical expertise and amount fucks given about
| result.
| cardanome wrote:
| We still have that and it is awesome. PHP is better than
| ever.
|
| In serious emergencies I even sometimes end up quickly SSH-en
| to a prod server and changing the file directly. Which is
| kind of horrifying but hey customer is happy it got fixed
| immediately and I get to relax and take my time to write a
| proper fix. Beats sweating and watching the pipeline build
| and asking around for people to approve my merge request.
| Muromec wrote:
| The black magic comes at the cost of not having one streamlined
| procedure to release stuff.
|
| And to make the black magic work you have to engage with it.
| Most of the time people don't even bother write to a proper
| import.meta.hot.accept thingy in javascript. Developers simply
| hate chores, which is evident by not willing to write proper
| unit tests (despite knowing that tests work) or writing just
| enough to let the coverage cop pass ship the build.
|
| A dedicated small team running something like whatsup? Sure,
| look into the arcane and let it look back at you (although high
| insight makes one more susceptible to madness you know). But
| most of the time you will do better job with PHP in a stupid
| restartable box behind seven load balancing proxies.
| yetihehe wrote:
| > The black magic comes at the cost of not having one
| streamlined procedure to release stuff.
|
| You can also have a streamlined procedure to release stuff.
| Most changes in my erlang based system consist of "push to
| staging branch, click to deploy and test, pull to master,
| click deploy button". Can't be simpler than that. Most
| changes in such systems are also pretty simple. When you need
| to add something big, typically not many things are dependent
| on that, so deploy is also pretty simple.
|
| > But most of the time you will do better job with PHP in a
| stupid restartable box behind seven load balancing proxies.
|
| Yeah, we talk here about more complicated things here. If you
| have something simple, you don't need to use erlang, `python
| -m http.server` will be even simpler than your php in stupid
| restartable box, because you don't need a special box, just
| one small command.
| Muromec wrote:
| Do you do 100% of deployments using hot reload? If yes,
| maybe you should share the recipy with everybody else,
| since consensus seems to recommend the opposite.
|
| At the very least you will have a different procedure to
| upgrade the erlang itself, right?
|
| >If you have something simple, you don't need to use erlang
|
| I think on a spectrum of difficult things there is an area
| between hosting static file on rpi at home and running
| massivele distributed system full of long running stateful
| processes.
| yetihehe wrote:
| > Do you do 100% of deployments using hot reload?
|
| About 99%. We need to restart servers maybe once a year.
| Maybe next year we will finally migrate from erlang 21 to
| latest. Most "stopping everything" deployments take max
| 1s of downtime, like this month when we needed to upgrade
| postgresql database by switching it over to new machine,
| having zero downtime here would take a little longer to
| get the consistency, but we could spare a second to make
| it much simpler task on database side (it was a restart
| of database module, rest of erlang server was unaffected
| and clients were not disconnected). Otherwise, most
| deployments are not visible as disconnections, we have a
| lot long-running connections.
|
| > At the very least you will have a different procedure
| to upgrade the erlang itself, right?
|
| Yes.
|
| > I think on a spectrum of difficult things there is an
| area between hosting static file on rpi at home and
| running massivele distributed system full of long running
| stateful processes.
|
| Is PHP good for both? I think PHP is NOT good for long
| running stateful processes, but I didn't use it in
| 10years. And it probably is not needed for static files.
| Muromec wrote:
| Sounds like you are doing cool stuff the cool way, which
| can't be said about everyone.
|
| >Is PHP good for both? I think PHP is NOT good for long
| running stateful processes, but I didn't use it in
| 10years. And it probably is not needed for static files.
|
| No, of course it isn't. I didn't touch it with a long
| pole for even more years and don't even want to. And
| would not argue to do what you are doing in anything that
| isn't running on BEAM/OTP.
|
| The point I'm trying to make is -- for most of the web
| stuff, making transactional response from a short-lived
| http handler is good enough and you can do it even in PHP
| (which is not a praise for PHP as a great tool I enjoy
| using, but the opposite). It would not the most optimal
| solution by any metric, not the most elegant or
| sophisticated either, but it's survivable, it's the
| lowest bidder.
| lamuswawir wrote:
| Erlang is built for reliability. They're chasing nine nines.
| Everything about the BEAM is built to emphasize that, the
| design choices, the documentation, the recommended practices.
|
| Erlang is not very fast, but that's not what it was built for.
| Muromec wrote:
| Is it really beam or just otp? Sure, beam gives you
| processes, network-transparent send, immutable structures and
| linking-monitoring thingy on top, but is what makes it good
| to shoot for nines?
|
| I suspect the aura of mistycism around yet another jit vm is
| not that warranted
| jerf wrote:
| It is reasonable to conceive of Erlang as encompassing OTP.
| Perhaps somewhere in the world there is some developer out
| there hot on Erlang but just hates OTP and doesn't use it,
| but they must be fairly frustrated at how hard it is to
| keep OTP out of their code base if they ever need any
| libraries.
|
| Restarting is arguably the definitive thing that makes
| Erlang stack the 9s out past what most languages and
| runtimes can achieve... the thing is, it's more complicated
| to use in practice than a web page like this makes it look,
| and it's beyond what most products need. Few applications
| _need_ the fifth or sixth or seventh nine, and it gets to
| the point that you can 't have it anyhow because your
| Erlang cluster, no matter how well distributed, itself
| probably doesn't have 99.99999 availability, and your users
| probably don't have 99.99999 availability on their own
| network connection.
|
| It's not impossibly complicated, but it is the sort of
| thing where you if you want to use the feature you need to
| have it sort of constantly in mind as you write the rest of
| your system, and it's a lot easier even in Erlang to just
| design the system to take entire nodes down and bring them
| back up, if not the entire cluster down, rather than fuss
| with hot reloads. I wish Erlang advocates would be more
| upfront about pitching this as an interesting niche
| feature, but not really a reason to consider Erlang. Unless
| you _absolutely need it_ , in which case it can indeed be
| the thing that puts it on the short list of choices... but
| as evidenced by the vast, vast majority of software and
| systems not being on Erlang and managing to get along,
| there aren't really that many things that _need_ it.
| gf000 wrote:
| It's not well known, but the JVM has very good hot reload
| support, and is a very reliable and performant platform.
| amelius wrote:
| Does this hot swapping also work for closures?
| Muromec wrote:
| Erlang doesn't have closures, because erlang doesn't have
| variables. The compiler simply desugars it to partially applied
| function referenced by it's name (yes, those inline functions
| in fact have names).
|
| If you have something_function, then first inline function used
| in it will be -something_function/1-fun-0- with zero being the
| index and captured variable being another argument. Now if you
| will change the host function to have more inlines _before_ it,
| the indexing will drift.
|
| So I would expect the body of inline function will still be
| resolved from the old version of the module, but I didn't
| actually try.
|
| Source: I did run erlc -S at least once.
|
| Add: now thinking of it, will the call to a local function from
| the old version of the module ever escape into the new one
| without first returning back to gen_server and letting it call
| the new version? Another comment says that calls withing the
| module never do, so the assumption was correct.
| bitwalker wrote:
| Erlang absolutely has closures, you are mistaken. What you
| are referring to are "function captures", which bind a
| function reference as a value, and there is no environment to
| close over with those. However, you can define closures which
| as you'd expect, can close over bindings in the environment
| in which the closure is defined.
|
| The interaction between hot reloads and function captures in
| general is a bit subtle, particularly when it comes to how a
| function is captured. A fully qualified function capture is
| reloaded normally, but a capture using just a local name
| refers to the version of the module at the time it was
| captured, but is force upgraded after two consecutive hot
| upgrades, as only two versions of a module are allowed to
| exist at the same time. For this reason, you have to be
| careful about how you capture functions, depending on the
| semantics you want.
| toast0 wrote:
| > but is force upgraded after two consecutive hot upgrades,
| as only two versions of a module are allowed to exist at
| the same time.
|
| Force upgraded is maybe misleading. When a module is loaded
| for the 3rd time, any processes that still have the first
| version in their stack are killed. That may result in a
| supervisor restarting them with new code, if they're
| supervised.
| bitwalker wrote:
| Ah right, good point - I was trying to remember the exact
| behavior, but couldn't recall if an error is raised (and
| when), or if the underlying module is just replaced and
| "jesus take the wheel" after that.
| Muromec wrote:
| What does is it look like? I was talking about this thing:
| Val = 1, SumFun = fun(X) -> X + Val end, SumFun(2).
|
| It looks like you define arity 1 function that captures
| Val, while in fact you define arity 2 function and bind 1
| as a first argument. Since you can't redefine Val anyway,
| it's as good as a closure, but technically it doesn't
| capture the environment.
|
| Maybe I'm mistaken and there is another way to express it?
| bitwalker wrote:
| The example you've given here does not work the way you
| think it does. I would agree however that the mechanics
| of closure environments is simpler in Erlang due to the
| fact that values are immutable, as opposed to closures in
| other languages where mutability must be accounted for.
|
| I would also note that, for the example you've given, the
| compiler _could_ constant-fold the whole thing away, but
| for the sake of argument, let's assume that `Val` is an
| argument to the current function in which `SumFun` is
| defined, and so the compiler cannot reason about the
| actual value that was bound.
|
| The closure will be constructed at the point it is
| captured, using the `make_fun` BIF, with a given number
| of free var slots (in this case, 1 for the capture of
| `Val`). `Val` is written to the slot in the closure
| environment at this time as well. See the implementation
| of the BIF [here](https://github.com/erlang/otp/blob/6cef
| a05a2a977864150908feb...) if you are curious.
|
| At runtime, when the closure is executed, the underlying
| function receives the closure environment, from which it
| loads any free vars. In my own Erlang compiler, the
| closure environment was given via pointer, as the first
| argument to the function, and then instructions were
| emitted to load free variables relative to that pointer.
| I believe BEAM does the same thing, but it may differ in
| the specific details, but conceptually that is how it
| works.
|
| The compiler obviously must generate a new free function
| definition for closures with free variables (hence the
| name of the function you see in the interactive shell, or
| in debug output). The captured MFA of the closure is this
| generated function. The runtime distinguishes between the
| two types of closures (function captures vs actual
| closures) based on the metadata of the func value itself.
|
| Like I mentioned near the top, it's worth bearing in mind
| that the compiler can also do quite a bit of
| simplification and optimization during compilation to
| BEAM - so there may be cases where you end up with a
| function capture instead of a closure, because the
| compiler was able to remove the need for the free
| variable in cases like your example, but I can't recall
| what erlc specifically does and does not do in that
| regard.
| Muromec wrote:
| > let's assume that `Val` is an argument to the current
| function in which `SumFun` is defined, and so the
| compiler cannot reason about the actual value that was
| bound.
|
| That was exactly the case I was talking about, because
| otherwise there is no need to even make arity 2 function.
| If the value is known at compile time, the constant is
| embedded into the body of inlined function.
|
| >At runtime, when the closure is executed, the underlying
| function receives the closure environment, from which it
| loads any free vars.
|
| To my understanding, no it doesn't, as the value is
| resolved when the function pointed is created, not when
| the underlying function executes, which the code you
| linked shows too. I know it uses the "env" as a structure
| field, but it's partial application, not the actual
| closure which has access to parent scope. Consider two
| counter examples in python: for x in
| range(1,10): ret.append(partial(lambda y: y*2, x)) #
| that's what erlang does for x in
| range(1,10): ret.append(partial(x, lambda y: y*2)) #
| that's an actual closure, as all lambdas will return 18
| because x is captured from the parent context
|
| But then again, it doesn't matter since variables are
| assigned only once.
|
| >Like I mentioned near the top, it's worth bearing in
| mind that the compiler can also do quite a bit of
| simplification and optimization during compilation to
| BEAM - so there may be cases where you end up with a
| function capture instead of a closure, because the
| compiler was able to remove the need for the free
| variable in cases like your example, but I can't recall
| what erlc specifically does and does not do in that
| regard.
|
| I was looking into it a week ago, and erlc does what I
| described when it can't figure out the constant at
| compile time.
|
| add: If we are at it, BEAM doesn't even know about
| variables, only values and registers anyway, so it has
| nothing to capture anyway.
| slt2021 wrote:
| hot reload of code is nothing new nowadays, but people use it
| only locally during development for REPL like development style.
|
| in actual production, people prefer to operate at the container
| level + traffic management, and dont touch anything deeper than
| the container
| foota wrote:
| Amusingly, this reminds me sort of about the story of a person
| who joins a new company only to discover that their programming
| framework is intricately linked to their version control
| system.
| diath wrote:
| > in actual production, people prefer to operate at the
| container level + traffic management, and dont touch anything
| deeper than the container
|
| How do you think video games like World of Warcraft or Path of
| Exile deploy restartless hotfixes to millions of concurrent
| players without killing instances? I don't think it's a matter
| of "prefer to", it's a matter of "can we completely disrupt the
| service for users and potentially lose some of the state"? Even
| if that disruption lasts a mere millisecond, in some context
| it's not acceptable.
| Muromec wrote:
| Games do in fact have downtimes on major releases and you
| have to restart the client too before connecting.
| diath wrote:
| For major patches/backend changes that require recompiling
| - yes, for gameplay tweaks/hotfixes - no, hot reloading is
| preferable where possible.
| qudat wrote:
| WoW restarts every week. Not sure that's better than zero
| downtime deployments
| diath wrote:
| That's just how it works when your backend is a hybrid
| software that utilizes a low-level compiled programming
| language and a high-level language that runs in its own VM.
| You can use the latter for gameplay features, and can
| hotfix on the go, and then for core changes you have to
| restart, which is also why WoW will hotfix the latter on
| the go, usually every day on an expansion launch, whereas
| they defer the bulk of backend changes for the next weekly
| restart without continuously disrupting the game for
| players.
| AnotherGoodName wrote:
| That's a very big assumption that they do code hotpatching.
|
| It would seem far more likely they seperate the stateful
| (database) and stateless layers (game logic) and they just
| spin up a new instance of the stateless server layer behind a
| reverse proxy and spin down the old instance. It's basically
| how all websites update without down time.
| diath wrote:
| A website that just proxies to another server does not need
| to do much to restore the previous state to make it look
| seamless to a user, the client will just perform another
| GET request that triggers a few SELECT queries, it's far
| more complex in the context of a video game.
| Thaxll wrote:
| Most of those hot fixes are data driven as in database
| updates. Gameserver just reload the data, the binary itself
| is not touch.
|
| I've never seen a game where they hot reload code inside the
| gameserver itself, it's usually a downtime or rolling
| updates.
| diath wrote:
| > Most of those hot fixes are data driven as in database
| updates. Gameserver just reload the data, the binary itself
| is not touch.
|
| And since the data from the disk/database (whether it's a
| Lua table, XML structure, JSON object, or a query) is then
| representend as a low-level data structure, that's
| essentially what hot reloading is - you deserialize the new
| data and hot-swap the pointers in the simplest terms.
|
| >I've never seen a game where they hot reload code inside
| the gameserver itself, it's usually a downtime or rolling
| updates.
|
| In World of Warcraft, you will literally have bosses
| despawn mid-fight and spawn again with new stats or you
| will see their health values update mid-fight, all without
| the players getting interrupted, their spell state getting
| desynced, or spawned items in the instance disappearing.
| This can be observed with the release of every single new
| raid on live streams as Blizzard employees are watching the
| world first attempts and tweaking/tuning the fights as they
| happen.
|
| EDIT: Here's such an example, for the majority of the fight
| the extra tank could keep a spawned monster away from the
| boss, then mid-fight, the monster suddenly started one-
| shotting the tank, without the disruption of the instance,
| this was Blizzard's way of addressing a cheese strat to
| force the players to do the right as designed:
| https://www.youtube.com/watch?v=7gMm60BXAjU
| Thaxll wrote:
| Yes but again it's not hot swapping code as in Erlang,
| the C++ code is unchanged, they just change some xml
| somewhere.
|
| By your definition every CRUD app have hot reloading
| capabilities.
| diath wrote:
| > Yes but again it's not hot swapping code as in Erlang,
| the C++ code is unchanged, they just change some xml
| somewhere.
|
| Right, not on the C++ side, but on the Lua side that WoW
| uses - you load the new gameplay code that pulls the new
| data, and override the globals with new functions.
| dgfitz wrote:
| Why does it matter the language? C++ built in the tooling
| to allow hot swapping, no?
| Thaxll wrote:
| C++ because 99% of the major games are built in that
| language.
| tomjakubowski wrote:
| LPMUDs ran almost entirely on hot reloadable code written
| in a quirky language called LPC, which later inspired the
| Pike language.
|
| I believe that only the "driver" code, which handles system
| calls and hosts the LPC interpreter and is written in C,
| couldn't be hot reloaded; everything else running in the
| game could be reloaded without restarting the server.
|
| I'd guess in the modern day, there would be some games
| where Lua scripts can be hot-reloaded like any other data,
| from a database or object store.
| cess11 wrote:
| It's a rather fun language and programming environment,
| I'd recommend playing around with it over doing AoC.
| swat535 wrote:
| In addition to what most people said, many other game servers
| just simply announce upcoming maintenance work and take the
| services offline until the patches are deployed.
|
| This way they can properly test everything and rollback any
| potential fixes if required.. even banking systems regularly
| goes down for maintenance.
| AlphaWeaver wrote:
| People may "prefer" simply replacing containers, but as some
| siblings mention, some applications might require more
| reliability guarantees.
|
| Erlang was originally designed for implementing telephony
| protocols, where interrupted phone calls were not an acceptable
| side effect of application updates.
| AnotherGoodName wrote:
| FWIW as soon as you start using containers you should be able
| to handle those containers spinning up/down. Pretty much the
| whole point of containers. At which point you don't need to
| bother with code hot swapping since you already have a
| mechanism for newer containers to spin up while older ones
| spin down.
|
| The sibling post "that's how they update without downtime" is
| super naive. It is absolutely not how they do it.
| Muromec wrote:
| That's kinda what erlang does, just on a different level.
| Your docker and your load balancer are both inside your
| app.
| simoncion wrote:
| If we to wedge how Erlang does hot code swapping into a
| container metaphor, then to get what Erlang does, you'd
| need to have a container per function call.
|
| Given that it would be absurdly wasteful to use OS
| processes in containers to clone Erlang's code reload
| system, AnotherGoodName might take ten minute to watch
| Erlang: The Movie to get a better sense of the
| capabilities of that system. The movie is available from
| many places, including archive.org.
| Muromec wrote:
| >If we to wedge how Erlang does hot code swapping into a
| container metaphor, then to get what Erlang does, you'd
| need to have a container per function call.
|
| You have a container that responds to HTPP requests
| sitting behind a load balancer, then you spawn a new
| container and tell load balancer to redirect calls to the
| new one. From the point of view of whoever is calling the
| load balancer you have hot swapping. You may even
| separate containers into logical groups and call it
| microservices architecture. Or you can define a process
| as something having qualified name and a mailbox and is
| sending messages to other processes.
|
| Now reasonable people may disagree about what's wasteful,
| but the market seems to tolerate places where adding a
| checkbox to a form is a half a year process involving
| five different departments and the market can't be wrong.
| aeturnum wrote:
| I work at a company that deploys Elixir/Erlang and while we do
| /prefer/ to push a fully tested build in a new container,
| sometimes things get nasty and we need to console in and re-
| define a module in production. It's not a "best practice" but
| it stems the bleeding while the best practice is going though
| its test suite.
| simoncion wrote:
| > in actual production, people prefer to operate at the
| container level + traffic management, and dont touch anything
| deeper than the container
|
| Fred Hebert (and many of the folks he has worked with) do not
| operate that way: <https://ferd.ca/a-pipeline-made-of-
| airbags.html>
|
| One nice quote (out of many) from the article:
|
| > The thing that stateless containers and kubernetes do is
| handle that base case of "when a thing is wrong, replace it and
| get back to a good state." The thing it does not easily let you
| do is "and then start iterating to get better and better at not
| losing all your state and recuperating fast".
|
| (And if one wants to argue with that quote, please read the
| entire essay first. There's important context that's relevant
| to fully understanding Hebert's opinion here.)
| toast0 wrote:
| > in actual production, people prefer to operate at the
| container level + traffic management, and dont touch anything
| deeper than the container
|
| I mean, this seems to be "best practices" these days, but I
| certainly don't prefer it. At least the orchestration I use is
| amazingly slow. And cold loading changes is terrible for long
| running processes... this makes deployment a major chore.
|
| It's less terrible if you're just doing mostly stateless web
| stuff, but that's not my world.
|
| In the time it takes to run terraform plan, I could have pushed
| erlang code to all my machines, loaded it, and (usually)
| confirm I fixed what I wanted to fix.
|
| Low cost of deploy means you can do more updates which means
| they can be smaller which makes them easier to review.
| anonymousDan wrote:
| I'm a distributed setup I imagine there could be cases where you
| want to atomically hot upgrade multiple VMs at the same time. Is
| this common in practice and if so are there recommended
| patterns/techniques for doing it?
| Muromec wrote:
| There can't be anything atomic in a distributed system. You
| can't even atomically hot upgrade it on a single VM anyway --
| you instead load the new version of the module and let
| dispatcher know to route new calls into it, the same as you
| would do with a load balancer and a bunch of load bearing
| docker hosts, just _inside_ your app.
| knome wrote:
| erlang has a code_change function in the otp that allows the
| gen_server to update its current state and start using new
| code. No connections need be broken with clients, no long
| running processes need be stopped. Just updated in place.
|
| It's not just a routing change.
|
| https://www.erlang.org/docs/24/man/gen_server
| Muromec wrote:
| It's a routing change in a sense that gen_server is routing
| function calls to the new module definition. I know about
| gen_server and code_change, the point was that conceptually
| the same mechanism, just on a different level of
| abstraction.
| AlphaWeaver wrote:
| Erlang does have a mechanism that allows a module to control
| when it moves from the "old version" to the "new version" of
| its own code. Calls to the module with the fully qualified name
| (e.g. `module:function()`) will invoke the "new code" once it's
| loaded, but calls within that module using only function names
| (just `function()`) will continue to invoke the "old code".
|
| If the portion of the app you were hot upgrading was an OTP
| process like a GenServer, you could theoretically wait for some
| sort of atomic coordination mechanism to make that fully
| qualified function call after the new code has loaded, at least
| in theory.
|
| We use hot code reloading at my work, but haven't had a reason
| to atomically sync the reload. Most of the time it's a tmux
| session with `synchronize-panes` and that suffices. If your
| application can handle upgrades within a module smoothly, it's
| rare to have a need for some sort of cluster-level coordination
| of a code change, at least one that's atomic.
| toast0 wrote:
| I mean, yes, there's cases where you want that. But there's no
| mechanism for it, because you would have to stop the world, do
| the load, and then resume.
|
| Even within a single VM, hot loading doesn't stop the world,
| during the load some schedulers will switch before others.
| Although there are guarantees that mean when a process runs new
| code and sends a message to another local process, that process
| will have the new code available when it reads the message. (It
| _may_ still be running the old code, depending on how it 's
| called though)
|
| Dealing with multiple versions active is part of life in most
| distributed systems though. You can architect it away in some
| systems, but that usually involves having downtime in
| maintenance windows.
|
| A typical pattern is making progressive updates, where if you
| want to change a request, first you deploy a server that can
| handle old and new requests, then you deploy the client that
| sends the new request, then you can deploy a server that no
| longer accepts old requests.
|
| For new replies, if the new reply comes with a new request,
| that works like above... a client that sent a new request must
| handle the new reply. Otherwise, update the client to handle
| either type of reply, then update the server to send the new
| reply, finally remove handling of the old reply in the clients.
|
| It gets a bit harder if your team dynamics mean one
| person/group doesn't control both sides... Then you need stats
| to tell you when all the clients have switched.
|
| Sometimes you do need more of a point in time switch. If it
| needs to be pretty good, you can just set a config through a
| dist 'broadcast'. If it needs to be better than that, you can
| have the servers and clients change behavior after a specific
| time... but make sure you understand the realities of clock
| synchronization and think about what to do for requests in
| flight. If that's not good enough, you can drop or buffer
| requests for a little bit before your targer time, make sure
| there are no in progress requests, then resume processing
| requests with the new version.
| behnamoh wrote:
| Lisp has had this features since day 1. But Lisp-like langs like
| Clojure, Racket, etc. don't have it. This is one of the
| fundamental features of Common Lisp and I don't know why most
| other Lisp-wanna-be's don't implement it.
| lamuswawir wrote:
| Came here to say this. In Lisp, you can just compile a
| function, or load a file and it just works. It's not even sold
| as a hot feature, not the way Erlang sells it. It's just a
| feature.
|
| I manage a few websites written in Lisp, and updating them is
| as simple as push code, recompile and it works.
| davidw wrote:
| But what if the system is running and the new function takes
| different arguments or something? What if there is data
| loaded in the system, what happens to it?
|
| Simply loading new code is easy, ensuring the whole system
| works seems to require a bit more effort.
| fiddlerwoaroof wrote:
| Common Lisp has a bunch of features designed to enable
| migrating the system. e.g. update-instance-for-redefined-
| class ( https://www.lispworks.com/documentation/HyperSpec/B
| ody/f_upd... ) lets you write code to update instance data
| between class versions when a class definition is reloaded.
|
| It turns out, though, that making hot-code reloading work
| well is mainly a question of how you design your system:
| designing for hot code reloading isn't all that hard for
| 90% of cases once you figure out the relevant techniques.
| leprechaun1066 wrote:
| We do this in q/kdb+ systems often for patches. An
| important thing about these languages is that this kind of
| workflow is part of the core for solving problems. So when
| you are building a system one of the aspects of its design
| will always allow for this update method. Then when you
| push a patch you both know the impact of the change
| (because you've tested the exact same steps in a
| dev/QA/UAT/Beta environment) and the work required to do it
| safely.
|
| Major releases do go through a full shutdown and release
| cycle though.
| osmano807 wrote:
| Those sites have something like Phoenix LiveView or it's
| something ad hoc like a simple SSR template engine? Would be
| nice to have something to handle migrations in the client
| side code to match the server side API.
| fiddlerwoaroof wrote:
| Clojure has it for a large percentage of functionality: things
| like https://github.com/clojure-emacs/cider depend on it.
| However, this mostly stays in dev-time and isn't used much for
| releases. Which I find a bit funny because Clojure's
| functional, data-driven philosophy is great for enabling
| painless hot-code updates
| chamomeal wrote:
| Can't you do something like this with clojure?
|
| I don't understand the particulars, but one selling point of
| biff is it's got built-in support for updating things directly
| in prod via the REPL.
|
| There's a fun interview with the biff guy on the podcast "the
| REPL". He talks about how much fun it is to develop directly on
| the prod server, and how horrified people are by it lol.
|
| https://biffweb.com/
|
| https://www.therepl.net/episodes/48/
| Volundr wrote:
| It's worth noting that distillery is deprecated in favor of mix
| releases, which don't support relups out of the box, and
| specifically warn against them due to the complexity involved in
| writing code to support them correctly.
|
| It's a cool feature that's no doubt amazing for applications that
| need it, but it brings a fair amount of complexity vs other
| deployment strategies.
| superdisk wrote:
| Yeah, note that this article is from 2016. I distinctly
| remember during that time that these hot-swap deployments were
| all the rage in the Elixir community, and then fell out of
| fashion with time.
| thibaut_barrere wrote:
| Good point. Someone shared this in case someone wonders:
|
| https://elixirforum.com/t/how-to-tweak-mix-release-to-work-w...
|
| > I've spent some time understanding how to do hot code
| reloading with releases built using mix release, and here I'd
| like to detail the steps needed, in hopes that it will help
| someone.
| alberth wrote:
| (2016)
| dang wrote:
| Where do you see that? I couldn't find it.
| gnabgib wrote:
| It's in the URL :D But yeah, the page doesn't make it clear
| (and some of the embedded JS has a 2020 date suggesting it's
| received updates).
|
| In the RSS feed too: Wed, 07 Dec 2016
| https://kennyballou.com/index.xml
| dang wrote:
| Hidden in plain view! Ok, let's put 2016 above, on the
| assumption that the edits since then haven't been too
| major.
| hauxir wrote:
| At kosmi.io we use elixir hot swapping for every small
| patch/bugfix on the backend. This allows us to deploy updates
| multiple times a day with 0 disruption.
|
| Allows the clients to remain connected and be none the wiser that
| there was an update at all.
|
| For larger updates we just do hard restarts when in-memory data
| structures or supervision tree are changed.
| deathtrader666 wrote:
| Would love to know more how you go about it.
| hauxir wrote:
| It's a little hacky but I'll try to explain:
|
| * The server runs in a docker container which has an ssh
| server installed and running in the background. The reason
| for SSH is simply because that's what edeliver/distillery
| uses.
|
| * The CI(local github runner) runs in a docker container as
| well which handles building and deploying the updated
| releases when merged on master.
|
| * We use edeliver to deploy the hot upgrades/releases from
| the CI container to the server container. This happens
| automatically unless stopped which we do for larger merges
| where a restart is needed.
|
| * The whole deployment process is done in a bash script which
| uses the git hash for versioning, edeliver for deploying and
| in the end it runs the database migrations.
|
| I'm not going to say it's perfect but it's allowed us to move
| pretty damn fast.
| GCUMstlyHarmls wrote:
| This is a talk about a large scale, resilient elixir/erlang
| deployment in healthcare.
|
| Specifically they talk about running with no down time using hot
| code reloading here: https://youtu.be/pQ0CvjAJXz4?t=2667 but the
| whole talk is quite interesting regarding availability.
|
| Warning: the video is quite quiet.
| benzible wrote:
| "hot deploys on fly.io to a planet-wide cluster, in 3 seconds.":
| https://x.com/chris_mccord/status/1785678249424461897
| jongjong wrote:
| Forcing all clients to reload their code at the same time sounds
| like a bad idea. Allowing different clients to run different
| incompatible versions of the code at the same time also sounds
| like a bad idea.
|
| APIs are like database engines; they should rarely change. Making
| it easy to change them is an anti-pattern.
|
| Engineers don't build bridges with replaceable pillars or
| skyscrapers with replaceable foundations. When aerospace
| engineers tried building a plane with replaceable engines, we got
| Boeing 737 Max...
| tzmudzin wrote:
| Engine replacement happens on airplanes fairly frequently. You
| don't want to scrap an airplane because of a single damaged
| turbine blade, or even keep it on the ground for longer.
|
| https://jalopnik.com/how-airlines-decide-to-replace-jet-
| engi....
| apex_sloth wrote:
| I used to work for a company that wanted zero downtime through
| Erlang's hot code reload feature. While it absolutely works, it
| requires immense effort and extra code to handle state upgrades
| and downgrades.
| modernerd wrote:
| Live updating a drone running Erlang in 10ms while it was flying
| with no application restart and no loss of state impressed me
| when I saw it in 2021:
|
| https://www.youtube.com/watch?v=XQS9SECCp1I
|
| But I almost never hear Erlang/Elixir/Gleam folks talk about this
| benefit of the Erlang VM now, even though it seems fairly unique
| and interesting. Has the community moved away from it? Is it just
| not that useful?
| cess11 wrote:
| A lot of the GenServer-information floating around explains
| code_change/3, no? That's commonly what you want, a way to
| handle state propagation when process code is updating in a
| running system.
|
| Most people are probably running some web services or something
| and might as well shift machines in and out of a cluster or can
| wait for old processes to disband on their own, because the new
| code is backwards compatible with the one in already running
| processes, and so on.
|
| It can also be relatively hard to do without causing damage to
| the system. Those who need and can manage it probably don't
| need it marketed.
| cess11 wrote:
| Someone put a reply and then deleted it while I wrote a
| response, and it irks me that it might have been a waste so
| here's the gist of it:
|
| "Is it just that people are more comfortable with blue-green
| deploys, or are blue-green deploys actually better?"
|
| It depends. If you can do a blue-green shift where you
| gradually add 'fresh' servers/VM:s/processes and drain the
| old, that's likely to be most convenient and robust in many
| organisations. On the other hand, if you rely on long running
| processes in a way where changing their PID:s break the
| system, then you pretty much need to update them with this
| kind of hot patching.
|
| "Does Erlang offer any features to minimize damage here?"
|
| The BEAM allows a lot of things in this area, on pretty much
| every level of abstraction. If you know what you're doing and
| you've designed your system to fit well into the provided
| mechanisms the platform provides a lot of support for hot
| patching without sacrificing robustness and uptime. But it's
| like another layer of possible bugs and risks, it's not just
| your usual network and application logic that might cause a
| failure, your handling of updates might itself be a source of
| catastrophe.
|
| In practice you need to think long and hard about how to
| deploy, and test thoroughly under very production like
| conditions. It helps that you can know for sure what
| production looks like at any given time, the BEAM VM can tell
| you exactly what processes it runs, what the application and
| supervisor trees look like, hardware resource consumption and
| so on. You can use this information to stage fairly realistic
| tests with regards to load and whatnot, so if your update for
| example has an effect on performance and unexpected
| bottlenecks show up you might catch it before it reaches your
| users.
|
| And as anyone can tell you who has updated a profitable, non-
| trivial production system directly, like a lot of PHP devs of
| ye olden times, it takes a rather strong stomach even when it
| works out fine. When it doesn't, you get scars that might
| never fade.
| Muromec wrote:
| This is also a reply to that deleted comment, because I had
| to type it all and also got to go outside and have my
| European 2 hour long lunch break while doing it.
|
| If you have any kind of state in gen_server and the state or
| assumptions of it have changed, you need to write that
| code_change thingy that migrates the state _both_ ways
| between two specific versions. If by some chance this
| function is bugged, then the process is killed (which is
| okay), so you need to nail down the supervision tree to make
| things restartable and also not get into restart loops.
| Remember writing database migrations for django or whatever
| ORM of the day? Now do that, but for memory structures you
| have.
|
| Now, while the function is running it can't be updated of
| course, so you need gen_server to call you back from the
| outside of the module. If you like to save function
| references instead of saving process references in your
| state, you need to figure out which version you will be
| actually calling.
|
| If you change the arity of your record, then the old record
| no longer matches your patterns.
|
| Since updates are not atomic, you will have two versions of
| the code running at the same time, potentially sending
| messages that old/new stuff does not expect, and both old and
| new code should not bug out. And if they do bug out, you have
| been smart enough to figure out how to recover and actually
| test that.
|
| Than there is this thing, if somehow something from the
| version V-2 still running after update to V-1 and you start
| updating to the latest V, then things happen.
|
| You can deal with all that of course and erlang gives you
| tools and recipies to make it work. Sometimes you have to
| make it work, because restarting and losing state is not an
| option. Also it's probably fun to deal with complex things.
|
| Or you could just do do the stupid thing that is good enough
| and let it crash and restart instead of figuring out ten
| different things that could go wrong. Or take a 15 minutes
| maintenance window while your users are all sleeping (yes,
| not everybody is doing critical infra that runs 24/7 like
| discord group with game memes). Or just do blue-green and
| sidestep it all completely.
| thibaut_barrere wrote:
| A lot of web apps are just well-enough served with a blue-green
| deployment model. It is less risky.
|
| But if you really need it, it's really great to have that
| option (e.g. very long running systems which are split in
| front/back etc), and it can be used in creative ways too (like
| the Drone example).
|
| Here is a lightning talk I gave about how to use hot-reload for
| music / MIDI interactions:
| https://www.youtube.com/watch?v=Z8sGQM6kLvo
| modernerd wrote:
| Great talk, thanks, nice to see other creative uses. Great
| idea to add LiveView and SVGs for the keyboard UI.
|
| "...thanks to hot reloading, which -- for once -- is
| useful..."
|
| That seems to sum up the sentiment that hot swapping in
| Erlang has uses but they're generally not aligned with what
| Erlang is typically employed for. It seems like it would be
| great for tight game dev loop feedback and iteration too, for
| example, but that's not a traditional use of Erlang either.
| thibaut_barrere wrote:
| > That seems to sum up the sentiment that hot swapping in
| Erlang has uses but they're generally not aligned with what
| Erlang is typically employed for
|
| Actually, I think it is much more common in original Erlang
| scenarios (including "non-web") where high availability is
| a useful pre-requisite.
|
| It is in my experience less common in Elixir, which is
| often more web-oriented (although not exclusively).
| epiccoleman wrote:
| Extremely cool, thanks for sharing!
| chefandy wrote:
| Huh, really? I feel like I see Elixir folks sing the praises of
| beam pretty regularly. Specifically the OTP supervisor stuff
| for fault-resistant server deployments. I haven't looked
| specifically for that though recently so maybe people are
| taking it for granted?
| melvinroest wrote:
| Is this like a similar feature in Smalltalk/Pharo and Lisp?
| igouy wrote:
| Yes, the basics are there in Smalltalk and there's more support
| built into Erlang.
|
| Also:
|
| "Live program changes in the Dart VM"
|
| https://github.com/dart-lang/sdk/blob/main/docs/Hot-reload.m...
|
| "Live reloading for your ESP32"
|
| https://github.com/toitlang/jaguar
| gregors wrote:
| The Big Elixir 2018 - Desmond Bowe - Hot Upgrade Are Not Scary
|
| https://www.youtube.com/watch?v=IeUF48vSxwI
| epiccoleman wrote:
| I wonder if this kind of thing could be used to make the Elixir
| REPL a bit more LISPy. I like iex a good deal, but I often find
| myself wishing I could just easily eval some code or expression
| in the editor and have it make its way into the REPL context.
| (yes, I know you can `r` on a module, but that's pretty clunky
| compared to something like CIDER).
| dszoboszlay wrote:
| Hot code upgrades on the BEAM are awesome, but they're not a
| piece of cake. If you're also interested in the challenges of
| making them production safe, I gave a talk about this topic on
| CodeBEAM Sto earlier this year:
|
| https://youtu.be/epORYuUKvZ0?si=gkVBgrX2VpBFQAk5
|
| OP talks in the summary about the importance of understanding the
| process. It's very much true, but you need to understand not only
| the process your tooling provides, but also what's going on in
| the background and what hasn't been taken care for you by your
| tools. I'm afraid these things are rarely understood about hot
| upgrades, even by experienced Erlang engineers.
| robocat wrote:
| Great discussion 23 days ago on hot code loading:
|
| https://news.ycombinator.com/item?id=42187761
___________________________________________________________________
(page generated 2024-12-13 23:01 UTC)