[HN Gopher] I Use Erlang Hot Code Updates
___________________________________________________________________
I Use Erlang Hot Code Updates
Author : lawik
Score : 70 points
Date : 2024-11-19 20:29 UTC (2 hours ago)
(HTM) web link (underjord.io)
(TXT) w3m dump (underjord.io)
| toast0 wrote:
| > Both have described hot code updates as something that people
| should learn and use. I imagine Whatsapp's initial engineering
| crew would agree. They did pretty well.
|
| Yeah. Hot loading is clearly better than anything else when
| you've got a million clients connected and you want to make a
| code change. Of course, we didn't have any of these fancy
| 'release' tools, we just used GNU Make to rsync the code to prod
| and run erlc. Then you can grab a debug shell and l(module). (we
| did write utilities to see what code was modified, and to provide
| the right incantations so we wouldn't load if it would kill
| processes)
| rybosome wrote:
| > Hot loading is clearly better than anything else when you've
| got a million clients connected and you want to make a code
| change.
|
| In the contexts in which I've worked, this was solved by
| issuing a command to the server to enter a lame-duck mode and
| stop accepting new connections, then restarting the process
| with updated code after all existing connections ended.
|
| This worked in our case because connections had a TTL with a
| "reasonable" time, couldn't have been more than an hour. We
| could always wait it out.
|
| I suppose hot reloading is more necessary when you have
| connections without a set TTL.
| arnon wrote:
| A few years ago, the biggest problem with Erlang's hot code
| updates was getting the files updated on all of the nodes. Has
| this been solved or improved in any way?
| comboy wrote:
| I don't think updating files is the problem. The biggest issue
| with hot code updates seems to be that they can create states
| that cannot be replicated in either release on its own.
| ketralnis wrote:
| This is my experience. About 25% of the time I'd encounter a
| bug that's impossible to reproduce without both versions of
| the code in memory, and end up restarting the node anyway
| dropping requests in the process. Whereas if I'd have
| architectured around not having hot code updates I could
| built it in a way that never has to drop requests
| whorleater wrote:
| WhatsApp very long ago used to hot reload across all nodes with a
| ssh script to incrementally deploy during the day
| rozap wrote:
| I used to work on a pretty big elixir project that had many
| clients with long lived connections that ran jobs that weren't
| easily resumable. Our company had a language agnostic deployment
| strategy based on docker, etc which meant we couldn't do hot code
| updates even though they would have saved our customers some
| headache.
|
| Honestly I wish we had had the ability to do both. Sometimes a
| change is so tricky that the argument that "hot code updates are
| complicated and it'll cause more issues than it will solve" is
| very true, and maybe a deploy that forces everyone to reconnect
| is best for that sort of change. But often times we'd deploy some
| mundane thing where you don't have to worry about upgrading state
| in a running gen server or whatever, and it'd be nice to have
| minimal impact.
|
| Obviously that's even more complexity piled onto the system, but
| every time I pushed some minor change and caused a retry that (in
| a perfect world at least...) didn't _need_ to retry, I winced a
| bit.
___________________________________________________________________
(page generated 2024-11-19 23:00 UTC)