hngopher.com

       [HN Gopher] I Use Erlang Hot Code Updates
       ___________________________________________________________________
        
       I Use Erlang Hot Code Updates
        
       Author : lawik
       Score  : 70 points
       Date   : 2024-11-19 20:29 UTC (2 hours ago)
        
 (HTM) web link (underjord.io)
 (TXT) w3m dump (underjord.io)
        
       | toast0 wrote:
       | > Both have described hot code updates as something that people
       | should learn and use. I imagine Whatsapp's initial engineering
       | crew would agree. They did pretty well.
       | 
       | Yeah. Hot loading is clearly better than anything else when
       | you've got a million clients connected and you want to make a
       | code change. Of course, we didn't have any of these fancy
       | 'release' tools, we just used GNU Make to rsync the code to prod
       | and run erlc. Then you can grab a debug shell and l(module). (we
       | did write utilities to see what code was modified, and to provide
       | the right incantations so we wouldn't load if it would kill
       | processes)
        
         | rybosome wrote:
         | > Hot loading is clearly better than anything else when you've
         | got a million clients connected and you want to make a code
         | change.
         | 
         | In the contexts in which I've worked, this was solved by
         | issuing a command to the server to enter a lame-duck mode and
         | stop accepting new connections, then restarting the process
         | with updated code after all existing connections ended.
         | 
         | This worked in our case because connections had a TTL with a
         | "reasonable" time, couldn't have been more than an hour. We
         | could always wait it out.
         | 
         | I suppose hot reloading is more necessary when you have
         | connections without a set TTL.
        
       | arnon wrote:
       | A few years ago, the biggest problem with Erlang's hot code
       | updates was getting the files updated on all of the nodes. Has
       | this been solved or improved in any way?
        
         | comboy wrote:
         | I don't think updating files is the problem. The biggest issue
         | with hot code updates seems to be that they can create states
         | that cannot be replicated in either release on its own.
        
           | ketralnis wrote:
           | This is my experience. About 25% of the time I'd encounter a
           | bug that's impossible to reproduce without both versions of
           | the code in memory, and end up restarting the node anyway
           | dropping requests in the process. Whereas if I'd have
           | architectured around not having hot code updates I could
           | built it in a way that never has to drop requests
        
       | whorleater wrote:
       | WhatsApp very long ago used to hot reload across all nodes with a
       | ssh script to incrementally deploy during the day
        
       | rozap wrote:
       | I used to work on a pretty big elixir project that had many
       | clients with long lived connections that ran jobs that weren't
       | easily resumable. Our company had a language agnostic deployment
       | strategy based on docker, etc which meant we couldn't do hot code
       | updates even though they would have saved our customers some
       | headache.
       | 
       | Honestly I wish we had had the ability to do both. Sometimes a
       | change is so tricky that the argument that "hot code updates are
       | complicated and it'll cause more issues than it will solve" is
       | very true, and maybe a deploy that forces everyone to reconnect
       | is best for that sort of change. But often times we'd deploy some
       | mundane thing where you don't have to worry about upgrading state
       | in a running gen server or whatever, and it'd be nice to have
       | minimal impact.
       | 
       | Obviously that's even more complexity piled onto the system, but
       | every time I pushed some minor change and caused a retry that (in
       | a perfect world at least...) didn't _need_ to retry, I winced a
       | bit.
        
       ___________________________________________________________________
       (page generated 2024-11-19 23:00 UTC)