[HN Gopher] Patterns for Building Realtime Features
       ___________________________________________________________________
        
       Patterns for Building Realtime Features
        
       Author : zknill
       Score  : 43 points
       Date   : 2025-02-10 19:42 UTC (3 hours ago)
        
 (HTM) web link (zknill.io)
 (TXT) w3m dump (zknill.io)
        
       | martinsnow wrote:
       | How do you handle deployments of realtime back ends which needs
       | state in memory?
        
         | jakewins wrote:
         | In general you do it by doing a failover behind some variation
         | of a reverse proxy.
         | 
         | If you can start new instances quickly and clients can handle
         | short delays you can do it by just stopping the old deployment
         | and starting the new one, booting off of the snapshotted state
         | from the prior deployment.
         | 
         | If you need "instant" you do it by implementing some form of
         | catchup and then fail over.
         | 
         | It is a lot easier to do this if you have a dedicated component
         | that "does" the failover, rather than having the old and new
         | deployments try to solve it bilaterally. Could just be a script
         | ran by a human, or something like a k8s operator if you do this
         | a lot
        
         | cess11 wrote:
         | On BEAM/OTP you can control how state is handled at code
         | updates. Finicky but you can.
         | 
         | In most other contexts you'd externalise state to a data store
         | like Redis or RDBMS, and spawn one, kill one or do blue-green
         | in the nebula behind your load balancer constellation.
        
       | blixt wrote:
       | What I found building multiplayer editors at scale is that it's
       | very easy to very quickly overcomplicate this. For example, once
       | you get into pub/sub territory, you have a very complex
       | infrastructure to manage, and if you're a smaller team this can
       | slow down your product development a lot.
       | 
       | What I found to work is:
       | 
       | Keep the data you wish multiplayer to operate on atomic. Don't
       | split it out into multiple parallel data blobs that you sometimes
       | want to keep in sync (e.g. if you are doing a multiplayer drawing
       | app that has commenting support, keep comments inline with the
       | drawings, don't add a separate data store). This does increase
       | the size of the blob you have to send to users, but it
       | dramatically decreases complexity. Especially once you inevitably
       | want versioning support.
       | 
       | Start with a simple protocol for updates. This won't be possible
       | for every type of product, but surprisingly often you can do just
       | fine with a JSON patching protocol where each operation patches
       | properties on a giant object which is the atomic data you operate
       | on. There are exceptions to this such as text, where something
       | like CRDTs will help you, but I'd try to avoid the temptation to
       | make your entire data structure a CRDT even though it's
       | theoretically great because this comes with additional complexity
       | and performance cost in practice.
       | 
       | You will inevitably need to deal with getting all clients to
       | agree on the order in which operations are applied. CRDTs solve
       | this perfectly, but again have a high cost. You might actually
       | have an easier time letting a central server increment a number
       | and making sure all clients re-apply all their updates that
       | didn't get assigned the number they expected from the server.
       | Your mileage may vary here.
       | 
       | On that note, just going for a central server instead of trying
       | to go fully distributed is probably the most maintainable way for
       | you to work. This makes it easier to add on things like
       | permissions and honestly most products will end up with a central
       | authority. If you're doing something that is actually local-
       | first, then ignore me.
       | 
       | I found it very useful to deal with large JSON blobs next to a
       | "transaction log", i.e. a list of all operations in the order the
       | server received them (again, I'm assuming a central authority
       | here). Save lines to this log immediately so that if the server
       | crashes you can recover most of the data. This also lets you
       | avoid rebuilding the large JSON blob on the server too often (but
       | clients will need to be able to handle JSON blob + pending
       | updates list on connect, though this follows naturally since
       | other clients may be sending updates while they connect).
       | 
       | The trickiest part is choosing a simple server-side
       | infrastructure. Honestly, if you're not a big company, a single
       | fat server is going to get you very far for a long time. I've
       | asked a lot of people about this, and I've heard many
       | alternatives that are cloud scale, but they have downsides I
       | personally don't like from a product experience perspective
       | (harder to implement features, latency/throughput issues,
       | possibility of data loss, etc.) Durable Objects from Cloudflare
       | do give you the best from both worlds, you get perfect sharding
       | on a per-object (project / whatever unit your users work on)
       | basis.
       | 
       | Anyway, that's my braindump on the subject. The TLDR is: keep it
       | as simple as you can. There are a lot of ways to overcomplicate
       | this. And of course some may claim I am the one overcomplicating
       | things, but I'd love to hear more alternatives that work well at
       | a startup scale.
        
         | jvanderbot wrote:
         | I've worked in robotics for a long time. In robotics nowadays
         | you always end up with a distributed system, where each robot
         | has to have a view of the world, it's mission, etc, and also of
         | each other robot, and also the command and control dashboards
         | do too, etc etc.
         | 
         | Always always always follow parent's advice. Pick one canonical
         | owner for the data, and have everyone query it. Build an
         | estimator at each node that can predict what the robot is doing
         | when you don't have timely data (usually just running a shadow
         | copy of the robot's software), but try to never ever do
         | distributed state.
         | 
         | Even something as simple as a map gets arbitrarily complicated
         | when you're sensing multiple locations. Just push everyone's
         | guesses to a central location and periodically batch update and
         | disseminate updates. You'll be much happier.
        
           | mikhmha wrote:
           | Wow, this sounds like how the AI simulation for my
           | multiplayer game works. Each AI agent has a view of the world
           | and can make local steering decisions to avoid other agents
           | and self preservation. Agents carry out low-level goals that
           | are given to them by squad leaders. A squad leader receives
           | high level "world" objectives from a commander. High-level
           | objectives are broken down into low level objectives
           | distributed among squad units based on their attributes and
           | preferences.
        
       | recroad wrote:
       | It's amazing how much boilerplate stuff you don't have to worry
       | about when you use Phoenix LiveView. I think I'm in love with it.
        
         | pawelduda wrote:
         | Exactly! I was halfway through the article and thought how
         | LiveView is basically equivalent of the "push ops" pattern
         | described but beautifully abstracted away and comes for free,
         | while you (mostly) write dynamic HTML markup. Magic!
        
         | mervz wrote:
         | I have not enjoyed a language and framework like I'm enjoying
         | Elixir and Phoenix! It has become my stack for just about
         | everything.
        
       | jtwaleson wrote:
       | I'm building a simple version with horizontally scalable app
       | servers that each use LISTEN/NOTIFY on the database. The article
       | says this will lead to problems and you'll need PubSub services,
       | but I was hoping LISTEN/NOTIFY would easily scale to hundreds of
       | concurrent users. Please let me know if that won't work ;)
       | 
       | Some context: The use case is a digital whiteboard like Miro and
       | the heaviest realtime functionality will be tracking all of the
       | pointers of all the users updating 5x per second. I'm not
       | expecting thousands/millions of users as I'm planning on running
       | each instance of the software on-prem.
        
       ___________________________________________________________________
       (page generated 2025-02-10 23:00 UTC)