[HN Gopher] Scaling Replicated State Machines with Compartmental...
       ___________________________________________________________________
        
       Scaling Replicated State Machines with Compartmentalization [pdf]
        
       Author : luu
       Score  : 19 points
       Date   : 2021-01-12 07:49 UTC (15 hours ago)
        
 (HTM) web link (mwhittaker.github.io)
 (TXT) w3m dump (mwhittaker.github.io)
        
       | electricshampo1 wrote:
       | I read through this and it is unclear what the point of this is.
       | Most real world RSM deployments (Spanner, DynamoDB, etc) are
       | composed of many separate RSMs one for each shard/partition. The
       | requisite performance is readily provided by the aggregate
       | performance of these essentially independent RSMs. Don't see any
       | real situations where adding additional hardware to make a single
       | RSM faster makes sense. Any ideas?
        
         | mjb wrote:
         | Sharding is a great way to scale, but comes with costs. One
         | cost is that you need a separate transaction layer "on top of"
         | the shards if you want to offer transactions. If you want to
         | offer strong isolation (such as snapshot or serializability),
         | good performance, and fault tolerance, that transaction layer
         | can be quite complex. Scaling a single RSM is much cleaner
         | conceptually, and (depending on the model) can offer very
         | strong semantics (strict serializability would be typical).
         | Having a single system image (rather than trying to simulate
         | one through distributed transaction protocols) also removes
         | complexity in other areas, such as making it easier to do large
         | read queries MVCC-style.
         | 
         | I don't think that being able to scale an RSM up makes anything
         | possible that wasn't possible before, but it is convenient in
         | all kinds of ways.
        
           | electricshampo1 wrote:
           | I agree with basically everything you said. However, this
           | would only apply in relatively narrow cases where your
           | workload can be supported by a single RSM. There is an
           | absolute limit for the sequencing alone from a single leader,
           | even after you apply the tricks mentioned in the paper. Geo
           | optimizations (US users data stored in US, EU users data
           | stored in EU, etc) for latency while maintaining a single
           | database view also become challenging (impossible?) without
           | sharding. Fault tolerance is also slightly improved with many
           | shards because you expose a smaller fraction of your database
           | to transient downtime (re-election, catchup) or slowness when
           | the node containing the leader fails or becomes slow.
        
         | hodgesrm wrote:
         | Same here. One of the main benefits of sharding is that it
         | creates independent scopes for transactions of all kinds, not
         | just those managed through Paxos. It's a natural way to
         | parallelize everything from transaction processing to backup.
        
       ___________________________________________________________________
       (page generated 2021-01-12 23:01 UTC)