[HN Gopher] ScyllaDB is Moving to a New Replication Algorithm: T...
       ___________________________________________________________________
        
       ScyllaDB is Moving to a New Replication Algorithm: Tablets
        
       Author : carpintech
       Score  : 91 points
       Date   : 2023-07-11 14:06 UTC (8 hours ago)
        
 (HTM) web link (www.scylladb.com)
 (TXT) w3m dump (www.scylladb.com)
        
       | dmagda wrote:
       | This is the right move. Sounds similar to YugabyteDB that
       | distributes data by splitting tables into tablets. The cluster
       | monitors the cluster size (nodes count) and table size (data
       | volume) and re-shards data automatically by splitting large
       | tablets or adding new ones:
       | https://docs.yugabyte.com/preview/architecture/docdb-shardin...
        
         | dangoodmanUT wrote:
         | "can i copy your homework" "yeah but change it a bit" (compared
         | to the other comment that looks nearly identical, looks lik
         | yugabyte sent some people in here)
        
       | jzelinskie wrote:
       | This sounds a lot like ranges in CockroachDB. Anyone familiar
       | with the deep details to highlight the differences?
        
         | Nican wrote:
         | I thought of the same thing, so I am trying to find information
         | on the documentation about things that CockroachDB does very
         | well:
         | 
         | 1. Consistent backups/transactions. When a backup is made, is
         | that a single point in time, or best-effort by individual
         | tablet. For example, backing up an inventory and orders table,
         | the backup could have an older version of inventory, where
         | orders have already been completed for some of them. It looks
         | like Scylla backsup per node, so it could mean that data might
         | have a slight time offset from one another.
         | 
         | 2. Replicate reads. Like CockroachDB, it looks like Scylla will
         | redirect the read to the range lead, but CRDB also provides the
         | option to get stale reads from a non-lead. This is usually good
         | for cross-region databases where reading off the lead can be
         | big increases in latency.
         | 
         | I do not have a lot of time, but I am having a hard time
         | finding much information about the architecture on Scylla's
         | documentation. My personal guess is that Scylla optimized their
         | code for performance, and less worry about data integrity.
        
           | acevedoorafael wrote:
           | > My personal guess is that Scylla optimized their code for
           | performance, and less worry about data integrity.
           | 
           | Definitely, but they are implementing Raft-based transactions
           | which provide higher consistency. That should enable a higher
           | variety of use cases.
        
       | collinc777 wrote:
       | This sounds like partitions in DynamoDB
        
       | bsdnoob wrote:
       | Tablets remind me of vitess
        
       | bbss wrote:
       | Very similar to how BigTable[1] works under the hood which was
       | built ~20 years ago.
       | 
       | [1]
       | https://static.googleusercontent.com/media/research.google.c...
        
         | jeffbee wrote:
         | The load shifting part is similar to the way BigTable splits,
         | merges, and assigns tablets. But the rest of it is not related,
         | because BigTable does not try to offer mutation consistency
         | across replicas. If you write to one replica of a BigTable,
         | your mutation may be read at some other replica, after an
         | undefined delay. Applications that need stronger consistency
         | features must layer their own replication scheme atop BigTable
         | (such as Megastore).
         | 
         | What this post is describing for replication seems more
         | comparable to Spanner.
        
           | axiak wrote:
           | I don't understand this comment. Bigtable requires that each
           | tablet is only assigned to one tablet server at a time,
           | enforced in Chubby. There's no risk of inconsistent reads. Of
           | course this means that there can be downtime when a tablet
           | server goes down, until a replacement tablet server is ready
           | to serve requests.
        
             | jeffbee wrote:
             | Right, the contrast I was trying to draw was between what
             | they depict, where multiple nodes are holding a replica of
             | the tablet and performing synchronous replication between
             | themselves, and what BigTable would do, which is to have
             | the entire table copied elsewhere, with mutation log
             | shipping. What they are doing is more analogous to how
             | Spanner would do replication.
        
       | carpintech wrote:
       | Moving from Vnode-based replication to tablets to dynamically
       | distribute data across the cluster
        
       | magden wrote:
       | This is the right move for Scylla. Overall, looks similar to
       | YugabyteDB that distirbutes data by sharding tables into tablets
       | as well. The cluster monitors the cluster size (number of nodes)
       | and the size of each tablet (data volume), and adds new tablets
       | or re-shards large ones automatically:https://docs.yugabyte.com/p
       | review/architecture/docdb-shardin...
        
         | dangoodmanUT wrote:
         | "can i copy your homework" "yeah but change it a bit" (compared
         | to the other comment that looks nearly identical, looks lik
         | yugabyte sent some people in here)
        
       | geenat wrote:
       | Would be nice if the deployment story became a bit more like
       | CockroachDB too.
        
         | eatonphil wrote:
         | How so? Mind saying more?
        
           | aeyes wrote:
           | I suppose that they are saying that CockroachDB is a single
           | binary which you just drop on a machine and you are good to
           | go. For ScyllaDB you need to install Java, Python and several
           | ScyllaDB related packages.
        
             | hobofan wrote:
             | Java? I thought the whole raison d'etre for ScyllaDB was
             | "Cassandra without Java"?
        
               | taywrobel wrote:
               | The server implementation is, but administering it still
               | requires the Java based Cassandra tooling like nodetool
               | and cqlsh
        
               | heipei wrote:
               | cqlsh is written in Python. Which doesn't mean it's less
               | of a pain in the ass ;)
        
               | taywrobel wrote:
               | Sorry, that was phrased poorly; was in reference to the
               | parent comment's "For ScyllaDB you need to install Java,
               | Python and several ScyllaDB related packages".
               | 
               | Just meant to say it does have tooling which requires
               | other languages/environment specifics.
        
             | tracker1 wrote:
             | I think you may be confusing ScyllaDB with Cassandra.
        
             | heipei wrote:
             | The Docker images for ScyllaDB work perfectly fine and ship
             | with all administrative tools included.
        
       ___________________________________________________________________
       (page generated 2023-07-11 23:01 UTC)