[HN Gopher] ScyllaDB is Moving to a New Replication Algorithm: T...
___________________________________________________________________
ScyllaDB is Moving to a New Replication Algorithm: Tablets
Author : carpintech
Score : 91 points
Date : 2023-07-11 14:06 UTC (8 hours ago)
(HTM) web link (www.scylladb.com)
(TXT) w3m dump (www.scylladb.com)
| dmagda wrote:
| This is the right move. Sounds similar to YugabyteDB that
| distributes data by splitting tables into tablets. The cluster
| monitors the cluster size (nodes count) and table size (data
| volume) and re-shards data automatically by splitting large
| tablets or adding new ones:
| https://docs.yugabyte.com/preview/architecture/docdb-shardin...
| dangoodmanUT wrote:
| "can i copy your homework" "yeah but change it a bit" (compared
| to the other comment that looks nearly identical, looks lik
| yugabyte sent some people in here)
| jzelinskie wrote:
| This sounds a lot like ranges in CockroachDB. Anyone familiar
| with the deep details to highlight the differences?
| Nican wrote:
| I thought of the same thing, so I am trying to find information
| on the documentation about things that CockroachDB does very
| well:
|
| 1. Consistent backups/transactions. When a backup is made, is
| that a single point in time, or best-effort by individual
| tablet. For example, backing up an inventory and orders table,
| the backup could have an older version of inventory, where
| orders have already been completed for some of them. It looks
| like Scylla backsup per node, so it could mean that data might
| have a slight time offset from one another.
|
| 2. Replicate reads. Like CockroachDB, it looks like Scylla will
| redirect the read to the range lead, but CRDB also provides the
| option to get stale reads from a non-lead. This is usually good
| for cross-region databases where reading off the lead can be
| big increases in latency.
|
| I do not have a lot of time, but I am having a hard time
| finding much information about the architecture on Scylla's
| documentation. My personal guess is that Scylla optimized their
| code for performance, and less worry about data integrity.
| acevedoorafael wrote:
| > My personal guess is that Scylla optimized their code for
| performance, and less worry about data integrity.
|
| Definitely, but they are implementing Raft-based transactions
| which provide higher consistency. That should enable a higher
| variety of use cases.
| collinc777 wrote:
| This sounds like partitions in DynamoDB
| bsdnoob wrote:
| Tablets remind me of vitess
| bbss wrote:
| Very similar to how BigTable[1] works under the hood which was
| built ~20 years ago.
|
| [1]
| https://static.googleusercontent.com/media/research.google.c...
| jeffbee wrote:
| The load shifting part is similar to the way BigTable splits,
| merges, and assigns tablets. But the rest of it is not related,
| because BigTable does not try to offer mutation consistency
| across replicas. If you write to one replica of a BigTable,
| your mutation may be read at some other replica, after an
| undefined delay. Applications that need stronger consistency
| features must layer their own replication scheme atop BigTable
| (such as Megastore).
|
| What this post is describing for replication seems more
| comparable to Spanner.
| axiak wrote:
| I don't understand this comment. Bigtable requires that each
| tablet is only assigned to one tablet server at a time,
| enforced in Chubby. There's no risk of inconsistent reads. Of
| course this means that there can be downtime when a tablet
| server goes down, until a replacement tablet server is ready
| to serve requests.
| jeffbee wrote:
| Right, the contrast I was trying to draw was between what
| they depict, where multiple nodes are holding a replica of
| the tablet and performing synchronous replication between
| themselves, and what BigTable would do, which is to have
| the entire table copied elsewhere, with mutation log
| shipping. What they are doing is more analogous to how
| Spanner would do replication.
| carpintech wrote:
| Moving from Vnode-based replication to tablets to dynamically
| distribute data across the cluster
| magden wrote:
| This is the right move for Scylla. Overall, looks similar to
| YugabyteDB that distirbutes data by sharding tables into tablets
| as well. The cluster monitors the cluster size (number of nodes)
| and the size of each tablet (data volume), and adds new tablets
| or re-shards large ones automatically:https://docs.yugabyte.com/p
| review/architecture/docdb-shardin...
| dangoodmanUT wrote:
| "can i copy your homework" "yeah but change it a bit" (compared
| to the other comment that looks nearly identical, looks lik
| yugabyte sent some people in here)
| geenat wrote:
| Would be nice if the deployment story became a bit more like
| CockroachDB too.
| eatonphil wrote:
| How so? Mind saying more?
| aeyes wrote:
| I suppose that they are saying that CockroachDB is a single
| binary which you just drop on a machine and you are good to
| go. For ScyllaDB you need to install Java, Python and several
| ScyllaDB related packages.
| hobofan wrote:
| Java? I thought the whole raison d'etre for ScyllaDB was
| "Cassandra without Java"?
| taywrobel wrote:
| The server implementation is, but administering it still
| requires the Java based Cassandra tooling like nodetool
| and cqlsh
| heipei wrote:
| cqlsh is written in Python. Which doesn't mean it's less
| of a pain in the ass ;)
| taywrobel wrote:
| Sorry, that was phrased poorly; was in reference to the
| parent comment's "For ScyllaDB you need to install Java,
| Python and several ScyllaDB related packages".
|
| Just meant to say it does have tooling which requires
| other languages/environment specifics.
| tracker1 wrote:
| I think you may be confusing ScyllaDB with Cassandra.
| heipei wrote:
| The Docker images for ScyllaDB work perfectly fine and ship
| with all administrative tools included.
___________________________________________________________________
(page generated 2023-07-11 23:01 UTC)