[HN Gopher] Benchmark results: Cassandra 4.0, 3.11, Scylla 4.4 [...
       ___________________________________________________________________
        
       Benchmark results: Cassandra 4.0, 3.11, Scylla 4.4 [video]
        
       Author : PeterCorless
       Score  : 62 points
       Date   : 2021-09-27 15:57 UTC (7 hours ago)
        
 (HTM) web link (www.youtube.com)
 (TXT) w3m dump (www.youtube.com)
        
       | kreetx wrote:
       | To a someone not versed in NoSQL: what are Cassandra and Scylla
       | like to use? Would you use it instead of say, Mongo, when you
       | have much more data?
        
         | dagnabbit wrote:
         | Column stores are particularly good for handling large
         | analytics workloads.
         | 
         | As the name suggests, the defining feature of a column store is
         | that instead of storing each row of data sequentially, they
         | store the _column_ sequentially, so it's very very fast to grab
         | a column of data, and quite slow to get a row (the opposite of
         | a typical RDBMS).
        
         | dreyfan wrote:
         | They're for teams that still refuse a proper RDBMS but desire
         | an Apache-level of infrastructure complexity.
        
           | VirusNewbie wrote:
           | You remind me of some ex coworkers who thought everything
           | could be done on RDBMS systems and then when they had to
           | export large amounts of data topped out at 3500 tps.
           | 
           | Meanwhile I'm over here using Scylla and 3 nodes can give me
           | roughly 3.5M TPS for data extraction.
           | 
           | RDBMS are great for certain use cases but they _do not scale_
           | and if you think they do you don 't really know what scale
           | means.
           | 
           | Sure you can cite things like Github using MySql, but i'll
           | also point out they've invested thousands upon thousands of
           | man hours customizing layers on top of MySql to support their
           | sharding which also, oops, had a bug that gave them almost
           | half a day of downtime at one point. This is not a good use
           | of someone's time, GH definitely made the wrong decision.
           | Don't steer other people into making similar mistakes
           | especially given it seems you have a lack of experience on
           | designing systems at scale.
        
             | [deleted]
        
           | zinclozenge wrote:
           | I love this comment, but presumably there's a need for
           | databases that are well suited for write heavy workloads.
           | Also there's no great story for starting out with an RDBMS
           | and then scaling out. Right now if you're with MySQL you have
           | some options to scale out with read replicas, and then
           | eventually replicate to a TiDB cluster. PostgreSQL I guess
           | there's cockroachdb, but it's kind of meh.
        
             | skyde wrote:
             | I believe Cloudflare usage of PostgreSQL is "write heavy"
             | and is a RDBMS that scaled out very well for them.
             | 
             | They are using CitusDB but it's still RDBMS and support
             | most PostgreSQL SQL query without the need to rewrite your
             | query or change your schema.
        
               | doliveira wrote:
               | They moved to Clickhouse recently, though:
               | https://blog.cloudflare.com/http-analytics-
               | for-6m-requests-p...
        
               | [deleted]
        
               | zinclozenge wrote:
               | That's true, I forgot about Citus.
        
               | skyde wrote:
               | In fact Youtube for a long time was running on MySQL with
               | a tiny service called "vitess" on top of it to handle
               | (Sharding/partitioning) automatically.
               | 
               | I dont know if they are still using Vitess or migrated to
               | Google other relational Database "Spanner"
        
               | jeffbee wrote:
               | YouTube switched some Vitess workloads over to Procella.
               | http://www.vldb.org/pvldb/vol12/p2022-chattopadhyay.pdf
               | 
               | In general I've noticed that people outside Google assume
               | that Spanner is much more popular than it really is.
        
           | doliveira wrote:
           | There are other requirements other than low-volume
           | transactional data and CRUDs...
        
             | dreyfan wrote:
             | I'm guessing your usage of a RDBMS hasn't extended beyond
             | an ORM library.
        
               | doliveira wrote:
               | I'm obviously being hyperbolic, not sure about you. Point
               | remains that scaling RDBMSs is not exactly trivial and it
               | does look like most companies eventually give up.
        
               | dreyfan wrote:
               | Eh, there are numerous options today (vitess, citus,
               | clickhouse, aurora, redshift, bigquery, snowflake).
               | Clickhouse outperforms Scylla by a wide margin at a
               | fraction of the cost and complexity.
               | 
               | Any of the above makes your data accessible via standard
               | SQL. You can hire and utilize proficient data analysts,
               | not people writing esoteric queries in whatever flavor of
               | NoSQL, while actually solving business needs.
        
               | doliveira wrote:
               | Not sure what you're talking about then? It seems you're
               | just talking about the syntax used to query (which I
               | agree SQL is a no-brainer) and not really about the "R"
               | in "RDBMS"
        
               | [deleted]
        
               | dreyfan wrote:
               | You haven't made a point yourself, so honestly, I don't
               | fucking know what we're talking about either. You came in
               | with a hyperbolic comment that echoes the stance from the
               | NoSQL community I've seen for the past decade. The one
               | that basically says "We have zero fucking clue how to
               | model our data so we're just going to jam everything into
               | bunch of loosely connected JSON documents and spend the
               | next decade adding basic SQL features from the 70s".
        
               | doliveira wrote:
               | I'm not a native speaker but I dare say the point itself
               | about RDBMS was pretty clear. I don't really feel like
               | continuing this argument either.
        
         | jandrewrogers wrote:
         | Cassandra, and by extension Scylla, are commonly used for
         | workloads you probably wouldn't use MongoDB for at any scale.
         | The primary strength is the ability to reliably scale-out the
         | landing of live data. It supports writing more complex data
         | models than logging systems but is slower and more limited for
         | querying complex data models than e.g. an RDBMS.
         | 
         | Cassandra is commonly used as a sink for complex service
         | telemetry e.g. mobile data from the carrier perspective.
        
         | jjirsa wrote:
         | SQL-like query pattern on top of bigtable-like distribution.
         | 
         | e.g. you're willing to model your reads such that you dont need
         | joins, and your sorting/scanning is based on the idea that you
         | cluster data together, sorted in order you'll read it, into
         | something called a "partition", which is how data is found
         | within very large clusters.
        
         | orthecreedence wrote:
         | I didn't understand Cassandra/Scylla until I realized that it's
         | basically a k/v store that lets you append data to existing
         | keys. If your storage needs fit that domain, then I can vouch
         | for Scylla being amazing to work with.
        
           | fennecfoxen wrote:
           | To put it in slightly more detail, it's a two-level data
           | structure with two keys, a Dict[SortedList[Any]], to use
           | Python type syntax. The top level key is also the sharding
           | key.
        
         | seabrookmx wrote:
         | In a way, yes. Cassandra's clustering is more flexible. But the
         | "price" you pay is that every table has a partition key. This
         | requires more data modeling and thought upfront than your
         | typical "JSON in, JSON out" document store like MongoDB or
         | firebase.
        
           | fennecfoxen wrote:
           | The important thing to understand from this is how it changes
           | the way your application handles reads writes. In a normal
           | SQL database, you'd try hard to normalize your data, write an
           | update once, do joins, and add an index for things you
           | frequently read by, so the query planner can come up with
           | something reasonable. When that reaches its scalability
           | limit, you'll grudgingly consider various denormalization
           | strategies, copying some subset of your data somewhere so
           | that it's faster to query.
           | 
           | If you're using Cassandra and you're also not a fool, it's
           | _probably_ because you actually need a lot of scale, so you
           | 'll generally do the denormalization up front and you will
           | denormalize everything. If you're writing an event, you will
           | write it several places. For example, suppose you're WalMart
           | recording sales. You might write to: store transactions by
           | store/year/day/hour (the "master" record insofar as you have
           | one), user transactions by user/year/month, product purchases
           | by manufacturer/year/day/hour... When you write the
           | transaction, you write to all of these locations.
           | 
           | Each of these "by X" keys is a shard. Each can be located on
           | a different set of ~3 machines (the number is configurable).
           | Querying involves getting a copy of the ring topology,
           | computing which integer shard-ID the key maps to, figuring
           | out which machines in the ring own that integer, and then
           | asking the machine for a whole bucketful of data, which
           | should be a superset of what you're actually looking for. For
           | something like a user's transactions you'll want to have
           | basically everything there at once, so loading the "order
           | history" page for the past month might be a single query that
           | just returns a report: no joins at all, very fast, super
           | scalable. Other lookup strategies might ask for a range
           | within that bucket (the data within the bucket can be ordered
           | by a single key; often this is a timestamp or time-based
           | UUID). Anything that isn't a simple query of a few buckets
           | like this has to be a map-reduce job and will be slow.
           | 
           | All of this is pain. You should generally not invite pain
           | into your organization. However, if pain has already found
           | you, something like this may be the least painful option.
        
             | jjirsa wrote:
             | > then asking the machine for a whole bucketful of data,
             | which should be a superset of what you're actually looking
             | for. For something like a user's transactions you'll want
             | to have basically everything there at once, so loading the
             | "order history" page for the past month might be a single
             | query that just returns a report: no joins at all, very
             | fast, super scalable.
             | 
             | Not really a superset, cassandra (and scylla, and bigtable,
             | all of which are basically copying bigtable's model) each
             | try very hard not to read any extra data at all, and can
             | often return approximately the exact data requested, modulo
             | serialization data which is usually fitting in ~compression
             | chunk size (64k) + checksum.
             | 
             | > If you're using Cassandra and you're also not a fool,
             | it's probably because you actually need a lot of scale
             | 
             | Cassandra also gives you very literally the most control
             | over CAP tradeoffs of any database in the industry.
             | 
             | If you have 100 machines per DC in each of 10 dcs, what
             | happens when one machine is offline? one rack? one dc? 2
             | dcs separated from 8? 6 dcs separated from 4? There's no
             | single answer in cassandra (depends on replication factor,
             | consistency of writes, consistency of reads, all of which
             | are tunable, with 2 of those being tunable PER QUERY), the
             | CAP tradeoffs are yours and yours alone. That flexibility
             | is powerful for power users (it's also confusing for
             | novices, which is unfortunate).
             | 
             | But to your first point, yes, the point is scale. The lack
             | of opinions and deliberate functionality are designed to
             | enable it to scale to thousands of hosts, potentially
             | petabytes of data, trivially accessible in a single SQL-
             | like CQL query, with realistic read latency < 1ms mean/avg
             | and < 5ms p99 for a tuned workload where you know what
             | you're doing. A lot of users will never need a database
             | that can do a million reads per second across a thousand
             | machines reaching p50 1ms on 2 petabytes of data, but
             | Cassandra can do that, and you don't have to build a whole
             | sharding layer on top of mysql/postgres/redis or even
             | install Scylla to get there.
        
       | mixedCase wrote:
       | Threadly reminder that Scylla is licensed under the Affero GPL
       | and depending on your needs may not qualify as a competitor to
       | Apache Cassandra.
        
       | staticassertion wrote:
       | https://www.scylladb.com/2021/08/24/apache-cassandra-4-0-vs-...
       | 
       | I suspect this talk is a follow up to this blog post, which may
       | be preferable for many.
        
       | saberience wrote:
       | Can someone summarise these in a more digestible format than a
       | Youtube video?
        
         | wolf550e wrote:
         | This is the summary: https://youtu.be/hrA-Exd_qI4?t=3221
        
         | staticassertion wrote:
         | https://www.scylladb.com/2021/08/24/apache-cassandra-4-0-vs-...
         | 
         | I suspect this talk is based on this post.
        
         | spullara wrote:
         | ScyllaDB is faster than Cassandra by a lot.
        
           | willvarfar wrote:
           | Haven't watched the video either, so asking perhaps a naive
           | question:
           | 
           | Why is it faster? Is it algorithmic, or some neat trick, or
           | just a much more efficient implementation?
           | 
           | And are they closely equivalent? Would you use one or the
           | other for the same thing, or do they make different CAP
           | promises?
        
             | jandrewrogers wrote:
             | The use of C++ enables a couple architectural optimizations
             | that would be difficult or impractical in Java, aside from
             | C++ generally producing significantly faster runtimes. The
             | important difference is that C++ is a systems language --
             | you could replace C++ with Rust, Zig, etc -- whereas Java
             | is not.
             | 
             | Scylla has a highly optimized low-level I/O path that
             | largely bypasses high-level OS APIs and kernel services
             | that Cassandra and other open source databases tend to use.
             | This will typically generate an integer factor improvement
             | in I/O performance if implemented well and makes a big
             | difference for the kinds of write-heavy workloads Cassandra
             | was built for. It requires taking strict control of low-
             | level memory access and behavior, which (for better and
             | worse) is the default case in C++. This code is
             | intrinsically non-portable.
             | 
             | Additionally, there are some important classes of
             | throughput optimization that are incompatible with garbage
             | collection. In principle you can abuse Java to effect these
             | optimizations but it is _much_ easier to implement these
             | optimizations in languages that don 't have a garbage
             | collector. If absolute performance is the objective, like
             | Scylla, it is easier to do the implementation in a language
             | that won't fight your intent every step of the way.
             | 
             | tl;dr: The performance isn't so much that it is written in
             | C++ but that C++ makes critical optimizations relatively
             | straightforward and economic to implement.
        
               | gunnarmorling wrote:
               | > there are some important classes of throughput
               | optimization that are incompatible with garbage
               | collection
               | 
               | Can you share some more details about these
               | optimizations? I.e. what they are and why GC tends to go
               | against them?
        
               | _benedict wrote:
               | In part it's also simply a demonstration of different
               | priorities. Scylla's USP is performance, so a lot of
               | elbow grease is spent there. The Apache Cassandra
               | community is focused primarily on operating at scale, as
               | that's its USP.
               | 
               | Performance is adequate for Cassandra, so the community
               | has (for several years) primarily focused elsewhere. It
               | will be a priority again in future, but in the meantime
               | with many huge scale users out there the community has
               | focused on guaranteeing correctness and stability at
               | scale. For example, the Harry[1] toolkit for validating
               | huge databases, and an adversarial cluster simulator[2]
               | for exposing distributed and other complex bugs. Also a
               | huge amount of behind-the-scenes work that isn't so easy
               | to call out.
               | 
               | The community is now focusing on expanding the utility of
               | the database for these use cases. For example the
               | recently proposed enhancement to bring state-of-the-art
               | general purpose transactions[3] to Apache Cassandra.
               | 
               | [1] https://github.com/apache/cassandra-harry
               | 
               | [2] https://cwiki.apache.org/confluence/display/CASSANDRA
               | /CEP-10...
               | 
               | [3] https://cwiki.apache.org/confluence/download/attachme
               | nts/188...
               | 
               | [edit] disclaimer: I'm an Apache Cassandra contributor
               | involved with some of the above work.
        
               | PeterCorless wrote:
               | The only clarification I want to make here is that
               | Cassandra is focused on _horizontal_ scalability, which
               | Scylla can match. However, Cassandra isn 't (now, or yet)
               | focused on vertical scalability, such as using it in
               | current instances that have dozens of vCPUs. This was
               | shown in the "4 vs. 40" test of Scylla vs. Cassandra,
               | where 4 boxes of a vertically scaled Scylla (72 vCPU
               | each; 288 vCPUs total) were able to perform the same or
               | better as 40 boxes of Cassandra (with 16 vCPU; 640 vCPU
               | total).
               | 
               | Definitely newer JVMs are improving things such as
               | latency, and there are now some that are NUMA-aware, but
               | in a JVM you are literally straight-jacketed from seeing
               | the raw hardware you are running on. And that will impact
               | to greater or lesser degrees your ability to take
               | advantage of it.
        
             | kreetx wrote:
             | It appears to be a C++ rewrite of Cassandra (written in
             | Java).
        
               | jeffbee wrote:
               | Similar vibe to redpanda vs. kafka: take a reasonable
               | idea and reimplement it so it isn't terrible.
               | https://vectorized.io/blog/fast-and-safe/
        
               | dieters wrote:
               | Yup, and redpanda happens to use Scylla's Seastar
               | framework as well
        
             | hiyer wrote:
             | > Why is it faster? Is it algorithmic, or some neat trick,
             | or just a much more efficient implementation?
             | 
             | Scylla is written in C++ (versus Java for Cassandra) and
             | uses the high-performance Seastar[0] framework.
             | 
             | > And are they closely equivalent? Would you use one or the
             | other for the same thing, or do they make different CAP
             | promises?
             | 
             | Scylla claims to be a drop-in replacement for Cassandra.
             | 
             | 0. http://seastar.io/
        
               | nivertech wrote:
               | BTW, there is a word play here:
               | 
               | seastar -> sea star -> C* -> Cassandra ;)
        
       | jakearmitage wrote:
       | Seastar Framework seems to be the star behind all this. Anyone
       | using that for other purposes?
        
         | jjirsa wrote:
         | Vectorized/RedPanda for Kafka compatible service, at least.
         | 
         | RedPanda seems to be a better starting point to be honest - a
         | lot of the market share for Cassandra doesn't actually need the
         | extra tps of Scylla. Some customers do, but many many many do
         | not.
        
           | enedil wrote:
           | Scylla is not only about throughput. One of the values of
           | Scylla is its reduced latencies or ease of maintenance.
           | Overview of reasons: https://www.scylladb.com/scylla-vs-
           | cassandra
           | 
           | Disclosure: I'm an engineer at ScyllaDB.
        
             | jjirsa wrote:
             | The "reduced latencies" really reduces down to performance,
             | where tps was a first order proxy for performance.
             | 
             | The ease of maintenance, similarly, is sold as easier due
             | to reduced node count, which is perhaps an extension of
             | performance but probably misunderstands (or ignores) that
             | most people running large cassandra clusters have tooling
             | that parallelizes most maintenance anyway, so the reduction
             | in effort is sorta not that important in real life (if
             | anything, having more machines gives you better blast
             | radius behavior, consolidation onto fewer exposes you to
             | larger percentages of loss/failure when there's inevitably
             | a problem with the fewer, larger machines).
             | 
             | The real comparison, though, is missing in that link,
             | because the real comparison is not performance. It's
             | license. Nobody is running AGPL in prod unless they have
             | zero IP worth protecting, so it's ultimately comparing OSS
             | to proprietary.
             | 
             | (And similar disclosure: cassandra committer)
        
         | PeterCorless wrote:
         | As mentioned, Redpanda, which is a Seastar-based Kafka.
         | 
         | Also, Red Hat's Ceph's replacement, "Crimson"
         | 
         | https://docs.ceph.com/en/latest/dev/crimson/crimson/
        
       ___________________________________________________________________
       (page generated 2021-09-27 23:01 UTC)