[HN Gopher] Benchmark results: Cassandra 4.0, 3.11, Scylla 4.4 [...
___________________________________________________________________
Benchmark results: Cassandra 4.0, 3.11, Scylla 4.4 [video]
Author : PeterCorless
Score : 62 points
Date : 2021-09-27 15:57 UTC (7 hours ago)
(HTM) web link (www.youtube.com)
(TXT) w3m dump (www.youtube.com)
| kreetx wrote:
| To a someone not versed in NoSQL: what are Cassandra and Scylla
| like to use? Would you use it instead of say, Mongo, when you
| have much more data?
| dagnabbit wrote:
| Column stores are particularly good for handling large
| analytics workloads.
|
| As the name suggests, the defining feature of a column store is
| that instead of storing each row of data sequentially, they
| store the _column_ sequentially, so it's very very fast to grab
| a column of data, and quite slow to get a row (the opposite of
| a typical RDBMS).
| dreyfan wrote:
| They're for teams that still refuse a proper RDBMS but desire
| an Apache-level of infrastructure complexity.
| VirusNewbie wrote:
| You remind me of some ex coworkers who thought everything
| could be done on RDBMS systems and then when they had to
| export large amounts of data topped out at 3500 tps.
|
| Meanwhile I'm over here using Scylla and 3 nodes can give me
| roughly 3.5M TPS for data extraction.
|
| RDBMS are great for certain use cases but they _do not scale_
| and if you think they do you don 't really know what scale
| means.
|
| Sure you can cite things like Github using MySql, but i'll
| also point out they've invested thousands upon thousands of
| man hours customizing layers on top of MySql to support their
| sharding which also, oops, had a bug that gave them almost
| half a day of downtime at one point. This is not a good use
| of someone's time, GH definitely made the wrong decision.
| Don't steer other people into making similar mistakes
| especially given it seems you have a lack of experience on
| designing systems at scale.
| [deleted]
| zinclozenge wrote:
| I love this comment, but presumably there's a need for
| databases that are well suited for write heavy workloads.
| Also there's no great story for starting out with an RDBMS
| and then scaling out. Right now if you're with MySQL you have
| some options to scale out with read replicas, and then
| eventually replicate to a TiDB cluster. PostgreSQL I guess
| there's cockroachdb, but it's kind of meh.
| skyde wrote:
| I believe Cloudflare usage of PostgreSQL is "write heavy"
| and is a RDBMS that scaled out very well for them.
|
| They are using CitusDB but it's still RDBMS and support
| most PostgreSQL SQL query without the need to rewrite your
| query or change your schema.
| doliveira wrote:
| They moved to Clickhouse recently, though:
| https://blog.cloudflare.com/http-analytics-
| for-6m-requests-p...
| [deleted]
| zinclozenge wrote:
| That's true, I forgot about Citus.
| skyde wrote:
| In fact Youtube for a long time was running on MySQL with
| a tiny service called "vitess" on top of it to handle
| (Sharding/partitioning) automatically.
|
| I dont know if they are still using Vitess or migrated to
| Google other relational Database "Spanner"
| jeffbee wrote:
| YouTube switched some Vitess workloads over to Procella.
| http://www.vldb.org/pvldb/vol12/p2022-chattopadhyay.pdf
|
| In general I've noticed that people outside Google assume
| that Spanner is much more popular than it really is.
| doliveira wrote:
| There are other requirements other than low-volume
| transactional data and CRUDs...
| dreyfan wrote:
| I'm guessing your usage of a RDBMS hasn't extended beyond
| an ORM library.
| doliveira wrote:
| I'm obviously being hyperbolic, not sure about you. Point
| remains that scaling RDBMSs is not exactly trivial and it
| does look like most companies eventually give up.
| dreyfan wrote:
| Eh, there are numerous options today (vitess, citus,
| clickhouse, aurora, redshift, bigquery, snowflake).
| Clickhouse outperforms Scylla by a wide margin at a
| fraction of the cost and complexity.
|
| Any of the above makes your data accessible via standard
| SQL. You can hire and utilize proficient data analysts,
| not people writing esoteric queries in whatever flavor of
| NoSQL, while actually solving business needs.
| doliveira wrote:
| Not sure what you're talking about then? It seems you're
| just talking about the syntax used to query (which I
| agree SQL is a no-brainer) and not really about the "R"
| in "RDBMS"
| [deleted]
| dreyfan wrote:
| You haven't made a point yourself, so honestly, I don't
| fucking know what we're talking about either. You came in
| with a hyperbolic comment that echoes the stance from the
| NoSQL community I've seen for the past decade. The one
| that basically says "We have zero fucking clue how to
| model our data so we're just going to jam everything into
| bunch of loosely connected JSON documents and spend the
| next decade adding basic SQL features from the 70s".
| doliveira wrote:
| I'm not a native speaker but I dare say the point itself
| about RDBMS was pretty clear. I don't really feel like
| continuing this argument either.
| jandrewrogers wrote:
| Cassandra, and by extension Scylla, are commonly used for
| workloads you probably wouldn't use MongoDB for at any scale.
| The primary strength is the ability to reliably scale-out the
| landing of live data. It supports writing more complex data
| models than logging systems but is slower and more limited for
| querying complex data models than e.g. an RDBMS.
|
| Cassandra is commonly used as a sink for complex service
| telemetry e.g. mobile data from the carrier perspective.
| jjirsa wrote:
| SQL-like query pattern on top of bigtable-like distribution.
|
| e.g. you're willing to model your reads such that you dont need
| joins, and your sorting/scanning is based on the idea that you
| cluster data together, sorted in order you'll read it, into
| something called a "partition", which is how data is found
| within very large clusters.
| orthecreedence wrote:
| I didn't understand Cassandra/Scylla until I realized that it's
| basically a k/v store that lets you append data to existing
| keys. If your storage needs fit that domain, then I can vouch
| for Scylla being amazing to work with.
| fennecfoxen wrote:
| To put it in slightly more detail, it's a two-level data
| structure with two keys, a Dict[SortedList[Any]], to use
| Python type syntax. The top level key is also the sharding
| key.
| seabrookmx wrote:
| In a way, yes. Cassandra's clustering is more flexible. But the
| "price" you pay is that every table has a partition key. This
| requires more data modeling and thought upfront than your
| typical "JSON in, JSON out" document store like MongoDB or
| firebase.
| fennecfoxen wrote:
| The important thing to understand from this is how it changes
| the way your application handles reads writes. In a normal
| SQL database, you'd try hard to normalize your data, write an
| update once, do joins, and add an index for things you
| frequently read by, so the query planner can come up with
| something reasonable. When that reaches its scalability
| limit, you'll grudgingly consider various denormalization
| strategies, copying some subset of your data somewhere so
| that it's faster to query.
|
| If you're using Cassandra and you're also not a fool, it's
| _probably_ because you actually need a lot of scale, so you
| 'll generally do the denormalization up front and you will
| denormalize everything. If you're writing an event, you will
| write it several places. For example, suppose you're WalMart
| recording sales. You might write to: store transactions by
| store/year/day/hour (the "master" record insofar as you have
| one), user transactions by user/year/month, product purchases
| by manufacturer/year/day/hour... When you write the
| transaction, you write to all of these locations.
|
| Each of these "by X" keys is a shard. Each can be located on
| a different set of ~3 machines (the number is configurable).
| Querying involves getting a copy of the ring topology,
| computing which integer shard-ID the key maps to, figuring
| out which machines in the ring own that integer, and then
| asking the machine for a whole bucketful of data, which
| should be a superset of what you're actually looking for. For
| something like a user's transactions you'll want to have
| basically everything there at once, so loading the "order
| history" page for the past month might be a single query that
| just returns a report: no joins at all, very fast, super
| scalable. Other lookup strategies might ask for a range
| within that bucket (the data within the bucket can be ordered
| by a single key; often this is a timestamp or time-based
| UUID). Anything that isn't a simple query of a few buckets
| like this has to be a map-reduce job and will be slow.
|
| All of this is pain. You should generally not invite pain
| into your organization. However, if pain has already found
| you, something like this may be the least painful option.
| jjirsa wrote:
| > then asking the machine for a whole bucketful of data,
| which should be a superset of what you're actually looking
| for. For something like a user's transactions you'll want
| to have basically everything there at once, so loading the
| "order history" page for the past month might be a single
| query that just returns a report: no joins at all, very
| fast, super scalable.
|
| Not really a superset, cassandra (and scylla, and bigtable,
| all of which are basically copying bigtable's model) each
| try very hard not to read any extra data at all, and can
| often return approximately the exact data requested, modulo
| serialization data which is usually fitting in ~compression
| chunk size (64k) + checksum.
|
| > If you're using Cassandra and you're also not a fool,
| it's probably because you actually need a lot of scale
|
| Cassandra also gives you very literally the most control
| over CAP tradeoffs of any database in the industry.
|
| If you have 100 machines per DC in each of 10 dcs, what
| happens when one machine is offline? one rack? one dc? 2
| dcs separated from 8? 6 dcs separated from 4? There's no
| single answer in cassandra (depends on replication factor,
| consistency of writes, consistency of reads, all of which
| are tunable, with 2 of those being tunable PER QUERY), the
| CAP tradeoffs are yours and yours alone. That flexibility
| is powerful for power users (it's also confusing for
| novices, which is unfortunate).
|
| But to your first point, yes, the point is scale. The lack
| of opinions and deliberate functionality are designed to
| enable it to scale to thousands of hosts, potentially
| petabytes of data, trivially accessible in a single SQL-
| like CQL query, with realistic read latency < 1ms mean/avg
| and < 5ms p99 for a tuned workload where you know what
| you're doing. A lot of users will never need a database
| that can do a million reads per second across a thousand
| machines reaching p50 1ms on 2 petabytes of data, but
| Cassandra can do that, and you don't have to build a whole
| sharding layer on top of mysql/postgres/redis or even
| install Scylla to get there.
| mixedCase wrote:
| Threadly reminder that Scylla is licensed under the Affero GPL
| and depending on your needs may not qualify as a competitor to
| Apache Cassandra.
| staticassertion wrote:
| https://www.scylladb.com/2021/08/24/apache-cassandra-4-0-vs-...
|
| I suspect this talk is a follow up to this blog post, which may
| be preferable for many.
| saberience wrote:
| Can someone summarise these in a more digestible format than a
| Youtube video?
| wolf550e wrote:
| This is the summary: https://youtu.be/hrA-Exd_qI4?t=3221
| staticassertion wrote:
| https://www.scylladb.com/2021/08/24/apache-cassandra-4-0-vs-...
|
| I suspect this talk is based on this post.
| spullara wrote:
| ScyllaDB is faster than Cassandra by a lot.
| willvarfar wrote:
| Haven't watched the video either, so asking perhaps a naive
| question:
|
| Why is it faster? Is it algorithmic, or some neat trick, or
| just a much more efficient implementation?
|
| And are they closely equivalent? Would you use one or the
| other for the same thing, or do they make different CAP
| promises?
| jandrewrogers wrote:
| The use of C++ enables a couple architectural optimizations
| that would be difficult or impractical in Java, aside from
| C++ generally producing significantly faster runtimes. The
| important difference is that C++ is a systems language --
| you could replace C++ with Rust, Zig, etc -- whereas Java
| is not.
|
| Scylla has a highly optimized low-level I/O path that
| largely bypasses high-level OS APIs and kernel services
| that Cassandra and other open source databases tend to use.
| This will typically generate an integer factor improvement
| in I/O performance if implemented well and makes a big
| difference for the kinds of write-heavy workloads Cassandra
| was built for. It requires taking strict control of low-
| level memory access and behavior, which (for better and
| worse) is the default case in C++. This code is
| intrinsically non-portable.
|
| Additionally, there are some important classes of
| throughput optimization that are incompatible with garbage
| collection. In principle you can abuse Java to effect these
| optimizations but it is _much_ easier to implement these
| optimizations in languages that don 't have a garbage
| collector. If absolute performance is the objective, like
| Scylla, it is easier to do the implementation in a language
| that won't fight your intent every step of the way.
|
| tl;dr: The performance isn't so much that it is written in
| C++ but that C++ makes critical optimizations relatively
| straightforward and economic to implement.
| gunnarmorling wrote:
| > there are some important classes of throughput
| optimization that are incompatible with garbage
| collection
|
| Can you share some more details about these
| optimizations? I.e. what they are and why GC tends to go
| against them?
| _benedict wrote:
| In part it's also simply a demonstration of different
| priorities. Scylla's USP is performance, so a lot of
| elbow grease is spent there. The Apache Cassandra
| community is focused primarily on operating at scale, as
| that's its USP.
|
| Performance is adequate for Cassandra, so the community
| has (for several years) primarily focused elsewhere. It
| will be a priority again in future, but in the meantime
| with many huge scale users out there the community has
| focused on guaranteeing correctness and stability at
| scale. For example, the Harry[1] toolkit for validating
| huge databases, and an adversarial cluster simulator[2]
| for exposing distributed and other complex bugs. Also a
| huge amount of behind-the-scenes work that isn't so easy
| to call out.
|
| The community is now focusing on expanding the utility of
| the database for these use cases. For example the
| recently proposed enhancement to bring state-of-the-art
| general purpose transactions[3] to Apache Cassandra.
|
| [1] https://github.com/apache/cassandra-harry
|
| [2] https://cwiki.apache.org/confluence/display/CASSANDRA
| /CEP-10...
|
| [3] https://cwiki.apache.org/confluence/download/attachme
| nts/188...
|
| [edit] disclaimer: I'm an Apache Cassandra contributor
| involved with some of the above work.
| PeterCorless wrote:
| The only clarification I want to make here is that
| Cassandra is focused on _horizontal_ scalability, which
| Scylla can match. However, Cassandra isn 't (now, or yet)
| focused on vertical scalability, such as using it in
| current instances that have dozens of vCPUs. This was
| shown in the "4 vs. 40" test of Scylla vs. Cassandra,
| where 4 boxes of a vertically scaled Scylla (72 vCPU
| each; 288 vCPUs total) were able to perform the same or
| better as 40 boxes of Cassandra (with 16 vCPU; 640 vCPU
| total).
|
| Definitely newer JVMs are improving things such as
| latency, and there are now some that are NUMA-aware, but
| in a JVM you are literally straight-jacketed from seeing
| the raw hardware you are running on. And that will impact
| to greater or lesser degrees your ability to take
| advantage of it.
| kreetx wrote:
| It appears to be a C++ rewrite of Cassandra (written in
| Java).
| jeffbee wrote:
| Similar vibe to redpanda vs. kafka: take a reasonable
| idea and reimplement it so it isn't terrible.
| https://vectorized.io/blog/fast-and-safe/
| dieters wrote:
| Yup, and redpanda happens to use Scylla's Seastar
| framework as well
| hiyer wrote:
| > Why is it faster? Is it algorithmic, or some neat trick,
| or just a much more efficient implementation?
|
| Scylla is written in C++ (versus Java for Cassandra) and
| uses the high-performance Seastar[0] framework.
|
| > And are they closely equivalent? Would you use one or the
| other for the same thing, or do they make different CAP
| promises?
|
| Scylla claims to be a drop-in replacement for Cassandra.
|
| 0. http://seastar.io/
| nivertech wrote:
| BTW, there is a word play here:
|
| seastar -> sea star -> C* -> Cassandra ;)
| jakearmitage wrote:
| Seastar Framework seems to be the star behind all this. Anyone
| using that for other purposes?
| jjirsa wrote:
| Vectorized/RedPanda for Kafka compatible service, at least.
|
| RedPanda seems to be a better starting point to be honest - a
| lot of the market share for Cassandra doesn't actually need the
| extra tps of Scylla. Some customers do, but many many many do
| not.
| enedil wrote:
| Scylla is not only about throughput. One of the values of
| Scylla is its reduced latencies or ease of maintenance.
| Overview of reasons: https://www.scylladb.com/scylla-vs-
| cassandra
|
| Disclosure: I'm an engineer at ScyllaDB.
| jjirsa wrote:
| The "reduced latencies" really reduces down to performance,
| where tps was a first order proxy for performance.
|
| The ease of maintenance, similarly, is sold as easier due
| to reduced node count, which is perhaps an extension of
| performance but probably misunderstands (or ignores) that
| most people running large cassandra clusters have tooling
| that parallelizes most maintenance anyway, so the reduction
| in effort is sorta not that important in real life (if
| anything, having more machines gives you better blast
| radius behavior, consolidation onto fewer exposes you to
| larger percentages of loss/failure when there's inevitably
| a problem with the fewer, larger machines).
|
| The real comparison, though, is missing in that link,
| because the real comparison is not performance. It's
| license. Nobody is running AGPL in prod unless they have
| zero IP worth protecting, so it's ultimately comparing OSS
| to proprietary.
|
| (And similar disclosure: cassandra committer)
| PeterCorless wrote:
| As mentioned, Redpanda, which is a Seastar-based Kafka.
|
| Also, Red Hat's Ceph's replacement, "Crimson"
|
| https://docs.ceph.com/en/latest/dev/crimson/crimson/
___________________________________________________________________
(page generated 2021-09-27 23:01 UTC)