[HN Gopher] How discord stores billions of messages (2017)
       ___________________________________________________________________
        
       How discord stores billions of messages (2017)
        
       Author : greymalik
       Score  : 170 points
       Date   : 2022-08-26 12:33 UTC (10 hours ago)
        
 (HTM) web link (discord.com)
 (TXT) w3m dump (discord.com)
        
       | coldblues wrote:
       | Unfortunately, they're not doing a good job at deleting them. If
       | you press the "Delete Account" button, all it does is anonymize
       | your profile, and leaves all of your messages intact. One of the
       | reasons I avoid using Discord whenever possible.
        
         | hombre_fatal wrote:
         | Why is HN acceptable then?
        
           | byyll wrote:
           | HN is a public forum, not behind a pay or auth wall.
        
             | vxNsr wrote:
             | Just treat it the same way, assume everything you will say
             | on discord will become public. Problem solved.
        
               | unlog wrote:
               | Sorry, buy you have the wrong idea. That would be fine
               | for public servers, but not for Private Conversations.
               | I'ts unacceptable.
               | 
               | edit:typo
        
               | [deleted]
        
               | hombre_fatal wrote:
               | I can't think of a good reason why the presence or
               | effectiveness of a delete button should change what you
               | write online, even in a so-called "private conversation",
               | or whether you use a certain platform.
               | 
               | Delete is useful, but it shouldn't change what you're
               | willing to send online because delete doesn't time travel
               | and unsend the message.
        
               | unlog wrote:
               | I do not consider "online", when it's a PM. A private
               | message, should be private. There are reasons to get rid
               | of the whole thing sometimes.
        
               | pixelat3d wrote:
               | If you are typing anything you desire to stay hidden
               | forever into a completely unencrypted and openly data
               | mined and brokered (read your privacy policy) cloud-based
               | solution then I have a bridge to sell you...
        
               | vel0city wrote:
               | Private conversations aren't really very private on
               | Discord. Attachment URLs are globally visible.
        
               | roflyear wrote:
               | Agreed. People like to make excuses for the poor
               | functionality of HN. We should be able to delete our
               | comments.
        
               | [deleted]
        
             | RockRobotRock wrote:
             | You need to login to your account, so how is that not an
             | auth wall?
        
               | maccard wrote:
               | You don't need to be logged in to view other people's
               | comments though
        
               | RockRobotRock wrote:
               | that's true
        
       | jpalomaki wrote:
       | This is quite valuable advise: "The original version of Discord
       | was built in just under two months in early 2015. Arguably, one
       | of the best databases for iterating quickly is MongoDB.
       | Everything on Discord was stored in a single MongoDB replica set
       | and this was intentional, but we also planned everything for easy
       | migration to a new database"
       | 
       | Also the article links to Twitter blog, which gives similar point
       | (it's from 2010): "We [Twitter] currently use MySQL to store most
       | of our online data. In the beginning, the data was in one small
       | database instance which in turn became one large database
       | instance and eventually many large database clusters" [1]
       | 
       | [1]
       | https://blog.twitter.com/engineering/en_us/a/2010/announcing...
        
         | avodonosov wrote:
         | The first version of reddit also used a single PostgreSQL
         | instance apparently: https://github.com/reddit-
         | archive/reddit1.0/blob/bb4fbdb5871...
        
           | andrewxdiamond wrote:
           | Reddit is perhaps not the best company to look at for
           | operational excellence. They seem to be barely keeping the
           | ship aloft
        
             | ChuckNorris89 wrote:
             | Modern Reddit for me runs like ass and I have a 2021 4.2GHz
             | 8 core Ryzen CPU. I can't imagine what it's like on older
             | systems where even my work Intel 9th gen was braking the
             | site.
        
               | MH15 wrote:
               | It's incredible how slow their website is. I understand
               | (but am against) their motivation to push everyone to the
               | app on mobile devices, but on desktop where am I going to
               | go? My powerful computers both struggle to render the
               | site.
        
               | andrewxdiamond wrote:
               | old.reddit.com is still alive! RES still works as well.
               | 
               | You can live in a happy little bubble ignoring their
               | current escapades
        
               | ChuckNorris89 wrote:
               | _I don 't understand your issue. The website runs fine on
               | our M1 Pro MacBooks we use for development so the problem
               | must be on your end._ /s
        
               | mhh__ wrote:
               | Runs like a old dog there too
        
           | MonkeyMalarky wrote:
           | Seems like sound advice? Starting with DB clusters would be
           | premature optimization, but designing your schema with the
           | expectation that it will be on multiple instances later is
           | sane. Kind of like building a monolith with the expectation
           | that certain components will eventually be moved out into
           | their own services.
        
       | judge2020 wrote:
       | (2017)
       | 
       | As revealed in this blog post[0], they now see 4 billion messages
       | per day.
       | 
       | 0: https://news.ycombinator.com/item?id=32474093
        
         | tmikaeld wrote:
         | And switched to ScyllaDB.
        
         | samlambert wrote:
         | Around ~11k a second. Not small but not insane scale either.
        
           | judge2020 wrote:
           | 4,000,000,000/(24 * 60 * 60) ~= 46,000
           | 
           | And their mps definitely scales based on time-of-day,
           | illustrated by this blog image[0] from this blog[1] (given
           | their CPUs are only really handling messages).
           | 
           | 0: https://assets-global.website-
           | files.com/5f9072399b2640f14d6a...
           | 
           | 1: https://discord.com/blog/why-discord-is-switching-from-go-
           | to...
        
           | chrsig wrote:
           | as someone that's worked with a similar workload (edit: in
           | terms of write volume, not persistent chat messages), it's an
           | incredibly annoying volume.
           | 
           | Too high for simple solutions, not big enough for large scale
           | solutions.
        
       | mgoldstein5 wrote:
        
       | raffraffraff wrote:
       | I haven't used Cassandra for about 3 years, and this is awakening
       | memories. At a previous company I inherited a very badly
       | assembled cluster that was used for a time series database. The
       | guys who built it said (and no, I'm not kidding...) "we don't
       | need to put a TTL on metrics because they're tiny and anyway we
       | can just add more nodes and scale the cluster horizontally
       | forever!". Well, forever was about 2 years, when the physical
       | data center ran out of rack space, and teams abused the metrics
       | system with TBs of bullshit data. That was when they handed the
       | whole metrics system to yours truly. And I discover two things:
       | 
       | 1. You can't just bulk delete a year of old stale data without
       | breaking it
       | 
       | 2. "Woops, did we really set replication factor to 1?"
       | 
       | Fun.
        
         | vorpalhex wrote:
         | What was the issue with the bulk deletes? Locking the cpu?
        
       | chrsig wrote:
       | > Nothing was surprising, we got exactly what we expected.
       | 
       | Such a satisfying feeling in the engineering world.
       | 
       | > We noticed Cassandra was running 10 second "stop-the-world" GC
       | constantly but we had no idea why.
       | 
       | This makes me very thankful for the work that the Go team has put
       | into the go GC.
       | 
       | > In the scenario that a user edits a message at the same time as
       | another user deletes the same message, we ended up with a row
       | that was missing all the data except the primary key and the text
       | since all Cassandra writes are upserts.
       | 
       | Does cassandra not offer a mechanism to do a conditional update?
       | I'd expect to be able to submit a upsert that fails if the row
       | isn't present, or has a `deleted = true` field, or something to
       | that effect.
        
         | ryanworl wrote:
         | "Conditional update" requires consensus to be fault tolerant,
         | and regular Cassandra operations don't use consensus. Cassandra
         | has a feature called light-weight transactions which can sort
         | of get you there, and they use Paxos to do it. It unfortunately
         | has a history of bugs that invalidate the guarantees you'd want
         | from this feature, but more modern versions of Cassandra are
         | improving in this regard.
        
           | PeterCorless wrote:
           | Yes. ScyllaDB also has LWTs based on Paxos. More here:
           | https://www.scylladb.com/2020/07/15/getting-the-most-out-
           | of-...
           | 
           | [Disclosure: I work for ScyllaDB]
        
         | cmrdporcupine wrote:
         | > This makes me very thankful for the work that the Go team has
         | put into the go GC.
         | 
         | If you read that part of the piece again, the problem with the
         | GC wasn't necessarily with Java's GC implementation but with
         | Cassandra's storage architecture combined with a mass delete,
         | which tombstoned millions of records and then Cassandra had to
         | scan them and pass them over: _" Cassandra had to effectively
         | scan millions of message tombstones (generating garbage faster
         | than the JVM could collect it)_"
         | 
         | Even if you had an incremental GC, you'd be eating CPU and
         | thrashing the cache like crazy doing that. And you may never
         | catch up.
         | 
         | Actually even if you didn't have a GC language, you'd likely
         | still have scan or free operations.
         | 
         | Writing databases is hard. And actually the various JVM's GCs
         | are top knotch and have had hundreds of thousands of
         | engineering hours put into perfecting them for various
         | workloads.
         | 
         | Still, these days I'm glad not to be near it and I work in Rust
         | or C++.
        
           | chrsig wrote:
           | I'm not discounting that the GC had an overwhelming amount of
           | work to do.
           | 
           | My comment wasn't intended to convey that the go gc is
           | somehow more efficient, but rather that I'm thankful for the
           | tradeoffs they've made prioritizing latency and
           | responsiveness, and minimizing the STW phase. I can see how I
           | didn't give enough context for it to be interpreted
           | correctly.
           | 
           | It's been my experience that with larger STW pauses, you wind
           | up with other effects -- to the outside observer it's
           | impossible to tell if the service is even available or not.
           | You could argue that if you're thrashing memory that hard,
           | no, it's not available. In general though, it makes for
           | higher variance in latency distributions.
           | 
           | > Writing databases is hard. And actually the various JVM's
           | GCs are top knotch and have had hundreds of thousands of
           | engineering hours put into perfecting them for various
           | workloads.
           | 
           | no doubt on both accounts. again, thankful for the design
           | tradeoffs.
           | 
           | > Still, these days I'm glad not to be near it and I work in
           | Rust or C++.
           | 
           | For anything that really needs to be very consistent, or have
           | full control execution, I don't blame you. Cassandra is a
           | prime example of what happens when you don't have that full
           | control.
        
             | LAC-Tech wrote:
             | I feel like GP is talking about GC at the database level,
             | not the language level.
             | 
             | IE something like RocksDB - written in C++ - still has a
             | compaction phase.
        
               | cmrdporcupine wrote:
               | I think they're actually talking about both at once.
               | 
               | Cassandra definitely has compaction (similar LSM storage
               | as RocksDB), and it causes pauses.
               | 
               | It's been over 10 years since I touched Cassandra, but
               | when I did there was often a royal dumpster fire when
               | compaction & JVM GC ganged up to make a big mess.
        
         | pkulak wrote:
         | > This makes me very thankful for the work that the Go team has
         | put into the go GC.
         | 
         | Has the Go team hit on some GC algorithm that has escaped the
         | notice of the entire compiler-creating world since 1995? Not to
         | disparage Go at all, it's got brilliant people working on it,
         | but you could boil the oceans at this point with the effort
         | from the brilliant folks who have been working on JVM garbage
         | collection for the last 20+ years.
        
           | Scaevolus wrote:
           | The Go team has members with decades of experience writing
           | garbage collection algorithms, including on the JVM, so it's
           | not starting from scratch.
           | 
           | Go has value types and stack allocation that allow it to
           | create _far_ less garbage than Java and means its heap
           | objects violate the generational hypothesis-- because
           | shortlived objects tend to stay on the stack. This means that
           | the Go GC can comfortably operate with far less GC throughput
           | than typical Java programs.
           | 
           | https://go.dev/blog/ismmkeynote
        
       | oreally wrote:
       | I don't know databases and there's quite a number of them on the
       | market, so posts like these are great.
       | 
       | Sidetrack though, does anyone have a list for pros and cons of
       | each db, with a preference towards low latency? Also how does it
       | compare with say Postgres?
        
         | Transfinity wrote:
         | That depends heavily on the shape of your data, what your
         | workload looks like, what sort of consistency guarantees you
         | need, etc. I recommend Designing Data Intensive Applications
         | for getting a handle on this - it's the book I suggest to SDE
         | IIs who are hungry for the jump to senior. Not a quick read,
         | but well written, and there's not really a shortcut to deep
         | understanding other than deep study.
        
           | oreally wrote:
           | thanks, will look that up
        
           | dominotw wrote:
           | > I suggest to SDE IIs who are hungry for the jump to senior
           | 
           | Any suggestions for seniors who are hungry for a staff title.
           | Is there any value to specializing in tech anymore after
           | senior or is it all 'soft skills' at this point ?
        
             | asdfasgasdgasdg wrote:
             | It is not all any one thing. If you're completely lacking
             | in soft skills, you'll probably have more trouble the
             | higher up you go, and you may get stuck at some point. Hard
             | technical skills become less important the higher you go,
             | but even the CEO of a tech company probably still needs to
             | have some basic understanding of technology. Being good at
             | both things is obviously the best way forward.
        
             | dharmaturtle wrote:
             | I enjoyed this: https://staffeng.com/book
             | 
             | Not a staff though so can't attest to how
             | true/accurate/effective it is.
        
         | PeterCorless wrote:
         | I did a talk about establishing criteria for differences
         | between distributed SQL and NoSQL databases.
         | 
         | https://www.youtube.com/watch?v=nJvnq8ZD_7Q
         | 
         | There are a couple of different dimensions to consider. I
         | didn't focus on this in the talk but firstly you should
         | consider these elements...
         | 
         | * Speed (latency & throughputs)
         | 
         | * Scale (gigabytes? terabytes? petabytes? exabytes?)
         | 
         | * Workload types (OLAP, OLTP, HTAP, data
         | warehouses/lakes/lakehouses)
         | 
         | * Consistency Models ("ACID vs. BASE", eventual vs. strong,
         | etc.)
         | 
         | * Data Model - RDBMS, NoSQL (Key-Value, Wide Column, Document,
         | Graph, Column Store, etc.)
         | 
         | * Query Language (SQL, streaming SQL [Flink], CQL [Cassandra,
         | ScyllaDB, DataStax, CosmosDB, Keyspaces, Yugabyte, etc.],
         | Gremlin/Tinkerpop [JanusGraph, DSE Graph, OrientDB, etc.], and
         | so on.
         | 
         | What I did focus on was more the "distributed system"
         | attributes...
         | 
         | * Clustering & distribution (local clustering, cross-
         | clustering, multidatacenter clustering)
         | 
         | * Node roles, high availability & failover strategies
         | (primary/replica, peer-to-peer leaderless/active-active, load
         | balancing)
         | 
         | * Data replication/sharding strategies (replication factors,
         | consistency levels, manual vs. auto-sharing, topology awareness
         | [rack and datacenter awareness])
         | 
         | I only focused on a handful. There are literally hundreds of
         | databases tracked on DB-engines:
         | 
         | https://db-engines.com/en/ranking
         | 
         | [Disclosure, I work for ScyllaDB]
        
           | jeffbee wrote:
           | None of Bigtable, Spanner, or DynamoDB made your shortlist?
        
             | PeterCorless wrote:
             | For the sake of that talk I focused only on open source you
             | could self-host.
             | 
             | SQL * PostgreSQL * CockroachDB
             | 
             | NoSQL * MongoDB * Redis * ScyllaDB [Cassandra]
             | 
             | There are a lot of points in the DBaaS world to _also_
             | consider, but there 's also issues of visibility -- can
             | these issues be known, perceived or understood from a
             | user's POV? Are they documented? Or are some of these
             | considerations obscured and trade secreted? So I had to
             | pare down.
             | 
             | I planned to do a whole _different_ talk on DBaaS systems,
             | and yes, I 'd likely shortlist Bigtable, Spanner, DynamoDB,
             | and a few others for that.
             | 
             | Only so much to do in an hour! :)
        
       | hestefisk wrote:
       | Wonder if Postgres would scale to such volume.
        
         | aeyes wrote:
         | Maybe its possible with Citus.
         | 
         | But I wouldn't try, it's not the right tool for the job.
        
       | paxys wrote:
       | As a point of comparison, Slack uses MySQL (Vitess) -
       | https://slack.engineering/scaling-datastores-at-slack-with-v...
        
       | unlog wrote:
       | Discord doesn't respect privacy, you cannot just get rid of a
       | whole conversation. Users are the product, and they make it so
       | difficult to delete entire convos that it's so obvious it's just
       | valuable to them.
        
         | ridgered4 wrote:
         | I had a bigger issue with requiring a phone number than the
         | lack of ephemeral posting. The later is something I'm used to
         | at least.
        
         | corndoge wrote:
         | Can say the same thing about IRC and logging...anything online
         | is permanent
        
           | unlog wrote:
           | For public servers I understand, for private messages
           | obviously not.
           | 
           | edit: better wording
        
         | roflyear wrote:
         | You can't on HN either.
        
           | unlog wrote:
           | on HN the convos aren't private.
        
             | magicalhippo wrote:
             | Discord is a closed-source service hosted by a 3rd party on
             | the internet. It's by definition not private either.
        
               | richbell wrote:
               | Hacker News doesn't have "private messages", Discord
               | does.
        
       | seydor wrote:
       | how many are bots? AFAIK bot traffic is higher than human there.
       | Plus they added so much bot APIs that now bots are hacking and
       | spamming users' accounts
        
         | tcfhgj wrote:
         | Bot traffic and human traffic aren't mutually exclusive, bots
         | can be humans:
         | 
         | https://t2bot.io/discord/
        
           | PeterCorless wrote:
           | Elsewhere in social media and gaming many accounts may have
           | both manual and automated activities associated with it.
           | 
           | For example, in a WoW chat, that might be your friend typing
           | "lol," or it could be an add-on putting out an automated raid
           | warning.
           | 
           | Similarly, your Twitter account could be a manual post, or a
           | scheduled one from Tweetdeck.
           | 
           | Are you now a "bot account?"
           | 
           | Discord does permit bots in its system. But it tries to limit
           | "self-bots" on the platform. It wants humans to be humans and
           | bots to be bots. "Self-bots" in Discord could result in
           | termination. Read more here:
           | 
           | https://support.discord.com/hc/en-
           | us/articles/115002192352-A....
        
         | [deleted]
        
       | Jamie9912 wrote:
       | I believe they use ScyllaDB exclusively now for storing messages
        
       ___________________________________________________________________
       (page generated 2022-08-26 23:01 UTC)