[HN Gopher] How Figma's databases team lived to tell the scale
       ___________________________________________________________________
        
       How Figma's databases team lived to tell the scale
        
       Author : pinser98
       Score  : 334 points
       Date   : 2024-03-14 17:51 UTC (5 hours ago)
        
 (HTM) web link (www.figma.com)
 (TXT) w3m dump (www.figma.com)
        
       | esafak wrote:
       | This is such a familiar story. Company starts with a centralized
       | database, runs into scaling problems, then spends man-years
       | sharding it. Just use a distributed database.
        
         | Closi wrote:
         | This assumes that using a distributed database from the start
         | doesn't offer it's own penalties/drawbacks (particularly in
         | 2016).
         | 
         | Particularly considering as per the figma note, they consider
         | their data highly relational (i.e. presumably this means lots
         | of joins in queries)
         | 
         | What database would you have chosen?
        
           | esafak wrote:
           | That's a fair response for then, but not today where you're
           | spoiled for choice. Spanner was released as a service in
           | 2017, which isn't far off.
        
             | prisenco wrote:
             | Spanner is quite expensive though.
        
         | jack_riminton wrote:
         | If every startup used every scale-proof method available from
         | the beginning they'd never have the time or resources to build
         | the products to get the customers that would require them to
         | use the scale-proof methods
        
           | esafak wrote:
           | I agree with your general point but there are relational
           | distributed databases, open source and "serverless", that are
           | compatible with MySQL (planetscale, tidb, vitess...) and
           | postgres (citus, cockroachdb...)
           | 
           | edited for examples.
        
             | noja wrote:
             | One reason they don't is unhelpful comments like these:
             | alluding to products without naming them.
        
               | ofrzeta wrote:
               | From the article: "During our evaluation, we explored
               | CockroachDB, TiDB, Spanner, and Vitess". Three of them
               | are open source if I am not mistaken.
        
             | mplanchard wrote:
             | And many of these are substantially more expensive than
             | RDS, or have terrible INSERT performance, or require making
             | the correct decisions really early about what to use as
             | partition keys. There's no free lunch, sadly.
        
             | Closi wrote:
             | I don't think these were mature products when Figma started
             | to be developed in 2012.
        
         | mamcx wrote:
         | There are many kinds of apps that use of "distributed"
         | databases is an antipattern...and useless to make them scale.
         | 
         | MOST apps did not need more than "tenant-per-company" and then,
         | maybe, distributed among a small set of servers.
         | 
         | Is it just that the kind of app that shows here (and has this
         | problems) are just niche apps like social networks and complex
         | chat apps.
        
         | dang wrote:
         | " _Please don 't post shallow dismissals, especially of other
         | people's work. A good critical comment teaches us something._"
         | 
         | https://news.ycombinator.com/newsguidelines.html
        
         | battwell wrote:
         | Doesn't seem a crazy way to start a company. RDS will have
         | scaling problems but is very mature and fairly easy to use
         | early on when you're working on your MVP
         | 
         | I've used CRDB early on at a startup. There was some overhead.
         | You don't get all the nice PSQL features
         | 
         | Although it did save us a giant migration later on
        
       | nosefrog wrote:
       | Coming from Google, where Spanner is this magical technology that
       | supports infinite horizontal sharding with transactions and has
       | become the standard storage engine for everything at Google
       | (almost every project not using Spanner was moving to Spanner),
       | I'm curious how Figma evaluated Cloud Spanner. Cloud Spanner does
       | have a postgres translation layer, though I don't know how well
       | it works.
       | 
       | It seems like they've (hopefully only temporarily) given up real
       | transactional support with their horizontal postgres scheme?
        
         | echelon wrote:
         | Why would _anyone_ marry themselves to Google? That sounds like
         | the most boneheaded move possible.
         | 
         | First, you should never be beholden to a single vendor for the
         | most critical technological underpinnings. You're backing
         | yourself into a corner.
         | 
         | But more importantly, Google can't even figure out how to
         | prioritize their cloud efforts. That's not a good partnership
         | to be in for anyone except Google.
         | 
         | I wouldn't care if my solution was 10x worse than Cloud Spanner
         | from a technology perspective. It'd be 1000x better from a
         | strategy perspective.
         | 
         | You can hire engineers to do consistency at scale. It's your
         | core competency and you can't just handwave and outsource that,
         | lest you wind up stuck in a crevasse. Hire smart engineers and
         | do the work yourself. It'll pay off.
        
           | Thaxll wrote:
           | You can't, that's why spanner only exists a Google. That tech
           | is that good, not found anywhere else.
        
             | echelon wrote:
             | But you don't need it. There are a limitless number of ways
             | to deal with the problems spanner solves. And by choosing
             | your own, you can focus on the subset that matter to your
             | case and not be tied down to Google.
        
         | hipadev23 wrote:
         | The problem is nobody outside Google trusts them to run or
         | operate anything.
         | 
         | Edit: To the Googlers downvoting these comments. Your behavior
         | only reinforces our views.
        
           | szundi wrote:
           | Google means: good chance discontinued after you have worked
           | out the bugs and have a stable system at last
        
             | groestl wrote:
             | This is definitely not the case with their core cloud
             | products.
             | 
             | > almost every project not using Spanner was moving to
             | Spanner
             | 
             | This even includes Datastore. Even Datastore moved to
             | Spanner.
        
         | ksb wrote:
         | Ping time from AWS data centers to GCP ones
        
         | umvi wrote:
         | Never a good idea to rely on Google proprietary tech (unless
         | you are Google)... it could be sunset at any time without
         | warning. I use GCP but I try my best to stay Google agnostic
         | (avoid GCP-only offerings, etc) so that I can move to AWS if
         | Google pulls the rug out from under me.
        
           | rockostrich wrote:
           | GCP products have a much better track record than Google
           | consumer products when it comes to support since there are
           | usually enterprise customers with multi-year contracts worth
           | tens, if not hundreds, of millions of dollars using them.
        
             | callalex wrote:
             | That's what AppEngine customers thought.
        
               | weitendorf wrote:
               | I worked on AppEngine (well, Serverless Compute) at
               | Google for over 4 years and left on Friday. Did something
               | happen to AppEngine in the last week?
        
               | callalex wrote:
               | I'm talking about the pricing disaster that happened in
               | 2017/2018 where user prices went up from 10x-100x because
               | Google wanted to kill the product without actually
               | killing it.
        
             | jerrygenser wrote:
             | IoT is one example of a big backbone service that was
             | sunset.
        
               | endisneigh wrote:
               | It had barely any usage though from what I can tell from
               | searching about it.
               | 
               | Not that it's any solace to those affected.
        
             | marcinzm wrote:
             | AI Platform is a recent example that's been deprecated.
        
           | weitendorf wrote:
           | I'm biased having worked on GCP, but I think GCP actually has
           | a very good track record of not sunsetting entire products or
           | removing core functionality. When I worked on AppEngine, I
           | would often see apps written 10+ years ago still chugging
           | along.
           | 
           | It is true though that GCP sometimes sunsets specific product
           | functionality, requiring changes on the customers' part. Some
           | of these are unavoidable (eg committing to apply security
           | patches to Python 2.7 given that the rest of the world is
           | mostly not upstreaming these patches anymore), but not all of
           | them.
           | 
           | I would certainly use Cloud Spanner externally now that I've
           | left the company, and IMO spanner has such a compelling
           | featureset that it's a strong reason to use GCP for
           | greenfield development. The problem though is that it's only
           | available on GCP, and could become expensive at large scale.
        
             | jerrygenser wrote:
             | >It is true though that GCP sometimes sunsets specific
             | product functionality, requiring changes on the customers'
             | part. Some of these are unavoidable (eg committing to apply
             | security patches to Python 2.7 given that the rest of the
             | world is mostly not upstreaming these patches anymore), but
             | not all of them.
             | 
             | A good example is probably IoT. I've heard first hand
             | anecdotes of very difficult migrations off this service.
        
         | shalabhc wrote:
         | Global consistency is expensive, both latency-wise and cost-
         | wise. In reality most apps don't need global serializability
         | across all objects. For instance, you probably don't need
         | serializability across different tenants, organizations,
         | workspaces, etc. Spanner provides serializability across all
         | objects IIUC - so you pay for it whether you need it or not.
         | 
         | The other side of something like Spanner is the quorum-based
         | latency is often optimized by adding another cache on top,
         | which instantly defeats the original consistency guarantees.
         | The consistency of (spanner+my_cache) is not the same as the
         | consistency of spanner. So if we're back to app level
         | consistency guarantees anyway, turns out the "managed" solution
         | is only partial.
         | 
         | Ideally the managed db systems would have flexible consistency,
         | allowing me to configure not just which object sets need
         | consistency but also letting me configure caches with lag
         | tolerance. This would let me choose trade-offs without having
         | to implement consistent caching and other optimization tricks
         | on top of globally consistent/serializable databases.
        
           | eatonphil wrote:
           | See also: "Strict-serializability, but at what cost, for what
           | purpose?"
           | 
           | https://muratbuffalo.blogspot.com/2022/08/strict-
           | serializabi....
        
           | foota wrote:
           | While it doesn't help much with the costs of replication,
           | Spanner can be configured with read only replicas that don't
           | participate in voting for commits, so they don't impact the
           | quorum latency.
           | 
           | Reads can then be done with different consistency
           | requirements, e.g., bounded staleness (which guarantees data
           | less stale than the time bound requested).
           | 
           | See https://cloud.google.com/spanner/docs/reference/rest/v1/T
           | ran... or
           | https://cloud.google.com/spanner/docs/reads#read_types and
           | https://cloud.google.com/spanner/docs/create-manage-
           | configur...
        
         | opportune wrote:
         | My perspective from working both inside and outside of Google:
         | 
         | The external spanner documentation doesn't seem as good as the
         | internal documentation, in my opinion. Because it's not
         | generally well known outside of G, they ought to do a better
         | job explaining it and its benefits. It truly is magical
         | technology but you have to be a database nerd to see why.
         | 
         | It's also pretty expensive and because you generally need to
         | rewrite your applications to work with it, there is a degree of
         | lockin. So taking on Spanner is a risky proposition - if your
         | prices get hiked or it starts costing more than you want,
         | you'll have to spend even more time and money migrating off it.
         | Spanner's advantages over other DBs (trying to "solve" the CAP
         | theorem) then become a curse, because it's hard to find any
         | other DB that gives you horizontal scaling, ACID, and high
         | availability out of the box, and you might have to solve those
         | problems yourself/redesign the rest of your system.
         | 
         | Personally I would consider using Cloud Spanner, but I wouldn't
         | bet my business on it.
        
         | zenbowman wrote:
         | I wouldn't say "infinite", its still susceptible to read
         | hotspotting; and while fine-grained locking enables generally
         | higher write throughputs, you can still get in a situation
         | where interconnected updates end up being pretty slow.
         | 
         | That said, its way better than anything else I've used in my
         | career.
        
       | emptysea wrote:
       | Seems they've built out a PG version of MySQL's Vitess
       | 
       | Query rewriting seems interesting, having a layer between your DB
       | and your application would also allow various ACL stuff as well
        
       | goshx wrote:
       | "Sorry, you've stumbled upon a temporary technical issue. Please
       | try refreshing this page in a moment."
       | 
       | Have they? :) I am excited to read the article when it loads
        
         | nix0n wrote:
         | It worked for me on the first try, but here's an archive link
         | in case it goes down again: https://archive.is/xusR7
        
         | Lucasoato wrote:
         | This might be another case of "death by HN"
        
       | mannyv wrote:
       | Hmm, wonder why they didn't try FoundationDB.
       | 
       | Interesting that they had problems with vacuuming. I always
       | thought that part of Postgres was the worst part. I vaguely
       | remember that you needed twice the space to vacuum successfully,
       | which hopefully has changed in later versions.
       | 
       | An article about why vacuum is needed in pg (as compared to
       | mysql/innodb).
       | 
       | http://rhaas.blogspot.com/2011/02/mysql-vs-postgresql-part-2...
        
         | eatonphil wrote:
         | The article mentions they're trying very hard to stick with
         | Postgres. FoundationDB is great but doesn't even have a SQL
         | access layer, let alone a Postgres SQL access layer. :)
        
         | harikb wrote:
         | Do you know of any performant relational-db layer on top of
         | FoundationDB? It seems there usecase would need at least simple
         | join, which raw FoundationDB lacks.
        
         | yashap wrote:
         | They want pretty full SQL support, so FoundationDB would be
         | out. I'd think "buy" options for them would be more like Citus,
         | Yugabyte, Cockroach or Spanner.
        
       | dakiol wrote:
       | I cannot help but think this all sounds pretty much like a hack
       | (a clever one, though).
       | 
       | We do not handle, let's say low-level I/O buffering/caching, by
       | ourselves anymore, right? (at least the folks doing web
       | development/saas). We rely instead of the OS APIs, and that's
       | good. I think we are missing something similar but for db
       | sharding. It seems to me that we are still missing some
       | fundamental piece of technology/infrastructure to handle
       | horizontal data sharding.
        
         | simonw wrote:
         | Citus and Vitess are examples of horzontal data sharding
         | technology for PostgreSQL and MySQL respectively.
         | 
         | At Figma's size there are sometimes reasons to roll your own,
         | which I think they explain pretty clearly in the article. They
         | wanted a solution they could incrementally engineer onto their
         | existing stack without doing a full rewrite or lift-and-shift
         | to something else.
        
         | yard2010 wrote:
         | This is Hacker News after all
        
       | bjornsing wrote:
       | One thought that comes up: Wouldn't it be easier to have each
       | customer in their own (logical) database? I mean, you don't need
       | transactions across different customers, right? So you're
       | essentially solving a harder problem than the one you've got.
       | 
       | Not sure postgres (logical) databases would scale that well, but
       | don't see a principal reason why it couldn't. Has anyone explored
       | this further?
        
         | eatonphil wrote:
         | I imagine it only gets you so far. What do you do about
         | customers like Walmart or Oracle? Hundreds, if not thousands,
         | of users all leaving hundreds of comments on Figma files every
         | day. If you want good latency without giving up strong
         | consistency (which the article says they want) you'll need to
         | keep sharding.
        
           | willsmith72 wrote:
           | A single db can handle that load easily
        
           | jeremyjh wrote:
           | They were running their whole business on one RDS instance 4
           | years ago. Do you think they now have one customer larger
           | than all their customers combined 4 years ago?
        
             | eatonphil wrote:
             | > Figma's database stack has grown almost 100x since 2020
             | 
             | The first sentence from the article seems to suggest its
             | possible?
        
           | infecto wrote:
           | I bet it gets you further than you imagine. Entirely depends
           | on the backend services and what they touch but in this
           | scenario you would be deploying/scaling that service based on
           | the customer seat size. I suspect that even for large
           | enterprise customers, the users actively touching Figma are
           | not reaching he thousands but I am happy to be wrong.
           | 
           | After all, Stackoverflow is running off of a handful of
           | machines.
        
         | jakewins wrote:
         | The problem - conceptually - is made much simpler this way; we
         | make use of this at work.
         | 
         | However you will still have shared resource problems - some
         | rogue query destroys IOPS in one tenant now ends up bringing
         | down all tenants etc. There are in theory databases that solve
         | this as well, but my experience has been that at that point
         | what you buy into is a bad version of resource sharing - ie
         | what an operating system does - and you're better off using OS
         | mechanisms
         | 
         | In other words: yes, but you still have noisy neighbours, and
         | may be better off running lots of small fully separated DBMSes
         | than a big logically separated one
        
           | hacker_newz wrote:
           | If tenants are on separate databases how would that be an
           | issue?
        
             | jakewins wrote:
             | Separate _logical_ databases, within the same RDBMs, so
             | sharing CPU, disks, RAM etc
        
             | sgarland wrote:
             | Because database, in Postgres terms, doesn't mean physical
             | node. It's more akin to a VM than anything else. The term
             | for an installation of Postgres is database cluster, which
             | can contain N databases, each of which can contain M
             | schemas.
        
         | aeyes wrote:
         | I have pondered about this for quite some time and came to the
         | conclusion that it would make schema migrations more difficult
         | to handle. I think Shopify is using an approach which is
         | similar to what you are describing. The advantage is that you
         | don't end up with hot shards because you can move around large
         | customers independently.
         | 
         | In practice there isn't a big difference, they just colocate
         | several customers according to their sharding key in the same
         | logical database.
        
           | jamesfinlayson wrote:
           | I remember Postgres table spaces being used to separate
           | customers at a previous job - I can't remember how migrations
           | were handled (pretty sure they were applied per table space)
           | but I don't think it was a problem (at our scale anyway).
        
           | yard2010 wrote:
           | I worked in a place that had a database for each tenant and
           | the schema migrations were a real pain. Every time everything
           | goes smoothly except these few databases that have an edge
           | case that screws the whole migration.
        
         | N_A_T_E wrote:
         | It sounds like they actually did something like this. Their
         | shard key selection could be customer, project, folder or
         | something in their data model at a reasonably high logical
         | level in their hierarchy.
        
         | infra_dev wrote:
         | Nile is a Serverless Postgres that virtualizes
         | tenants/customers. It is specifically built for SaaS companies
         | similar to Figma https://www.thenile.dev/. I am the CEO of
         | Nile.
        
         | GordonS wrote:
         | I've done something like this before, with each customer
         | getting their own _schema_ within a single Postgres instance.
        
         | jerbear4328 wrote:
         | Actually, Apple does this for iCloud! They use FoundationDB[1]
         | to store billions of databases, one for each user (plus shared
         | or global databases).
         | 
         | See: https://read.engineerscodex.com/p/how-apple-built-icloud-
         | to-...
         | 
         | Discussed on HN at the time:
         | https://news.ycombinator.com/item?id=39028672
         | 
         | [1]: https://github.com/apple/foundationdb
         | https://en.wikipedia.org/wiki/FoundationDB
        
           | danpalmer wrote:
           | > store billions of databases
           | 
           | This is sort of true and sort of false. When you think of a
           | "database", if you're thinking of a Postgres database, you're
           | way off the reality of what "database" means here.
           | 
           | FoundationDB has a concept called "layers", and essentially
           | they have created a layer that looks like a separate database
           | on top of a layer that is separately encrypted groups of
           | keys. They don't have billions of FoundationDB clusters or
           | machines, and at the infra level, i.e. the instances of the
           | FoundationDB server software, it's unaware of individual
           | "databases".
           | 
           | A closer analogy would be like having billions of tables, but
           | even that isn't accurate because in relational databases a
           | table is usually a notably more static concept than data in a
           | table. The closest analogy would be that each of the billions
           | of users has a bunch of rows with a user-id field on them,
           | and there's a proxy that filters everything such that you can
           | view the table as if it only had one user's data in it.
           | 
           | To be clear, FoundationDB is awesome and Apple have done some
           | really cool stuff with it, but it's less crazy/impressive
           | than it sounds.
        
             | dexwiz wrote:
             | This sounds like a pretty standard multitenant datastore.
             | Everything has a user/group Id on it, and a logical layer
             | that locks a connection to a specific group.
        
               | danpalmer wrote:
               | Yeah, the advantage or difference here is that these
               | "layers" are a common design pattern with FoundationDB,
               | several ship in FDB by default, and you're encouraged to
               | make more, so the database certainly has better support
               | than just adding a column for TenantID, but still you're
               | right that it's not too out there.
        
         | dhash wrote:
         | I believe physalia [0] explores this concept at production
         | scale quite well.
         | 
         | [0] https://blog.acolyer.org/2020/03/04/millions-of-tiny-
         | databas...
        
         | paxys wrote:
         | This works great until (1) your largest customer outgrows the
         | largest available DB (happens sooner than you'd think for large
         | companies) or (2) you _do_ need transactions across different
         | customers, say to facilitate some kind of sharing. Going all-in
         | on the isolated tenant strategy means when you hit one of these
         | cases it 's a nightmare to unwind and rearchitect your entire
         | DB layer.
        
           | winrid wrote:
           | A figma customer won't exceed the requirements of an
           | i3.metal...
        
           | jddj wrote:
           | Can you give an example of when a single customer has
           | outgrown the largest available DB?
        
           | jasonwatkinspdx wrote:
           | I'd respond by saying (1) is more rare than you're asserting.
           | 
           | There is a huge long tail of companies with datasets that
           | won't fit on a single machine but can be handled by a dozen
           | or so, and where no single customer or service is near the
           | limits of an individual node.
           | 
           | These customers are poorly served at the moment. Most of the
           | easy to implement SaaS options for what they need would be
           | hugely costly vs a small fleet of db servers they administer.
           | Meanwhile, most of the open source options are cargo culting
           | Google or Facebook style architecture, which is a huge burden
           | for a small company. I mean do you really want to run K8S
           | when you have 10 servers in total?
           | 
           | I think there's a lot of interesting stuff happening in this
           | end of the market that's not trying to be a mini Google, like
           | Fly.io.
           | 
           | As for (2), I think a middle ground is supporting cross shard
           | transactions but not joins. This works well enough for VoltDB
           | et all.
        
           | zarzavat wrote:
           | Figma is more or less a desktop application that happens to
           | run in a web browser.
           | 
           | If I use Photoshop to edit a .psd file I don't think "man
           | that psd file should really be stored in a single planet-
           | sized database of all psd files in existence". It's just a
           | file on my computer.
           | 
           | Figma requires a little bit more intermingling of data than
           | Photoshop, it has multiuser support for one, so a pure local
           | storage based approach wouldn't work. But, at its heart it's
           | still based on the document model. When you open a Figma
           | document it's its own isolated little universe, the
           | connections with resources outside the document are limited,
           | and that matches user expectations.
        
         | ummonk wrote:
         | This seems to be an architecture Cloudflare is aiming to
         | support with their SQLite service. One database per customer,
         | each database located in the customer's primary region.
        
           | jasonwatkinspdx wrote:
           | I think there's quite a few people chasing similar ideas,
           | like Azure's Durable Entities.
           | 
           | I've been calling it the Lots of Little Databases model vs
           | the Globe Spanning Gorilla.
           | 
           | Like the Spanner paper points out, even if your distributed
           | database semantically appears like a single giant instance,
           | in practice performance means developers avoid using
           | distributed joins, etc, because these can lead to shuffling
           | very large amounts of intermediate results across the
           | network. So the illusion of being on a single giant machine
           | ends up leaking through the reality, and people end up
           | writing workarounds for distributed joins like async
           | materialization.
           | 
           | If we give up the single machine illusion we get a lot of
           | simplification, at the cost of features devs were unlikely to
           | use anyhow. I see having consistent distributed commit but
           | without cross shard joins as a really interesting
           | alternative.
           | 
           | And besides scalability I like the extra security rope of
           | fine grained partitioning from the start.
           | 
           | I'll write a blog post along these lines if I get anything
           | worthwhile done.
        
         | cavisne wrote:
         | Spanners interleaved tables seem like a similar solution, ie
         | you interleave customer data so it all ends up on the same set
         | of hosts for performance, while still having the ability to
         | create transactions across customers.
        
         | tharkun__ wrote:
         | Yes, we've been doing that at my place basically since the
         | start. Each tenant is a schema in postgres. Works perfectly
         | fine on the one hand, i.e. your tables don't grow to 'infinity'
         | just because you're adding more and more tenants. If there's a
         | particular tenant that has lots of data, only that tenant's
         | indexes and tables grow huge and become slower because of that
         | particular reason etc. If a tenant leaves, you keep the schema
         | around for some time, so they can come back and then at some
         | point you just drop the schema!
         | 
         | It does mean having to upgrade each schema individually, which
         | also makes it both easier and harder. Easier because the tables
         | are smaller, so any schema changes that require things like say
         | a table lock, are locking for a smaller amount of time and
         | won't affect more than the one tenant at a given time. It also
         | means that you can get into an inconsistent state of course,
         | where some of your tenants have all the latest DB upgrades,
         | while it failed on another subset.
         | 
         | At some point Postgres's internal tables become a bit of a
         | "problem", as you want to run as many of these updates in
         | parallel as you can for speed, which could lead to contention
         | on Postgres' administrative side. You'll also still need to
         | shard across multiple actual RDS instances, because you still
         | have many tenants running against a single physical piece of
         | hardware that will show its limitations if too many large or
         | active tenants happen to be on the same shard.
         | 
         | And then you have the problem of keeping a record of which
         | physical RDS instance (i.e. shard) the tenant is on. Your
         | application code will need to look that up (and cache that info
         | for some time ;)) and you have some choice there as well. I.e.
         | do you shard those as well and juggle load balancing as
         | basically a 1:1 mapping to shards or do you have your
         | application layer connect to _all_ database shards and handle
         | _any_ tenants? One is more complicated I would say while the
         | other could run out of connections depending on how you need to
         | scale the application layer and what kind of RDS instance you
         | have.
        
         | icedchai wrote:
         | I worked at a place that did this with MySQL. Every tiny, trial
         | account got their own database. Every huge, actual customer got
         | their own database. Migrations were kinda painful. I would
         | think carefully about doing it this way.
        
         | petervandijck wrote:
         | Ongoing product development with migrations, tweaking indexes
         | etc. becomes really hard. Every small database tweak now has to
         | be deployed over 1000s of databases.
        
       | adastral wrote:
       | I see they don't mention Citus
       | (https://github.com/citusdata/citus), which is already a fairly
       | mature native Postgres extension. From the details given in the
       | article, it sounds like they just reimplemented it.
       | 
       | I wonder if they were unaware of it or disregarded it for a
       | reason --I currently am in a similar situation as the one
       | described in the blog, trying to shard a massive Postgres DB.
        
         | _boffin_ wrote:
         | how "massive" is massive in your case?
        
           | adastral wrote:
           | Around ten heavily-updated (50-400k updated rows/min) tables
           | ranging between 500M and 5B rows, with a couple tables over
           | 40B rows each (5TB each IIRC).
        
           | dijit wrote:
           | I've had CitusDB running across 68 bare metal machines (40
           | vCPU, 768GiB ram, 20TiB of storage each + 40GiB network
           | links) and it ran decently well.
           | 
           | Not sure what your definition of massive is, I think Spanner
           | would easily beat it.
           | 
           | Also, it's very use-case dependent, you can't "just use"
           | Citus for everything, it's not _quite_ as flexible as a bog-
           | standard pgsql install due to the way it 's sharding, you
           | have to be a tad more careful with your data model.
        
             | skunkworker wrote:
             | What is your definition of "decently well", and is your
             | primary cluster (without replicas) above 1PB?
        
               | simonw wrote:
               | They said 20TiB * 68, which I think is 1.5PB.
        
         | iamdanieljohns wrote:
         | I would love to see a comparison of the major PostgresQL
         | services such as Citus, EDB, Crunchy, Neon, and some OSS
         | distributions/packages
        
         | junto wrote:
         | Before clicking on the article I assumed it was Citus, and was
         | surprised when it wasn't.
         | 
         | Maybe because CitusData was bought by Microsoft around the same
         | time, so Microsoft could create "Azure Cosmos DB for Postgres
         | Cluster", yet another one of Microsoft's typical product naming
         | crapshoots.
        
           | victor106 wrote:
           | > yet another one of Microsoft's typical product naming
           | crapshoots.
           | 
           | Well said. I haven't seen any company as terrible as
           | Microsoft at naming things. Anyone know why?
        
             | studmuffin650 wrote:
             | AWS is putting up good fight
        
             | salynchnew wrote:
             | Naming things is hard.
             | 
             | At a previous employer, I saw several cool-ish open source
             | projects instantly doomed to obscurity by picking a name
             | that either completely duplicated the name of an existing
             | OSS project or were guaranteed to have terrible SEO for
             | another reason.
             | 
             | However, Microsoft seems to have a unique crossover of
             | fragmented business units and centralized marketing. That's
             | why you end up with Azure -> Subproject -> Actual
             | Product/Service word soup. Perviously, they did this with
             | the Windows Live brand from 2005-2012, and "Xbox" for a
             | wide range of gaming projects (many of which were on PC).
        
               | thomasjudge wrote:
               | related, Microsoft on Microsoft marketing:
               | 
               | https://www.youtube.com/watch?v=EUXnJraKM3k
        
         | sgarland wrote:
         | I thought of that as well. The only thing I could think of is
         | that they mentioned that they don't want to move off of RDS,
         | and there is 0% chance of Citus coming to AWS since Microsoft
         | bought them.
        
         | jabart wrote:
         | Figma uses AWS RDS, RDS doesn't list citus as a supported
         | extension.
        
           | gen220 wrote:
           | This is my guess of why they didn't use Citus. They weren't
           | interested in the options of (1) going multi-cloud [DB in
           | Azure Cosmos / Backend(s) in AWS] (2) going all-in on Azure
           | [DB in Azure Cosmos / Backend(s) in Azure] (3) self-managing
           | Postgres+Citus in EC2.
           | 
           | It'd be interesting to compare the expected capex of
           | developing this in-house solution + the opex of maintaining
           | it vs the same categories of expected costs for option (3) -
           | because I imagine that's probably the most palatable option.
           | 
           | They also may have pre-paid for dedicated RDS instances for
           | the next X years ( _before_ this horizontal scaling
           | initiative began, to boot), as AWS allows companies to do
           | this at a pretty steep discount rate, which would probably
           | tilt them away from (3).
        
             | sgarland wrote:
             | Especially because Option 3 lets you go waaaay farther on
             | vertical scaling, since you can get native NVMe drives
             | (they mentioned hitting IOPS limits for RDS), more exotic
             | instance classes with far more RAM, and do stuff like ZFS
             | for native compression and snapshots.
        
         | gshulegaard wrote:
         | I have worked on teams that have both sharded and partitioned
         | PostgreSQL ourselves (somewhat like Figma) (Postgres 9.4-ish
         | time frame) as well as those that have utilized Citus. I am a
         | strong proponent of Citus and point colleagues in that
         | direction frequently, but depending on how long ago Figma was
         | considering this path I will say that there were some very
         | interesting limitations to Citus not that long ago.
         | 
         | For example, it was only 2 years ago that Citus allowed the
         | joining of data in "local" tables and data retrieved from
         | distributed tables (https://www.citusdata.com/updates/v11-0).
         | In this major update as well, Citus enabled _any_ node to
         | handle queries, previously all queries (whether or not it was
         | modifying data) had to go through the "coordinator" node in
         | your cluster. This could turn into a pretty significant
         | bottleneck which had ramifications for your cluster
         | administration and choices made about how to shape your data
         | (what goes into local tables, reference tables, or distributed
         | tables).
         | 
         | Again, huge fan of Citus, but it's not a magic bullet that
         | makes it so you no longer have to think about scale when using
         | Postgres. It makes it _much_ easier and adds some killer
         | features that push complexity down the stack such that it is
         | _almost_ completely abstracted from application logic. But you
         | still have be cognizant of it, sometimes even altering your
         | data model to accommodate.
        
           | gen220 wrote:
           | You also benefit from the tailwind of the CitusData team
           | making continued improvement to the extension, whereas an in-
           | house system depends on your company's ability to hire and
           | retain people to maintain + improve the in-house system.
           | 
           | It's hard to account for the value of benefits that have yet
           | to accrue, but this kind of analysis, even if you pretty
           | heavily-discount that future value, tilts the ROI in favor of
           | solutions like Citus, IMO. Especially if your time horizon is
           | 5+ or 10+ years out.
           | 
           | Like you said, if they made this decision 3ish years ago, you
           | would have had to be pretty trusting on that future value. A
           | choice, made today, hinges less on that variable.
        
             | wayne-li2 wrote:
             | Huh, I would have thought the opposite. Companies at Figma
             | size are easily able to hire talent to maintain a core part
             | of their engineering stack. On the other hand, they retain
             | no control of Citus decision making. Those tailwinds could
             | easily have been headwinds if they went in a direction that
             | did not suit Figma.
        
         | gregors wrote:
         | Where's the fun in that? I'm not being snarky either. Maybe
         | it's not the best decision business-wise, but I guarantee it
         | was more challenging and more fun. There's something to be said
         | for that.
        
       | TkTech wrote:
       | Am I the only one finding the layout of this blog distracting?
       | Kind of disappointing from a UX company. The images are also
       | massive, the page was 42.21mb!
       | 
       | Good article none the less! Always appreciate when companies like
       | Figma document technical challenges.
        
         | renegade-otter wrote:
         | It seems like someone at Figma decided to use the latest CSS
         | tricks they just had discovered. Changing background color?
         | Come on.
        
           | Cwizard wrote:
           | I thought this was a bug at first. Does anyone actually want
           | this?
        
           | counterfaccola wrote:
           | Just give us white! That purple and yellow-green combo is
           | quite distracting!
        
         | zelphirkalt wrote:
         | Maybe someone designed this castle in Figma.
        
         | ghostly_s wrote:
         | Odd, it looks perfectly bog-standard for me on Firefox and iOS,
         | aside from the lavender bg which I quite like. I even disabled
         | my ad-blocker to see if it made a difference, no changing bg
         | colors for me...
        
       | vinner_roy wrote:
       | Could you use Aurora Limitless for this instead?
       | https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-au...
        
         | bearjaws wrote:
         | I doubt even VC money can afford this service.
         | 
         | Serverless Aurora is incredibly expensive for most workloads. I
         | have yet to find a use case for any SaaS product that is used
         | >4 hours a day. Since all my products span at least 3 time
         | zones there is at least 12 hours of activity a day.
        
           | brcmthrowaway wrote:
           | What products
        
           | Scubabear68 wrote:
           | We found this out the hard way in a small startup. The per
           | query and I/O expense was through the roof.
        
             | yard2010 wrote:
             | Did it work though? Did you achieve unlimited scaling?
             | Because if so you should compare the price to the price of
             | a team of great minds such as in this article, working for
             | 2 years to get a solution.
             | 
             | I bet it still would be cheaper to pay people over Amazon,
             | but I'm curious about the numbers
        
         | paulddraper wrote:
         | "Limitless" refers to the bill, not just the scale.
        
         | MapleWalnut wrote:
         | Yes, but that wasn't available when they did this migration
        
         | sgarland wrote:
         | At my company, we were given an early talk on Limitless. Never
         | once did the reps mention that it ran on Serverless. Dear lord,
         | that's going to be a hard no from me. Unless they've
         | dramatically changed pricing for Limitless as opposed to normal
         | Serverless, that'll be through the roof.
        
         | vinner_roy wrote:
         | Haha alright I get the picture. Too expensive.
        
       | cynicalsecurity wrote:
       | It would have benefitted the article to use less of annoying
       | corporate talk.
        
       | ijustlovemath wrote:
       | > NoSQL databases are another common scalable-by-default solution
       | that companies adopt as they grow. However, we have a very
       | complex relational data model built on top of our current
       | Postgres architecture and NoSQL APIs don't offer this kind of
       | versatility.
       | 
       | As I understand it, NoSQL is for people who need a backend that
       | ingests just about any unstructured data, for teams that may not
       | have a complex relational model worked out/stabilized. Postgres
       | has this in its native jsonb datatype, but they wouldn't need to
       | use that much since it sounds like they already have a good data
       | model. What am I missing here?
        
         | opportune wrote:
         | What is your exact question? To me it makes sense that you'd
         | not want to use NoSQL if you're dealing with data that's
         | already relational, and heavily leveraging features common in
         | relational DBs that may not come out of the box with NoSQL DBs.
         | 
         | They're saying basically that NoSQL DBs solve a lot of
         | horizontal scaling problems but aren't a good fit for their
         | highly relational data, is my understanding. Not that they
         | can't get NoSQL functionality at eg the query level in
         | relational DBs.
        
         | ffsm8 wrote:
         | I think they're equating relation database with databases with
         | ACID guarantees, as thats basically a full overlap on a venn
         | diagram.
         | 
         | And we all know that acid has to go at _some_ scale, even if
         | that scale keeps getting pushed further out as our hardware
         | gets better.
         | 
         | (Same with the relational guarantees that eat performance...
         | But only once you've reached a certain amount of throughput and
         | data)
        
         | suhastech wrote:
         | Using NoSQL might not be the best idea in this case. I've seen
         | it backfire for many companies. They start with NoSQL, but then
         | end up needing relational features as their business grows.
         | This leads to performance issues, redundancy, and data sync
         | problems early on, which shouldn't be happening.
         | 
         | Especially in the early days, NoSQL companies used to market
         | their databases as general-purpose database that scale easily,
         | but that hasn't always been the case obviously.
         | 
         | I usually recommend starting with a relational database like
         | PostgreSQL. If scaling becomes necessary later on, you can
         | invest in sharding the database. Figma's approach seems
         | reasonable given the tools available at the time.
         | 
         | I've helped small companies switch from NoSQL to SQL because
         | the benefits of NoSQL weren't worth the trade-offs at their
         | stage of growth. In case, anyone is in a similar boat:
         | https://mongotosqlmigration.com/
        
       | jeffbee wrote:
       | Given the list of authors and acknowledgees, what I'd really like
       | to read is the differences between this solution and Dropbox's.
        
         | sp1nozick wrote:
         | horizontal sharding is such a foundational skillset in working
         | with databases - it's essential to database architecture
        
       | HermitX wrote:
       | I This is an intriguing article, clearly showing the team's
       | fondness for Postgres. However, Postgres is an OLTP product. I'm
       | curious about what system Figma's data team uses for their data
       | analysis tasks.
        
         | jerrygenser wrote:
         | They mentioned analyzing query logs in Snowflake in this very
         | article. So... at least they use Snowflake for some things?
        
       | cynicalsecurity wrote:
       | They came up with a really over-engineered, over-complicated
       | attempt at splitting data into multiple databases. I'm not sure
       | this was the best idea or even a good idea, but then I also don't
       | understand how it came to the situation when they only had a few
       | months before their current database gets overflown and the whole
       | system collapses.
        
         | dwaltrip wrote:
         | Can you offer insight into what a better approach might have
         | been?
        
           | TheP1000 wrote:
           | As others have mentioned, moving to per tenant databases can
           | really simplify things at scale and doesn't leave a massive
           | amount of engineering complexity and debt in its wake.
           | 
           | I feel sorry for the team managing this 5 years from now.
        
             | cynicalsecurity wrote:
             | Exactly this. But at this point, I don't even want to give
             | them advice, I don't really like their service. I like
             | Lunacy more.
        
             | simonw wrote:
             | Moving to a per-tenant database sounds like even more work
             | to me than moving to shards. Moving to per-tenant means
             | rewriting _everything_ - moving to shards has you rewriting
             | a lot less.
        
               | infra_dev wrote:
               | This is possible to do, but lots of engineering. You can
               | provide the experience of a single DB while each tenant
               | can be placed in their own dedicated Postgres compute.
               | This would help the application to stay the same while
               | tenants are moved to independent computes (you can even
               | move only a few tenants and leave the rest on a shared
               | Postgres compute).
        
       | hintymad wrote:
       | Given that sharding has become a pretty mature practice, is it
       | still worth considering the NewSQL solutions like CRDB, Yugabyte,
       | and TiDB for the sake of auto sharding, given that these NewSQL
       | databases usually trade throughput and latency for auto-sharding
       | and multi-region support? Another added cost is learning how to
       | operate NewSQL databases, assuming one is already familiar with
       | either MySQL or Postgres.
        
         | emmanueloga_ wrote:
         | In the OP they said they built their own sharding solution
         | because it was risky to move to a different (newsql) solution
         | and they already had sharding expertise with PG.
         | 
         | I think if starting from scratch it makes sense to look at
         | these newsql DBs that are built to horizontally scale from the
         | very beginning.
        
       | Thaxll wrote:
       | How transactions work when you end up querying different shards?
        
         | blauditore wrote:
         | IIUC they never have db transactions across shards, because
         | customers are basically data silos, which makes sharding quite
         | straightforward.
        
         | simonw wrote:
         | I imagine that's one of the reasons they have "colos" - you can
         | aim to collocate tables that are likely to be part of the same
         | transaction in the same shard by putting them in the same colo
         | group.
        
       | gfodor wrote:
       | Honestly it's depressing that I remember doing this same thing
       | more than a decade ago and we still have to spend developer
       | cycles on this nonsense in 2024.
        
       | tbarbugli wrote:
       | Citus seems incredibly close to what they built, I wonder why
       | they did not use it
        
       | AtlasBarfed wrote:
       | 1 postgres
       | 
       | 2 postgres replicas+sharding
       | 
       | 3 Cassandra / dynamo
       | 
       | If you are at stage 2, you are using bandaids. You need to start
       | architecting a transition of your most heavily used data views to
       | truly scalable databases like Cassandra / dynamo. I have not used
       | foundation, but aphyr liked it.
       | 
       | That transition takes time, evaluation, pro typing, and
       | organizational learning both at the application programmer,
       | support engineer, and management.
        
       ___________________________________________________________________
       (page generated 2024-03-14 23:00 UTC)