[HN Gopher] How Figma's databases team lived to tell the scale
___________________________________________________________________
How Figma's databases team lived to tell the scale
Author : pinser98
Score : 334 points
Date : 2024-03-14 17:51 UTC (5 hours ago)
(HTM) web link (www.figma.com)
(TXT) w3m dump (www.figma.com)
| esafak wrote:
| This is such a familiar story. Company starts with a centralized
| database, runs into scaling problems, then spends man-years
| sharding it. Just use a distributed database.
| Closi wrote:
| This assumes that using a distributed database from the start
| doesn't offer it's own penalties/drawbacks (particularly in
| 2016).
|
| Particularly considering as per the figma note, they consider
| their data highly relational (i.e. presumably this means lots
| of joins in queries)
|
| What database would you have chosen?
| esafak wrote:
| That's a fair response for then, but not today where you're
| spoiled for choice. Spanner was released as a service in
| 2017, which isn't far off.
| prisenco wrote:
| Spanner is quite expensive though.
| jack_riminton wrote:
| If every startup used every scale-proof method available from
| the beginning they'd never have the time or resources to build
| the products to get the customers that would require them to
| use the scale-proof methods
| esafak wrote:
| I agree with your general point but there are relational
| distributed databases, open source and "serverless", that are
| compatible with MySQL (planetscale, tidb, vitess...) and
| postgres (citus, cockroachdb...)
|
| edited for examples.
| noja wrote:
| One reason they don't is unhelpful comments like these:
| alluding to products without naming them.
| ofrzeta wrote:
| From the article: "During our evaluation, we explored
| CockroachDB, TiDB, Spanner, and Vitess". Three of them
| are open source if I am not mistaken.
| mplanchard wrote:
| And many of these are substantially more expensive than
| RDS, or have terrible INSERT performance, or require making
| the correct decisions really early about what to use as
| partition keys. There's no free lunch, sadly.
| Closi wrote:
| I don't think these were mature products when Figma started
| to be developed in 2012.
| mamcx wrote:
| There are many kinds of apps that use of "distributed"
| databases is an antipattern...and useless to make them scale.
|
| MOST apps did not need more than "tenant-per-company" and then,
| maybe, distributed among a small set of servers.
|
| Is it just that the kind of app that shows here (and has this
| problems) are just niche apps like social networks and complex
| chat apps.
| dang wrote:
| " _Please don 't post shallow dismissals, especially of other
| people's work. A good critical comment teaches us something._"
|
| https://news.ycombinator.com/newsguidelines.html
| battwell wrote:
| Doesn't seem a crazy way to start a company. RDS will have
| scaling problems but is very mature and fairly easy to use
| early on when you're working on your MVP
|
| I've used CRDB early on at a startup. There was some overhead.
| You don't get all the nice PSQL features
|
| Although it did save us a giant migration later on
| nosefrog wrote:
| Coming from Google, where Spanner is this magical technology that
| supports infinite horizontal sharding with transactions and has
| become the standard storage engine for everything at Google
| (almost every project not using Spanner was moving to Spanner),
| I'm curious how Figma evaluated Cloud Spanner. Cloud Spanner does
| have a postgres translation layer, though I don't know how well
| it works.
|
| It seems like they've (hopefully only temporarily) given up real
| transactional support with their horizontal postgres scheme?
| echelon wrote:
| Why would _anyone_ marry themselves to Google? That sounds like
| the most boneheaded move possible.
|
| First, you should never be beholden to a single vendor for the
| most critical technological underpinnings. You're backing
| yourself into a corner.
|
| But more importantly, Google can't even figure out how to
| prioritize their cloud efforts. That's not a good partnership
| to be in for anyone except Google.
|
| I wouldn't care if my solution was 10x worse than Cloud Spanner
| from a technology perspective. It'd be 1000x better from a
| strategy perspective.
|
| You can hire engineers to do consistency at scale. It's your
| core competency and you can't just handwave and outsource that,
| lest you wind up stuck in a crevasse. Hire smart engineers and
| do the work yourself. It'll pay off.
| Thaxll wrote:
| You can't, that's why spanner only exists a Google. That tech
| is that good, not found anywhere else.
| echelon wrote:
| But you don't need it. There are a limitless number of ways
| to deal with the problems spanner solves. And by choosing
| your own, you can focus on the subset that matter to your
| case and not be tied down to Google.
| hipadev23 wrote:
| The problem is nobody outside Google trusts them to run or
| operate anything.
|
| Edit: To the Googlers downvoting these comments. Your behavior
| only reinforces our views.
| szundi wrote:
| Google means: good chance discontinued after you have worked
| out the bugs and have a stable system at last
| groestl wrote:
| This is definitely not the case with their core cloud
| products.
|
| > almost every project not using Spanner was moving to
| Spanner
|
| This even includes Datastore. Even Datastore moved to
| Spanner.
| ksb wrote:
| Ping time from AWS data centers to GCP ones
| umvi wrote:
| Never a good idea to rely on Google proprietary tech (unless
| you are Google)... it could be sunset at any time without
| warning. I use GCP but I try my best to stay Google agnostic
| (avoid GCP-only offerings, etc) so that I can move to AWS if
| Google pulls the rug out from under me.
| rockostrich wrote:
| GCP products have a much better track record than Google
| consumer products when it comes to support since there are
| usually enterprise customers with multi-year contracts worth
| tens, if not hundreds, of millions of dollars using them.
| callalex wrote:
| That's what AppEngine customers thought.
| weitendorf wrote:
| I worked on AppEngine (well, Serverless Compute) at
| Google for over 4 years and left on Friday. Did something
| happen to AppEngine in the last week?
| callalex wrote:
| I'm talking about the pricing disaster that happened in
| 2017/2018 where user prices went up from 10x-100x because
| Google wanted to kill the product without actually
| killing it.
| jerrygenser wrote:
| IoT is one example of a big backbone service that was
| sunset.
| endisneigh wrote:
| It had barely any usage though from what I can tell from
| searching about it.
|
| Not that it's any solace to those affected.
| marcinzm wrote:
| AI Platform is a recent example that's been deprecated.
| weitendorf wrote:
| I'm biased having worked on GCP, but I think GCP actually has
| a very good track record of not sunsetting entire products or
| removing core functionality. When I worked on AppEngine, I
| would often see apps written 10+ years ago still chugging
| along.
|
| It is true though that GCP sometimes sunsets specific product
| functionality, requiring changes on the customers' part. Some
| of these are unavoidable (eg committing to apply security
| patches to Python 2.7 given that the rest of the world is
| mostly not upstreaming these patches anymore), but not all of
| them.
|
| I would certainly use Cloud Spanner externally now that I've
| left the company, and IMO spanner has such a compelling
| featureset that it's a strong reason to use GCP for
| greenfield development. The problem though is that it's only
| available on GCP, and could become expensive at large scale.
| jerrygenser wrote:
| >It is true though that GCP sometimes sunsets specific
| product functionality, requiring changes on the customers'
| part. Some of these are unavoidable (eg committing to apply
| security patches to Python 2.7 given that the rest of the
| world is mostly not upstreaming these patches anymore), but
| not all of them.
|
| A good example is probably IoT. I've heard first hand
| anecdotes of very difficult migrations off this service.
| shalabhc wrote:
| Global consistency is expensive, both latency-wise and cost-
| wise. In reality most apps don't need global serializability
| across all objects. For instance, you probably don't need
| serializability across different tenants, organizations,
| workspaces, etc. Spanner provides serializability across all
| objects IIUC - so you pay for it whether you need it or not.
|
| The other side of something like Spanner is the quorum-based
| latency is often optimized by adding another cache on top,
| which instantly defeats the original consistency guarantees.
| The consistency of (spanner+my_cache) is not the same as the
| consistency of spanner. So if we're back to app level
| consistency guarantees anyway, turns out the "managed" solution
| is only partial.
|
| Ideally the managed db systems would have flexible consistency,
| allowing me to configure not just which object sets need
| consistency but also letting me configure caches with lag
| tolerance. This would let me choose trade-offs without having
| to implement consistent caching and other optimization tricks
| on top of globally consistent/serializable databases.
| eatonphil wrote:
| See also: "Strict-serializability, but at what cost, for what
| purpose?"
|
| https://muratbuffalo.blogspot.com/2022/08/strict-
| serializabi....
| foota wrote:
| While it doesn't help much with the costs of replication,
| Spanner can be configured with read only replicas that don't
| participate in voting for commits, so they don't impact the
| quorum latency.
|
| Reads can then be done with different consistency
| requirements, e.g., bounded staleness (which guarantees data
| less stale than the time bound requested).
|
| See https://cloud.google.com/spanner/docs/reference/rest/v1/T
| ran... or
| https://cloud.google.com/spanner/docs/reads#read_types and
| https://cloud.google.com/spanner/docs/create-manage-
| configur...
| opportune wrote:
| My perspective from working both inside and outside of Google:
|
| The external spanner documentation doesn't seem as good as the
| internal documentation, in my opinion. Because it's not
| generally well known outside of G, they ought to do a better
| job explaining it and its benefits. It truly is magical
| technology but you have to be a database nerd to see why.
|
| It's also pretty expensive and because you generally need to
| rewrite your applications to work with it, there is a degree of
| lockin. So taking on Spanner is a risky proposition - if your
| prices get hiked or it starts costing more than you want,
| you'll have to spend even more time and money migrating off it.
| Spanner's advantages over other DBs (trying to "solve" the CAP
| theorem) then become a curse, because it's hard to find any
| other DB that gives you horizontal scaling, ACID, and high
| availability out of the box, and you might have to solve those
| problems yourself/redesign the rest of your system.
|
| Personally I would consider using Cloud Spanner, but I wouldn't
| bet my business on it.
| zenbowman wrote:
| I wouldn't say "infinite", its still susceptible to read
| hotspotting; and while fine-grained locking enables generally
| higher write throughputs, you can still get in a situation
| where interconnected updates end up being pretty slow.
|
| That said, its way better than anything else I've used in my
| career.
| emptysea wrote:
| Seems they've built out a PG version of MySQL's Vitess
|
| Query rewriting seems interesting, having a layer between your DB
| and your application would also allow various ACL stuff as well
| goshx wrote:
| "Sorry, you've stumbled upon a temporary technical issue. Please
| try refreshing this page in a moment."
|
| Have they? :) I am excited to read the article when it loads
| nix0n wrote:
| It worked for me on the first try, but here's an archive link
| in case it goes down again: https://archive.is/xusR7
| Lucasoato wrote:
| This might be another case of "death by HN"
| mannyv wrote:
| Hmm, wonder why they didn't try FoundationDB.
|
| Interesting that they had problems with vacuuming. I always
| thought that part of Postgres was the worst part. I vaguely
| remember that you needed twice the space to vacuum successfully,
| which hopefully has changed in later versions.
|
| An article about why vacuum is needed in pg (as compared to
| mysql/innodb).
|
| http://rhaas.blogspot.com/2011/02/mysql-vs-postgresql-part-2...
| eatonphil wrote:
| The article mentions they're trying very hard to stick with
| Postgres. FoundationDB is great but doesn't even have a SQL
| access layer, let alone a Postgres SQL access layer. :)
| harikb wrote:
| Do you know of any performant relational-db layer on top of
| FoundationDB? It seems there usecase would need at least simple
| join, which raw FoundationDB lacks.
| yashap wrote:
| They want pretty full SQL support, so FoundationDB would be
| out. I'd think "buy" options for them would be more like Citus,
| Yugabyte, Cockroach or Spanner.
| dakiol wrote:
| I cannot help but think this all sounds pretty much like a hack
| (a clever one, though).
|
| We do not handle, let's say low-level I/O buffering/caching, by
| ourselves anymore, right? (at least the folks doing web
| development/saas). We rely instead of the OS APIs, and that's
| good. I think we are missing something similar but for db
| sharding. It seems to me that we are still missing some
| fundamental piece of technology/infrastructure to handle
| horizontal data sharding.
| simonw wrote:
| Citus and Vitess are examples of horzontal data sharding
| technology for PostgreSQL and MySQL respectively.
|
| At Figma's size there are sometimes reasons to roll your own,
| which I think they explain pretty clearly in the article. They
| wanted a solution they could incrementally engineer onto their
| existing stack without doing a full rewrite or lift-and-shift
| to something else.
| yard2010 wrote:
| This is Hacker News after all
| bjornsing wrote:
| One thought that comes up: Wouldn't it be easier to have each
| customer in their own (logical) database? I mean, you don't need
| transactions across different customers, right? So you're
| essentially solving a harder problem than the one you've got.
|
| Not sure postgres (logical) databases would scale that well, but
| don't see a principal reason why it couldn't. Has anyone explored
| this further?
| eatonphil wrote:
| I imagine it only gets you so far. What do you do about
| customers like Walmart or Oracle? Hundreds, if not thousands,
| of users all leaving hundreds of comments on Figma files every
| day. If you want good latency without giving up strong
| consistency (which the article says they want) you'll need to
| keep sharding.
| willsmith72 wrote:
| A single db can handle that load easily
| jeremyjh wrote:
| They were running their whole business on one RDS instance 4
| years ago. Do you think they now have one customer larger
| than all their customers combined 4 years ago?
| eatonphil wrote:
| > Figma's database stack has grown almost 100x since 2020
|
| The first sentence from the article seems to suggest its
| possible?
| infecto wrote:
| I bet it gets you further than you imagine. Entirely depends
| on the backend services and what they touch but in this
| scenario you would be deploying/scaling that service based on
| the customer seat size. I suspect that even for large
| enterprise customers, the users actively touching Figma are
| not reaching he thousands but I am happy to be wrong.
|
| After all, Stackoverflow is running off of a handful of
| machines.
| jakewins wrote:
| The problem - conceptually - is made much simpler this way; we
| make use of this at work.
|
| However you will still have shared resource problems - some
| rogue query destroys IOPS in one tenant now ends up bringing
| down all tenants etc. There are in theory databases that solve
| this as well, but my experience has been that at that point
| what you buy into is a bad version of resource sharing - ie
| what an operating system does - and you're better off using OS
| mechanisms
|
| In other words: yes, but you still have noisy neighbours, and
| may be better off running lots of small fully separated DBMSes
| than a big logically separated one
| hacker_newz wrote:
| If tenants are on separate databases how would that be an
| issue?
| jakewins wrote:
| Separate _logical_ databases, within the same RDBMs, so
| sharing CPU, disks, RAM etc
| sgarland wrote:
| Because database, in Postgres terms, doesn't mean physical
| node. It's more akin to a VM than anything else. The term
| for an installation of Postgres is database cluster, which
| can contain N databases, each of which can contain M
| schemas.
| aeyes wrote:
| I have pondered about this for quite some time and came to the
| conclusion that it would make schema migrations more difficult
| to handle. I think Shopify is using an approach which is
| similar to what you are describing. The advantage is that you
| don't end up with hot shards because you can move around large
| customers independently.
|
| In practice there isn't a big difference, they just colocate
| several customers according to their sharding key in the same
| logical database.
| jamesfinlayson wrote:
| I remember Postgres table spaces being used to separate
| customers at a previous job - I can't remember how migrations
| were handled (pretty sure they were applied per table space)
| but I don't think it was a problem (at our scale anyway).
| yard2010 wrote:
| I worked in a place that had a database for each tenant and
| the schema migrations were a real pain. Every time everything
| goes smoothly except these few databases that have an edge
| case that screws the whole migration.
| N_A_T_E wrote:
| It sounds like they actually did something like this. Their
| shard key selection could be customer, project, folder or
| something in their data model at a reasonably high logical
| level in their hierarchy.
| infra_dev wrote:
| Nile is a Serverless Postgres that virtualizes
| tenants/customers. It is specifically built for SaaS companies
| similar to Figma https://www.thenile.dev/. I am the CEO of
| Nile.
| GordonS wrote:
| I've done something like this before, with each customer
| getting their own _schema_ within a single Postgres instance.
| jerbear4328 wrote:
| Actually, Apple does this for iCloud! They use FoundationDB[1]
| to store billions of databases, one for each user (plus shared
| or global databases).
|
| See: https://read.engineerscodex.com/p/how-apple-built-icloud-
| to-...
|
| Discussed on HN at the time:
| https://news.ycombinator.com/item?id=39028672
|
| [1]: https://github.com/apple/foundationdb
| https://en.wikipedia.org/wiki/FoundationDB
| danpalmer wrote:
| > store billions of databases
|
| This is sort of true and sort of false. When you think of a
| "database", if you're thinking of a Postgres database, you're
| way off the reality of what "database" means here.
|
| FoundationDB has a concept called "layers", and essentially
| they have created a layer that looks like a separate database
| on top of a layer that is separately encrypted groups of
| keys. They don't have billions of FoundationDB clusters or
| machines, and at the infra level, i.e. the instances of the
| FoundationDB server software, it's unaware of individual
| "databases".
|
| A closer analogy would be like having billions of tables, but
| even that isn't accurate because in relational databases a
| table is usually a notably more static concept than data in a
| table. The closest analogy would be that each of the billions
| of users has a bunch of rows with a user-id field on them,
| and there's a proxy that filters everything such that you can
| view the table as if it only had one user's data in it.
|
| To be clear, FoundationDB is awesome and Apple have done some
| really cool stuff with it, but it's less crazy/impressive
| than it sounds.
| dexwiz wrote:
| This sounds like a pretty standard multitenant datastore.
| Everything has a user/group Id on it, and a logical layer
| that locks a connection to a specific group.
| danpalmer wrote:
| Yeah, the advantage or difference here is that these
| "layers" are a common design pattern with FoundationDB,
| several ship in FDB by default, and you're encouraged to
| make more, so the database certainly has better support
| than just adding a column for TenantID, but still you're
| right that it's not too out there.
| dhash wrote:
| I believe physalia [0] explores this concept at production
| scale quite well.
|
| [0] https://blog.acolyer.org/2020/03/04/millions-of-tiny-
| databas...
| paxys wrote:
| This works great until (1) your largest customer outgrows the
| largest available DB (happens sooner than you'd think for large
| companies) or (2) you _do_ need transactions across different
| customers, say to facilitate some kind of sharing. Going all-in
| on the isolated tenant strategy means when you hit one of these
| cases it 's a nightmare to unwind and rearchitect your entire
| DB layer.
| winrid wrote:
| A figma customer won't exceed the requirements of an
| i3.metal...
| jddj wrote:
| Can you give an example of when a single customer has
| outgrown the largest available DB?
| jasonwatkinspdx wrote:
| I'd respond by saying (1) is more rare than you're asserting.
|
| There is a huge long tail of companies with datasets that
| won't fit on a single machine but can be handled by a dozen
| or so, and where no single customer or service is near the
| limits of an individual node.
|
| These customers are poorly served at the moment. Most of the
| easy to implement SaaS options for what they need would be
| hugely costly vs a small fleet of db servers they administer.
| Meanwhile, most of the open source options are cargo culting
| Google or Facebook style architecture, which is a huge burden
| for a small company. I mean do you really want to run K8S
| when you have 10 servers in total?
|
| I think there's a lot of interesting stuff happening in this
| end of the market that's not trying to be a mini Google, like
| Fly.io.
|
| As for (2), I think a middle ground is supporting cross shard
| transactions but not joins. This works well enough for VoltDB
| et all.
| zarzavat wrote:
| Figma is more or less a desktop application that happens to
| run in a web browser.
|
| If I use Photoshop to edit a .psd file I don't think "man
| that psd file should really be stored in a single planet-
| sized database of all psd files in existence". It's just a
| file on my computer.
|
| Figma requires a little bit more intermingling of data than
| Photoshop, it has multiuser support for one, so a pure local
| storage based approach wouldn't work. But, at its heart it's
| still based on the document model. When you open a Figma
| document it's its own isolated little universe, the
| connections with resources outside the document are limited,
| and that matches user expectations.
| ummonk wrote:
| This seems to be an architecture Cloudflare is aiming to
| support with their SQLite service. One database per customer,
| each database located in the customer's primary region.
| jasonwatkinspdx wrote:
| I think there's quite a few people chasing similar ideas,
| like Azure's Durable Entities.
|
| I've been calling it the Lots of Little Databases model vs
| the Globe Spanning Gorilla.
|
| Like the Spanner paper points out, even if your distributed
| database semantically appears like a single giant instance,
| in practice performance means developers avoid using
| distributed joins, etc, because these can lead to shuffling
| very large amounts of intermediate results across the
| network. So the illusion of being on a single giant machine
| ends up leaking through the reality, and people end up
| writing workarounds for distributed joins like async
| materialization.
|
| If we give up the single machine illusion we get a lot of
| simplification, at the cost of features devs were unlikely to
| use anyhow. I see having consistent distributed commit but
| without cross shard joins as a really interesting
| alternative.
|
| And besides scalability I like the extra security rope of
| fine grained partitioning from the start.
|
| I'll write a blog post along these lines if I get anything
| worthwhile done.
| cavisne wrote:
| Spanners interleaved tables seem like a similar solution, ie
| you interleave customer data so it all ends up on the same set
| of hosts for performance, while still having the ability to
| create transactions across customers.
| tharkun__ wrote:
| Yes, we've been doing that at my place basically since the
| start. Each tenant is a schema in postgres. Works perfectly
| fine on the one hand, i.e. your tables don't grow to 'infinity'
| just because you're adding more and more tenants. If there's a
| particular tenant that has lots of data, only that tenant's
| indexes and tables grow huge and become slower because of that
| particular reason etc. If a tenant leaves, you keep the schema
| around for some time, so they can come back and then at some
| point you just drop the schema!
|
| It does mean having to upgrade each schema individually, which
| also makes it both easier and harder. Easier because the tables
| are smaller, so any schema changes that require things like say
| a table lock, are locking for a smaller amount of time and
| won't affect more than the one tenant at a given time. It also
| means that you can get into an inconsistent state of course,
| where some of your tenants have all the latest DB upgrades,
| while it failed on another subset.
|
| At some point Postgres's internal tables become a bit of a
| "problem", as you want to run as many of these updates in
| parallel as you can for speed, which could lead to contention
| on Postgres' administrative side. You'll also still need to
| shard across multiple actual RDS instances, because you still
| have many tenants running against a single physical piece of
| hardware that will show its limitations if too many large or
| active tenants happen to be on the same shard.
|
| And then you have the problem of keeping a record of which
| physical RDS instance (i.e. shard) the tenant is on. Your
| application code will need to look that up (and cache that info
| for some time ;)) and you have some choice there as well. I.e.
| do you shard those as well and juggle load balancing as
| basically a 1:1 mapping to shards or do you have your
| application layer connect to _all_ database shards and handle
| _any_ tenants? One is more complicated I would say while the
| other could run out of connections depending on how you need to
| scale the application layer and what kind of RDS instance you
| have.
| icedchai wrote:
| I worked at a place that did this with MySQL. Every tiny, trial
| account got their own database. Every huge, actual customer got
| their own database. Migrations were kinda painful. I would
| think carefully about doing it this way.
| petervandijck wrote:
| Ongoing product development with migrations, tweaking indexes
| etc. becomes really hard. Every small database tweak now has to
| be deployed over 1000s of databases.
| adastral wrote:
| I see they don't mention Citus
| (https://github.com/citusdata/citus), which is already a fairly
| mature native Postgres extension. From the details given in the
| article, it sounds like they just reimplemented it.
|
| I wonder if they were unaware of it or disregarded it for a
| reason --I currently am in a similar situation as the one
| described in the blog, trying to shard a massive Postgres DB.
| _boffin_ wrote:
| how "massive" is massive in your case?
| adastral wrote:
| Around ten heavily-updated (50-400k updated rows/min) tables
| ranging between 500M and 5B rows, with a couple tables over
| 40B rows each (5TB each IIRC).
| dijit wrote:
| I've had CitusDB running across 68 bare metal machines (40
| vCPU, 768GiB ram, 20TiB of storage each + 40GiB network
| links) and it ran decently well.
|
| Not sure what your definition of massive is, I think Spanner
| would easily beat it.
|
| Also, it's very use-case dependent, you can't "just use"
| Citus for everything, it's not _quite_ as flexible as a bog-
| standard pgsql install due to the way it 's sharding, you
| have to be a tad more careful with your data model.
| skunkworker wrote:
| What is your definition of "decently well", and is your
| primary cluster (without replicas) above 1PB?
| simonw wrote:
| They said 20TiB * 68, which I think is 1.5PB.
| iamdanieljohns wrote:
| I would love to see a comparison of the major PostgresQL
| services such as Citus, EDB, Crunchy, Neon, and some OSS
| distributions/packages
| junto wrote:
| Before clicking on the article I assumed it was Citus, and was
| surprised when it wasn't.
|
| Maybe because CitusData was bought by Microsoft around the same
| time, so Microsoft could create "Azure Cosmos DB for Postgres
| Cluster", yet another one of Microsoft's typical product naming
| crapshoots.
| victor106 wrote:
| > yet another one of Microsoft's typical product naming
| crapshoots.
|
| Well said. I haven't seen any company as terrible as
| Microsoft at naming things. Anyone know why?
| studmuffin650 wrote:
| AWS is putting up good fight
| salynchnew wrote:
| Naming things is hard.
|
| At a previous employer, I saw several cool-ish open source
| projects instantly doomed to obscurity by picking a name
| that either completely duplicated the name of an existing
| OSS project or were guaranteed to have terrible SEO for
| another reason.
|
| However, Microsoft seems to have a unique crossover of
| fragmented business units and centralized marketing. That's
| why you end up with Azure -> Subproject -> Actual
| Product/Service word soup. Perviously, they did this with
| the Windows Live brand from 2005-2012, and "Xbox" for a
| wide range of gaming projects (many of which were on PC).
| thomasjudge wrote:
| related, Microsoft on Microsoft marketing:
|
| https://www.youtube.com/watch?v=EUXnJraKM3k
| sgarland wrote:
| I thought of that as well. The only thing I could think of is
| that they mentioned that they don't want to move off of RDS,
| and there is 0% chance of Citus coming to AWS since Microsoft
| bought them.
| jabart wrote:
| Figma uses AWS RDS, RDS doesn't list citus as a supported
| extension.
| gen220 wrote:
| This is my guess of why they didn't use Citus. They weren't
| interested in the options of (1) going multi-cloud [DB in
| Azure Cosmos / Backend(s) in AWS] (2) going all-in on Azure
| [DB in Azure Cosmos / Backend(s) in Azure] (3) self-managing
| Postgres+Citus in EC2.
|
| It'd be interesting to compare the expected capex of
| developing this in-house solution + the opex of maintaining
| it vs the same categories of expected costs for option (3) -
| because I imagine that's probably the most palatable option.
|
| They also may have pre-paid for dedicated RDS instances for
| the next X years ( _before_ this horizontal scaling
| initiative began, to boot), as AWS allows companies to do
| this at a pretty steep discount rate, which would probably
| tilt them away from (3).
| sgarland wrote:
| Especially because Option 3 lets you go waaaay farther on
| vertical scaling, since you can get native NVMe drives
| (they mentioned hitting IOPS limits for RDS), more exotic
| instance classes with far more RAM, and do stuff like ZFS
| for native compression and snapshots.
| gshulegaard wrote:
| I have worked on teams that have both sharded and partitioned
| PostgreSQL ourselves (somewhat like Figma) (Postgres 9.4-ish
| time frame) as well as those that have utilized Citus. I am a
| strong proponent of Citus and point colleagues in that
| direction frequently, but depending on how long ago Figma was
| considering this path I will say that there were some very
| interesting limitations to Citus not that long ago.
|
| For example, it was only 2 years ago that Citus allowed the
| joining of data in "local" tables and data retrieved from
| distributed tables (https://www.citusdata.com/updates/v11-0).
| In this major update as well, Citus enabled _any_ node to
| handle queries, previously all queries (whether or not it was
| modifying data) had to go through the "coordinator" node in
| your cluster. This could turn into a pretty significant
| bottleneck which had ramifications for your cluster
| administration and choices made about how to shape your data
| (what goes into local tables, reference tables, or distributed
| tables).
|
| Again, huge fan of Citus, but it's not a magic bullet that
| makes it so you no longer have to think about scale when using
| Postgres. It makes it _much_ easier and adds some killer
| features that push complexity down the stack such that it is
| _almost_ completely abstracted from application logic. But you
| still have be cognizant of it, sometimes even altering your
| data model to accommodate.
| gen220 wrote:
| You also benefit from the tailwind of the CitusData team
| making continued improvement to the extension, whereas an in-
| house system depends on your company's ability to hire and
| retain people to maintain + improve the in-house system.
|
| It's hard to account for the value of benefits that have yet
| to accrue, but this kind of analysis, even if you pretty
| heavily-discount that future value, tilts the ROI in favor of
| solutions like Citus, IMO. Especially if your time horizon is
| 5+ or 10+ years out.
|
| Like you said, if they made this decision 3ish years ago, you
| would have had to be pretty trusting on that future value. A
| choice, made today, hinges less on that variable.
| wayne-li2 wrote:
| Huh, I would have thought the opposite. Companies at Figma
| size are easily able to hire talent to maintain a core part
| of their engineering stack. On the other hand, they retain
| no control of Citus decision making. Those tailwinds could
| easily have been headwinds if they went in a direction that
| did not suit Figma.
| gregors wrote:
| Where's the fun in that? I'm not being snarky either. Maybe
| it's not the best decision business-wise, but I guarantee it
| was more challenging and more fun. There's something to be said
| for that.
| TkTech wrote:
| Am I the only one finding the layout of this blog distracting?
| Kind of disappointing from a UX company. The images are also
| massive, the page was 42.21mb!
|
| Good article none the less! Always appreciate when companies like
| Figma document technical challenges.
| renegade-otter wrote:
| It seems like someone at Figma decided to use the latest CSS
| tricks they just had discovered. Changing background color?
| Come on.
| Cwizard wrote:
| I thought this was a bug at first. Does anyone actually want
| this?
| counterfaccola wrote:
| Just give us white! That purple and yellow-green combo is
| quite distracting!
| zelphirkalt wrote:
| Maybe someone designed this castle in Figma.
| ghostly_s wrote:
| Odd, it looks perfectly bog-standard for me on Firefox and iOS,
| aside from the lavender bg which I quite like. I even disabled
| my ad-blocker to see if it made a difference, no changing bg
| colors for me...
| vinner_roy wrote:
| Could you use Aurora Limitless for this instead?
| https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-au...
| bearjaws wrote:
| I doubt even VC money can afford this service.
|
| Serverless Aurora is incredibly expensive for most workloads. I
| have yet to find a use case for any SaaS product that is used
| >4 hours a day. Since all my products span at least 3 time
| zones there is at least 12 hours of activity a day.
| brcmthrowaway wrote:
| What products
| Scubabear68 wrote:
| We found this out the hard way in a small startup. The per
| query and I/O expense was through the roof.
| yard2010 wrote:
| Did it work though? Did you achieve unlimited scaling?
| Because if so you should compare the price to the price of
| a team of great minds such as in this article, working for
| 2 years to get a solution.
|
| I bet it still would be cheaper to pay people over Amazon,
| but I'm curious about the numbers
| paulddraper wrote:
| "Limitless" refers to the bill, not just the scale.
| MapleWalnut wrote:
| Yes, but that wasn't available when they did this migration
| sgarland wrote:
| At my company, we were given an early talk on Limitless. Never
| once did the reps mention that it ran on Serverless. Dear lord,
| that's going to be a hard no from me. Unless they've
| dramatically changed pricing for Limitless as opposed to normal
| Serverless, that'll be through the roof.
| vinner_roy wrote:
| Haha alright I get the picture. Too expensive.
| cynicalsecurity wrote:
| It would have benefitted the article to use less of annoying
| corporate talk.
| ijustlovemath wrote:
| > NoSQL databases are another common scalable-by-default solution
| that companies adopt as they grow. However, we have a very
| complex relational data model built on top of our current
| Postgres architecture and NoSQL APIs don't offer this kind of
| versatility.
|
| As I understand it, NoSQL is for people who need a backend that
| ingests just about any unstructured data, for teams that may not
| have a complex relational model worked out/stabilized. Postgres
| has this in its native jsonb datatype, but they wouldn't need to
| use that much since it sounds like they already have a good data
| model. What am I missing here?
| opportune wrote:
| What is your exact question? To me it makes sense that you'd
| not want to use NoSQL if you're dealing with data that's
| already relational, and heavily leveraging features common in
| relational DBs that may not come out of the box with NoSQL DBs.
|
| They're saying basically that NoSQL DBs solve a lot of
| horizontal scaling problems but aren't a good fit for their
| highly relational data, is my understanding. Not that they
| can't get NoSQL functionality at eg the query level in
| relational DBs.
| ffsm8 wrote:
| I think they're equating relation database with databases with
| ACID guarantees, as thats basically a full overlap on a venn
| diagram.
|
| And we all know that acid has to go at _some_ scale, even if
| that scale keeps getting pushed further out as our hardware
| gets better.
|
| (Same with the relational guarantees that eat performance...
| But only once you've reached a certain amount of throughput and
| data)
| suhastech wrote:
| Using NoSQL might not be the best idea in this case. I've seen
| it backfire for many companies. They start with NoSQL, but then
| end up needing relational features as their business grows.
| This leads to performance issues, redundancy, and data sync
| problems early on, which shouldn't be happening.
|
| Especially in the early days, NoSQL companies used to market
| their databases as general-purpose database that scale easily,
| but that hasn't always been the case obviously.
|
| I usually recommend starting with a relational database like
| PostgreSQL. If scaling becomes necessary later on, you can
| invest in sharding the database. Figma's approach seems
| reasonable given the tools available at the time.
|
| I've helped small companies switch from NoSQL to SQL because
| the benefits of NoSQL weren't worth the trade-offs at their
| stage of growth. In case, anyone is in a similar boat:
| https://mongotosqlmigration.com/
| jeffbee wrote:
| Given the list of authors and acknowledgees, what I'd really like
| to read is the differences between this solution and Dropbox's.
| sp1nozick wrote:
| horizontal sharding is such a foundational skillset in working
| with databases - it's essential to database architecture
| HermitX wrote:
| I This is an intriguing article, clearly showing the team's
| fondness for Postgres. However, Postgres is an OLTP product. I'm
| curious about what system Figma's data team uses for their data
| analysis tasks.
| jerrygenser wrote:
| They mentioned analyzing query logs in Snowflake in this very
| article. So... at least they use Snowflake for some things?
| cynicalsecurity wrote:
| They came up with a really over-engineered, over-complicated
| attempt at splitting data into multiple databases. I'm not sure
| this was the best idea or even a good idea, but then I also don't
| understand how it came to the situation when they only had a few
| months before their current database gets overflown and the whole
| system collapses.
| dwaltrip wrote:
| Can you offer insight into what a better approach might have
| been?
| TheP1000 wrote:
| As others have mentioned, moving to per tenant databases can
| really simplify things at scale and doesn't leave a massive
| amount of engineering complexity and debt in its wake.
|
| I feel sorry for the team managing this 5 years from now.
| cynicalsecurity wrote:
| Exactly this. But at this point, I don't even want to give
| them advice, I don't really like their service. I like
| Lunacy more.
| simonw wrote:
| Moving to a per-tenant database sounds like even more work
| to me than moving to shards. Moving to per-tenant means
| rewriting _everything_ - moving to shards has you rewriting
| a lot less.
| infra_dev wrote:
| This is possible to do, but lots of engineering. You can
| provide the experience of a single DB while each tenant
| can be placed in their own dedicated Postgres compute.
| This would help the application to stay the same while
| tenants are moved to independent computes (you can even
| move only a few tenants and leave the rest on a shared
| Postgres compute).
| hintymad wrote:
| Given that sharding has become a pretty mature practice, is it
| still worth considering the NewSQL solutions like CRDB, Yugabyte,
| and TiDB for the sake of auto sharding, given that these NewSQL
| databases usually trade throughput and latency for auto-sharding
| and multi-region support? Another added cost is learning how to
| operate NewSQL databases, assuming one is already familiar with
| either MySQL or Postgres.
| emmanueloga_ wrote:
| In the OP they said they built their own sharding solution
| because it was risky to move to a different (newsql) solution
| and they already had sharding expertise with PG.
|
| I think if starting from scratch it makes sense to look at
| these newsql DBs that are built to horizontally scale from the
| very beginning.
| Thaxll wrote:
| How transactions work when you end up querying different shards?
| blauditore wrote:
| IIUC they never have db transactions across shards, because
| customers are basically data silos, which makes sharding quite
| straightforward.
| simonw wrote:
| I imagine that's one of the reasons they have "colos" - you can
| aim to collocate tables that are likely to be part of the same
| transaction in the same shard by putting them in the same colo
| group.
| gfodor wrote:
| Honestly it's depressing that I remember doing this same thing
| more than a decade ago and we still have to spend developer
| cycles on this nonsense in 2024.
| tbarbugli wrote:
| Citus seems incredibly close to what they built, I wonder why
| they did not use it
| AtlasBarfed wrote:
| 1 postgres
|
| 2 postgres replicas+sharding
|
| 3 Cassandra / dynamo
|
| If you are at stage 2, you are using bandaids. You need to start
| architecting a transition of your most heavily used data views to
| truly scalable databases like Cassandra / dynamo. I have not used
| foundation, but aphyr liked it.
|
| That transition takes time, evaluation, pro typing, and
| organizational learning both at the application programmer,
| support engineer, and management.
___________________________________________________________________
(page generated 2024-03-14 23:00 UTC)