[HN Gopher] How to Distribute Postgres Geographically
___________________________________________________________________
How to Distribute Postgres Geographically
Author : mebcitto
Score : 106 points
Date : 2024-06-01 04:42 UTC (1 days ago)
(HTM) web link (xata.io)
(TXT) w3m dump (xata.io)
| hazaskull wrote:
| While it is fun to see how to creatively solve such issues, it
| does raise the question of managability. When sharding data into
| loosely (fdw) coupled silo's it would become tricky to make
| consistent backups, ensure locking mechanisms work when sharded
| data might sometimes be directly related, handle zone/region
| failures gracefully, prevent hot spots, perform multi-region
| schema-changes reliably, etc. I suppose this pattern principally
| only works when the data is in fact not strongly related and the
| silo's are quite independent. I wouldn't call that a distributed
| system at all, really. This may be a matter of opinion of course.
|
| It does give a "When all you have is a hammer..." vibe to me and
| begs the question: why not use a system that's designed for use-
| cases like this and do it reliably and securely ? i.e.:
| https://www.cockroachlabs.com/docs/stable/multiregion-overvi...
| (yes, I know full data domiciling requires something even more
| strict but I currently don't know of any system that can
| transparently span the globe and stay performant while not
| sharing any metadata or caching between regions)
| tudorg wrote:
| > It does give a "When all you have is a hammer..." vibe to me
| and begs the question: why not use a system that's designed for
| use-cases like this and do it reliably and securely ?
|
| (disclaimer: blog post author)
|
| A reason would be that you want to stick to pure Postgres, for
| example because you want to use Postgres extensions, or prefer
| the liberal Postgres license.
|
| It can also be a matter of performance, distributed
| transactions are necessarily slower so if almost all the time
| you can avoid them by connecting to a single node, which has
| all the data that the transaction needs, that's going to get
| you better performance.
| hazaskull wrote:
| Hi there! Thank you for the post and your work on pgzx!
| Though it depends on the system (cockroachdb can place
| leaders on specific nodes to speed up local queries, it has
| global tables and otherwise there's always follower-reads)
| those are of course valid reasons. Admittedly if you want to
| keep data "pinned", you're into manual placement, rather than
| horizontal scaling but that's nitpicking and there's pros and
| cons. I do enjoy the freedom of Postgres and am hopeful that
| its virtually prehistoric storage-design becomes a non-issue
| thanks to tech such as Neon and Orioledb. The option to
| decouple storage would provide wonderful flexibility for
| solutions like yours too I think.
| tudorg wrote:
| Thanks for noticing pgzx! We are working on a project
| building on top of it and going into the direction hinted
| by this blog post.
|
| I agree that the separation of storage and compute
| complements this nicely. In fact, we take advantage of it
| in the Xata platform which uses Aurora.
| jitl wrote:
| (I work at Notion, one of the larger Notion clones)
|
| We experimented with using partitioned tables and doing the fancy
| postgres_fdw thing when we sharded our tenant-clustered tables in
| 2020. Even with all the Postgres instances in the same region we
| found the approach unwieldy. Routing queries in the application
| gives you a lot more control versus needing to teach Postgres the
| exact logic to route a query, plus do a (dangerous? one-way?
| global?) schema migration whenever you want to change the routing
| logic.
|
| We touched on this briefly in our sharding blog post here:
| https://www.notion.so/blog/sharding-postgres-at-notion
|
| Another reason to avoid complex routing in Postgres is risk. If
| something goes wrong at 3am in a load-bearing cross-Postgres-
| instance query, how easily will it be to mitigate that if the
| routing is happening in Postgres (with whatever advanced
| clustering system Citus, pgzx, postgres_fdw) versus if the
| routing is happening in the application? For example if there's a
| network partition between the "global" postgres instance and the
| us-east-2 cluster? Maybe you're Postgres wizards and know how to
| handle this with a quick schema change or something in the psql
| CLI but I'd bet more on my team's ability to write fault
| tolerance in the application versus in Postgres internal logic.
| tudorg wrote:
| Thanks a lot for commenting and pointing me to the blog post. I
| do think I've seen it before but forgot about it. I've re-read
| it now and it's a great read!
|
| From what I understand you decided to do sharding in the
| application code, and given the current state I think that
| makes total sense and I'd have probably done the same.
|
| Part of my point with the blog post is that there is a built-in
| horizontal sharding solution in vanilla Postgres (partitioning
| + FDW), but it's currently badly lacking when it comes to
| cluster management, schema changes, distributed transactions,
| and more. If we put work into it, perhaps we tip the balance
| and the next successful Notion clone could choose to do it at
| the DB level.
| Ozzie_osman wrote:
| Thanks for this commentary! At a startup where we are preparing
| to shard Postgres. I'd be curious if you're familiar with AWS
| limitless, and how you would have approached deciding whether
| to use it vs. the approach in the blog post had it existed back
| in 2021.
| sgarland wrote:
| My biggest concern with Limitless - other than inherent
| performance issues with Aurora - is that according to their
| docs, it's built on Aurora Serverless. IME, Serverless
| anything tends to get absurdly expensive very quickly.
| jitl wrote:
| I'm a solid "no" on any primary datastore database thingy
| that's under 5 years of industry wide production workload
| experience, and would seriously weigh it against something
| with 10+ years industry use.
|
| In 2019 when I was interviewing at companies for my next
| position I heard from a few places that the original Aurora
| for Postgres lost their data. It seems like the sentiment on
| mainline Aurora has improved a bit, but I would never bet my
| company's future on an AWS preview technology. Better the
| devil you know (and everyone else knows).
| nurple wrote:
| Ever since the war of stored procedures, I've been very reticent
| to put any more logic than absolutely necessary into the
| database. The simple truth of the matter is that the DB server
| codebases are awful, complicated, places to develop this kind of
| functionality. I strongly believe these features should be
| handled at higher layers.
|
| At my last job, in fintech, we used application-level shard
| routing with each aggregate root served by a particular RPC
| service (users, accounts, transactions, etc). When one of these
| aggregate services were asked to pull data, they would query a
| central routing service for the server and shard the requested
| data resided in.
|
| Between them and the DB servers we had a cluster of pgbouncer
| instances to handle connection oversubscription and the routing
| of queries around DB instances in the face of failures or
| maintenance.
|
| While this was pretty effective, the work to manage sharded
| Postgres, DDL updates, caching, locating, and balancing data was
| still very complicated and error prone, and was the root of many
| production outages.
|
| I didn't design this system, but after a couple years leading a
| platform migration, which necessitated getting into all this at a
| deep level, I would do it differently. Instead of query routing
| and DB sharding, I would shard the aggregate root services
| themselves. Each "sharded" set of these instances would have a
| simple PG instance (with replica chain) behind it that knows
| nothing about any other set.
|
| At this point, instead of routing being done by the aggregate
| root services locating and querying data from DB shards on huge
| vertically-scaled servers, each set only pulls data from their
| dedicated DB. A routing facade is placed in front of the set of
| sets that sends requests from consumers to the one holding the
| desired data.
|
| With this architecture, the complexity of sharding and vertically
| scaling at the DB layer, and handling connection oversubscription
| with a query router like PGbouncer, just falls away.
|
| I would keep these sets, and their DBs, rather small to also reap
| the benefits of small datasets. One of the biggest issues we had
| with huge DBs is the time it takes to do anything; as an example,
| restarting replication in a failed chain could take many hours
| because of the huge amount of data, and any hiccups in
| connectivity would quickly overrun our capability to "catch"
| replication up and we'd have to fall back to zfs-send to resnap.
|
| A larger number of smaller DBs would not improve the total time
| needed to do something like backups or DDL mutations, but it
| would significantly reduce the time for any particular instance
| which reduces overall risk and blast radius of a DB failure.
|
| Another thing I think small DBs can help with is data locality,
| DB automation, and possibly making the instances effectively
| ephemeral. When your dataset is small, bringing an instance up
| from a hot backup can take on the order of a few seconds, and
| could allow you to schedule the root services and their DB on the
| same host.
|
| For geographical distribution, the routing facade can also send
| writes to the set serving that shard in some other region.
| sgarland wrote:
| > Ever since the war of stored procedures, I've been very
| reticent to put any more logic than absolutely necessary into
| the database.
|
| Counterpoint: the fewer round trips to the DB you have to do,
| the faster your app is.
|
| My belief is that as DBAs more or less went away thanks to
| cloud providers and the rise of Full Stack Engineering, the
| quiet part that no one wanted to admit was that RDBMS are
| hideously complicated, and you really need SMEs to use them
| correctly. Thus, the use of stored procedures, triggers,
| advanced functions etc. went away in favor of just treating the
| DB as a dumb store, and doing everything in app logic.
|
| As more and more companies are discovering that actually, you
| do need DB SMEs (I'm a DBRE; demand for this role has
| skyrocketed), my hope is we can push some of the logic back out
| to the DB. I am all for VCS, automated migrations, canary
| instances, and other careful patterns. I'd just like to see
| less of treating RDBMS as a document store, and embracing the
| advantages of the tech already in place.
| nurple wrote:
| I kind of agree with your assessment, it would even speak to
| the popularity of document DBs coming up around the same
| time. However, I would argue that the decline in deep RDBMS
| integrations is exactly because of their excessive
| complexity. Modern software development methods left them
| behind because their change velocity is awful. This is doubly
| apparent when you look at features like TFA is talking about,
| exactly because it's extremely complex to stand up, manage,
| and make changes to.
|
| I personally left in the era of ORMs exactly because it let
| me build systems in high-productivity languages and
| frameworks, but also because the cost benefit of RDBMS
| integration just wasn't there. The DB is absolutely the
| highest-risk place to put logic in any stack, if something
| goes wrong _everything_ breaks and it's the most difficult
| place to fix things.
|
| I don't know what the future looks like for DBs, but I think
| we'll continue to see them become even more transparent. Devs
| just want to persist entities, and integrating at the RDBMS
| level is a slog.
| sgarland wrote:
| I'm obviously biased because of my career speciality, but I
| love everything about RDBMS. I love mastering the million
| knobs they have, their weird quirks, and how if you use
| them as designed with properly normalized schema, your data
| will ALWAYS be exactly what you told it to be, with no
| surprises.
|
| To this end, "devs just want to persist entities" makes me
| both sad and frustrated. Everyone seems to default to "just
| use Postgres," but then don't want to actually use an
| RDBMS. If you want a KV store, then use one. If you want a
| document store, then use one. Don't subject a relational
| database to storing giant JSON blobs with UUIDv4 keys, and
| then complain that it's slow because you've filled the
| memory with bloated indices, and saturated the disk
| bandwidth with massive write amplification.
___________________________________________________________________
(page generated 2024-06-02 23:02 UTC)