[HN Gopher] Making Postgres scale
___________________________________________________________________
Making Postgres scale
Author : levkk
Score : 109 points
Date : 2025-03-14 17:07 UTC (5 hours ago)
(HTM) web link (pgdog.dev)
(TXT) w3m dump (pgdog.dev)
| 999900000999 wrote:
| I'm kind of interested in why we can't make a better database
| with all of our modern technology.
|
| Postgres is a fantastic workhorse, but it was also released in
| the late 80s. Who, who among you will create the database of the
| future... And not lock it behind bizarro licenses which force me
| to use telemetry.
| trescenzi wrote:
| I guess I'd ask why is something having been first released in
| the late 80s, or any decade, as positive or negative? Some
| things are still used because they solve the problems people
| have. Some things are still used just because of industry
| capture. I'm not honestly sure where I'd put Postgres.
|
| Are there specific things you'd want from a modern database?
| 999900000999 wrote:
| Relating to the article, better scaling. Saying run it on a
| bigger box is a very brute force way to optimize an
| application.
|
| While they come up with some other tricks here, that's
| ultimately what's scaling postgres means.
|
| If I imagine a better database, it would have native support
| for scaling, a postgres compatible data layer as well as
| first party support for NoSQL( JSONB columns don't cut it
| since if you have simultaneous writes unpredictable behavior
| tends to occur).
|
| It needs to also have a permissible license
| throwaway7783 wrote:
| Can you please expand on the JSONB unpredictable behavior?
| We are about to embark on a journey to move some of our
| data from MongoDB to postgres (with some JSONB). While we
| don't have significant concurrent writes to a table, would
| be very helpful to understand the issues
| 999900000999 wrote:
| https://github.com/shrinerb/shrine/discussions/665
|
| I've never personally encountered this, but I've seen
| other HN contributors mention it.
| https://news.ycombinator.com/item?id=43189535
|
| From what I can tell, unlike mongo, some postgres queries
| will try to update the entire JSONB data object vs a
| single field. This can lead to race conditions.
| throwaway7783 wrote:
| Ah, thanks. The first link seems something specific to
| Shrine. The bottomline is concurrent updates to different
| parts of JSONB need row level locking for correct
| behavior in Postgresql. This is not an important issue
| for us. Thank you for the pointers
| aprdm wrote:
| Why is it brute force and why is it bad ?
| DoctorOW wrote:
| Have you looked at CockroachDB? PostgreSQL compatibility with
| modern comforts (e.g. easy cloud deployments, horizontal
| scaling, memory safe language)
| rednafi wrote:
| Came here to say this. Cockroach solves the sharding issue by
| adopting consistent hashing-based data distribution, as
| described in the Dynamo paper. However, their cloud solution
| is a bit expensive to get started with.
| I_am_tiberius wrote:
| Does CockroachdB already support ltree?
| rednafi wrote:
| It doesn't. It doesn't support most of the Postgres
| extensions. We got away with it because we weren't doing
| anything beyond vanilla Postgres.
| HighlandSpring wrote:
| There are "better" databases but they're better given some
| particular definition that may not be relevant to your needs.
| If SQL/the relational model and ACID semantics is what you need
| then postgres is simply the best in class. The fact it dates
| back to the 80s is probably an advantage (requirement?) when it
| comes to solving a problem really well
| wmf wrote:
| Anyone who creates a better database is going to want to get
| paid for it which either means DBaaS or those weird licenses.
| thinkingtoilet wrote:
| Postgres 17.4 was released last month. Show some respect to the
| devs.
| remram wrote:
| Never heard of pgdog before. How does it compare to citus?
| MuffinFlavored wrote:
| What are the drawbacks of Citus/why isn't it perfect/what would
| you look for from a competitor/alternative?
| craigkerstiens wrote:
| Citus works really well *if* you have your schema well
| defined and slightly denormalized (meaning you have the shard
| key materialized on every table), and you ensure you're
| always joining on that as part of querying. For a lot of
| existing applications that were not designed with this in
| mind if can be several months of database and application
| code changes to get things into shape to work with Citus.
|
| If you're designing from scratch and make it worth with Citus
| then (specifically for a multi-tenant/SaaS sharded app) it
| can make scaling seem a bit magical.
| caffeinated_me wrote:
| Seems like this is a similar philosophy, but is missing a bunch
| of things the Citus coordinator provides. From the article, I'm
| guessing Citus is better at cross-shard queries, SQL support,
| central management of workers, keeping schemas in sync, and
| keeping small join tables in sync across the fleet, and
| provides a single point of ingestion.
|
| That being said, this does seem to handle replicas better than
| Citus ever really did, and most of the features it's lacking
| aren't relevant for the sort of multitenant use case this blog
| is describing, so it's not a bad tradeoff. This also avoids the
| coordinator as a central point of failure for both outages and
| connection count limitations, but we never really saw those be
| a problem often in practice.
| levkk wrote:
| We certainly have a way to go to support all cross-shard use
| cases, especially complex aggregates (like percentiles). In
| OLTP, where PgDog will focus on first, it's good to have a
| sharding key and a single shard in mind, 99% of the time. The
| 1% will be divided between easy things we already support,
| like sorting, and slightly more complex things like
| aggregates (avg, count, max, min, etc.), which are on the
| roadmap.
|
| For everything else, and until we cover what's left,
| postgres_fdw can be a fallback. It actually works pretty
| well.
| levkk wrote:
| Great question and we will publish something for this later on.
| TLDR: Citus doesn't provide an easy migration path for existing
| DBs and the managed deployments are pretty much exclusive to
| Azure. A large percentage of production DBs run elsewhere and
| don't have an easy way to shard.
| craigkerstiens wrote:
| Probably as useful is the overview of what pgdog is and the docs.
| From their docs[1]: "PgDog is a sharder, connection pooler and
| load balancer for PostgreSQL. Written in Rust, PgDog is fast,
| reliable and scales databases horizontally without requiring
| changes to application code."
|
| [1] https://docs.pgdog.dev/
| saisrirampur wrote:
| Interesting technology. Similar to Citus but not built as an
| extension. The Citus coordinator, which is a Postgres database
| with the Citus extension, is replaced by a proxy layer written in
| Rust. That might provide more flexibility and velocity
| implementing distributed planning and execution than being tied
| to the extension ecosystem. It would indeed be a journey to catch
| up with Postgres on compatibility, but it's a good start.
| sroussey wrote:
| So like the MySQL proxies of long ago?
|
| There are definitely advantages of not running inside the
| system you wish to orchestrate.
|
| Better keep up with the parser though!
| levkk wrote:
| We use the Postgres parser directly, thanks to the great work
| of pg_query [1].
|
| [1] https://github.com/pganalyze/pg_query.rs
| mindcrash wrote:
| Well, ofcourse it does! :)
|
| Another (battle tested * ) solution is to deploy the (open
| source) Postgres distribution created by Citus (subsidiary of
| Microsoft) on nodes running on Ubuntu, Debian or Red Hat and you
| are pretty much done: https://www.citusdata.com/product/community
|
| Slap good old trusty PgBounce in front of it if you want/need
| (and you probably do) connection pooling:
| https://www.citusdata.com/blog/2017/05/10/scaling-connection...
|
| *) Citus was purchased by Microsoft more or less solely to
| provide easy scale out on Azure through Cosmos DB for PostgreSQL
| gigatexal wrote:
| Is it really that easy? What are the edge cases?
| levkk wrote:
| It's not. We tried. Plus, it doesn't work on RDS, where most
| of production databases are. I think Citus was a great first
| step in the right direction, but it's time to scale the 99%
| of databases that don't run on Azure Citus already.
| mindcrash wrote:
| That's because Amazon wants to do whatever they like
| themselves... you apparently can get stuff to work by
| running your own masters (w/ citus extension) in EC2 backed
| by workers (Postgres RDS) in RDS:
|
| https://www.citusdata.com/blog/2015/07/15/scaling-
| postgres-r... (note that this is a old blog post --
| pg_shard has been succeeded by citus, but the architecture
| diagram still applies)
|
| And me saying "Apparently" because I have no experience
| dealing with large databases on AWS.
|
| Personally had no issues with Citus too, both on bare
| metal/VMs and as SaaS on Azure...
| caffeinated_me wrote:
| Depends on your schema, really. The hard part is choosing a
| distribution key to use for sharding- if you've got something
| like tenant ID that's in most of your queries and big tables,
| it's pretty easy, but can be a pain otherwise.
| mindcrash wrote:
| Same pain as with good old (native) partitioning, right? :)
|
| As with partitioning, in my experience something like a
| common key (identifying data sets), tenant id and/or
| partial date (yyyy-mm) work pretty great
| caffeinated_me wrote:
| For a multi-tenant use case, yeah, pretty close to
| thinking about partitioning.
|
| For other use cases, there can be big gains from cross-
| shard queries that you can't really match with
| partitioning, but that's super use case dependent and not
| a guaranteed result.
| rohan_ wrote:
| Couldn't they have just moved to Aurora DSQL and saved all the
| headache?
| levkk wrote:
| We actually found Aurora to be about 3x slower than community
| PG for small queries. That was back then, maybe things are
| better now. Migrating to another database (and Aurora is
| Postgres-compatible, it's not Postgres) is very risky when
| you've been using yours for years and know where the edge cases
| are.
| rednafi wrote:
| This!
|
| Postgres to pg-compatible DBs are never as smooth as they
| advertise it to be.
| fmajid wrote:
| Skype open-sourced their architecture way back, using PL/Proxy to
| route calls based on shard. It works, is quite elegant, handled
| 50% of all international phone calls in the noughties. My old
| company used it to provide real-time analytics on about 300M
| mobile devices.
|
| https://wiki.postgresql.org/images/2/28/Moskva_DB_Tools.v3.p...
|
| https://s3.amazonaws.com/apsalar_docs/presentations/Apsalar_...
| Keyframe wrote:
| _Skype has had from the beginning the requirement that all
| database access must be implemented through stored procedures._
|
| That presentation starts with hard violence.
| Tostino wrote:
| If the database team designed a thoughtful API with stored
| procedures, this can actually be a quite nice way to interact
| with a database for specific uses.
|
| Being 100% hard and fast on that rule seems like a bad idea
| though.
| rednafi wrote:
| Another option is going full-scale with CockroachDB. We had a
| Django application backed by PostgreSQL, which we migrated to
| CockroachDB using their official backend.
|
| The data migration was a pain, but it was still less painful than
| manually sharding the data or dealing with 3rd party extensions.
| Since then, we've had a few hiccups with autogenerated migration
| scripts, but overall, the experience has been quite seamless. We
| weren't using any advanced PostgreSQL features, so CockroachDB
| has worked well.
| skunkworker wrote:
| Unless their pricing has changed, it's quite exorbitant when
| you need a lot of data. To the point that one year of
| cockroachdb would cost 5x the cost of the server it was running
| on.
| rednafi wrote:
| This is still true. I wouldn't use Cockroach if it were my
| own business. Also, they don't offer any free version to try
| out the product. All you get is a short trial period and
| that's it.
| CharlesW wrote:
| > _Also, they don't offer any free version to try out the
| product._
|
| The site makes it seems as if I can install CockroachDB on
| Mac, Linux, or Windows and try it out for as long as I
| like. https://www.cockroachlabs.com/docs/v25.1/install-
| cockroachdb... Additionally, they claim CockroachDB Cloud
| is free for use "up to 10 GiB of storage and 50M RUs per
| organization per month".
| rednafi wrote:
| Oh yeah, you can run docker-compose and play with the
| local version as long as you want. But their cloud offers
| are limited and quite expensive.
| Ocha wrote:
| You can use official binary in production deployment -
| you just need to manage it yourself like you would manage
| Postgres.
| CharlesW wrote:
| I think you're referring to the CockroachDB Cloud DBaaS
| offering vs. CockroachDB itself, correct?
| rednafi wrote:
| Not the parent but yeah, most likely. But then again, you
| probably don't want to maintain your own deployment of
| Cockroach fleet.
| levkk wrote:
| I'm glad you brought up the migration, because one of the main
| goals behind our project is to automate migrating to a sharded
| deployment. You can think of your DB as the base case
| (num_shards = 1), and PgDog as the recursive solution.
| rednafi wrote:
| Automatic data transfer would be super cool to get out of the
| box. We had a custom multi-tenancy solution for our
| application that heavily used PostgreSQL schemas. One schema
| for each customer.
|
| It was a pain to get that work with Cockroach since it
| doesn't optimize cross schema queries and suggests one DB per
| customer. This was a deal breaker for us and we had to
| duplicate data to avoid cross schema queries.
|
| Being able to live within Postgres has its advantages.
| traceroute66 wrote:
| > Another option is going full-scale with CockroachDB
|
| Just beware that CockroachDB is not a drop-in replacement for
| PostgreSQL.
|
| Last time I looked it was missing basic stuff. Like stored
| functions. I don't call stored functions an "advanced feature".
| aprdm wrote:
| 99.9% of the companies in the world will never need more than 1
| beefy box running postgres with a replica for a manual failover
| and/or reads.
| frollogaston wrote:
| Availability is trickier than scalability. An async replica can
| lose a little data during a failover, and a synchronous replica
| is safer but slower. A company on some platform might not even
| know which one they're using until it bites them.
| Eikon wrote:
| I run a 100 billion+ rows Postgres database [0], that is around
| 16TB, it's pretty painless!
|
| There are a few tricks that make it run well (PostgreSQL compiled
| with a non-standard block size, ZFS, careful VACUUM planning).
| But nothing too out of the ordinary.
|
| ATM, I insert about 150,000 rows a second, run 40,000
| transactions a second, and read 4 million rows a second.
|
| Isn't "Postgres does not scale" a strawman?
|
| [0] https://www.merklemap.com/
| rednafi wrote:
| Damn, that's a chonky database. Have you written anything about
| the setup? I'd love to know more-- is it running on a single
| machine? How many reader and writer DBs? What does the
| replication look like? What are the machine specs? Is it self-
| hosted or on AWS?
|
| By the way, really cool website.
| Eikon wrote:
| I'll try to get a blog post out soon!
|
| > Damn, that's a chonky database. Have you written anything
| about the setup? I'd love to know more-- is it running on a
| single machine? How many reader and writer DBs? What does the
| replication look like? What are the machine specs? Is it
| self-hosted or on AWS?
|
| It's self-hosted on bare metal, with standby replication,
| normal settings, nothing "weird" there.
|
| 6 NVMe drives in raidz-1, 1024GB of memory, a 96 core AMD
| EPYC cpu.
|
| A single database with no partitioning (I avoid PostgreSQL
| partitioning as it complicates queries and weakens constraint
| enforcement, and IHMO is not providing much benefits outside
| of niche use-cases).
|
| > By the way, really cool website.
|
| Thank you!
| stuartjohnson12 wrote:
| > It's self-hosted on bare metal, with standby replication,
| normal settings, nothing "weird" there.
|
| I can build scalable data storage without a flexible
| scalable redundant resilient fault-tolerant available
| distributed containerized serverless microservice cloud-
| native managed k8-orchestrated virtualized load balanced
| auto-scaled multi-region pubsub event-based stateless
| quantum-ready vectorized private cloud center? I won't
| believe it.
| tracker1 wrote:
| +1 as I'm hoping this is sarcastic humor.
| Keyframe wrote:
| from riches to RAG.
| rednafi wrote:
| > I'll try to get a blog post out soon!
|
| Please do.
|
| > It's self-hosted on bare metal, with standby replication,
| normal settings, nothing "weird" there.
|
| 16TB without nothing weird is pretty impressive. Our devops
| team reached for Aurora way before that.
|
| > 6 NVMe drives in raidz-1, 1024GB of memory, a 96-core AMD
| EPYC CPU.
|
| Since you're self hosted, I'm you aren't on AWS. How much
| is this setup costing you now if you don't mind sharing.
|
| > A single database with no partitioning (I avoid
| PostgreSQL partitioning as it complicates queries and
| weakens constraint enforcement, and IMHO does not provide
| many benefits outside of niche use cases).
|
| Beautiful!
| Eikon wrote:
| > Since you're self hosted, I'm you aren't on AWS. How
| much is this setup costing you now if you don't mind
| sharing.
|
| About 28K euros of hardware per replica IIRC + colo
| costs.
| rednafi wrote:
| Yearly, 28k Euros I presume.
|
| Damn. I hope you make enough revenue to continue. This is
| pretty impressive.
| Eikon wrote:
| No, one time + ongoing colocation costs.
| tracker1 wrote:
| The CPU itself is around $8-10k for a top-end AMD Epyc,
| $15-20k for the rest of the server, including memory and
| storage is probably about right. There are still $100k+
| servers, but they tend to be AI equipment at this point,
| not the general purpose stuff, which is sub $30-50k now.
| outworlder wrote:
| > 16TB without nothing weird is pretty impressive. Our
| devops team reached for Aurora way before that.
|
| Probably depends on the usage patterns too. Our
| developers commit atrocities in their 'microservices'
| (which are not micro, or services, but that's another
| discussion).
| GordonS wrote:
| I'm also self-hosting Postgres, and the project is getting
| to the point where a standby would be a good idea to ensure
| higher availability.
|
| Did you use any particular guide for setting up
| replication? Also, how do you handle failover/fallback
| to/from standby please?
| Tostino wrote:
| Not OP, but managed a ~1tb Postgres install for years.
| You should use something like pgbackrest or barman to
| help with both replication (replicas can pull WAL from
| your backups when catching up), backups, and failovers.
|
| At least for pgbackrest, set up a spool directory which
| allows async wal push / fetch.
| tracker1 wrote:
| > A single database with no partitioning (I avoid
| PostgreSQL partitioning as it complicates queries and
| weakens constraint enforcement, and IHMO is not providing
| much benefits outside of niche use-cases).
|
| That's kind of where I'm at now... you can vertically scale
| a server so much now (compared to even a decade ago) that
| there's really no need to bring a lot of complexity in IMO
| for Databases. Simple read replicas or hot spare should be
| sufficient for the vast majority of use cases and the
| hardware is way cheaper than a few years ago, relatively
| speaking.
|
| I spent a large part of the past decade and a half using
| and understanding all the no-sql options (including
| sharding with pg) and where they're better or not. At this
| point my advice is start with PG, grow that DB as far as
| real hardware will let you... if you grow to the point you
| need more, then you have the money to deal with your use
| case properly.
|
| So few applications have the need for beyond a few million
| simultaneous users, and avoiding certain pitfalls, it's not
| that hard. Especially if you're flexible enough to leverage
| JSONB and a bit of denormalization for fewer joins, you'll
| go a very, very long way.
| arkh wrote:
| > you can vertically scale a server so much now
|
| And you often don't really need to.
|
| Just last week for some small application and checking
| the performance of some queries I add to get random data
| on a dev setup. Which is a dockerized postgres (with no
| tuning at all) in a VM on a basic windows laptop. I
| inserted enough data to represent what could maybe be
| there in 20 years (like some tables got half a billion
| rows, small internal app). Still no problem chugging
| along.
|
| It is crazy when you compare what you can do with
| databases now on modern hardware with how other software
| do not feel as having benefited as much. Especially on
| the frontend side.
| JohnBooty wrote:
| Your replies are really valuable and informative. Thank you
| so much.
|
| Question - what is your peak utilization % like? How close
| are you to saturating these boxes in terms of CPU etc?
| Eikon wrote:
| I'd say 60-70% overall cpu usage, including database,
| ingest workers, web app and search.
|
| > Your replies are really valuable and informative. Thank
| you so much.
|
| Thank you!
| sa46 wrote:
| I used to run a bunch of Postgres nodes at a similar scale. The
| most painful parts (by far) were restoring to a new node and
| major version upgrades.
|
| Any tricks you used for those parts?
| Eikon wrote:
| > were restoring to a new node
|
| Zfs send / recv or replication.
|
| > I used to run a bunch of Postgres nodes at a similar scale.
| The most painful parts (by far) were restoring to a new node
| and major version upgrades. Any tricks you used for those
| parts?
|
| Replication makes this pretty painless :)
| wg0 wrote:
| So run a replica? Is there more literature on that?
| Eikon wrote:
| > So run a replica?
|
| Basically, yes.
|
| https://www.postgresql.org/docs/current/runtime-config-
| repli...
| marsavar wrote:
| That's amazing! Could you elaborate more on your VACUUM
| planning?
| Eikon wrote:
| Mostly, be careful with long-running transactions (hours
| long), and modify your autovacuum settings to be as
| aggressive as possible based on your workload. Be careful
| with the freezing threshold too.
|
| I ran into multixact "members" limit
| exceeded
|
| Quite a bit when starting out :)
| whitepoplar wrote:
| Can you recommend a good rule of thumb for autovacuum
| settings given a large workload like yours?
| Eikon wrote:
| Here's mine:
|
| - autovacuum_max_workers to my number of tables (Only do
| so if you have enough IO capacity and CPU...).
|
| - autovacuum_naptime 10s
|
| - autovacuum_vacuum_cost_delay 1ms
|
| - autovacuum_vacuum_cost_limit 2000
|
| You probably should read
| https://www.postgresql.org/docs/current/routine-
| vacuuming.ht... it's pretty well written and easy to
| parse!
| ahoka wrote:
| Do you ever need to VACUUM FULL?
| Eikon wrote:
| It's too slow at that scale, pg_squeeze works wonders
| though.
|
| I only "need" that because one of my table requires batch
| deletion, and I want to reclaim the space. I need to
| refactor that part. Otherwise nothing like that would be
| required.
| nesarkvechnep wrote:
| Because no one else cares to ask, are you running FreeBSD? I
| ask because you use ZFS.
| Eikon wrote:
| Debian!
| nesarkvechnep wrote:
| Damn, a stud! ZFS on Linux.
| Eikon wrote:
| Ahaha :)
| shayonj wrote:
| > I insert about 150,000 rows a second
|
| That's amazing - I would love to know if you have done careful
| data modeling, indexing, etc that allows you to get to this and
| what kind of data is being insert ed?
| Eikon wrote:
| I am not optimizing too much around insertion speed. I avoid
| GIN, GIST and hash indexes.
|
| The schema is nicely normalized.
|
| I had troubles with hash indexes requiring hundreds of
| gigabytes of memory to rebuild.
|
| Postgres B-Trees are painless and (very) fast.
|
| Eg. querying one table by id (redacted):
| EXPLAIN ANALYZE SELECT * FROM table_name WHERE id =
| [ID_VALUE]; Index Scan using table_name_pkey on
| table_name (cost=0.71..2.93 rows=1 width=32) (actual
| time=0.042..0.042 rows=0 loops=1) Index Cond: (id =
| '[ID_VALUE]'::bigint) Planning Time: 0.056 ms
| Execution Time: 0.052 ms
|
| Here's a zpool iostat 1 # zpool iostat 1
| operations bandwidth read write read write
| ----- ----- ----- ----- 148K 183K 1.23G 2.43G
| 151K 180K 1.25G 2.36G 151K 177K 1.25G 2.33G
| 148K 153K 1.23G 2.13G
| kelafoja wrote:
| One of things I find challenging is understand the meaning of
| the word "scales". It is sometimes used differently in
| different contexts.
|
| Can it be performant in high load situations? Certainly. Can is
| elastically scale up and down based on demand? As far as I'm
| aware it cannot.
|
| What I'm most interested in is how operations are handled. For
| example, if it's deployed in a cloud environment and you need
| more CPU and/or memory, you have to eat the downtime to scale
| it up. What if it's deployed to bare metal and it cannot handle
| the increasing load anymore? How costly (in terms of both time
| and money) is it to migrate it to bigger hardware?
| pluto_modadic wrote:
| a database scaling dramatically up and down /under load/ and
| expecting it to perform the same as steady state seems a bit
| weird, vs a single, beefy database with a beefy ingest job
| and a bunch of read only clients searching it?
|
| like you're more likely to encounter two phases (building the
| DB in heavy growth mode, and using the DB in light growth
| heavy read mode).
|
| A business that doesn't quite yet know what size the DB needs
| to be has a frightening RDS bill incoming.
| nine_k wrote:
| When it "scales", it usually means "scales up". A scalable
| solution is such that can withstand a large and increasing
| load, past the usual limitations of obvious solutions.
|
| Being _elastic_ is nice, but not always needed. In most cases
| of database usage, downsizing never happens, or expected to
| happen: logically, data are only added, and any packaging and
| archiving only exists to keep the size manageable.
| MR4D wrote:
| Having some issues with your numbers, but I'm probably just
| missing something...
|
| If you insert 150K rows per second, that's roughly 13 Billion
| rows per day.
|
| So you're inserting 10%+ of your database size every day?
|
| That seems weird to me. Are you pruning somewhere? If not, is
| your database less than a month old? I'm confused.
| Eikon wrote:
| Well, that's why I said "ATM", it's not a sustained rate, all
| the time. And yes, there's a bunch of DELETEs too.
| oa335 wrote:
| Can you please share some tips and tricks for achieving such
| high throughput?
| tnorgaard wrote:
| Super interesting compiling pg, I assume, with same as the zfs
| block size! It was always on our todo to try, but never got
| around to it. If possible, what block size did you end up with?
| Have you tried zfs direct io in 2.3.x, if so, could you share
| any findings? Thanks for sharing - and cool website!
| Eikon wrote:
| I don't think Postgres will be able to benefit from direct
| io? I might be wrong though!
|
| I use Postgres with 32K BLKSZ.
|
| I am actually using default 128K zfs recordsize, in a mixed
| workload, I found overall performance nicer than matching at
| 32K, and compression is way better.
|
| > Thanks for sharing - and cool website!
|
| Thank you!
| frollogaston wrote:
| "Postgres does not scale" means that you can't just give it
| more machines, which is true. At some point you've got the
| fastest available machine, maybe you've played with the
| settings, and that's it. Then you can embark on the kind of
| manual sharding journey the article describes.
|
| But most of the time, an RDBMS is the right tool for the job
| anyway, you just have to deal with it.
| Eikon wrote:
| > "Postgres does not scale" means that you can't just give it
| more machines, which is true.
|
| Well, it's only true for writes.
| frollogaston wrote:
| It's still true for reads if you need them to be fully
| consistent.
| kristianpaul wrote:
| Logical replication works both ways, thats a good start.
| briankelly wrote:
| People talk about scale frequently as a single dimension (and
| usually volume as it relates to users) but that can be
| oversimplifying for many kinds of applications. For instance, as
| you are thinking about non-trivial partitioning schemes (like if
| there is high coupling between entities of the same kind - as you
| see in graphs) is when you should consider alternatives like the
| Bigtable-inspired DBs, since those are (relatively) more
| batteries included for you.
|
| > It's funny to write this. The Internet contains at least 1 (or
| maybe 2) meaty blog posts about how this is done
|
| It would've been great to link those here. I'm guessing one
| refers to StackOverflow which has/had one of the more famous
| examples of scaled Postgres.
| levkk wrote:
| I was thinking of the Instagram post years ago. And maybe the
| Instacart one.
| briankelly wrote:
| Maybe this one for Instagram? https://instagram-
| engineering.com/sharding-ids-at-instagram-...
| octernion wrote:
| i was reading through this and was going "huh this sounds
| familiar" until i read who wrote it :)
|
| neat piece of tech! excited to try it out.
| fourseventy wrote:
| I run a postgresql db with a few billion rows at about 2TB right
| now. We don't need sharding yet but when we do I was considering
| Citus. Does anyone have experience implementing Citus that could
| comment?
| eximius wrote:
| Something I don't see in the pgdog documentation is how cross-
| shard joins work. Okay, if I do a simple `select * from users
| order by id`, you'll in-memory order the combined results for me.
| But if I have group by and aggregations and such? Will it resolve
| that correctly?
| levkk wrote:
| Aggregates are a work in progress. We're going to implement
| them in this order:
|
| 1. count
|
| 2. max, min, sum
|
| 3. avg (needs a query rewrite to include count)
|
| Eventually, we'll do all of these:
| https://www.postgresql.org/docs/current/functions-
| aggregate..... If you got a specific use case, reach out and
| we'll prioritize.
| eximius wrote:
| Heh, no chance I can introduce this at work and hard to have
| a personal project requiring it. :)
|
| I think you probably need some documentation to the effect of
| the current state of affairs, as well as prescriptions as to
| how to work around it. _Most_ live workloads, even if the
| total dataset is huge, have a pretty small working set. So
| limiting DB operations to simple fetches and doing any
| complex operations in memory is viable, but should be
| prescribed as the solution or people will consider it's
| omission as a fault instead of a choice.
| levkk wrote:
| No worries. It's early days, the code and docs are just a
| few months in the making. I'm happy to keep you updated on
| the progress. If you want, send your contact info to
| hi@pgdog.dev.
|
| - Lev
| akshayshah wrote:
| Apart from being backed by Postgres instead of MySQL, is this
| different from Vitess (and its commercial vendor PlanetScale)?
|
| https://vitess.io/
| levkk wrote:
| The goal for this project is to be analogous to Vitess for
| Postgres.
| Karupan wrote:
| Tangentially related: is there a good guide or setup scripts to
| run self hosted Postgres with backups and secondary standby? Like
| I just want something I can deploy to a VPS/dedicated box for all
| my side projects.
|
| If not is supabase the most painless way to get started?
___________________________________________________________________
(page generated 2025-03-14 23:01 UTC)