hngopher.com

       [HN Gopher] Making Postgres scale
       ___________________________________________________________________
        
       Making Postgres scale
        
       Author : levkk
       Score  : 109 points
       Date   : 2025-03-14 17:07 UTC (5 hours ago)
        
 (HTM) web link (pgdog.dev)
 (TXT) w3m dump (pgdog.dev)
        
       | 999900000999 wrote:
       | I'm kind of interested in why we can't make a better database
       | with all of our modern technology.
       | 
       | Postgres is a fantastic workhorse, but it was also released in
       | the late 80s. Who, who among you will create the database of the
       | future... And not lock it behind bizarro licenses which force me
       | to use telemetry.
        
         | trescenzi wrote:
         | I guess I'd ask why is something having been first released in
         | the late 80s, or any decade, as positive or negative? Some
         | things are still used because they solve the problems people
         | have. Some things are still used just because of industry
         | capture. I'm not honestly sure where I'd put Postgres.
         | 
         | Are there specific things you'd want from a modern database?
        
           | 999900000999 wrote:
           | Relating to the article, better scaling. Saying run it on a
           | bigger box is a very brute force way to optimize an
           | application.
           | 
           | While they come up with some other tricks here, that's
           | ultimately what's scaling postgres means.
           | 
           | If I imagine a better database, it would have native support
           | for scaling, a postgres compatible data layer as well as
           | first party support for NoSQL( JSONB columns don't cut it
           | since if you have simultaneous writes unpredictable behavior
           | tends to occur).
           | 
           | It needs to also have a permissible license
        
             | throwaway7783 wrote:
             | Can you please expand on the JSONB unpredictable behavior?
             | We are about to embark on a journey to move some of our
             | data from MongoDB to postgres (with some JSONB). While we
             | don't have significant concurrent writes to a table, would
             | be very helpful to understand the issues
        
               | 999900000999 wrote:
               | https://github.com/shrinerb/shrine/discussions/665
               | 
               | I've never personally encountered this, but I've seen
               | other HN contributors mention it.
               | https://news.ycombinator.com/item?id=43189535
               | 
               | From what I can tell, unlike mongo, some postgres queries
               | will try to update the entire JSONB data object vs a
               | single field. This can lead to race conditions.
        
               | throwaway7783 wrote:
               | Ah, thanks. The first link seems something specific to
               | Shrine. The bottomline is concurrent updates to different
               | parts of JSONB need row level locking for correct
               | behavior in Postgresql. This is not an important issue
               | for us. Thank you for the pointers
        
             | aprdm wrote:
             | Why is it brute force and why is it bad ?
        
         | DoctorOW wrote:
         | Have you looked at CockroachDB? PostgreSQL compatibility with
         | modern comforts (e.g. easy cloud deployments, horizontal
         | scaling, memory safe language)
        
           | rednafi wrote:
           | Came here to say this. Cockroach solves the sharding issue by
           | adopting consistent hashing-based data distribution, as
           | described in the Dynamo paper. However, their cloud solution
           | is a bit expensive to get started with.
        
             | I_am_tiberius wrote:
             | Does CockroachdB already support ltree?
        
               | rednafi wrote:
               | It doesn't. It doesn't support most of the Postgres
               | extensions. We got away with it because we weren't doing
               | anything beyond vanilla Postgres.
        
         | HighlandSpring wrote:
         | There are "better" databases but they're better given some
         | particular definition that may not be relevant to your needs.
         | If SQL/the relational model and ACID semantics is what you need
         | then postgres is simply the best in class. The fact it dates
         | back to the 80s is probably an advantage (requirement?) when it
         | comes to solving a problem really well
        
         | wmf wrote:
         | Anyone who creates a better database is going to want to get
         | paid for it which either means DBaaS or those weird licenses.
        
         | thinkingtoilet wrote:
         | Postgres 17.4 was released last month. Show some respect to the
         | devs.
        
       | remram wrote:
       | Never heard of pgdog before. How does it compare to citus?
        
         | MuffinFlavored wrote:
         | What are the drawbacks of Citus/why isn't it perfect/what would
         | you look for from a competitor/alternative?
        
           | craigkerstiens wrote:
           | Citus works really well *if* you have your schema well
           | defined and slightly denormalized (meaning you have the shard
           | key materialized on every table), and you ensure you're
           | always joining on that as part of querying. For a lot of
           | existing applications that were not designed with this in
           | mind if can be several months of database and application
           | code changes to get things into shape to work with Citus.
           | 
           | If you're designing from scratch and make it worth with Citus
           | then (specifically for a multi-tenant/SaaS sharded app) it
           | can make scaling seem a bit magical.
        
         | caffeinated_me wrote:
         | Seems like this is a similar philosophy, but is missing a bunch
         | of things the Citus coordinator provides. From the article, I'm
         | guessing Citus is better at cross-shard queries, SQL support,
         | central management of workers, keeping schemas in sync, and
         | keeping small join tables in sync across the fleet, and
         | provides a single point of ingestion.
         | 
         | That being said, this does seem to handle replicas better than
         | Citus ever really did, and most of the features it's lacking
         | aren't relevant for the sort of multitenant use case this blog
         | is describing, so it's not a bad tradeoff. This also avoids the
         | coordinator as a central point of failure for both outages and
         | connection count limitations, but we never really saw those be
         | a problem often in practice.
        
           | levkk wrote:
           | We certainly have a way to go to support all cross-shard use
           | cases, especially complex aggregates (like percentiles). In
           | OLTP, where PgDog will focus on first, it's good to have a
           | sharding key and a single shard in mind, 99% of the time. The
           | 1% will be divided between easy things we already support,
           | like sorting, and slightly more complex things like
           | aggregates (avg, count, max, min, etc.), which are on the
           | roadmap.
           | 
           | For everything else, and until we cover what's left,
           | postgres_fdw can be a fallback. It actually works pretty
           | well.
        
         | levkk wrote:
         | Great question and we will publish something for this later on.
         | TLDR: Citus doesn't provide an easy migration path for existing
         | DBs and the managed deployments are pretty much exclusive to
         | Azure. A large percentage of production DBs run elsewhere and
         | don't have an easy way to shard.
        
       | craigkerstiens wrote:
       | Probably as useful is the overview of what pgdog is and the docs.
       | From their docs[1]: "PgDog is a sharder, connection pooler and
       | load balancer for PostgreSQL. Written in Rust, PgDog is fast,
       | reliable and scales databases horizontally without requiring
       | changes to application code."
       | 
       | [1] https://docs.pgdog.dev/
        
       | saisrirampur wrote:
       | Interesting technology. Similar to Citus but not built as an
       | extension. The Citus coordinator, which is a Postgres database
       | with the Citus extension, is replaced by a proxy layer written in
       | Rust. That might provide more flexibility and velocity
       | implementing distributed planning and execution than being tied
       | to the extension ecosystem. It would indeed be a journey to catch
       | up with Postgres on compatibility, but it's a good start.
        
         | sroussey wrote:
         | So like the MySQL proxies of long ago?
         | 
         | There are definitely advantages of not running inside the
         | system you wish to orchestrate.
         | 
         | Better keep up with the parser though!
        
           | levkk wrote:
           | We use the Postgres parser directly, thanks to the great work
           | of pg_query [1].
           | 
           | [1] https://github.com/pganalyze/pg_query.rs
        
       | mindcrash wrote:
       | Well, ofcourse it does! :)
       | 
       | Another (battle tested * ) solution is to deploy the (open
       | source) Postgres distribution created by Citus (subsidiary of
       | Microsoft) on nodes running on Ubuntu, Debian or Red Hat and you
       | are pretty much done: https://www.citusdata.com/product/community
       | 
       | Slap good old trusty PgBounce in front of it if you want/need
       | (and you probably do) connection pooling:
       | https://www.citusdata.com/blog/2017/05/10/scaling-connection...
       | 
       | *) Citus was purchased by Microsoft more or less solely to
       | provide easy scale out on Azure through Cosmos DB for PostgreSQL
        
         | gigatexal wrote:
         | Is it really that easy? What are the edge cases?
        
           | levkk wrote:
           | It's not. We tried. Plus, it doesn't work on RDS, where most
           | of production databases are. I think Citus was a great first
           | step in the right direction, but it's time to scale the 99%
           | of databases that don't run on Azure Citus already.
        
             | mindcrash wrote:
             | That's because Amazon wants to do whatever they like
             | themselves... you apparently can get stuff to work by
             | running your own masters (w/ citus extension) in EC2 backed
             | by workers (Postgres RDS) in RDS:
             | 
             | https://www.citusdata.com/blog/2015/07/15/scaling-
             | postgres-r... (note that this is a old blog post --
             | pg_shard has been succeeded by citus, but the architecture
             | diagram still applies)
             | 
             | And me saying "Apparently" because I have no experience
             | dealing with large databases on AWS.
             | 
             | Personally had no issues with Citus too, both on bare
             | metal/VMs and as SaaS on Azure...
        
           | caffeinated_me wrote:
           | Depends on your schema, really. The hard part is choosing a
           | distribution key to use for sharding- if you've got something
           | like tenant ID that's in most of your queries and big tables,
           | it's pretty easy, but can be a pain otherwise.
        
             | mindcrash wrote:
             | Same pain as with good old (native) partitioning, right? :)
             | 
             | As with partitioning, in my experience something like a
             | common key (identifying data sets), tenant id and/or
             | partial date (yyyy-mm) work pretty great
        
               | caffeinated_me wrote:
               | For a multi-tenant use case, yeah, pretty close to
               | thinking about partitioning.
               | 
               | For other use cases, there can be big gains from cross-
               | shard queries that you can't really match with
               | partitioning, but that's super use case dependent and not
               | a guaranteed result.
        
       | rohan_ wrote:
       | Couldn't they have just moved to Aurora DSQL and saved all the
       | headache?
        
         | levkk wrote:
         | We actually found Aurora to be about 3x slower than community
         | PG for small queries. That was back then, maybe things are
         | better now. Migrating to another database (and Aurora is
         | Postgres-compatible, it's not Postgres) is very risky when
         | you've been using yours for years and know where the edge cases
         | are.
        
           | rednafi wrote:
           | This!
           | 
           | Postgres to pg-compatible DBs are never as smooth as they
           | advertise it to be.
        
       | fmajid wrote:
       | Skype open-sourced their architecture way back, using PL/Proxy to
       | route calls based on shard. It works, is quite elegant, handled
       | 50% of all international phone calls in the noughties. My old
       | company used it to provide real-time analytics on about 300M
       | mobile devices.
       | 
       | https://wiki.postgresql.org/images/2/28/Moskva_DB_Tools.v3.p...
       | 
       | https://s3.amazonaws.com/apsalar_docs/presentations/Apsalar_...
        
         | Keyframe wrote:
         | _Skype has had from the beginning the requirement that all
         | database access must be implemented through stored procedures._
         | 
         | That presentation starts with hard violence.
        
           | Tostino wrote:
           | If the database team designed a thoughtful API with stored
           | procedures, this can actually be a quite nice way to interact
           | with a database for specific uses.
           | 
           | Being 100% hard and fast on that rule seems like a bad idea
           | though.
        
       | rednafi wrote:
       | Another option is going full-scale with CockroachDB. We had a
       | Django application backed by PostgreSQL, which we migrated to
       | CockroachDB using their official backend.
       | 
       | The data migration was a pain, but it was still less painful than
       | manually sharding the data or dealing with 3rd party extensions.
       | Since then, we've had a few hiccups with autogenerated migration
       | scripts, but overall, the experience has been quite seamless. We
       | weren't using any advanced PostgreSQL features, so CockroachDB
       | has worked well.
        
         | skunkworker wrote:
         | Unless their pricing has changed, it's quite exorbitant when
         | you need a lot of data. To the point that one year of
         | cockroachdb would cost 5x the cost of the server it was running
         | on.
        
           | rednafi wrote:
           | This is still true. I wouldn't use Cockroach if it were my
           | own business. Also, they don't offer any free version to try
           | out the product. All you get is a short trial period and
           | that's it.
        
             | CharlesW wrote:
             | > _Also, they don't offer any free version to try out the
             | product._
             | 
             | The site makes it seems as if I can install CockroachDB on
             | Mac, Linux, or Windows and try it out for as long as I
             | like. https://www.cockroachlabs.com/docs/v25.1/install-
             | cockroachdb... Additionally, they claim CockroachDB Cloud
             | is free for use "up to 10 GiB of storage and 50M RUs per
             | organization per month".
        
               | rednafi wrote:
               | Oh yeah, you can run docker-compose and play with the
               | local version as long as you want. But their cloud offers
               | are limited and quite expensive.
        
               | Ocha wrote:
               | You can use official binary in production deployment -
               | you just need to manage it yourself like you would manage
               | Postgres.
        
           | CharlesW wrote:
           | I think you're referring to the CockroachDB Cloud DBaaS
           | offering vs. CockroachDB itself, correct?
        
             | rednafi wrote:
             | Not the parent but yeah, most likely. But then again, you
             | probably don't want to maintain your own deployment of
             | Cockroach fleet.
        
         | levkk wrote:
         | I'm glad you brought up the migration, because one of the main
         | goals behind our project is to automate migrating to a sharded
         | deployment. You can think of your DB as the base case
         | (num_shards = 1), and PgDog as the recursive solution.
        
           | rednafi wrote:
           | Automatic data transfer would be super cool to get out of the
           | box. We had a custom multi-tenancy solution for our
           | application that heavily used PostgreSQL schemas. One schema
           | for each customer.
           | 
           | It was a pain to get that work with Cockroach since it
           | doesn't optimize cross schema queries and suggests one DB per
           | customer. This was a deal breaker for us and we had to
           | duplicate data to avoid cross schema queries.
           | 
           | Being able to live within Postgres has its advantages.
        
         | traceroute66 wrote:
         | > Another option is going full-scale with CockroachDB
         | 
         | Just beware that CockroachDB is not a drop-in replacement for
         | PostgreSQL.
         | 
         | Last time I looked it was missing basic stuff. Like stored
         | functions. I don't call stored functions an "advanced feature".
        
       | aprdm wrote:
       | 99.9% of the companies in the world will never need more than 1
       | beefy box running postgres with a replica for a manual failover
       | and/or reads.
        
         | frollogaston wrote:
         | Availability is trickier than scalability. An async replica can
         | lose a little data during a failover, and a synchronous replica
         | is safer but slower. A company on some platform might not even
         | know which one they're using until it bites them.
        
       | Eikon wrote:
       | I run a 100 billion+ rows Postgres database [0], that is around
       | 16TB, it's pretty painless!
       | 
       | There are a few tricks that make it run well (PostgreSQL compiled
       | with a non-standard block size, ZFS, careful VACUUM planning).
       | But nothing too out of the ordinary.
       | 
       | ATM, I insert about 150,000 rows a second, run 40,000
       | transactions a second, and read 4 million rows a second.
       | 
       | Isn't "Postgres does not scale" a strawman?
       | 
       | [0] https://www.merklemap.com/
        
         | rednafi wrote:
         | Damn, that's a chonky database. Have you written anything about
         | the setup? I'd love to know more-- is it running on a single
         | machine? How many reader and writer DBs? What does the
         | replication look like? What are the machine specs? Is it self-
         | hosted or on AWS?
         | 
         | By the way, really cool website.
        
           | Eikon wrote:
           | I'll try to get a blog post out soon!
           | 
           | > Damn, that's a chonky database. Have you written anything
           | about the setup? I'd love to know more-- is it running on a
           | single machine? How many reader and writer DBs? What does the
           | replication look like? What are the machine specs? Is it
           | self-hosted or on AWS?
           | 
           | It's self-hosted on bare metal, with standby replication,
           | normal settings, nothing "weird" there.
           | 
           | 6 NVMe drives in raidz-1, 1024GB of memory, a 96 core AMD
           | EPYC cpu.
           | 
           | A single database with no partitioning (I avoid PostgreSQL
           | partitioning as it complicates queries and weakens constraint
           | enforcement, and IHMO is not providing much benefits outside
           | of niche use-cases).
           | 
           | > By the way, really cool website.
           | 
           | Thank you!
        
             | stuartjohnson12 wrote:
             | > It's self-hosted on bare metal, with standby replication,
             | normal settings, nothing "weird" there.
             | 
             | I can build scalable data storage without a flexible
             | scalable redundant resilient fault-tolerant available
             | distributed containerized serverless microservice cloud-
             | native managed k8-orchestrated virtualized load balanced
             | auto-scaled multi-region pubsub event-based stateless
             | quantum-ready vectorized private cloud center? I won't
             | believe it.
        
               | tracker1 wrote:
               | +1 as I'm hoping this is sarcastic humor.
        
               | Keyframe wrote:
               | from riches to RAG.
        
             | rednafi wrote:
             | > I'll try to get a blog post out soon!
             | 
             | Please do.
             | 
             | > It's self-hosted on bare metal, with standby replication,
             | normal settings, nothing "weird" there.
             | 
             | 16TB without nothing weird is pretty impressive. Our devops
             | team reached for Aurora way before that.
             | 
             | > 6 NVMe drives in raidz-1, 1024GB of memory, a 96-core AMD
             | EPYC CPU.
             | 
             | Since you're self hosted, I'm you aren't on AWS. How much
             | is this setup costing you now if you don't mind sharing.
             | 
             | > A single database with no partitioning (I avoid
             | PostgreSQL partitioning as it complicates queries and
             | weakens constraint enforcement, and IMHO does not provide
             | many benefits outside of niche use cases).
             | 
             | Beautiful!
        
               | Eikon wrote:
               | > Since you're self hosted, I'm you aren't on AWS. How
               | much is this setup costing you now if you don't mind
               | sharing.
               | 
               | About 28K euros of hardware per replica IIRC + colo
               | costs.
        
               | rednafi wrote:
               | Yearly, 28k Euros I presume.
               | 
               | Damn. I hope you make enough revenue to continue. This is
               | pretty impressive.
        
               | Eikon wrote:
               | No, one time + ongoing colocation costs.
        
               | tracker1 wrote:
               | The CPU itself is around $8-10k for a top-end AMD Epyc,
               | $15-20k for the rest of the server, including memory and
               | storage is probably about right. There are still $100k+
               | servers, but they tend to be AI equipment at this point,
               | not the general purpose stuff, which is sub $30-50k now.
        
               | outworlder wrote:
               | > 16TB without nothing weird is pretty impressive. Our
               | devops team reached for Aurora way before that.
               | 
               | Probably depends on the usage patterns too. Our
               | developers commit atrocities in their 'microservices'
               | (which are not micro, or services, but that's another
               | discussion).
        
             | GordonS wrote:
             | I'm also self-hosting Postgres, and the project is getting
             | to the point where a standby would be a good idea to ensure
             | higher availability.
             | 
             | Did you use any particular guide for setting up
             | replication? Also, how do you handle failover/fallback
             | to/from standby please?
        
               | Tostino wrote:
               | Not OP, but managed a ~1tb Postgres install for years.
               | You should use something like pgbackrest or barman to
               | help with both replication (replicas can pull WAL from
               | your backups when catching up), backups, and failovers.
               | 
               | At least for pgbackrest, set up a spool directory which
               | allows async wal push / fetch.
        
             | tracker1 wrote:
             | > A single database with no partitioning (I avoid
             | PostgreSQL partitioning as it complicates queries and
             | weakens constraint enforcement, and IHMO is not providing
             | much benefits outside of niche use-cases).
             | 
             | That's kind of where I'm at now... you can vertically scale
             | a server so much now (compared to even a decade ago) that
             | there's really no need to bring a lot of complexity in IMO
             | for Databases. Simple read replicas or hot spare should be
             | sufficient for the vast majority of use cases and the
             | hardware is way cheaper than a few years ago, relatively
             | speaking.
             | 
             | I spent a large part of the past decade and a half using
             | and understanding all the no-sql options (including
             | sharding with pg) and where they're better or not. At this
             | point my advice is start with PG, grow that DB as far as
             | real hardware will let you... if you grow to the point you
             | need more, then you have the money to deal with your use
             | case properly.
             | 
             | So few applications have the need for beyond a few million
             | simultaneous users, and avoiding certain pitfalls, it's not
             | that hard. Especially if you're flexible enough to leverage
             | JSONB and a bit of denormalization for fewer joins, you'll
             | go a very, very long way.
        
               | arkh wrote:
               | > you can vertically scale a server so much now
               | 
               | And you often don't really need to.
               | 
               | Just last week for some small application and checking
               | the performance of some queries I add to get random data
               | on a dev setup. Which is a dockerized postgres (with no
               | tuning at all) in a VM on a basic windows laptop. I
               | inserted enough data to represent what could maybe be
               | there in 20 years (like some tables got half a billion
               | rows, small internal app). Still no problem chugging
               | along.
               | 
               | It is crazy when you compare what you can do with
               | databases now on modern hardware with how other software
               | do not feel as having benefited as much. Especially on
               | the frontend side.
        
             | JohnBooty wrote:
             | Your replies are really valuable and informative. Thank you
             | so much.
             | 
             | Question - what is your peak utilization % like? How close
             | are you to saturating these boxes in terms of CPU etc?
        
               | Eikon wrote:
               | I'd say 60-70% overall cpu usage, including database,
               | ingest workers, web app and search.
               | 
               | > Your replies are really valuable and informative. Thank
               | you so much.
               | 
               | Thank you!
        
         | sa46 wrote:
         | I used to run a bunch of Postgres nodes at a similar scale. The
         | most painful parts (by far) were restoring to a new node and
         | major version upgrades.
         | 
         | Any tricks you used for those parts?
        
           | Eikon wrote:
           | > were restoring to a new node
           | 
           | Zfs send / recv or replication.
           | 
           | > I used to run a bunch of Postgres nodes at a similar scale.
           | The most painful parts (by far) were restoring to a new node
           | and major version upgrades. Any tricks you used for those
           | parts?
           | 
           | Replication makes this pretty painless :)
        
             | wg0 wrote:
             | So run a replica? Is there more literature on that?
        
               | Eikon wrote:
               | > So run a replica?
               | 
               | Basically, yes.
               | 
               | https://www.postgresql.org/docs/current/runtime-config-
               | repli...
        
         | marsavar wrote:
         | That's amazing! Could you elaborate more on your VACUUM
         | planning?
        
           | Eikon wrote:
           | Mostly, be careful with long-running transactions (hours
           | long), and modify your autovacuum settings to be as
           | aggressive as possible based on your workload. Be careful
           | with the freezing threshold too.
           | 
           | I ran into                   multixact "members" limit
           | exceeded
           | 
           | Quite a bit when starting out :)
        
             | whitepoplar wrote:
             | Can you recommend a good rule of thumb for autovacuum
             | settings given a large workload like yours?
        
               | Eikon wrote:
               | Here's mine:
               | 
               | - autovacuum_max_workers to my number of tables (Only do
               | so if you have enough IO capacity and CPU...).
               | 
               | - autovacuum_naptime 10s
               | 
               | - autovacuum_vacuum_cost_delay 1ms
               | 
               | - autovacuum_vacuum_cost_limit 2000
               | 
               | You probably should read
               | https://www.postgresql.org/docs/current/routine-
               | vacuuming.ht... it's pretty well written and easy to
               | parse!
        
               | ahoka wrote:
               | Do you ever need to VACUUM FULL?
        
               | Eikon wrote:
               | It's too slow at that scale, pg_squeeze works wonders
               | though.
               | 
               | I only "need" that because one of my table requires batch
               | deletion, and I want to reclaim the space. I need to
               | refactor that part. Otherwise nothing like that would be
               | required.
        
         | nesarkvechnep wrote:
         | Because no one else cares to ask, are you running FreeBSD? I
         | ask because you use ZFS.
        
           | Eikon wrote:
           | Debian!
        
             | nesarkvechnep wrote:
             | Damn, a stud! ZFS on Linux.
        
               | Eikon wrote:
               | Ahaha :)
        
         | shayonj wrote:
         | > I insert about 150,000 rows a second
         | 
         | That's amazing - I would love to know if you have done careful
         | data modeling, indexing, etc that allows you to get to this and
         | what kind of data is being insert ed?
        
           | Eikon wrote:
           | I am not optimizing too much around insertion speed. I avoid
           | GIN, GIST and hash indexes.
           | 
           | The schema is nicely normalized.
           | 
           | I had troubles with hash indexes requiring hundreds of
           | gigabytes of memory to rebuild.
           | 
           | Postgres B-Trees are painless and (very) fast.
           | 
           | Eg. querying one table by id (redacted):
           | EXPLAIN ANALYZE SELECT * FROM table_name WHERE id =
           | [ID_VALUE];              Index Scan using table_name_pkey on
           | table_name  (cost=0.71..2.93 rows=1 width=32) (actual
           | time=0.042..0.042 rows=0 loops=1)          Index Cond: (id =
           | '[ID_VALUE]'::bigint)         Planning Time: 0.056 ms
           | Execution Time: 0.052 ms
           | 
           | Here's a zpool iostat 1                   # zpool iostat 1
           | operations     bandwidth           read  write   read  write
           | -----  -----  -----  -----          148K   183K  1.23G  2.43G
           | 151K   180K  1.25G  2.36G          151K   177K  1.25G  2.33G
           | 148K   153K  1.23G  2.13G
        
         | kelafoja wrote:
         | One of things I find challenging is understand the meaning of
         | the word "scales". It is sometimes used differently in
         | different contexts.
         | 
         | Can it be performant in high load situations? Certainly. Can is
         | elastically scale up and down based on demand? As far as I'm
         | aware it cannot.
         | 
         | What I'm most interested in is how operations are handled. For
         | example, if it's deployed in a cloud environment and you need
         | more CPU and/or memory, you have to eat the downtime to scale
         | it up. What if it's deployed to bare metal and it cannot handle
         | the increasing load anymore? How costly (in terms of both time
         | and money) is it to migrate it to bigger hardware?
        
           | pluto_modadic wrote:
           | a database scaling dramatically up and down /under load/ and
           | expecting it to perform the same as steady state seems a bit
           | weird, vs a single, beefy database with a beefy ingest job
           | and a bunch of read only clients searching it?
           | 
           | like you're more likely to encounter two phases (building the
           | DB in heavy growth mode, and using the DB in light growth
           | heavy read mode).
           | 
           | A business that doesn't quite yet know what size the DB needs
           | to be has a frightening RDS bill incoming.
        
           | nine_k wrote:
           | When it "scales", it usually means "scales up". A scalable
           | solution is such that can withstand a large and increasing
           | load, past the usual limitations of obvious solutions.
           | 
           | Being _elastic_ is nice, but not always needed. In most cases
           | of database usage, downsizing never happens, or expected to
           | happen: logically, data are only added, and any packaging and
           | archiving only exists to keep the size manageable.
        
         | MR4D wrote:
         | Having some issues with your numbers, but I'm probably just
         | missing something...
         | 
         | If you insert 150K rows per second, that's roughly 13 Billion
         | rows per day.
         | 
         | So you're inserting 10%+ of your database size every day?
         | 
         | That seems weird to me. Are you pruning somewhere? If not, is
         | your database less than a month old? I'm confused.
        
           | Eikon wrote:
           | Well, that's why I said "ATM", it's not a sustained rate, all
           | the time. And yes, there's a bunch of DELETEs too.
        
         | oa335 wrote:
         | Can you please share some tips and tricks for achieving such
         | high throughput?
        
         | tnorgaard wrote:
         | Super interesting compiling pg, I assume, with same as the zfs
         | block size! It was always on our todo to try, but never got
         | around to it. If possible, what block size did you end up with?
         | Have you tried zfs direct io in 2.3.x, if so, could you share
         | any findings? Thanks for sharing - and cool website!
        
           | Eikon wrote:
           | I don't think Postgres will be able to benefit from direct
           | io? I might be wrong though!
           | 
           | I use Postgres with 32K BLKSZ.
           | 
           | I am actually using default 128K zfs recordsize, in a mixed
           | workload, I found overall performance nicer than matching at
           | 32K, and compression is way better.
           | 
           | > Thanks for sharing - and cool website!
           | 
           | Thank you!
        
         | frollogaston wrote:
         | "Postgres does not scale" means that you can't just give it
         | more machines, which is true. At some point you've got the
         | fastest available machine, maybe you've played with the
         | settings, and that's it. Then you can embark on the kind of
         | manual sharding journey the article describes.
         | 
         | But most of the time, an RDBMS is the right tool for the job
         | anyway, you just have to deal with it.
        
           | Eikon wrote:
           | > "Postgres does not scale" means that you can't just give it
           | more machines, which is true.
           | 
           | Well, it's only true for writes.
        
             | frollogaston wrote:
             | It's still true for reads if you need them to be fully
             | consistent.
        
       | kristianpaul wrote:
       | Logical replication works both ways, thats a good start.
        
       | briankelly wrote:
       | People talk about scale frequently as a single dimension (and
       | usually volume as it relates to users) but that can be
       | oversimplifying for many kinds of applications. For instance, as
       | you are thinking about non-trivial partitioning schemes (like if
       | there is high coupling between entities of the same kind - as you
       | see in graphs) is when you should consider alternatives like the
       | Bigtable-inspired DBs, since those are (relatively) more
       | batteries included for you.
       | 
       | > It's funny to write this. The Internet contains at least 1 (or
       | maybe 2) meaty blog posts about how this is done
       | 
       | It would've been great to link those here. I'm guessing one
       | refers to StackOverflow which has/had one of the more famous
       | examples of scaled Postgres.
        
         | levkk wrote:
         | I was thinking of the Instagram post years ago. And maybe the
         | Instacart one.
        
           | briankelly wrote:
           | Maybe this one for Instagram? https://instagram-
           | engineering.com/sharding-ids-at-instagram-...
        
       | octernion wrote:
       | i was reading through this and was going "huh this sounds
       | familiar" until i read who wrote it :)
       | 
       | neat piece of tech! excited to try it out.
        
       | fourseventy wrote:
       | I run a postgresql db with a few billion rows at about 2TB right
       | now. We don't need sharding yet but when we do I was considering
       | Citus. Does anyone have experience implementing Citus that could
       | comment?
        
       | eximius wrote:
       | Something I don't see in the pgdog documentation is how cross-
       | shard joins work. Okay, if I do a simple `select * from users
       | order by id`, you'll in-memory order the combined results for me.
       | But if I have group by and aggregations and such? Will it resolve
       | that correctly?
        
         | levkk wrote:
         | Aggregates are a work in progress. We're going to implement
         | them in this order:
         | 
         | 1. count
         | 
         | 2. max, min, sum
         | 
         | 3. avg (needs a query rewrite to include count)
         | 
         | Eventually, we'll do all of these:
         | https://www.postgresql.org/docs/current/functions-
         | aggregate..... If you got a specific use case, reach out and
         | we'll prioritize.
        
           | eximius wrote:
           | Heh, no chance I can introduce this at work and hard to have
           | a personal project requiring it. :)
           | 
           | I think you probably need some documentation to the effect of
           | the current state of affairs, as well as prescriptions as to
           | how to work around it. _Most_ live workloads, even if the
           | total dataset is huge, have a pretty small working set. So
           | limiting DB operations to simple fetches and doing any
           | complex operations in memory is viable, but should be
           | prescribed as the solution or people will consider it's
           | omission as a fault instead of a choice.
        
             | levkk wrote:
             | No worries. It's early days, the code and docs are just a
             | few months in the making. I'm happy to keep you updated on
             | the progress. If you want, send your contact info to
             | hi@pgdog.dev.
             | 
             | - Lev
        
       | akshayshah wrote:
       | Apart from being backed by Postgres instead of MySQL, is this
       | different from Vitess (and its commercial vendor PlanetScale)?
       | 
       | https://vitess.io/
        
         | levkk wrote:
         | The goal for this project is to be analogous to Vitess for
         | Postgres.
        
       | Karupan wrote:
       | Tangentially related: is there a good guide or setup scripts to
       | run self hosted Postgres with backups and secondary standby? Like
       | I just want something I can deploy to a VPS/dedicated box for all
       | my side projects.
       | 
       | If not is supabase the most painless way to get started?
        
       ___________________________________________________________________
       (page generated 2025-03-14 23:01 UTC)