[HN Gopher] Pgagroal: High-performance connection pool for Postg...
___________________________________________________________________
Pgagroal: High-performance connection pool for PostgreSQL
Author : mastabadtomm
Score : 59 points
Date : 2021-11-22 06:56 UTC (16 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| joij2oi332 wrote:
| and all that just because Postgresql team doesn't want to use
| threads/tasks instead of heavy processes... I don't know any
| other DBMS that require an additional infrastructure layer just
| for connection pooling.
| maxpert wrote:
| > This report describe pgagroal in relationship to 3 other
| PostgreSQL connection pool implementations, which we will call
| "a", "b" and "c".
|
| Why? Just why! Please put up names rather than saving faces for
| other poolers!
| iruoy wrote:
| Also make the benchmark code open source, so that we can run it
| on our own hardware and judge it's relevancy.
| hagbarddenstore wrote:
| Uh... https://www.postgresql.org/docs/current/pgbench.html ?
|
| That's what's being used...
| bigbillheck wrote:
| > Why? Just why! Please put up names rather than saving faces
| for other poolers!
|
| It's Larry Ellison's fault:
| https://en.wikipedia.org/wiki/David_DeWitt
| oliwarner wrote:
| This is an [interesting] joke, right?
| bigbillheck wrote:
| No? Oracle started putting clauses in their sales contracts
| limiting publishing benchmarks because they performed
| poorly in a published report.
| oliwarner wrote:
| I mean in this context. Seems implausible you could sneak
| a DeWitt Clause into a FOSS license.
| garyclarke27 wrote:
| I'm curious, does anyone know how this compares to Yandex Odyssey
| or Pgpool?
| skrause wrote:
| "PostgreSQL's model of one process per connection doesn't scale,
| so let's build a connection pooler that uses one process per
| connection."
|
| Certainly sounds like a strange concept. How well does it scale
| with thousands of connections?
| api wrote:
| The popularity and mindshare of Postgres really puzzles me.
| It's so arcane and full of weird failure modes and gotchas. It
| feels like something from a 1980s mainframe that would come
| with a bound "field manual."
|
| We can have file systems like zfs that for all but maybe the
| largest and most exotic use cases are bulletproof and
| relatively easy to administrate. Why can't we have SQL like
| this?
|
| I guess we do in the form of CockroachDB or Yugabyte, but
| Postgres has been so aggressively evangelized try selling
| someone on those. I know of one or two companies that use them
| but it's a tough sell.
|
| There are tons of overpriced Postgres managed cloud offerings
| and consultancies, so my guess is that there is a heavy
| financially motivated astroturf campaign.
| Scarbutt wrote:
| This forum is heavily anti-other-dbs and pro-postgres so you
| won't get objective answers here. FWIW, at least three DBAs
| that know both well have told me that postgres is an
| operational nightmare compared to mysql.
| endisneigh wrote:
| In which ways?
| Scarbutt wrote:
| Don't know the exact details but they mentioned is full
| of operational footguns, like using it slightly wrong can
| utterly tank your performance or availability. Many
| operations are unpredictable so is hard to know what to
| expect.
|
| Another reason they mentioned is InnoDB, in particular
| that is the most battle-tested and well-engineered
| storage engine. With enormous amounts of engineering
| effort in hardening it for performance and for stability
| and operability at scale by companies like Google and
| Facebook.
|
| They also like Vitess.
| rkrzr wrote:
| As a long-time happy user of Postgres, I am happy to answer
| your questions:
|
| > The popularity and mindshare of Postgres really puzzles me.
|
| Why? Postgres really is "The World's Most Advanced Open
| Source Relational Database", just like it says on the tin.
|
| > It's so arcane and full of weird failure modes and gotchas.
|
| Not really. It actually is the best-documented open source
| database that behaves just like they say in the manual. The
| Postgres manual is really quite wonderful, can be read back-
| to-back, like a good book.
|
| > Why can't we have SQL like this?
|
| This has more to do with SQL itself than with Postgres. SQL
| as a standard has evolved over so many decades that the
| latest version of the spec has literally thousands of pages
| [citation needed, couldn't quickly find a link]. Implementing
| all of this, is an enormous task, and no relational DB
| implements all of it.
|
| > I guess we do in the form of CockroachDB or Yugabyte, but
| Postgres has been so aggressively evangelized try selling
| someone on those. I know of one or two companies that use
| them but it's a tough sell.
|
| Postgres covers ~95% of all use cases of your average
| software company. Specialized DBs are really only useful in
| that last ~5% of cases.
|
| > There are tons of overpriced Postgres managed cloud
| offerings and consultancies, so my guess is that there is a
| heavy financially motivated astroturf campaign.
|
| Postgres is quite easy to run yourself, completely free of
| charge. There really isn't any "financially motivated
| astroturf campaign" that I can discern.
| djbusby wrote:
| I've been using PG for like 20 years. Never paid, except
| for odd donation here and there, last trigger by the add of
| jsonb type.
| qaq wrote:
| You do realize that prob 80% of Yugobyte code is PostgreSQL?
| api wrote:
| Yes. The other 20% implements Raft consensus.
| bayesian_horse wrote:
| Some people feel like Postgres with its decades of history is
| the safer bet. It's a known commodity. Or at least more known
| than CockroachDB and Yugabyte.
| gmac wrote:
| First: have you used Postgres, and at what scale? Because
| that hasn't been my experience of it.
|
| Second: do you know something that's comparable but better?
| api wrote:
| I should have been more detailed.
|
| Postgres works really well in single node configurations as
| long as you read up a bit on the weirdness around
| vacuuming. Usually you don't have to worry unless you are
| running write-heavy loads at large scale.
|
| HA Postgres is where things get frustrating and hairy. It's
| 2021. I should be able to set up a master-master multi-node
| database by running a second node and telling it where the
| first node is, and then you're done. Master with read-only
| backup should be similarly easy. If I shut down the "lead"
| node a backup should become leader automatically. I should
| be able to set this up in a day and not worry that it will
| fail to fail over properly in a real scenario because there
| are algorithms now like Raft consensus that are provably
| correct (provided you meet their requirements).
|
| IMHO this kind of easy clustering is a table stakes feature
| for anything billing itself as an "enterprise capable"
| database.
|
| The fact that HA Postgres requires me to think in 2021 is
| what makes me hate the thing. As near as I can tell this
| problem exists to make me buy managed HA Postgres at 1000%
| markup over compute/storage.
|
| I can't complain too much because it's free and it is very
| good for the things it does well. It's just puzzling to me
| that there is no canonical turn-key solution for this after
| so many years of so many people requiring it.
|
| Implementing Raft is _far far_ easier than a lot of the
| other stuff that has been implemented in Postgres.
| whitepoplar wrote:
| Check out pg_auto_failover! It's the easiest HA option
| for Postgres atm.
| jpgvm wrote:
| Not the author but PG connection scalability being blamed on
| per-process model is somewhat misunderstood. You can easily
| have many hundreds of thousands of processes, they are mostly
| equivalent to threads on Linux anyway (with some minor
| differences ofc).
|
| Instead the poor scalability stems from the amount of resources
| allocated to each connection. If postgres was to switch to
| threads without changing this architecture it's likely
| scalability wouldn't be majorly improved.
|
| By the same token a connection pooler that is fork based but
| only allocates a very small amount of resources per connection
| is still likely to a net win as it can do statement based
| pooling of a smaller number of postgres connections.
| anarazel wrote:
| > You can easily have many hundreds of thousands of
| processes, they are mostly equivalent to threads on Linux
| anyway (with some minor differences ofc).
|
| Not really. The page-table overhead of a 100k processes alone
| is going to a eat up a _lot_ of memory - and you don 't have
| that with threads. And with something like postgres the page
| tables aren't going to be tiny, even with explicit use of
| huge pages for applicable allocations. The decrease in TLB
| hit ratio also is very painful. In a lot of workloads
| postgres spends way too much time in TLB faults, and that's
| to a good degree caused by the process model (also caused by
| linux making it way too hard to map executables with huge
| pages, itlb faults are a significant issue).
|
| That's not to say that a 1 connection: 1 thread approach is
| the right thing for a pooler - threads area also don't scale
| great. I think there are good simplicity arguments to be made
| for a 1 connection: 1 thread model for a database server
| itself, but for poolers the benefits seem much smaller.
| anarazel wrote:
| (Although the sibling post pointing to my postgres
| connection scalability post is right - processes are not
| the major bottleneck at the moment, and certainly not with
| < PG 14)
| jpgvm wrote:
| Yeah I agree wholeheartedly. What I was trying to get
| across was that simply s/processes/threads isn't going to
| make a massive difference to postgres if that was the
| only change you were to make.
|
| Your analysis seems pretty on-point with what I expected,
| i.e snapshot scalability and other postgres specific arch
| being the major bottlenecks for higher connection counts.
| anarazel wrote:
| FWIW, a good chunk of the snapshot scalability changes
| are in PG 14, so the situation has gone a bit further
| towards s/processes/threads/ being the primary
| bottleneck. It depends quite a bit on whether you're
| write-mostly (bottlenecks almost completely unrelated to
| processes vs threads) or read-mostly. And in the latter
| case whether it's a context switch heavy workload
| (processes vs threads may be a significant factor), or
| not (processes vs threads can be a significant factor,
| but less likely).
| kasey_junk wrote:
| Pretty good discussion of Postgres' process bottlenecks:
| https://www.citusdata.com/blog/2020/10/08/analyzing-
| connecti...
|
| Summary: The resource per process issue is isolated to just
| the transaction isolation mechanism and can likely be
| improved but you are never getting around the context
| switching and memory locality issues of the per process
| model.
| anarazel wrote:
| I do not get why it's a good idea for a connection pooler to go
| for a process model. The whole "isolation and stability" argument
| is doubtful for postgres itself, but it's imo pretty much
| nonsensical for a pooler.
| pcardoso wrote:
| > Pronounced: p-g-a-gro-al, named after Agroal in Portugal.
|
| Interesting!
| hoseja wrote:
| Sorry but that's clearly p-gag-roll, whatever the creators may
| wish.
| andruby wrote:
| I'd love to read a comparison with PgBouncer (I assume that
| PgBouncer is still the dominant connection pool)
| toredash wrote:
| AFAIK, it is.
___________________________________________________________________
(page generated 2021-11-22 23:02 UTC)