[HN Gopher] Pgagroal: High-performance connection pool for Postg...
       ___________________________________________________________________
        
       Pgagroal: High-performance connection pool for PostgreSQL
        
       Author : mastabadtomm
       Score  : 59 points
       Date   : 2021-11-22 06:56 UTC (16 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | joij2oi332 wrote:
       | and all that just because Postgresql team doesn't want to use
       | threads/tasks instead of heavy processes... I don't know any
       | other DBMS that require an additional infrastructure layer just
       | for connection pooling.
        
       | maxpert wrote:
       | > This report describe pgagroal in relationship to 3 other
       | PostgreSQL connection pool implementations, which we will call
       | "a", "b" and "c".
       | 
       | Why? Just why! Please put up names rather than saving faces for
       | other poolers!
        
         | iruoy wrote:
         | Also make the benchmark code open source, so that we can run it
         | on our own hardware and judge it's relevancy.
        
           | hagbarddenstore wrote:
           | Uh... https://www.postgresql.org/docs/current/pgbench.html ?
           | 
           | That's what's being used...
        
         | bigbillheck wrote:
         | > Why? Just why! Please put up names rather than saving faces
         | for other poolers!
         | 
         | It's Larry Ellison's fault:
         | https://en.wikipedia.org/wiki/David_DeWitt
        
           | oliwarner wrote:
           | This is an [interesting] joke, right?
        
             | bigbillheck wrote:
             | No? Oracle started putting clauses in their sales contracts
             | limiting publishing benchmarks because they performed
             | poorly in a published report.
        
               | oliwarner wrote:
               | I mean in this context. Seems implausible you could sneak
               | a DeWitt Clause into a FOSS license.
        
       | garyclarke27 wrote:
       | I'm curious, does anyone know how this compares to Yandex Odyssey
       | or Pgpool?
        
       | skrause wrote:
       | "PostgreSQL's model of one process per connection doesn't scale,
       | so let's build a connection pooler that uses one process per
       | connection."
       | 
       | Certainly sounds like a strange concept. How well does it scale
       | with thousands of connections?
        
         | api wrote:
         | The popularity and mindshare of Postgres really puzzles me.
         | It's so arcane and full of weird failure modes and gotchas. It
         | feels like something from a 1980s mainframe that would come
         | with a bound "field manual."
         | 
         | We can have file systems like zfs that for all but maybe the
         | largest and most exotic use cases are bulletproof and
         | relatively easy to administrate. Why can't we have SQL like
         | this?
         | 
         | I guess we do in the form of CockroachDB or Yugabyte, but
         | Postgres has been so aggressively evangelized try selling
         | someone on those. I know of one or two companies that use them
         | but it's a tough sell.
         | 
         | There are tons of overpriced Postgres managed cloud offerings
         | and consultancies, so my guess is that there is a heavy
         | financially motivated astroturf campaign.
        
           | Scarbutt wrote:
           | This forum is heavily anti-other-dbs and pro-postgres so you
           | won't get objective answers here. FWIW, at least three DBAs
           | that know both well have told me that postgres is an
           | operational nightmare compared to mysql.
        
             | endisneigh wrote:
             | In which ways?
        
               | Scarbutt wrote:
               | Don't know the exact details but they mentioned is full
               | of operational footguns, like using it slightly wrong can
               | utterly tank your performance or availability. Many
               | operations are unpredictable so is hard to know what to
               | expect.
               | 
               | Another reason they mentioned is InnoDB, in particular
               | that is the most battle-tested and well-engineered
               | storage engine. With enormous amounts of engineering
               | effort in hardening it for performance and for stability
               | and operability at scale by companies like Google and
               | Facebook.
               | 
               | They also like Vitess.
        
           | rkrzr wrote:
           | As a long-time happy user of Postgres, I am happy to answer
           | your questions:
           | 
           | > The popularity and mindshare of Postgres really puzzles me.
           | 
           | Why? Postgres really is "The World's Most Advanced Open
           | Source Relational Database", just like it says on the tin.
           | 
           | > It's so arcane and full of weird failure modes and gotchas.
           | 
           | Not really. It actually is the best-documented open source
           | database that behaves just like they say in the manual. The
           | Postgres manual is really quite wonderful, can be read back-
           | to-back, like a good book.
           | 
           | > Why can't we have SQL like this?
           | 
           | This has more to do with SQL itself than with Postgres. SQL
           | as a standard has evolved over so many decades that the
           | latest version of the spec has literally thousands of pages
           | [citation needed, couldn't quickly find a link]. Implementing
           | all of this, is an enormous task, and no relational DB
           | implements all of it.
           | 
           | > I guess we do in the form of CockroachDB or Yugabyte, but
           | Postgres has been so aggressively evangelized try selling
           | someone on those. I know of one or two companies that use
           | them but it's a tough sell.
           | 
           | Postgres covers ~95% of all use cases of your average
           | software company. Specialized DBs are really only useful in
           | that last ~5% of cases.
           | 
           | > There are tons of overpriced Postgres managed cloud
           | offerings and consultancies, so my guess is that there is a
           | heavy financially motivated astroturf campaign.
           | 
           | Postgres is quite easy to run yourself, completely free of
           | charge. There really isn't any "financially motivated
           | astroturf campaign" that I can discern.
        
             | djbusby wrote:
             | I've been using PG for like 20 years. Never paid, except
             | for odd donation here and there, last trigger by the add of
             | jsonb type.
        
           | qaq wrote:
           | You do realize that prob 80% of Yugobyte code is PostgreSQL?
        
             | api wrote:
             | Yes. The other 20% implements Raft consensus.
        
           | bayesian_horse wrote:
           | Some people feel like Postgres with its decades of history is
           | the safer bet. It's a known commodity. Or at least more known
           | than CockroachDB and Yugabyte.
        
           | gmac wrote:
           | First: have you used Postgres, and at what scale? Because
           | that hasn't been my experience of it.
           | 
           | Second: do you know something that's comparable but better?
        
             | api wrote:
             | I should have been more detailed.
             | 
             | Postgres works really well in single node configurations as
             | long as you read up a bit on the weirdness around
             | vacuuming. Usually you don't have to worry unless you are
             | running write-heavy loads at large scale.
             | 
             | HA Postgres is where things get frustrating and hairy. It's
             | 2021. I should be able to set up a master-master multi-node
             | database by running a second node and telling it where the
             | first node is, and then you're done. Master with read-only
             | backup should be similarly easy. If I shut down the "lead"
             | node a backup should become leader automatically. I should
             | be able to set this up in a day and not worry that it will
             | fail to fail over properly in a real scenario because there
             | are algorithms now like Raft consensus that are provably
             | correct (provided you meet their requirements).
             | 
             | IMHO this kind of easy clustering is a table stakes feature
             | for anything billing itself as an "enterprise capable"
             | database.
             | 
             | The fact that HA Postgres requires me to think in 2021 is
             | what makes me hate the thing. As near as I can tell this
             | problem exists to make me buy managed HA Postgres at 1000%
             | markup over compute/storage.
             | 
             | I can't complain too much because it's free and it is very
             | good for the things it does well. It's just puzzling to me
             | that there is no canonical turn-key solution for this after
             | so many years of so many people requiring it.
             | 
             | Implementing Raft is _far far_ easier than a lot of the
             | other stuff that has been implemented in Postgres.
        
               | whitepoplar wrote:
               | Check out pg_auto_failover! It's the easiest HA option
               | for Postgres atm.
        
         | jpgvm wrote:
         | Not the author but PG connection scalability being blamed on
         | per-process model is somewhat misunderstood. You can easily
         | have many hundreds of thousands of processes, they are mostly
         | equivalent to threads on Linux anyway (with some minor
         | differences ofc).
         | 
         | Instead the poor scalability stems from the amount of resources
         | allocated to each connection. If postgres was to switch to
         | threads without changing this architecture it's likely
         | scalability wouldn't be majorly improved.
         | 
         | By the same token a connection pooler that is fork based but
         | only allocates a very small amount of resources per connection
         | is still likely to a net win as it can do statement based
         | pooling of a smaller number of postgres connections.
        
           | anarazel wrote:
           | > You can easily have many hundreds of thousands of
           | processes, they are mostly equivalent to threads on Linux
           | anyway (with some minor differences ofc).
           | 
           | Not really. The page-table overhead of a 100k processes alone
           | is going to a eat up a _lot_ of memory - and you don 't have
           | that with threads. And with something like postgres the page
           | tables aren't going to be tiny, even with explicit use of
           | huge pages for applicable allocations. The decrease in TLB
           | hit ratio also is very painful. In a lot of workloads
           | postgres spends way too much time in TLB faults, and that's
           | to a good degree caused by the process model (also caused by
           | linux making it way too hard to map executables with huge
           | pages, itlb faults are a significant issue).
           | 
           | That's not to say that a 1 connection: 1 thread approach is
           | the right thing for a pooler - threads area also don't scale
           | great. I think there are good simplicity arguments to be made
           | for a 1 connection: 1 thread model for a database server
           | itself, but for poolers the benefits seem much smaller.
        
             | anarazel wrote:
             | (Although the sibling post pointing to my postgres
             | connection scalability post is right - processes are not
             | the major bottleneck at the moment, and certainly not with
             | < PG 14)
        
               | jpgvm wrote:
               | Yeah I agree wholeheartedly. What I was trying to get
               | across was that simply s/processes/threads isn't going to
               | make a massive difference to postgres if that was the
               | only change you were to make.
               | 
               | Your analysis seems pretty on-point with what I expected,
               | i.e snapshot scalability and other postgres specific arch
               | being the major bottlenecks for higher connection counts.
        
               | anarazel wrote:
               | FWIW, a good chunk of the snapshot scalability changes
               | are in PG 14, so the situation has gone a bit further
               | towards s/processes/threads/ being the primary
               | bottleneck. It depends quite a bit on whether you're
               | write-mostly (bottlenecks almost completely unrelated to
               | processes vs threads) or read-mostly. And in the latter
               | case whether it's a context switch heavy workload
               | (processes vs threads may be a significant factor), or
               | not (processes vs threads can be a significant factor,
               | but less likely).
        
           | kasey_junk wrote:
           | Pretty good discussion of Postgres' process bottlenecks:
           | https://www.citusdata.com/blog/2020/10/08/analyzing-
           | connecti...
           | 
           | Summary: The resource per process issue is isolated to just
           | the transaction isolation mechanism and can likely be
           | improved but you are never getting around the context
           | switching and memory locality issues of the per process
           | model.
        
       | anarazel wrote:
       | I do not get why it's a good idea for a connection pooler to go
       | for a process model. The whole "isolation and stability" argument
       | is doubtful for postgres itself, but it's imo pretty much
       | nonsensical for a pooler.
        
       | pcardoso wrote:
       | > Pronounced: p-g-a-gro-al, named after Agroal in Portugal.
       | 
       | Interesting!
        
         | hoseja wrote:
         | Sorry but that's clearly p-gag-roll, whatever the creators may
         | wish.
        
       | andruby wrote:
       | I'd love to read a comparison with PgBouncer (I assume that
       | PgBouncer is still the dominant connection pool)
        
         | toredash wrote:
         | AFAIK, it is.
        
       ___________________________________________________________________
       (page generated 2021-11-22 23:02 UTC)