[HN Gopher] Understanding Connections and Pools
       ___________________________________________________________________
        
       Understanding Connections and Pools
        
       Author : sudhirj
       Score  : 194 points
       Date   : 2021-01-05 12:03 UTC (10 hours ago)
        
 (HTM) web link (sudhir.io)
 (TXT) w3m dump (sudhir.io)
        
       | orware wrote:
       | Not that many people may be using Oracle and PHP together, but
       | last summer I decided to take a closer look at some performance
       | aspects since I'm wanting to move certain PHP things to the
       | cloud, while our Oracle database remains local, which would
       | introduce some extra latency.
       | 
       | In some basic testing on my local machine at work just connecting
       | to our local Oracle database introduced a good 60 ms - 100 ms for
       | a request (before running any sort of queries) and due to PHP's
       | nature this recurs on every new request.
       | 
       | I had searched for options for a long time in past months/years
       | that would allow PHP to have some sort of connection pool the
       | same way languages like Java/Go might allow, but had come up
       | short each time.
       | 
       | I don't even recall how I came across it, but last July I finally
       | learned about Oracle's Database Resident Connection Pooling
       | feature (and then reminded my coworker about wanting to try it
       | out in November) and was able to run some tests and it does a
       | good job of eliminating most of that unnecessary connection
       | waiting time for PHP applications so I was pretty happy with that
       | testing and it didn't seem to increase the load much at all on
       | the database server side.
       | 
       | Even though it might not be as ideal as a truly separate
       | Connection Pool being managed at the application server level, or
       | outside of the database, a Connection Pool available for use at
       | the database level still has benefits so it would be kind of neat
       | to maybe see more of this sort of approach be available with
       | other vendors (if it isn't already).
       | 
       | EDIT: Forgot to add that as a result of the DRCP testing, even
       | though moving the PHP side to the cloud introduces extra latency,
       | with the time savings of the DRCP change, it helps reduce latency
       | enough that overall it results in a time savings over the
       | normal/conventional connection approach when the PHP code is
       | local so for me that was a win (hopefully I'll get to try using
       | it in production in the near future, but waiting on my colleague
       | to sort out a few things with enabling it in production at the
       | moment).
        
       | [deleted]
        
       | kohlerm wrote:
       | IMHO a very good summary of the trade offs, with concrete
       | examples
        
       | sudhirj wrote:
       | Author here, self-posted. Can AMA.
        
         | artimaeis wrote:
         | Just 6 days ago you published what was a pretty great writeup
         | on message queues, now this thorough dive on connections &
         | pools; what's your writing schedule like? At some 12k words of
         | fairly accurate and precise technical content I'm kind of blown
         | away by your pace.
        
           | sudhirj wrote:
           | I had a few paragraphs written earlier (maybe half) over the
           | course of the pandemic.
           | 
           | I spent a lot of the pandemic procrastinating by trying to
           | build my own blogging software, a more efficient version of
           | Ghost, but it's too much duplicated effort given what I
           | learned at AWS reinvent. So I decided to just set up and
           | start publishing.
           | 
           | I write for about two hours a day, maybe four days or writing
           | and two of editing for each article. I'd like to publish once
           | a week, but don't want to set that kind of goal because then
           | I'll write even if I have nothing interesting to say. But
           | maybe that's the best process anyway?
           | 
           | Will tell you what's working if you ask again in a couple of
           | months :-/
        
             | johnonolan wrote:
             | Founder of Ghost here - would love to hear what parts you
             | were hoping to make more efficient / if you have any
             | suggestions :)
        
               | sudhirj wrote:
               | I was trying to make it multi tenant, move it a
               | serverless database like DynamoDB, and rewrite the server
               | in Go.
               | 
               | I'm trying to make a Ghost hosting service that's free /
               | $5 a month for personal use. I've figured out a way to
               | make the economics work with the existing 500MB disk +
               | 1GB RAM requirements, but it would have been a lot easier
               | with Go and Dynamo.
               | 
               | Thanks a ton to you and the team for the work you're
               | doing. I think you're making a fantastic system, and want
               | to make it available to more people as a personal
               | blogging platform. I just wish it could run multi tenant
               | or with lower memory needs.
               | 
               | I don't suppose making it easier for others to compete
               | with the official Ghost revenue model is a priority, so I
               | doubt this is anything you'll want to take up. But I'm
               | happy to share revenues on this - I just think Ghost
               | should be a lot more accessible to individuals around the
               | world.
        
               | johnonolan wrote:
               | Not clear how those changes would make it more accessible
               | or what problem you're actually trying to solve? There
               | are lots of things we're working on improving but I
               | promise you changing the programming language of any of
               | those components wouldn't make any difference to the end
               | experience of the software.
               | 
               | A lot of blogging platforms (including Ghost) start out
               | at $5/mo targeting personal use -- but the ones that
               | survive never stay there. Probably some worthwhile
               | takeaways from that :)
        
               | sudhirj wrote:
               | Yeah, the Go decision was more because of familiarity and
               | some tooling I have available for Dynamo. The real
               | problem I was tackling is zero marginal cost multi
               | tenancy. Right now a new Ghost installation costs 500MB
               | on a disk and 1GB in RAM, and I would like to to cost one
               | row in an existing database instead.
               | 
               | And yeah, don't think the $5 / month is sustainable for a
               | company, but for a solo founder it might work. Of course,
               | that might be a classic case of _this time it's
               | different_.
        
         | pqdbr wrote:
         | Great article!
         | 
         | We have a Rails app that uses Puma with MySQL, but so far
         | single threaded (the Puma config has `threads 1,1`), because we
         | migrated from Unicorn and are still worrying about thread-
         | safety.
         | 
         | Currently our database.yml pool size is set to 25, because we
         | have 25 puma workers on the server.
         | 
         | Reading your article, this seems to be overkill, since that
         | setting is _per worker_, and since each worker is running only
         | with 1 thread, the default 5 should be more than enough?
         | 
         | Also, I just checked and our MySQL is configured with 151 max
         | connections; if every worker of ours can use up to 25
         | connections, that can lead up to 625 connections to the
         | database, and that's without considering Sidekiq (concurrency
         | 25).
        
           | sudhirj wrote:
           | Well, even the default 5 would probably be overkill, unless
           | you're spawning threads on a request that each use their own
           | DB connections. But yes, I think you can go with the the
           | default. Once you're confident about being threadsafe, the
           | Heroku deployment guide for puma is pretty good for all the
           | little knobs to turn.
        
           | sudhirj wrote:
           | Re MySQL counts, based on your notes you're probably using 50
           | conns at max whatever your settings are. 25 for sidekiq
           | (assuming you do run a hot queue) and 25 at max for the
           | servers. Irrespective of DB pool size, I think Rails does
           | lazy connections, so you're never going to go over 1.
        
         | mekoka wrote:
         | I'm confused by this statement about pgBouncer et al.
         | 
         |  _These systems allow you to make as many database connections
         | as you want and manage them inefficiently, because the
         | connections they give you are fake & cheap simulated
         | connections that they handle with low overhead._
         | 
         | What makes them inefficient then, if they're cheap and have low
         | management overhead?
        
           | nickelbob wrote:
           | I think he meant that "you" can manage (or not manage them at
           | all) them inefficiently because the proxy pool well take care
           | of that for you. It reads awkwardly.
        
             | sudhirj wrote:
             | Yeah, will edit that. But yes, you don't need to worry
             | about managing them because they're so cheap.
        
         | fotta wrote:
         | ot: do plan on adding an RSS feed? Would love to subscribe.
        
           | hansvm wrote:
           | In case the author is worried about sufficient readership, I
           | second this motion :)
        
           | sudhirj wrote:
           | I thought this blog engine already had one, I certainly
           | didn't disable anything. Let me check.
        
             | fotta wrote:
             | the "Subscribe Now" button takes me to a login page
        
               | sudhirj wrote:
               | Yeah, that's the email subscription, substack style. Not
               | RSS.
        
           | sudhirj wrote:
           | Yeah, it's active on https://sudhir.io/rss/ Pasting the base
           | URL into a feed reader should usually check the /rss path as
           | a convention.
        
             | fotta wrote:
             | fantastic, thank you!
        
         | est wrote:
         | > the popular servers for Python, PHP and other languages that
         | use a global lock or assume single-threaded single-process
         | execution
         | 
         | This is not true for python. The article you linked mentions
         | Gunicorn, uWSGI and CherryPy, they are threaded and can handle
         | multiple requests per process.
         | 
         | Even if Python does have this GIL thing, it does not lock
         | network operations for a tcp/udp based database protocol (some
         | C based modules do tend to lock the GIL)
        
           | sudhirj wrote:
           | Will edit that, yeah. I was am struggling to explain the GIL
           | without having to talk about the GIL, because if open that
           | can of worms it becomes a much longer article and goes off
           | track.
           | 
           | All of the GL and event loop based stacks go multi threaded a
           | the lower library levels, really. Will think about how to
           | explain that. And will move Python off that general category.
        
         | sankyo wrote:
         | Thank you for this. Very few people can explain complicated
         | ideas with easy to follow language. You have a gift for
         | explaining.
        
           | sudhirj wrote:
           | Thank you for saying that. I enjoy helping people understand
           | things in minutes, especially if those things took me years.
           | 
           | Being stuck at home the only way to keep doing that is
           | writing and videos. And I have a pretty bad stammer so videos
           | aren't an option.
        
         | tjvc wrote:
         | Great article, many thanks.
         | 
         | > Rails auto-management needs to be conservative and cautious,
         | therefore inefficient
         | 
         | Can you help me understand this better? Are you saying that
         | Rails tends to hold onto connections for longer than it needs
         | to?
        
         | toredash wrote:
         | Am I reading this correctly, that with NodeJS and PostgreSQL,
         | the usage of multiple connection poolers _combined_ is a not-
         | so-good idea?
         | 
         | Example, using Knex as the connection pooler for your node app,
         | which is connected to your PgBouncer deployment?
         | 
         | I have not considered this a fault, in one way I find it
         | compelling to have a layer of "protection" by using Knex in
         | addition to PgBouncer, but not I'm not that sure anymore.
        
           | sudhirj wrote:
           | It depends on how knex is used. If it gives you a method to
           | run a DB query, and internally checks out a connection, runs
           | the query and releases the connection, then it's already
           | doing exactly what would happen with a proxy.
           | 
           | If it or you are setting up middleware to start a transaction
           | before every request, or put a checked out connection in the
           | request context, that's a lot of possible inefficiency. If
           | you really want the convenience this offers, and have request
           | flows that do a lot of other non-DB work, you'll need a
           | pooling proxy.
           | 
           | If your request handling is very database heavy and does
           | nothing else, then automatic request level connection check
           | out is still efficient, even without a proxy. Even if you
           | have a proxy one real connection is going to be working the
           | entire duration of the request anyway.
        
             | toredash wrote:
             | Thanks for the feedback. I have some reading to do and some
             | code to checkout.
        
         | victor106 wrote:
         | In depth article, would be good if you can cover Java based
         | connection pool like HickariCP. They are very widely used
        
           | sudhirj wrote:
           | I've added a link to the project in the context of another
           | comment here about deadlocks... is there anything in
           | particular you'd like to mention? Will look into the pool in
           | detail to see if there's any special tips and tricks they
           | use.
        
       | x4m wrote:
       | Thanks for the post, great read! Just one minor note
       | s/Odessey/Odyssey/g // From Odyssey maintainer
        
         | sudhirj wrote:
         | Thanks, will do.
        
       | wiradikusuma wrote:
       | Related, about sizing DB connection (counterintuitively):
       | https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-...
        
         | sudhirj wrote:
         | This formula `pool size = Tn x (Cm - 1) + 1` is really
         | interesting. Any idea how it's derived or what branch of math
         | this is from?
        
           | twic wrote:
           | I think that's just arithmetic. The fundamental idea is that
           | you have enough connections that you can guarantee that at
           | least one thread has all the connections it needs - because
           | if it does, it can proceed, finish, release its connections,
           | and then another thread can pick them up and gets to proceed,
           | etc.
           | 
           | If a thread needs Cm connections, then Cm - 1 connections is
           | enough to be one thread short of what you need. If you have
           | Tn threads, then Tn * (Cm - 1) connections lets every thread,
           | in the worst case, be one short of what it needs (if the
           | connections are not evenly distributed, then at least one
           | thread already has as many as it needs). So Tn * (Cm - 1) + 1
           | connections means that at least one thread definitely has
           | enough connections.
        
             | sudhirj wrote:
             | Ah, ok. Seems clear that this is for a thread requiring C
             | connections _simultaneously_ to get its work done. That
             | would make it a necessary calculation to avoid deadlocks. I
             | don 't see this being necessary in a Rails type app where
             | there's usually one connection per request, but if one were
             | using Go, setting a max connections parameter, and then
             | simultaneously checking out more than one connections to do
             | work, there's an easy chance of deadlocks.
             | 
             | Will add a section talking about that and this formula.
        
               | twic wrote:
               | > Seems clear that this is for a thread requiring C
               | connections simultaneously to get its work done.
               | 
               | Exactly. As that document says (my emphasis):
               | 
               | > The prospect of "pool-locking" has been raised with
               | respect to single actors _that acquire many connections_.
               | This is largely an application-level issue. Yes,
               | increasing the pool size can alleviate lockups in these
               | scenarios, but we would urge you to examine first what
               | can be done at the application level before enlarging the
               | pool.
               | 
               | > Where Tn is the maximum number of threads, and Cm is
               | the maximum number of _simultaneous connections_ held by
               | a single thread.
               | 
               | To be honest, i think applications do this are broken
               | ("largely an application-level issue"), so i'm not sure
               | how important it is.
        
         | ramchip wrote:
         | I think latency also matters a lot for client-side DB
         | connection pooling. If the DB is a 10ms round-trip away, and
         | queries take 10ms, the process on the DB side is only really
         | busy half the time.
        
         | batter wrote:
         | This article helped me to reduce connections pool from 80 to 10
         | per box. Which helped to serve traffic spikes from couple
         | requests per second to thousands per second.
        
           | [deleted]
        
         | calpaterson wrote:
         | A few months ago I wrote an article testing out the relative
         | speed of Python's async options for web apis/sites
         | (https://calpaterson.com/async-python-is-not-faster.html).
         | 
         | My findings were a bit controversial at the time as I found
         | that uWSGI + sync Python is about the best option, largely
         | because replacing interpreted Python code with native code is a
         | very significant factor in the performance of a Python program.
         | 
         | In the following discussion (and private emails) I was stunned
         | by the number of async enthusiasts who proposed raising the
         | number of database connections in the connection pool by two or
         | three _orders of magnitude_ (I had 20 conns in a pool for a 4
         | cpu machine) for reasons along the lines of  "the async app is
         | starved of connections".
         | 
         | In my opinion that suggestion betrays a misunderstanding of
         | what is likely going on in the database when you have 100s or
         | 1000s of connections all contending for the same data: in SQL
         | even reads create locks. Async applications yield _a lot_ , by
         | design, and under hundreds or thousands of requests per second
         | there is a considerable buildup of "work in progress" in the
         | form of async tasks that were yielded from and which have not
         | yet been returned to and completed. Many hundreds of database
         | connections is going to create an enormous volume of
         | bookkeeping on the database side and is very likely to slow
         | things down to a absolute crawl.
         | 
         | Even idle connections are known to have detrimental effect in
         | postgres. Someone at Microsoft is apparently working on this
         | and released this great blog post quantifying the problem:
         | 
         | https://techcommunity.microsoft.com/t5/azure-database-for-po...
        
           | spacemanmatt wrote:
           | > Even idle connections are known to have detrimental effect
           | in postgres.
           | 
           | I have a personal backlog task to evaluate whether connection
           | multiplexing benefits the idle connection cost issue. I wish
           | it could get more priority, as I'm super curious, but my
           | load/volume are so low for my work that it just doesn't
           | matter yet.
        
             | calpaterson wrote:
             | > connection multiplexing
             | 
             | What do you mean by that term?
        
           | nine_k wrote:
           | > _in SQL even reads create locks._
           | 
           | IIRC, of all popular SQL databases, only MySQL uses locks at
           | all. Postgres, MS SQL, Oracle all use MVCC.
           | 
           | (Edited: typo)
        
             | hilbertseries wrote:
             | Innodb does not create locks for reads, unless in a
             | transaction.
             | 
             | > SELECT ... FROM is a consistent read, reading a snapshot
             | of the database and setting no locks unless the transaction
             | isolation level is set to SERIALIZABLE. For SERIALIZABLE
             | level, the search sets shared next-key locks on the index
             | records it encounters. However, only an index record lock
             | is required for statements that lock rows using a unique
             | index to search for a unique row.
             | 
             | https://dev.mysql.com/doc/refman/5.7/en/innodb-locks-
             | set.htm...
        
               | calpaterson wrote:
               | > Innodb does not create locks for reads, unless in a
               | transaction.
               | 
               | Right...but I have a feeling that most
               | libraries/frameworks put you in a transaction by default
               | and just rollback at the end of the request lifecycle.
        
               | hilbertseries wrote:
               | I've worked with many libraries and frameworks and I
               | haven't seen transactions by default. Given how easy it
               | is to deadlock yourself this way, for instance by naively
               | making batch reads in random order. I doubt many
               | libraries would make transactions the default.
               | 
               | For instance in rails and django you need to explicitly
               | specify a transaction. Can you give an example of a
               | framework that turns off auto commit by default and
               | instead runs it at the end of http requests?
        
             | csharptwdec19 wrote:
             | MSSQL will lock on Reads by default, and can(will) escalate
             | locks to entire pages for reads.
             | 
             | Making MVCC Mode 'Opt In' requires enabling Snapshot
             | Isolation, and making it default requires another change.
             | It's also worth noting the way MSSQL does MVCC is far more
             | TempDB/Memory hungry than Oracle/Postgres (due to the
             | nature of using snapshots). I've been in more than one shop
             | that did not want to enable it due to the memory
             | requirements.
        
             | calpaterson wrote:
             | MySQM? MySQL presumably? Anyway MVCC is implemented via
             | locks (among other things, like copying, transaction ids,
             | visibility maps, etc).
             | 
             | Regardless of that, my main point is that even reading
             | requires transactional bookkeeping in the database - which
             | some people don't realise. If you read a row and, do
             | nothing and then rollback a few ms later (common in
             | webapps) there is still bookkeeping to be done.
        
               | spacemanmatt wrote:
               | But it shouldn't result in a write to disk unless a page
               | is dirtied.
        
               | calpaterson wrote:
               | I wasn't talking about writes but as an aside: I wouldn't
               | bet on that.
               | 
               | SQL databases are complicated and writes can happen for
               | many reasons - if you manage to bloat the working set by
               | reading more stuff concurrently (old rows, etc) it's not
               | hard to imagine something having to be paged out as a
               | consequence.
        
               | spacemanmatt wrote:
               | Quite agreed. It's still way harder and way less frequent
               | for read activity to force much write activity, although
               | not hard to imagine.
        
           | karmakaze wrote:
           | Idle connections did show up as taking substantial db CPU
           | time on one MySQL (Rails/Ruby) cluster I administered. We had
           | thousands of connections each making a 'ping' request to
           | check the validity of the connection every second. That was
           | enough context switching load to have an noticeable load.
        
         | commandlinefan wrote:
         | When I was working at a large online travel website a few
         | decades ago, we had a few hundred front-end boxes each with a
         | connection pool set to (I think) around 10 each - so around
         | 2000 connections on the DB backend. We did some profiling and
         | discovered that, under the heaviest load, we never had more
         | than one connection active per web server (and usually 0), so a
         | single shared DB connection would have been just as effective
         | as the pool was.
        
           | sudhirj wrote:
           | For a read-only load, right? If there were writes and
           | transactions I don't see how that would work.
        
         | Traubenfuchs wrote:
         | Doesn't all of this become mindbogglingly complex once you
         | factor in a) changing replica counts (connection count changes
         | in steps of up to the max pool size) and b) multiple
         | applications using the same DB hardware?
        
           | cogman10 wrote:
           | Ish but not really. Once you realize your DB is all about
           | resource constraints and connections aren't free then the
           | optimization becomes pretty simple. Use as few connections as
           | possible to get whatever work you need done done.
           | 
           | Once you view connections as a net cost to the system the
           | math becomes simple. The network isn't infinite, DB hardware
           | isn't infinite. At the end of the day, more connections ==
           | more load on the system as a whole due to management.
           | 
           | This will also lead you towards good DB management. If you
           | want to be able to scale and respond to things, then the
           | right thing to do is keep you Datasets as small as possible
           | and stop multiple apps from using the same DB hardware. Use a
           | fleet of dbs rather than one big db and you'll have a lot
           | better time of scaling.
           | 
           | I say this as someone that currently works at a company where
           | we have everything on a small set of big DB hardware. We are
           | currently running into all the problems of hardware and
           | network constraints that are causing us major issues.
        
           | [deleted]
        
       | cxcorp wrote:
       | Good read, I've been doing some work related to database
       | bottlenecks recently and there's definitely food for thought in
       | the article.
       | 
       | On a sidenote, I wonder if this part has a typo?
       | 
       | > These systems allow you to make as many database connections as
       | you want and manage them inefficiently
       | 
       | Perhaps inefficiently should be inexpensively or efficiently?
        
         | sudhirj wrote:
         | Have edited to "These systems allow you to make as many
         | database connections as you want without worrying about
         | management, because the connections..."
        
       | est wrote:
       | One thing I never understand about connection pool is how do they
       | handle server-sent close connection (FIN) events? Like I can
       | guarantee there are tons of CLOSE_WAIT connections on my near-
       | idle Python process for various of db pools, mysql or redis or
       | pg.
       | 
       | For sqlalchemy it just complaints "MySQL server has gone away
       | (error(32, 'Broken pipe'))" and never know how to recover. For
       | redis it just 100% the CPU and keep epoll the dead TCP.
       | 
       | Example: https://github.com/andymccurdy/redis-py/pull/886
        
         | sudhirj wrote:
         | I'm not familiar with specific implementations, but from what
         | I've seen with Redis and Postgres clients there's always a
         | reconnect hook of some sort, which I assume pooling proxies
         | give you as well. We've had plenty of broken pipe errors and we
         | just reconnect.
        
         | outworlder wrote:
         | > Like I can guarantee there are tons of CLOSE_WAIT connections
         | on my near-idle Python process for various of db pools, mysql
         | or redis or pg.
         | 
         | Generally your client should close the connection when it's
         | done. Server-side close should mostly be related to idle
         | timeouts.
        
         | hilbertseries wrote:
         | Generally a connection pool also checks the health of its
         | connections. In SQL connection pools this can be done by
         | executing a select 1, if this fails then a new connection can
         | be attempted.
        
       ___________________________________________________________________
       (page generated 2021-01-05 23:02 UTC)