[HN Gopher] Understanding Connections and Pools
___________________________________________________________________
Understanding Connections and Pools
Author : sudhirj
Score : 194 points
Date : 2021-01-05 12:03 UTC (10 hours ago)
(HTM) web link (sudhir.io)
(TXT) w3m dump (sudhir.io)
| orware wrote:
| Not that many people may be using Oracle and PHP together, but
| last summer I decided to take a closer look at some performance
| aspects since I'm wanting to move certain PHP things to the
| cloud, while our Oracle database remains local, which would
| introduce some extra latency.
|
| In some basic testing on my local machine at work just connecting
| to our local Oracle database introduced a good 60 ms - 100 ms for
| a request (before running any sort of queries) and due to PHP's
| nature this recurs on every new request.
|
| I had searched for options for a long time in past months/years
| that would allow PHP to have some sort of connection pool the
| same way languages like Java/Go might allow, but had come up
| short each time.
|
| I don't even recall how I came across it, but last July I finally
| learned about Oracle's Database Resident Connection Pooling
| feature (and then reminded my coworker about wanting to try it
| out in November) and was able to run some tests and it does a
| good job of eliminating most of that unnecessary connection
| waiting time for PHP applications so I was pretty happy with that
| testing and it didn't seem to increase the load much at all on
| the database server side.
|
| Even though it might not be as ideal as a truly separate
| Connection Pool being managed at the application server level, or
| outside of the database, a Connection Pool available for use at
| the database level still has benefits so it would be kind of neat
| to maybe see more of this sort of approach be available with
| other vendors (if it isn't already).
|
| EDIT: Forgot to add that as a result of the DRCP testing, even
| though moving the PHP side to the cloud introduces extra latency,
| with the time savings of the DRCP change, it helps reduce latency
| enough that overall it results in a time savings over the
| normal/conventional connection approach when the PHP code is
| local so for me that was a win (hopefully I'll get to try using
| it in production in the near future, but waiting on my colleague
| to sort out a few things with enabling it in production at the
| moment).
| [deleted]
| kohlerm wrote:
| IMHO a very good summary of the trade offs, with concrete
| examples
| sudhirj wrote:
| Author here, self-posted. Can AMA.
| artimaeis wrote:
| Just 6 days ago you published what was a pretty great writeup
| on message queues, now this thorough dive on connections &
| pools; what's your writing schedule like? At some 12k words of
| fairly accurate and precise technical content I'm kind of blown
| away by your pace.
| sudhirj wrote:
| I had a few paragraphs written earlier (maybe half) over the
| course of the pandemic.
|
| I spent a lot of the pandemic procrastinating by trying to
| build my own blogging software, a more efficient version of
| Ghost, but it's too much duplicated effort given what I
| learned at AWS reinvent. So I decided to just set up and
| start publishing.
|
| I write for about two hours a day, maybe four days or writing
| and two of editing for each article. I'd like to publish once
| a week, but don't want to set that kind of goal because then
| I'll write even if I have nothing interesting to say. But
| maybe that's the best process anyway?
|
| Will tell you what's working if you ask again in a couple of
| months :-/
| johnonolan wrote:
| Founder of Ghost here - would love to hear what parts you
| were hoping to make more efficient / if you have any
| suggestions :)
| sudhirj wrote:
| I was trying to make it multi tenant, move it a
| serverless database like DynamoDB, and rewrite the server
| in Go.
|
| I'm trying to make a Ghost hosting service that's free /
| $5 a month for personal use. I've figured out a way to
| make the economics work with the existing 500MB disk +
| 1GB RAM requirements, but it would have been a lot easier
| with Go and Dynamo.
|
| Thanks a ton to you and the team for the work you're
| doing. I think you're making a fantastic system, and want
| to make it available to more people as a personal
| blogging platform. I just wish it could run multi tenant
| or with lower memory needs.
|
| I don't suppose making it easier for others to compete
| with the official Ghost revenue model is a priority, so I
| doubt this is anything you'll want to take up. But I'm
| happy to share revenues on this - I just think Ghost
| should be a lot more accessible to individuals around the
| world.
| johnonolan wrote:
| Not clear how those changes would make it more accessible
| or what problem you're actually trying to solve? There
| are lots of things we're working on improving but I
| promise you changing the programming language of any of
| those components wouldn't make any difference to the end
| experience of the software.
|
| A lot of blogging platforms (including Ghost) start out
| at $5/mo targeting personal use -- but the ones that
| survive never stay there. Probably some worthwhile
| takeaways from that :)
| sudhirj wrote:
| Yeah, the Go decision was more because of familiarity and
| some tooling I have available for Dynamo. The real
| problem I was tackling is zero marginal cost multi
| tenancy. Right now a new Ghost installation costs 500MB
| on a disk and 1GB in RAM, and I would like to to cost one
| row in an existing database instead.
|
| And yeah, don't think the $5 / month is sustainable for a
| company, but for a solo founder it might work. Of course,
| that might be a classic case of _this time it's
| different_.
| pqdbr wrote:
| Great article!
|
| We have a Rails app that uses Puma with MySQL, but so far
| single threaded (the Puma config has `threads 1,1`), because we
| migrated from Unicorn and are still worrying about thread-
| safety.
|
| Currently our database.yml pool size is set to 25, because we
| have 25 puma workers on the server.
|
| Reading your article, this seems to be overkill, since that
| setting is _per worker_, and since each worker is running only
| with 1 thread, the default 5 should be more than enough?
|
| Also, I just checked and our MySQL is configured with 151 max
| connections; if every worker of ours can use up to 25
| connections, that can lead up to 625 connections to the
| database, and that's without considering Sidekiq (concurrency
| 25).
| sudhirj wrote:
| Well, even the default 5 would probably be overkill, unless
| you're spawning threads on a request that each use their own
| DB connections. But yes, I think you can go with the the
| default. Once you're confident about being threadsafe, the
| Heroku deployment guide for puma is pretty good for all the
| little knobs to turn.
| sudhirj wrote:
| Re MySQL counts, based on your notes you're probably using 50
| conns at max whatever your settings are. 25 for sidekiq
| (assuming you do run a hot queue) and 25 at max for the
| servers. Irrespective of DB pool size, I think Rails does
| lazy connections, so you're never going to go over 1.
| mekoka wrote:
| I'm confused by this statement about pgBouncer et al.
|
| _These systems allow you to make as many database connections
| as you want and manage them inefficiently, because the
| connections they give you are fake & cheap simulated
| connections that they handle with low overhead._
|
| What makes them inefficient then, if they're cheap and have low
| management overhead?
| nickelbob wrote:
| I think he meant that "you" can manage (or not manage them at
| all) them inefficiently because the proxy pool well take care
| of that for you. It reads awkwardly.
| sudhirj wrote:
| Yeah, will edit that. But yes, you don't need to worry
| about managing them because they're so cheap.
| fotta wrote:
| ot: do plan on adding an RSS feed? Would love to subscribe.
| hansvm wrote:
| In case the author is worried about sufficient readership, I
| second this motion :)
| sudhirj wrote:
| I thought this blog engine already had one, I certainly
| didn't disable anything. Let me check.
| fotta wrote:
| the "Subscribe Now" button takes me to a login page
| sudhirj wrote:
| Yeah, that's the email subscription, substack style. Not
| RSS.
| sudhirj wrote:
| Yeah, it's active on https://sudhir.io/rss/ Pasting the base
| URL into a feed reader should usually check the /rss path as
| a convention.
| fotta wrote:
| fantastic, thank you!
| est wrote:
| > the popular servers for Python, PHP and other languages that
| use a global lock or assume single-threaded single-process
| execution
|
| This is not true for python. The article you linked mentions
| Gunicorn, uWSGI and CherryPy, they are threaded and can handle
| multiple requests per process.
|
| Even if Python does have this GIL thing, it does not lock
| network operations for a tcp/udp based database protocol (some
| C based modules do tend to lock the GIL)
| sudhirj wrote:
| Will edit that, yeah. I was am struggling to explain the GIL
| without having to talk about the GIL, because if open that
| can of worms it becomes a much longer article and goes off
| track.
|
| All of the GL and event loop based stacks go multi threaded a
| the lower library levels, really. Will think about how to
| explain that. And will move Python off that general category.
| sankyo wrote:
| Thank you for this. Very few people can explain complicated
| ideas with easy to follow language. You have a gift for
| explaining.
| sudhirj wrote:
| Thank you for saying that. I enjoy helping people understand
| things in minutes, especially if those things took me years.
|
| Being stuck at home the only way to keep doing that is
| writing and videos. And I have a pretty bad stammer so videos
| aren't an option.
| tjvc wrote:
| Great article, many thanks.
|
| > Rails auto-management needs to be conservative and cautious,
| therefore inefficient
|
| Can you help me understand this better? Are you saying that
| Rails tends to hold onto connections for longer than it needs
| to?
| toredash wrote:
| Am I reading this correctly, that with NodeJS and PostgreSQL,
| the usage of multiple connection poolers _combined_ is a not-
| so-good idea?
|
| Example, using Knex as the connection pooler for your node app,
| which is connected to your PgBouncer deployment?
|
| I have not considered this a fault, in one way I find it
| compelling to have a layer of "protection" by using Knex in
| addition to PgBouncer, but not I'm not that sure anymore.
| sudhirj wrote:
| It depends on how knex is used. If it gives you a method to
| run a DB query, and internally checks out a connection, runs
| the query and releases the connection, then it's already
| doing exactly what would happen with a proxy.
|
| If it or you are setting up middleware to start a transaction
| before every request, or put a checked out connection in the
| request context, that's a lot of possible inefficiency. If
| you really want the convenience this offers, and have request
| flows that do a lot of other non-DB work, you'll need a
| pooling proxy.
|
| If your request handling is very database heavy and does
| nothing else, then automatic request level connection check
| out is still efficient, even without a proxy. Even if you
| have a proxy one real connection is going to be working the
| entire duration of the request anyway.
| toredash wrote:
| Thanks for the feedback. I have some reading to do and some
| code to checkout.
| victor106 wrote:
| In depth article, would be good if you can cover Java based
| connection pool like HickariCP. They are very widely used
| sudhirj wrote:
| I've added a link to the project in the context of another
| comment here about deadlocks... is there anything in
| particular you'd like to mention? Will look into the pool in
| detail to see if there's any special tips and tricks they
| use.
| x4m wrote:
| Thanks for the post, great read! Just one minor note
| s/Odessey/Odyssey/g // From Odyssey maintainer
| sudhirj wrote:
| Thanks, will do.
| wiradikusuma wrote:
| Related, about sizing DB connection (counterintuitively):
| https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-...
| sudhirj wrote:
| This formula `pool size = Tn x (Cm - 1) + 1` is really
| interesting. Any idea how it's derived or what branch of math
| this is from?
| twic wrote:
| I think that's just arithmetic. The fundamental idea is that
| you have enough connections that you can guarantee that at
| least one thread has all the connections it needs - because
| if it does, it can proceed, finish, release its connections,
| and then another thread can pick them up and gets to proceed,
| etc.
|
| If a thread needs Cm connections, then Cm - 1 connections is
| enough to be one thread short of what you need. If you have
| Tn threads, then Tn * (Cm - 1) connections lets every thread,
| in the worst case, be one short of what it needs (if the
| connections are not evenly distributed, then at least one
| thread already has as many as it needs). So Tn * (Cm - 1) + 1
| connections means that at least one thread definitely has
| enough connections.
| sudhirj wrote:
| Ah, ok. Seems clear that this is for a thread requiring C
| connections _simultaneously_ to get its work done. That
| would make it a necessary calculation to avoid deadlocks. I
| don 't see this being necessary in a Rails type app where
| there's usually one connection per request, but if one were
| using Go, setting a max connections parameter, and then
| simultaneously checking out more than one connections to do
| work, there's an easy chance of deadlocks.
|
| Will add a section talking about that and this formula.
| twic wrote:
| > Seems clear that this is for a thread requiring C
| connections simultaneously to get its work done.
|
| Exactly. As that document says (my emphasis):
|
| > The prospect of "pool-locking" has been raised with
| respect to single actors _that acquire many connections_.
| This is largely an application-level issue. Yes,
| increasing the pool size can alleviate lockups in these
| scenarios, but we would urge you to examine first what
| can be done at the application level before enlarging the
| pool.
|
| > Where Tn is the maximum number of threads, and Cm is
| the maximum number of _simultaneous connections_ held by
| a single thread.
|
| To be honest, i think applications do this are broken
| ("largely an application-level issue"), so i'm not sure
| how important it is.
| ramchip wrote:
| I think latency also matters a lot for client-side DB
| connection pooling. If the DB is a 10ms round-trip away, and
| queries take 10ms, the process on the DB side is only really
| busy half the time.
| batter wrote:
| This article helped me to reduce connections pool from 80 to 10
| per box. Which helped to serve traffic spikes from couple
| requests per second to thousands per second.
| [deleted]
| calpaterson wrote:
| A few months ago I wrote an article testing out the relative
| speed of Python's async options for web apis/sites
| (https://calpaterson.com/async-python-is-not-faster.html).
|
| My findings were a bit controversial at the time as I found
| that uWSGI + sync Python is about the best option, largely
| because replacing interpreted Python code with native code is a
| very significant factor in the performance of a Python program.
|
| In the following discussion (and private emails) I was stunned
| by the number of async enthusiasts who proposed raising the
| number of database connections in the connection pool by two or
| three _orders of magnitude_ (I had 20 conns in a pool for a 4
| cpu machine) for reasons along the lines of "the async app is
| starved of connections".
|
| In my opinion that suggestion betrays a misunderstanding of
| what is likely going on in the database when you have 100s or
| 1000s of connections all contending for the same data: in SQL
| even reads create locks. Async applications yield _a lot_ , by
| design, and under hundreds or thousands of requests per second
| there is a considerable buildup of "work in progress" in the
| form of async tasks that were yielded from and which have not
| yet been returned to and completed. Many hundreds of database
| connections is going to create an enormous volume of
| bookkeeping on the database side and is very likely to slow
| things down to a absolute crawl.
|
| Even idle connections are known to have detrimental effect in
| postgres. Someone at Microsoft is apparently working on this
| and released this great blog post quantifying the problem:
|
| https://techcommunity.microsoft.com/t5/azure-database-for-po...
| spacemanmatt wrote:
| > Even idle connections are known to have detrimental effect
| in postgres.
|
| I have a personal backlog task to evaluate whether connection
| multiplexing benefits the idle connection cost issue. I wish
| it could get more priority, as I'm super curious, but my
| load/volume are so low for my work that it just doesn't
| matter yet.
| calpaterson wrote:
| > connection multiplexing
|
| What do you mean by that term?
| nine_k wrote:
| > _in SQL even reads create locks._
|
| IIRC, of all popular SQL databases, only MySQL uses locks at
| all. Postgres, MS SQL, Oracle all use MVCC.
|
| (Edited: typo)
| hilbertseries wrote:
| Innodb does not create locks for reads, unless in a
| transaction.
|
| > SELECT ... FROM is a consistent read, reading a snapshot
| of the database and setting no locks unless the transaction
| isolation level is set to SERIALIZABLE. For SERIALIZABLE
| level, the search sets shared next-key locks on the index
| records it encounters. However, only an index record lock
| is required for statements that lock rows using a unique
| index to search for a unique row.
|
| https://dev.mysql.com/doc/refman/5.7/en/innodb-locks-
| set.htm...
| calpaterson wrote:
| > Innodb does not create locks for reads, unless in a
| transaction.
|
| Right...but I have a feeling that most
| libraries/frameworks put you in a transaction by default
| and just rollback at the end of the request lifecycle.
| hilbertseries wrote:
| I've worked with many libraries and frameworks and I
| haven't seen transactions by default. Given how easy it
| is to deadlock yourself this way, for instance by naively
| making batch reads in random order. I doubt many
| libraries would make transactions the default.
|
| For instance in rails and django you need to explicitly
| specify a transaction. Can you give an example of a
| framework that turns off auto commit by default and
| instead runs it at the end of http requests?
| csharptwdec19 wrote:
| MSSQL will lock on Reads by default, and can(will) escalate
| locks to entire pages for reads.
|
| Making MVCC Mode 'Opt In' requires enabling Snapshot
| Isolation, and making it default requires another change.
| It's also worth noting the way MSSQL does MVCC is far more
| TempDB/Memory hungry than Oracle/Postgres (due to the
| nature of using snapshots). I've been in more than one shop
| that did not want to enable it due to the memory
| requirements.
| calpaterson wrote:
| MySQM? MySQL presumably? Anyway MVCC is implemented via
| locks (among other things, like copying, transaction ids,
| visibility maps, etc).
|
| Regardless of that, my main point is that even reading
| requires transactional bookkeeping in the database - which
| some people don't realise. If you read a row and, do
| nothing and then rollback a few ms later (common in
| webapps) there is still bookkeeping to be done.
| spacemanmatt wrote:
| But it shouldn't result in a write to disk unless a page
| is dirtied.
| calpaterson wrote:
| I wasn't talking about writes but as an aside: I wouldn't
| bet on that.
|
| SQL databases are complicated and writes can happen for
| many reasons - if you manage to bloat the working set by
| reading more stuff concurrently (old rows, etc) it's not
| hard to imagine something having to be paged out as a
| consequence.
| spacemanmatt wrote:
| Quite agreed. It's still way harder and way less frequent
| for read activity to force much write activity, although
| not hard to imagine.
| karmakaze wrote:
| Idle connections did show up as taking substantial db CPU
| time on one MySQL (Rails/Ruby) cluster I administered. We had
| thousands of connections each making a 'ping' request to
| check the validity of the connection every second. That was
| enough context switching load to have an noticeable load.
| commandlinefan wrote:
| When I was working at a large online travel website a few
| decades ago, we had a few hundred front-end boxes each with a
| connection pool set to (I think) around 10 each - so around
| 2000 connections on the DB backend. We did some profiling and
| discovered that, under the heaviest load, we never had more
| than one connection active per web server (and usually 0), so a
| single shared DB connection would have been just as effective
| as the pool was.
| sudhirj wrote:
| For a read-only load, right? If there were writes and
| transactions I don't see how that would work.
| Traubenfuchs wrote:
| Doesn't all of this become mindbogglingly complex once you
| factor in a) changing replica counts (connection count changes
| in steps of up to the max pool size) and b) multiple
| applications using the same DB hardware?
| cogman10 wrote:
| Ish but not really. Once you realize your DB is all about
| resource constraints and connections aren't free then the
| optimization becomes pretty simple. Use as few connections as
| possible to get whatever work you need done done.
|
| Once you view connections as a net cost to the system the
| math becomes simple. The network isn't infinite, DB hardware
| isn't infinite. At the end of the day, more connections ==
| more load on the system as a whole due to management.
|
| This will also lead you towards good DB management. If you
| want to be able to scale and respond to things, then the
| right thing to do is keep you Datasets as small as possible
| and stop multiple apps from using the same DB hardware. Use a
| fleet of dbs rather than one big db and you'll have a lot
| better time of scaling.
|
| I say this as someone that currently works at a company where
| we have everything on a small set of big DB hardware. We are
| currently running into all the problems of hardware and
| network constraints that are causing us major issues.
| [deleted]
| cxcorp wrote:
| Good read, I've been doing some work related to database
| bottlenecks recently and there's definitely food for thought in
| the article.
|
| On a sidenote, I wonder if this part has a typo?
|
| > These systems allow you to make as many database connections as
| you want and manage them inefficiently
|
| Perhaps inefficiently should be inexpensively or efficiently?
| sudhirj wrote:
| Have edited to "These systems allow you to make as many
| database connections as you want without worrying about
| management, because the connections..."
| est wrote:
| One thing I never understand about connection pool is how do they
| handle server-sent close connection (FIN) events? Like I can
| guarantee there are tons of CLOSE_WAIT connections on my near-
| idle Python process for various of db pools, mysql or redis or
| pg.
|
| For sqlalchemy it just complaints "MySQL server has gone away
| (error(32, 'Broken pipe'))" and never know how to recover. For
| redis it just 100% the CPU and keep epoll the dead TCP.
|
| Example: https://github.com/andymccurdy/redis-py/pull/886
| sudhirj wrote:
| I'm not familiar with specific implementations, but from what
| I've seen with Redis and Postgres clients there's always a
| reconnect hook of some sort, which I assume pooling proxies
| give you as well. We've had plenty of broken pipe errors and we
| just reconnect.
| outworlder wrote:
| > Like I can guarantee there are tons of CLOSE_WAIT connections
| on my near-idle Python process for various of db pools, mysql
| or redis or pg.
|
| Generally your client should close the connection when it's
| done. Server-side close should mostly be related to idle
| timeouts.
| hilbertseries wrote:
| Generally a connection pool also checks the health of its
| connections. In SQL connection pools this can be done by
| executing a select 1, if this fails then a new connection can
| be attempted.
___________________________________________________________________
(page generated 2021-01-05 23:02 UTC)