[HN Gopher] Tips for a healthier Postgres database (2021)
___________________________________________________________________
Tips for a healthier Postgres database (2021)
Author : craigkerstiens
Score : 166 points
Date : 2023-02-10 00:45 UTC (1 days ago)
(HTM) web link (www.crunchydata.com)
(TXT) w3m dump (www.crunchydata.com)
| efxhoy wrote:
| One thing I really like about RDS is the query visualizer. It
| shows which queries ran when and for how long and what they spent
| their time waiting for, CPU or disk or locks or whatever. It's
| really good. Are there any free tools that show a similar view?
|
| edit: i think it's called performance insights in RDS
| ekkeke wrote:
| Would be a whole lot better if it didn't prune their query logs
| to x (I think 1000?) characters.
|
| We autogenerate many of our own queries which can have
| significant complexity (regularly over 10 joins, sometimes up
| to 30!) and our infrastructure isn't quite there yet to be able
| to recreate the exact query plan a customer saw on their own
| data without a lot of work. It could all be so much simpler, so
| if there is a setting to prevent this please tell me!
| taspeotis wrote:
| https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_.
| ..
| kaveet wrote:
| You might be looking for the track_activity_query_size
| parameter: SHOW track_activity_query_size;
| torvald wrote:
| Before we jumped on all kinds of SaaS offerings we used
| pgbadger.
|
| https://github.com/darold/pgbadger
| cynicalsecurity wrote:
| That was a rather short article.
| napsterbr wrote:
| Which is a great thing. If this were on YouTube, we'd have a 20
| minute video, with a long and unnecessary introduction and a
| sneaky 1 minute ad at the middle of the video.
| dexterdog wrote:
| That's why sponsorblock exists
| napsterbr wrote:
| I'll definitely check this out, ty! Currently my
| sponsorblock is my right thumb always ready to +10s as soon
| as I smell bullshit.
| didgetmaster wrote:
| >There seems to be a common lifecycle of indexes within
| applications. First you start off with almost none, maybe a few
| on primary keys. Then you start adding them, one by one, two by
| two, until you've got quite a few indexes for most any query you
| can run. Something is slow? Throw an index at it. What you end up
| with is some contention on overall throughput of your database,
| and well a lot of indexes that became a tangled ball of yarn over
| time.
|
| I spent years doing index maintenance on a large corporate
| database. Creating and rebuilding indexes was a hassle. So I set
| about building a new kind of database engine where every column
| in a table organizes its data for fast access (e.g. a columnar
| store with no separate indexing structure).
|
| I expected my system to be substantially faster than a Postgres
| table without indexes for a broad range of queries. What I didn't
| expect was for it to also be much faster for tables that were
| highly indexed in Postgres. I ran tons of queries on both systems
| and almost all of them were like this video:
|
| https://www.youtube.com/watch?v=OVICKCkWMZE
| CharlesW wrote:
| You self-promote here often1 (not judging, just an
| observation), and here are some thoughts that have occurred to
| me while reading your posts and comments.
|
| * Your vision for Didget is blurry. Sometimes it's a database
| (or "general purpose data management system"), sometimes it's a
| file system. Products that are dessert toppings _and_ floor
| waxes rarely find purchase. My advice: Decide which problem
| domain you 're playing in, then which problem(s) within that
| domain this solves.
|
| * The Postgres comparison seems naive. As the Postgres wiki
| notes, "PostgreSQL ships with a basic configuration tuned for
| wide compatibility _rather than performance_. Odds are good the
| default parameters are very undersized for your system. " A
| comparison with a system-appropriate Postgres configuration
| would be more interesting.
|
| * Promoting a new closed-source database seems anachronistic
| when there's a plethora of fine open-source options. If you're
| wondering why you're not getting many leads from HN, I'd
| imagine this is the main reason.
|
| 1
| https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
| MuffinFlavored wrote:
| What's the gotcha other than "it's a homegrown database
| engine"? Did you start from scratch? It can do JOINs and stuff?
| Bogdanp wrote:
| One potential hint is the Postgres queries are being executed
| as sequential scans, as you can see from the explain output.
| The first query only returns 200k rows (out of 7M), so even
| without any config tweaks it should be using an index if one
| is present. So, my guess is those queries are being made
| against unindexed columns. Also, I'm guessing this is vanilla
| Postgres, so it's possible OP hasn't tweaked its config for
| the machine they're using, and the default config pg ships
| with is designed for fairly resource-constrained machines by
| today's standards. If the data is indexed, it's possible the
| rows are all over the place, and clustering could help.
| skunkworker wrote:
| In addition, I wonder if random_page_cost is not set to be
| 1. I've noticed on machines with SSD only pools that makes
| the query planner much more deterministic and utilizes
| indexes over sequential scans almost unused.
| SoftTalker wrote:
| > First you start off with almost none, maybe a few on primary
| keys.
|
| Primary keys are always indexed. Does the author really know
| what he's talking about?
| Petersipoi wrote:
| The author clearly knows this based on the sentence you
| quoted. I'm very confused as to what your complaint is.
|
| > I have almost no fruit, just a few apples.
|
| Indicates that I _do_ know that apples are a fruit. Not that
| I don't.
| dewey wrote:
| What about tables without primary keys?
| wielebny wrote:
| These are usually stored in CSV files.
| rzzzt wrote:
| DBMS-s still have to identify a particular row somehow.
| That identifier is CTID in Postgres and ROWID in Oracle.
| Both will change if the data is moved around.
| lgas wrote:
| Generally not a good idea to have these at all.
| [deleted]
| eropple wrote:
| A more generous interpretation would be the assumption that
| primary keys are (as they frequently are) synthetic data, and
| perhaps not the most commonly queried field aside from direct
| get-that-resource queries.
| [deleted]
| chasil wrote:
| In Oracle, Tom Kyte's advice is to never rebuild them, unless
| there are reasons for structural change or physically
| relocating the storage.
|
| I have occasionally seen performance gains by rebuilding a
| composite index with key compression, for example.
|
| Indexes do tend to return to their natural size in heavy use,
| so repeatedly shrinking them will just tax performance for a
| temporary gain in storage.
|
| https://asktom.oracle.com/pls/apex/f?p=100:11:::::P11_QUESTI...
| pjdkoch wrote:
| This is missing a [2021].
| gregwebs wrote:
| Here's a newer article with much the same content:
| https://www.crunchydata.com/blog/exposing-postgres-performan...
| gnfargbl wrote:
| pg_stat_statements is useful, but it isn't completely free, e.g.
| [1] (admittedly from pg12, which is three major versions ago).
|
| Most people should probably leave it enabled, but if you have a
| database with a very defined and understood query pattern, then
| it might be worth running _perf_ and checking if it is costing
| you.
|
| [1] https://www.alibabacloud.com/blog/postgresql-v12-how-pg-
| stat...
| hummus_bae wrote:
| Most modern processors run at a speed that makes it cheaper to
| check than to not check. Some people might argue for lower-
| level optimizations like this, but in this case, I don't think
| it's a big deal.
|
| Bottom line: enabling this doesn't hurt. It might help, and it
| doesn't cost anything to check
| mixmastamyk wrote:
| What does crunchy cost? Page says to ask, typically meaning you
| can't afford. Wonder if it is feasible for a small company. Is it
| competitive with RDS?
| BasedInfra wrote:
| They have a hosted on major clouds service called crunchybridge
| which seems reasonably priced.
|
| https://www.crunchydata.com/pricing/calculator
| candiddevmike wrote:
| It's quite a bit more than RDS/cloud provider managed. I don't
| see a reason to use them vs a cloud provider native service if
| you're already in the cloud provider hotel california.
| craigkerstiens wrote:
| Product manager here from Crunchy, we aim to be quite in line
| with any of the major cloud providers. A number of customers
| have moved over from each of the major cloud providers to see
| better price to performance and quality of support.
| karmelapple wrote:
| Just as a data point: we are planning to move to Crunchy
| and away from Heroku soon. We tested the two, and we have
| 4X improved performance on Crunchy Postgres vs Heroku
| Postgres.
|
| 4X. We were astounded. And so we're the Heroku folks I
| talked to, who I'm guessing will want to try to bring
| performance boosts.
|
| We haven't done the switch quite yet because we're waiting
| on Crunchy to bring out their Heroku Dataclips replacement,
| which is something we rely on a lot. But once it's ready,
| we're going to switch!
| higeorge13 wrote:
| Just curious, what's special with heroku dataclips? You
| can host and use any bi tool which has similar
| functionality (last one i used was redash)
| mixmastamyk wrote:
| Great. Wonder if you offer any a la carte consulting on
| modeling, performance, etc? Wouldn't need it often, but
| rather occasional "ask an expert" when when hitting a
| bottleneck. This is a bit different than support.
| doodlesdev wrote:
| The starting prices for hobby instances [0] (i.e. development)
| are about half of what you would pay for AWS RDS [1]. If you
| choose to have high availability then prices are equal in this
| case. This is for Crunchy Bridge, which has a calculator on the
| website. Although, similar to Amazon RDS the prices increase
| dramatically as you get a decent machine to run your database
| and increase storage size. I'm using us-east-1 for the
| comparison.
|
| Honestly, as always, I think these managed database services
| are way too expensive. If I really needed to choose one of them
| I would go with the AWS service, because you also have to
| factor in data egress costs in your cloud costs calculation
| which will vary a lot depending on your workload.
|
| [0]:
| https://www.crunchydata.com/pricing/calculator?provider=aws&...
|
| [1]: https://calculator.aws/#/addService/RDSPostgreSQL
| whitepoplar wrote:
| https://www.crunchydata.com/pricing/calculator
| hyperman1 wrote:
| Wondering about pgbouncer: A postgres connection can have state
| attached, e.g. with 'set'. Does pgbouncer save and restore state,
| with associated perf cost, or does any pooled connection just
| receive the random state from the previous connection owner,
| assuming all state should be identical?
| sqlcommando wrote:
| pgbouncer supports session and transaction pooling. With
| session pooling a connection will be associated with the same
| session throughout its use. With transaction pooling each
| transaction on the connection could potentially get a different
| session.
___________________________________________________________________
(page generated 2023-02-11 23:01 UTC)