[HN Gopher] Tips for a healthier Postgres database (2021)
       ___________________________________________________________________
        
       Tips for a healthier Postgres database (2021)
        
       Author : craigkerstiens
       Score  : 166 points
       Date   : 2023-02-10 00:45 UTC (1 days ago)
        
 (HTM) web link (www.crunchydata.com)
 (TXT) w3m dump (www.crunchydata.com)
        
       | efxhoy wrote:
       | One thing I really like about RDS is the query visualizer. It
       | shows which queries ran when and for how long and what they spent
       | their time waiting for, CPU or disk or locks or whatever. It's
       | really good. Are there any free tools that show a similar view?
       | 
       | edit: i think it's called performance insights in RDS
        
         | ekkeke wrote:
         | Would be a whole lot better if it didn't prune their query logs
         | to x (I think 1000?) characters.
         | 
         | We autogenerate many of our own queries which can have
         | significant complexity (regularly over 10 joins, sometimes up
         | to 30!) and our infrastructure isn't quite there yet to be able
         | to recreate the exact query plan a customer saw on their own
         | data without a lot of work. It could all be so much simpler, so
         | if there is a setting to prevent this please tell me!
        
           | taspeotis wrote:
           | https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_.
           | ..
        
           | kaveet wrote:
           | You might be looking for the track_activity_query_size
           | parameter:                 SHOW track_activity_query_size;
        
         | torvald wrote:
         | Before we jumped on all kinds of SaaS offerings we used
         | pgbadger.
         | 
         | https://github.com/darold/pgbadger
        
       | cynicalsecurity wrote:
       | That was a rather short article.
        
         | napsterbr wrote:
         | Which is a great thing. If this were on YouTube, we'd have a 20
         | minute video, with a long and unnecessary introduction and a
         | sneaky 1 minute ad at the middle of the video.
        
           | dexterdog wrote:
           | That's why sponsorblock exists
        
             | napsterbr wrote:
             | I'll definitely check this out, ty! Currently my
             | sponsorblock is my right thumb always ready to +10s as soon
             | as I smell bullshit.
        
       | didgetmaster wrote:
       | >There seems to be a common lifecycle of indexes within
       | applications. First you start off with almost none, maybe a few
       | on primary keys. Then you start adding them, one by one, two by
       | two, until you've got quite a few indexes for most any query you
       | can run. Something is slow? Throw an index at it. What you end up
       | with is some contention on overall throughput of your database,
       | and well a lot of indexes that became a tangled ball of yarn over
       | time.
       | 
       | I spent years doing index maintenance on a large corporate
       | database. Creating and rebuilding indexes was a hassle. So I set
       | about building a new kind of database engine where every column
       | in a table organizes its data for fast access (e.g. a columnar
       | store with no separate indexing structure).
       | 
       | I expected my system to be substantially faster than a Postgres
       | table without indexes for a broad range of queries. What I didn't
       | expect was for it to also be much faster for tables that were
       | highly indexed in Postgres. I ran tons of queries on both systems
       | and almost all of them were like this video:
       | 
       | https://www.youtube.com/watch?v=OVICKCkWMZE
        
         | CharlesW wrote:
         | You self-promote here often1 (not judging, just an
         | observation), and here are some thoughts that have occurred to
         | me while reading your posts and comments.
         | 
         | * Your vision for Didget is blurry. Sometimes it's a database
         | (or "general purpose data management system"), sometimes it's a
         | file system. Products that are dessert toppings _and_ floor
         | waxes rarely find purchase. My advice: Decide which problem
         | domain you 're playing in, then which problem(s) within that
         | domain this solves.
         | 
         | * The Postgres comparison seems naive. As the Postgres wiki
         | notes, "PostgreSQL ships with a basic configuration tuned for
         | wide compatibility _rather than performance_. Odds are good the
         | default parameters are very undersized for your system. " A
         | comparison with a system-appropriate Postgres configuration
         | would be more interesting.
         | 
         | * Promoting a new closed-source database seems anachronistic
         | when there's a plethora of fine open-source options. If you're
         | wondering why you're not getting many leads from HN, I'd
         | imagine this is the main reason.
         | 
         | 1
         | https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
        
         | MuffinFlavored wrote:
         | What's the gotcha other than "it's a homegrown database
         | engine"? Did you start from scratch? It can do JOINs and stuff?
        
           | Bogdanp wrote:
           | One potential hint is the Postgres queries are being executed
           | as sequential scans, as you can see from the explain output.
           | The first query only returns 200k rows (out of 7M), so even
           | without any config tweaks it should be using an index if one
           | is present. So, my guess is those queries are being made
           | against unindexed columns. Also, I'm guessing this is vanilla
           | Postgres, so it's possible OP hasn't tweaked its config for
           | the machine they're using, and the default config pg ships
           | with is designed for fairly resource-constrained machines by
           | today's standards. If the data is indexed, it's possible the
           | rows are all over the place, and clustering could help.
        
             | skunkworker wrote:
             | In addition, I wonder if random_page_cost is not set to be
             | 1. I've noticed on machines with SSD only pools that makes
             | the query planner much more deterministic and utilizes
             | indexes over sequential scans almost unused.
        
         | SoftTalker wrote:
         | > First you start off with almost none, maybe a few on primary
         | keys.
         | 
         | Primary keys are always indexed. Does the author really know
         | what he's talking about?
        
           | Petersipoi wrote:
           | The author clearly knows this based on the sentence you
           | quoted. I'm very confused as to what your complaint is.
           | 
           | > I have almost no fruit, just a few apples.
           | 
           | Indicates that I _do_ know that apples are a fruit. Not that
           | I don't.
        
           | dewey wrote:
           | What about tables without primary keys?
        
             | wielebny wrote:
             | These are usually stored in CSV files.
        
             | rzzzt wrote:
             | DBMS-s still have to identify a particular row somehow.
             | That identifier is CTID in Postgres and ROWID in Oracle.
             | Both will change if the data is moved around.
        
             | lgas wrote:
             | Generally not a good idea to have these at all.
        
             | [deleted]
        
           | eropple wrote:
           | A more generous interpretation would be the assumption that
           | primary keys are (as they frequently are) synthetic data, and
           | perhaps not the most commonly queried field aside from direct
           | get-that-resource queries.
        
             | [deleted]
        
         | chasil wrote:
         | In Oracle, Tom Kyte's advice is to never rebuild them, unless
         | there are reasons for structural change or physically
         | relocating the storage.
         | 
         | I have occasionally seen performance gains by rebuilding a
         | composite index with key compression, for example.
         | 
         | Indexes do tend to return to their natural size in heavy use,
         | so repeatedly shrinking them will just tax performance for a
         | temporary gain in storage.
         | 
         | https://asktom.oracle.com/pls/apex/f?p=100:11:::::P11_QUESTI...
        
       | pjdkoch wrote:
       | This is missing a [2021].
        
         | gregwebs wrote:
         | Here's a newer article with much the same content:
         | https://www.crunchydata.com/blog/exposing-postgres-performan...
        
       | gnfargbl wrote:
       | pg_stat_statements is useful, but it isn't completely free, e.g.
       | [1] (admittedly from pg12, which is three major versions ago).
       | 
       | Most people should probably leave it enabled, but if you have a
       | database with a very defined and understood query pattern, then
       | it might be worth running _perf_ and checking if it is costing
       | you.
       | 
       | [1] https://www.alibabacloud.com/blog/postgresql-v12-how-pg-
       | stat...
        
         | hummus_bae wrote:
         | Most modern processors run at a speed that makes it cheaper to
         | check than to not check. Some people might argue for lower-
         | level optimizations like this, but in this case, I don't think
         | it's a big deal.
         | 
         | Bottom line: enabling this doesn't hurt. It might help, and it
         | doesn't cost anything to check
        
       | mixmastamyk wrote:
       | What does crunchy cost? Page says to ask, typically meaning you
       | can't afford. Wonder if it is feasible for a small company. Is it
       | competitive with RDS?
        
         | BasedInfra wrote:
         | They have a hosted on major clouds service called crunchybridge
         | which seems reasonably priced.
         | 
         | https://www.crunchydata.com/pricing/calculator
        
         | candiddevmike wrote:
         | It's quite a bit more than RDS/cloud provider managed. I don't
         | see a reason to use them vs a cloud provider native service if
         | you're already in the cloud provider hotel california.
        
           | craigkerstiens wrote:
           | Product manager here from Crunchy, we aim to be quite in line
           | with any of the major cloud providers. A number of customers
           | have moved over from each of the major cloud providers to see
           | better price to performance and quality of support.
        
             | karmelapple wrote:
             | Just as a data point: we are planning to move to Crunchy
             | and away from Heroku soon. We tested the two, and we have
             | 4X improved performance on Crunchy Postgres vs Heroku
             | Postgres.
             | 
             | 4X. We were astounded. And so we're the Heroku folks I
             | talked to, who I'm guessing will want to try to bring
             | performance boosts.
             | 
             | We haven't done the switch quite yet because we're waiting
             | on Crunchy to bring out their Heroku Dataclips replacement,
             | which is something we rely on a lot. But once it's ready,
             | we're going to switch!
        
               | higeorge13 wrote:
               | Just curious, what's special with heroku dataclips? You
               | can host and use any bi tool which has similar
               | functionality (last one i used was redash)
        
             | mixmastamyk wrote:
             | Great. Wonder if you offer any a la carte consulting on
             | modeling, performance, etc? Wouldn't need it often, but
             | rather occasional "ask an expert" when when hitting a
             | bottleneck. This is a bit different than support.
        
         | doodlesdev wrote:
         | The starting prices for hobby instances [0] (i.e. development)
         | are about half of what you would pay for AWS RDS [1]. If you
         | choose to have high availability then prices are equal in this
         | case. This is for Crunchy Bridge, which has a calculator on the
         | website. Although, similar to Amazon RDS the prices increase
         | dramatically as you get a decent machine to run your database
         | and increase storage size. I'm using us-east-1 for the
         | comparison.
         | 
         | Honestly, as always, I think these managed database services
         | are way too expensive. If I really needed to choose one of them
         | I would go with the AWS service, because you also have to
         | factor in data egress costs in your cloud costs calculation
         | which will vary a lot depending on your workload.
         | 
         | [0]:
         | https://www.crunchydata.com/pricing/calculator?provider=aws&...
         | 
         | [1]: https://calculator.aws/#/addService/RDSPostgreSQL
        
         | whitepoplar wrote:
         | https://www.crunchydata.com/pricing/calculator
        
       | hyperman1 wrote:
       | Wondering about pgbouncer: A postgres connection can have state
       | attached, e.g. with 'set'. Does pgbouncer save and restore state,
       | with associated perf cost, or does any pooled connection just
       | receive the random state from the previous connection owner,
       | assuming all state should be identical?
        
         | sqlcommando wrote:
         | pgbouncer supports session and transaction pooling. With
         | session pooling a connection will be associated with the same
         | session throughout its use. With transaction pooling each
         | transaction on the connection could potentially get a different
         | session.
        
       ___________________________________________________________________
       (page generated 2023-02-11 23:01 UTC)