http://blog.crunchydata.com/blog/cleaning-up-your-postgres-database

Crunchy Data Home

  * Home
  * Cloud

    Crunchy Bridge Overview

    Crunchy Bridge delivers a fully managed cloud Postgres service
    available on multiple clouds so you can focus on your
    application, not your database.

    Documentation

    Read up on the capabilities provided by Crunchy Bridge, including
    supported extensions, pricing, and features.

    Get Started

    Get started with Crunchy Bridge by creating your account and
    provision your production ready database on the cloud of your
    choice.

    x
  * Products

    Featured Products

    Crunchy Bridge

    A fully managed cloud Postgres service that allows you to focus
    on your application, not your database.

    Crunchy PostgreSQL for Kubernetes

    Kubernetes-Native, containerized PostgreSQL-as-a-Service for your
    choice of public, private, or hybrid cloud.

    Crunchy High Availability PostgreSQL

    Integrated high-availability PostgreSQL solution for enterprises
    with "always on" data requirements.

    All Crunchy Products

    Crunchy Bridge Crunchy PostgreSQL for Kubernetes Crunchy High
    Availability PostgreSQL Crunchy Certified PostgreSQL Crunchy
    PostgreSQL for Cloud Foundry Crunchy MLS PostgreSQL Crunchy
    Spatial
    x
  * Services & Support
  * Learn

    News

    Press Releases

    Read up on the latest product launches and company news from
    Crunchy Data.

    Tutorials

    Katacoda Learning Portal

    Learn PostgreSQL by example with interactive courses designed by
    our experts.

    Documentation

    Software Documentation

    Full product documentation of your favorite PostgreSQL tools.

    x
  * Contact Us
  * Blog
  * Download
  * Login

[BlogImage_]

Cleaning Up Your Postgres Database

February 02, 2021 Craig Kerstiens
Crunchy Bridge

Last week I was on a call with someone giving an overview of Crunchy
Bridge, our multi-cloud fully managed database as a service. During
the call they asked about what was the best way to get a sense of how
their database was doing, a health check if you will. I mentioned a
few monitoring tools such as pganalyze and pgMonitor that were good
options, but that we could also dig in live and see how things were. 

The person I was talking to was very similar to many of the customers
we run databases for - an application developer, without a full time
DBA on staff. They needed a little extra guidance and support and
they were outgrowing their existing hosting provider, so we dug in to
do a bit of spring cleaning and assessment on their database. 

While I'm generally happy to spend a few minutes with anyone
interested to talk about how Crunchy could help and to dig into your
database with you, I'm also a fan of self-servicing. Here's a walk
through of each of the pieces we looked at (all within a few minutes)
to help give some actionable steps on how to improve their database
performance.

The high level metrics - cache and index hit ratio

The first thing you're going to want to look at is your cache hit
ratio and index hit ratio. Your cache hit ratio is going to give the
percentage of time your data is served from within memory vs. having
to go to disk. Generally serving data from memory vs. disk is going
to orders of magnitude faster, thus the more you can serve from
memory the better. For a typical web application making a lot of
short requests I'm going to target > 99% here. 

 

sql
SELECT
  'index hit rate' AS name,
  (sum(idx_blks_hit)) / nullif(sum(idx_blks_hit + idx_blks_read),0) AS ratio
FROM pg_statio_user_indexes
UNION ALL
SELECT
 'table hit rate' AS name,
  sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read),0) AS ratio
FROM pg_statio_user_tables;

In their case, the cache hit ratio looked good and healthy. Had it
been something like 80% or 90%, the solution is simple: get your
database more memory by upgrading to the next step up. 

From here we went ahead and took a look at the indexes and how often
they're used. When you query a database the default behavior is going
to be: scan the entire table to find records that match. This works
fine when your table is 10 records, but for any sizable application
scanning millions of records is simply not feasible. It is likely
you've added some indexes as you've gone along to speed things up. In
the below query you'll see how often you used an index when querying
each respective table:

 

sql
SELECT relname,
   CASE idx_scan
     WHEN 0 THEN 'Insufficient data'
     ELSE (100 * idx_scan / (seq_scan + idx_scan))::text
   END percent_of_times_index_used,
   n_live_tup rows_in_table
 FROM
   pg_stat_user_tables
 ORDER BY
   n_live_tup DESC;
        relname        | percent_of_times_index_used | rows_in_table
-----------------------+-----------------------------+---------------
 account               | 11                          |          5409
 deploy                | 69                          |         58276
 app                   | 93                          |          5345
 team                  | 98                          |          1801
 firewall_rule         | 9                           |           984
 ...

Personally I like use a rough guideline of:

  * Rows with greater than 1000 or 10000 records
  * Percentage of index used greater than 90%

Here I've got a couple of things that would be worth digging in
further to adding some indexes. The first is the account table. While
the account table only has 5,409 records it is likely to continue
growing and is clearly missing some needed indexes since only 11% of
the time an index is used when querying that table. The other table
of interest is the deploy table. As we can see the deploy table is
the largest one. The deploy table has some indexes as they are used
69% of the time, but some workflow in the table is querying and not
having an index that helps. 

Cleaning up unused indexes

Once we saw that their indexing was in good shape we shifted to what
we could clean-up. As an application grows and matures you likely
have some things you did early on that are now and unnecessary tax.
The thing to look for here is your unused indexes-these are indexes
that exist within the database but your queries are not utilizing
them. This means you're getting slower throughput on writes with well
no benefit from the index existing.

 

sql
SELECT
  schemaname || '.' || relname AS table,
  indexrelname AS index,
  pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
  idx_scan as index_scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT indisunique AND idx_scan < 50 AND pg_relation_size(relid) > 5 * 8192
ORDER BY pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST,
pg_relation_size(i.indexrelid) DESC;


 

While you want to go through removing these indexes with caution
you're usually quite safe to remove these extra indexes that have no
usage and are simply slowing down your throughput. 

Keep certain things outside Postgres

One very common habit when you are starting out is to put everything
in your database. This isn't all bad, Postgres is quite versatile and
can handle a lot of what you can throw at it. But some very specific
tables often can make sense in other data stores, if you have a table
called messages, logs, events in your Postgres database there is a
good chance it's consuming the bulk of your storage, is your largest
table, and could benefit from being moved outside of your main
Postgres database. 

Dig into query performance with pg_stat_statements

I've said for many years pg_stat_statements is one of the most useful
extensions for Postgres. It records queries that are run against your
database and a number of stats about them. It is a great place to go
once a month or so and look up what queries are slowest on average as
well as ones that consume the most total time against your system. 

We'll dig into pg_stat_statements more in the future, but for now if
you want a quick look this query can give you a preview:

 

SELECT 
  (total_time / 1000 / 60) as total_minutes, 
  (total_time/calls) as average_time, 
  query 
FROM pg_stat_statements 
ORDER BY 1 DESC 
LIMIT 50;

Here's to a healthy and speedy database

You don't need to understand how MVCC underlies Postgres, or how to
determine during a failover that it is safe to failover to have a
well tuned Postgres database. Much of the tuning and improvements
that can happen for your database aren't the result of transaction
logs and linux huge page tuning, rather it can be identifying N+1
queries, or just paying attention to wasteful things like "logs"
consuming space and not providing value. Walking through the above,
without advanced knowledge of Postgres should give you a good
starting point to clean up. 

And as always if you do find yourself wanting extra support and
guidance in your Postgres journey, we're always happy to chat here
@crunchydata.

  * Tweet
  * 
  * 

Using GitOps to Self-Manage Postgres in Kubernetes [BlogImage_] [] 
Postgres and Crypto Supply Chain of Integrity

Like what you're reading? Stay informed by subscribing for our
newsletter!

[                    ] 

Newsletter

Like what you're reading? Stay informed by subscribing for our
newsletter!

 

Read More

 

Crunchy News

Crunchy Data Solutions logo

  *  
  *  
  *  
  *  

  * Privacy Policy
  * (c) 2018-2021 Crunchy Data Solutions, Inc.

Products

  * Crunchy Bridge
  * Crunchy PostgreSQL for Kubernetes
  * Crunchy High Availability PostgreSQL
  * Crunchy Certified PostgreSQL
  * Crunchy PostgreSQL for Cloud Foundry
  * Crunchy MLS PostgreSQL
  * Crunchy Spatial

Services & Support

  * Enterprise PostgreSQL Support
  * Red Hat Partner
  * Trusted PostgreSQL
  * Crunchy Data Subscription

Resources

  * Customer Portal
  * Software Documentation
  * Blog
  * Events
  * Videos
  * DISA STIG for PostgreSQL
  * CIS Benchmark for PostgreSQL

Company

  * About Crunchy Data
  * News
  * Careers
  * Contact Us
  * Newsletter