http://blog.crunchydata.com/blog/cleaning-up-your-postgres-database Crunchy Data Home * Home * Cloud Crunchy Bridge Overview Crunchy Bridge delivers a fully managed cloud Postgres service available on multiple clouds so you can focus on your application, not your database. Documentation Read up on the capabilities provided by Crunchy Bridge, including supported extensions, pricing, and features. Get Started Get started with Crunchy Bridge by creating your account and provision your production ready database on the cloud of your choice. x * Products Featured Products Crunchy Bridge A fully managed cloud Postgres service that allows you to focus on your application, not your database. Crunchy PostgreSQL for Kubernetes Kubernetes-Native, containerized PostgreSQL-as-a-Service for your choice of public, private, or hybrid cloud. Crunchy High Availability PostgreSQL Integrated high-availability PostgreSQL solution for enterprises with "always on" data requirements. All Crunchy Products Crunchy Bridge Crunchy PostgreSQL for Kubernetes Crunchy High Availability PostgreSQL Crunchy Certified PostgreSQL Crunchy PostgreSQL for Cloud Foundry Crunchy MLS PostgreSQL Crunchy Spatial x * Services & Support * Learn News Press Releases Read up on the latest product launches and company news from Crunchy Data. Tutorials Katacoda Learning Portal Learn PostgreSQL by example with interactive courses designed by our experts. Documentation Software Documentation Full product documentation of your favorite PostgreSQL tools. x * Contact Us * Blog * Download * Login [BlogImage_] Cleaning Up Your Postgres Database February 02, 2021 Craig Kerstiens Crunchy Bridge Last week I was on a call with someone giving an overview of Crunchy Bridge, our multi-cloud fully managed database as a service. During the call they asked about what was the best way to get a sense of how their database was doing, a health check if you will. I mentioned a few monitoring tools such as pganalyze and pgMonitor that were good options, but that we could also dig in live and see how things were. The person I was talking to was very similar to many of the customers we run databases for - an application developer, without a full time DBA on staff. They needed a little extra guidance and support and they were outgrowing their existing hosting provider, so we dug in to do a bit of spring cleaning and assessment on their database. While I'm generally happy to spend a few minutes with anyone interested to talk about how Crunchy could help and to dig into your database with you, I'm also a fan of self-servicing. Here's a walk through of each of the pieces we looked at (all within a few minutes) to help give some actionable steps on how to improve their database performance. The high level metrics - cache and index hit ratio The first thing you're going to want to look at is your cache hit ratio and index hit ratio. Your cache hit ratio is going to give the percentage of time your data is served from within memory vs. having to go to disk. Generally serving data from memory vs. disk is going to orders of magnitude faster, thus the more you can serve from memory the better. For a typical web application making a lot of short requests I'm going to target > 99% here. sql SELECT 'index hit rate' AS name, (sum(idx_blks_hit)) / nullif(sum(idx_blks_hit + idx_blks_read),0) AS ratio FROM pg_statio_user_indexes UNION ALL SELECT 'table hit rate' AS name, sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read),0) AS ratio FROM pg_statio_user_tables; In their case, the cache hit ratio looked good and healthy. Had it been something like 80% or 90%, the solution is simple: get your database more memory by upgrading to the next step up. From here we went ahead and took a look at the indexes and how often they're used. When you query a database the default behavior is going to be: scan the entire table to find records that match. This works fine when your table is 10 records, but for any sizable application scanning millions of records is simply not feasible. It is likely you've added some indexes as you've gone along to speed things up. In the below query you'll see how often you used an index when querying each respective table: sql SELECT relname, CASE idx_scan WHEN 0 THEN 'Insufficient data' ELSE (100 * idx_scan / (seq_scan + idx_scan))::text END percent_of_times_index_used, n_live_tup rows_in_table FROM pg_stat_user_tables ORDER BY n_live_tup DESC; relname | percent_of_times_index_used | rows_in_table -----------------------+-----------------------------+--------------- account | 11 | 5409 deploy | 69 | 58276 app | 93 | 5345 team | 98 | 1801 firewall_rule | 9 | 984 ... Personally I like use a rough guideline of: * Rows with greater than 1000 or 10000 records * Percentage of index used greater than 90% Here I've got a couple of things that would be worth digging in further to adding some indexes. The first is the account table. While the account table only has 5,409 records it is likely to continue growing and is clearly missing some needed indexes since only 11% of the time an index is used when querying that table. The other table of interest is the deploy table. As we can see the deploy table is the largest one. The deploy table has some indexes as they are used 69% of the time, but some workflow in the table is querying and not having an index that helps. Cleaning up unused indexes Once we saw that their indexing was in good shape we shifted to what we could clean-up. As an application grows and matures you likely have some things you did early on that are now and unnecessary tax. The thing to look for here is your unused indexes-these are indexes that exist within the database but your queries are not utilizing them. This means you're getting slower throughput on writes with well no benefit from the index existing. sql SELECT schemaname || '.' || relname AS table, indexrelname AS index, pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size, idx_scan as index_scans FROM pg_stat_user_indexes ui JOIN pg_index i ON ui.indexrelid = i.indexrelid WHERE NOT indisunique AND idx_scan < 50 AND pg_relation_size(relid) > 5 * 8192 ORDER BY pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST, pg_relation_size(i.indexrelid) DESC; While you want to go through removing these indexes with caution you're usually quite safe to remove these extra indexes that have no usage and are simply slowing down your throughput. Keep certain things outside Postgres One very common habit when you are starting out is to put everything in your database. This isn't all bad, Postgres is quite versatile and can handle a lot of what you can throw at it. But some very specific tables often can make sense in other data stores, if you have a table called messages, logs, events in your Postgres database there is a good chance it's consuming the bulk of your storage, is your largest table, and could benefit from being moved outside of your main Postgres database. Dig into query performance with pg_stat_statements I've said for many years pg_stat_statements is one of the most useful extensions for Postgres. It records queries that are run against your database and a number of stats about them. It is a great place to go once a month or so and look up what queries are slowest on average as well as ones that consume the most total time against your system. We'll dig into pg_stat_statements more in the future, but for now if you want a quick look this query can give you a preview: SELECT (total_time / 1000 / 60) as total_minutes, (total_time/calls) as average_time, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 50; Here's to a healthy and speedy database You don't need to understand how MVCC underlies Postgres, or how to determine during a failover that it is safe to failover to have a well tuned Postgres database. Much of the tuning and improvements that can happen for your database aren't the result of transaction logs and linux huge page tuning, rather it can be identifying N+1 queries, or just paying attention to wasteful things like "logs" consuming space and not providing value. Walking through the above, without advanced knowledge of Postgres should give you a good starting point to clean up. And as always if you do find yourself wanting extra support and guidance in your Postgres journey, we're always happy to chat here @crunchydata. * Tweet * * Using GitOps to Self-Manage Postgres in Kubernetes [BlogImage_] [] Postgres and Crypto Supply Chain of Integrity Like what you're reading? Stay informed by subscribing for our newsletter! [ ] Newsletter Like what you're reading? Stay informed by subscribing for our newsletter! Read More Crunchy News Crunchy Data Solutions logo * * * * * Privacy Policy * (c) 2018-2021 Crunchy Data Solutions, Inc. Products * Crunchy Bridge * Crunchy PostgreSQL for Kubernetes * Crunchy High Availability PostgreSQL * Crunchy Certified PostgreSQL * Crunchy PostgreSQL for Cloud Foundry * Crunchy MLS PostgreSQL * Crunchy Spatial Services & Support * Enterprise PostgreSQL Support * Red Hat Partner * Trusted PostgreSQL * Crunchy Data Subscription Resources * Customer Portal * Software Documentation * Blog * Events * Videos * DISA STIG for PostgreSQL * CIS Benchmark for PostgreSQL Company * About Crunchy Data * News * Careers * Contact Us * Newsletter