https://ivdl.co.za/2024/05/29/achieving-a-100x-speedup-of-deletes-on-postgresql/

Skip to content
Ian's notes
Menu

  * Home
  * SQL
  * Linux
  * Web development
  * About
  * Contact

Menu

Achieving a 100x speedup of DELETEs on PostgreSQL

Posted on 2024-05-292024-05-29 by Ian

"How did it get late so soon?"

[murray-campbell-B_TdfGFuGwA-unsplash-1024x768]Photo by Murray
Campbell on Unsplash

Performance

I recently had a query from a colleague regarding a slow DELETE query
on PostgreSQL. On the face of it, it was simple enough. It had no
CASCADE, and they were only deleting a few records at a time using a
subquery (in the absence of a LIMIT for DELETE). This small batch
approach was working, but it was too slow to keep up, resulting in a
growing backlog.

Before making any changes, I had a look at the performance to run the
query for a batch of 20 records. It took around 1400ms to complete.
After a bit of poking around, I got the same query to complete in
12ms. Here's how.

Background

PostgreSQL primary keys have indexes associated with them by default.
Because of this, people automatically assume the same of foreign
keys, and use them liberally. Unfortunately (although sensibly),
foreign keys are not indexed by default. That's the first caveat.

The second caveat is that, if a record being deleted is referenced
via a foreign key, PostgreSQL runs a trigger to check whether that
record's removal will result in a referential integrity violation.
This happens per record, per reference. If you are removing 100
records, each with two references, PostgreSQL runs 200 triggers each
with a function call associated with them. Those triggers need to
check whether the primary key being deleted exists in the list of
foreign keys in the related table, and if that list is long and
unindexed, it can take a while.

Consider the case above of 100 records and 200 triggers: if the list
of foreign keys contains a million records and no index, you're
running a sequential scan of (worst case) the whole million
references, and you're doing that 200 times.

An EXPLAIN ANALYZE can quickly highlight this, if you suspect you
have a similar problem.

Changes

As it happens, my colleague's two foreign keys were unindexed, and
the backlog was large. Because the backlog was growing, the queries
were running increasingly slower as the sequential scans had more
work to do.

I added the necessary indexes, which took about forty seconds each
with the CONCURRENTLY flag set, to avoid interrupting the production
workload, and got to the 12ms figure I mentioned earlier. A couple of
thousand records didn't take much longer either, and the backlog was
rapidly cleared.

Post navigation

- Estimating the disk space needed for a VACUUM FULL on PostgreSQL

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked 
*

          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
Comment * [                                             ]

Name * [                              ]

Email * [                              ]

Website [                              ]

[Post Comment] 

 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
D[                                             ] 

Profiles elsewhere

  * LinkedIn  Connect with me!
  * ORCiD  Read my research
  * Github  Modify my code
  * Shutterstock  View my photos

Established in 2020, this site serves as a memory bank of notes and
guides for me to reference again down the line, made public with the
knowledge that others are likely to encounter simillar situations,
especially in niche environments.

These notes tend towards writeups of esoteric encounters with
obstinate parts of well known technologies and convenient
aggregations of information available elsewhere, but may also include
other interests of mine such as photography, art or retro software.

If you found the information here useful and would like to help keep
the site running, you're welcome to buy me a coffee. To follow along,
add the RSS feed to your reader.

Thanks for stopping by
Ian

Academic Publications

  * A comparison of sentiment analysis techniques in a parallel and
    distributed NoSQL environment (M.Sc. Thesis)
  * Design and evaluation of an artefact for realtime Twitter
    sentiment analysis: lessons learnt
  * A comparison of sentiment analysis techniques with a social media
    big data set
  * Challenges relating to the handling of big data

Highlights

  * Extending Django templating to create dynamic, nested templates
  * Using the Django _meta API to write generalisable code within the
    confines of the ORM
  * Using multiple databases for a single Django project
  * Microsoft 365 for Business: Check the info you entered. It
    doesn't match the info for this card. 
  * Setting up Shorewall for LXC container network bridging and
    routing
  * LXC containers quickstart guide

Others' notes

  * Futility Closet
  * Jan's Blog
  * The Copetti site
  * The Map is Mostly Water
  * The Old New Thing
  * Chemometrics and such
  * Stephen Wolfram Writings
  * Fourmilab

  * Achieving a 100x speedup of DELETEs on PostgreSQL
  * Estimating the disk space needed for a VACUUM FULL on PostgreSQL
  * PgPool-II: Unable to get session context
  * A room with a view of the PostgreSQL autovacuum
  * Synchronising a PostgreSQL identity sequence with the contents of
    the table
  * Beware of EuroDNS
  * Spoegwolf - Sterre
  * Ansible node unreachable
  * The infinitely patient vacuum - a case study of what happens when
    the PostgreSQL VACUUM never completes
  * Detecting VMWare vMotion migration events on Linux virtual
    machines

  * 2024
  * 2023
  * 2022
  * 2021
  * 2020

Search
[                    ]
Written by Human, not by AI
(c) 2024 Ian's notes | Powered by Minimalist Blog WordPress Theme