https://ivdl.co.za/2024/05/29/achieving-a-100x-speedup-of-deletes-on-postgresql/ Skip to content Ian's notes Menu * Home * SQL * Linux * Web development * About * Contact Menu Achieving a 100x speedup of DELETEs on PostgreSQL Posted on 2024-05-292024-05-29 by Ian "How did it get late so soon?" [murray-campbell-B_TdfGFuGwA-unsplash-1024x768]Photo by Murray Campbell on Unsplash Performance I recently had a query from a colleague regarding a slow DELETE query on PostgreSQL. On the face of it, it was simple enough. It had no CASCADE, and they were only deleting a few records at a time using a subquery (in the absence of a LIMIT for DELETE). This small batch approach was working, but it was too slow to keep up, resulting in a growing backlog. Before making any changes, I had a look at the performance to run the query for a batch of 20 records. It took around 1400ms to complete. After a bit of poking around, I got the same query to complete in 12ms. Here's how. Background PostgreSQL primary keys have indexes associated with them by default. Because of this, people automatically assume the same of foreign keys, and use them liberally. Unfortunately (although sensibly), foreign keys are not indexed by default. That's the first caveat. The second caveat is that, if a record being deleted is referenced via a foreign key, PostgreSQL runs a trigger to check whether that record's removal will result in a referential integrity violation. This happens per record, per reference. If you are removing 100 records, each with two references, PostgreSQL runs 200 triggers each with a function call associated with them. Those triggers need to check whether the primary key being deleted exists in the list of foreign keys in the related table, and if that list is long and unindexed, it can take a while. Consider the case above of 100 records and 200 triggers: if the list of foreign keys contains a million records and no index, you're running a sequential scan of (worst case) the whole million references, and you're doing that 200 times. An EXPLAIN ANALYZE can quickly highlight this, if you suspect you have a similar problem. Changes As it happens, my colleague's two foreign keys were unindexed, and the backlog was large. Because the backlog was growing, the queries were running increasingly slower as the sequential scans had more work to do. I added the necessary indexes, which took about forty seconds each with the CONCURRENTLY flag set, to avoid interrupting the production workload, and got to the 12ms figure I mentioned earlier. A couple of thousand records didn't take much longer either, and the backlog was rapidly cleared. Post navigation - Estimating the disk space needed for a VACUUM FULL on PostgreSQL Leave a Reply Cancel reply Your email address will not be published. Required fields are marked * [ ] [ ] [ ] [ ] [ ] [ ] [ ] Comment * [ ] Name * [ ] Email * [ ] Website [ ] [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] Profiles elsewhere * LinkedIn Connect with me! * ORCiD Read my research * Github Modify my code * Shutterstock View my photos Established in 2020, this site serves as a memory bank of notes and guides for me to reference again down the line, made public with the knowledge that others are likely to encounter simillar situations, especially in niche environments. These notes tend towards writeups of esoteric encounters with obstinate parts of well known technologies and convenient aggregations of information available elsewhere, but may also include other interests of mine such as photography, art or retro software. If you found the information here useful and would like to help keep the site running, you're welcome to buy me a coffee. To follow along, add the RSS feed to your reader. Thanks for stopping by Ian Academic Publications * A comparison of sentiment analysis techniques in a parallel and distributed NoSQL environment (M.Sc. Thesis) * Design and evaluation of an artefact for realtime Twitter sentiment analysis: lessons learnt * A comparison of sentiment analysis techniques with a social media big data set * Challenges relating to the handling of big data Highlights * Extending Django templating to create dynamic, nested templates * Using the Django _meta API to write generalisable code within the confines of the ORM * Using multiple databases for a single Django project * Microsoft 365 for Business: Check the info you entered. It doesn't match the info for this card. * Setting up Shorewall for LXC container network bridging and routing * LXC containers quickstart guide Others' notes * Futility Closet * Jan's Blog * The Copetti site * The Map is Mostly Water * The Old New Thing * Chemometrics and such * Stephen Wolfram Writings * Fourmilab * Achieving a 100x speedup of DELETEs on PostgreSQL * Estimating the disk space needed for a VACUUM FULL on PostgreSQL * PgPool-II: Unable to get session context * A room with a view of the PostgreSQL autovacuum * Synchronising a PostgreSQL identity sequence with the contents of the table * Beware of EuroDNS * Spoegwolf - Sterre * Ansible node unreachable * The infinitely patient vacuum - a case study of what happens when the PostgreSQL VACUUM never completes * Detecting VMWare vMotion migration events on Linux virtual machines * 2024 * 2023 * 2022 * 2021 * 2020 Search [ ] Written by Human, not by AI (c) 2024 Ian's notes | Powered by Minimalist Blog WordPress Theme