[HN Gopher] Don't rely on IF NOT EXISTS for concurrent index cre...
       ___________________________________________________________________
        
       Don't rely on IF NOT EXISTS for concurrent index creation in
       PostgreSQL
        
       Author : shayonj
       Score  : 54 points
       Date   : 2024-08-12 18:55 UTC (4 hours ago)
        
 (HTM) web link (www.shayon.dev)
 (TXT) w3m dump (www.shayon.dev)
        
       | londons_explore wrote:
       | Other long-running operations use transactions... Do the
       | operation on a snapshot of the data, and then evaluate if it can
       | be committed.
       | 
       | Why isn't index creation done the same way?
        
         | AmericanChopper wrote:
         | CREATE INDEX CONCURRENTLY is not a transactional statement. If
         | you want to create an index inside a transaction it will block
         | all writes to the table until the index is created, which can
         | take a rather long time on large tables.
        
         | singron wrote:
         | Concurrent index creation internally uses multiple
         | transactions, so it can't itself be wrapped in a transaction.
         | 
         | The docs explain the mechanics further:
         | https://www.postgresql.org/docs/current/sql-createindex.html...
        
           | zelphirkalt wrote:
           | That immediately brings up the next question: Why are
           | transactions not composable?
        
             | londons_explore wrote:
             | Ignoring everything but the outer transaction would be
             | technically correct - wonder why it isn't supported?
        
             | paulddraper wrote:
             | They are?
             | 
             | Your transaction isolation mode defines how they compose.
             | 
             | Unless you are asking something different.
        
         | paulddraper wrote:
         | The entire point of CONCURRENTLY is to _not_ be in a single
         | transaction.
         | 
         | So you can interleave writes and (partially complete) indexing.
         | 
         | If you want transactional index creation, just omit
         | CONCURRENTLY.
        
       | ars wrote:
       | "PostgreSQL doesn't automatically remove the partially created
       | index when the operation is aborted due to a lock timeout. This
       | is intentional, as it allows for potential recovery or manual
       | intervention"
       | 
       | This argument doesn't make much sense. What's the usefulness of a
       | partially created index? The entire rest of the article is
       | working hard to explain how this invalid index is 100% useless
       | and should be dropped!
       | 
       | So this seems like a bug in PostgreSQL to me.
        
         | shayonj wrote:
         | I went looking in the postgres code and took it more as a
         | "Feature request" where if the index building / table scanning,
         | etc is interrupted for concurrent builds, PG could drop the
         | index in the end. I can see how PG keeps it around however,
         | since invalid indexes can be useful for administrators.
         | 
         | However, for application developers - unless you are familiar
         | with the internals, this comes off as a big surprise.
         | 
         | Would love to learn more from folks who are more familiar with
         | the internals.
        
         | choilive wrote:
         | Index creation can take a very long time for a large database,
         | think hours or god forbid days. Restarting from scratch for an
         | index that takes 18 hours to build is very painful.. doubly so
         | back when concurrent index creation didn't exist.
        
           | whartung wrote:
           | Do what is the recovery/restart process for a partially
           | created index?
           | 
           | And how do you clean up after an aborted index build?
        
             | asabil wrote:
             | REINDEX INDEX CONCURRENTLY[1]
             | 
             | [1]: https://www.postgresql.org/docs/current/sql-
             | reindex.html
        
           | Groxx wrote:
           | Just to +1 this as a non-Postgres-user who briefly skimmed
           | the docs:                   REINDEX INDEX CONCURRENTLY
           | 
           | ^ it can be rebuilt.
           | 
           | I'm honestly not sure if this is saving anything except re-
           | _defining_ (i.e. the literal  "create index" statement), but
           | they seem definitely not _100%_ useless despite being
           | invalid. Maybe 99% useless, but  "drop it in the background"
           | doesn't seem particularly better either - leaving
           | intermediate state for an admin to tackle by hand seems
           | reasonable.
        
           | SoftTalker wrote:
           | If your database is that big then it should probably be
           | managed by someone competent to do so, not a bunch of
           | developers who only know ActiveRecord.
        
         | victorbjorklund wrote:
         | You can use reindex to rebuild an invalid index. But many
         | prefer to just drop it and try again (like the author does).
         | I'm sure there are scenarios where reindex is a much better
         | option so not a bug (maybe a bad default).
        
         | AmericanChopper wrote:
         | I'm pretty sure it's just because it allows you to retry the
         | index creation using REINDEX INDEX CONCURRENTLY, which allows
         | you to avoid the exclusive lock created by DROP INDEX, because
         | you can't always DROP INDEX CONCURRENTLY
         | 
         | https://www.postgresql.org/docs/current/sql-dropindex.html
        
         | merb wrote:
         | Actually the index can only fail because of a deadlock or a
         | unique failure. Deadlocks are rare and are probably an
         | Application bug and for unique failure you need to fix your
         | database anyway and you either need to drop and recreate it or
         | reinfect it (new stuff will use the index for unique checks
         | even in invalid state but not for queries)
        
       | andatki wrote:
       | Hey Shayon. IF NOT EXISTS or the Active Record version
       | "if_not_exists: true" can be pretty handy though when there's a
       | valid index in production, to help drive schema definition
       | consistency in all environments (prod, dev, CI, etc.). As you
       | pointed out though, it only makes sense when the indexes being
       | checked are valid.
       | 
       | In my experience on large tables that are "busy", sometimes
       | indexes need to be added manually first from a utility session,
       | perhaps inside tmux/screen that's detached from while they are
       | created. This could take hours for large tables. Then once done,
       | and the index is valid, an Active Record migration can be sent
       | out using "if_not_exists: true" to make sure it's applied
       | everywhere.
       | 
       | Your point that it could be misused unintentionally due to not
       | knowing an index is INVALID is a good one, and I feel it should
       | be part of how it works by default in Active Record. Had you
       | considered trying to propose that to rails/rails? I would
       | certainly support that PR (may be able to collaborate) and could
       | add more examples and validation.
        
         | shayonj wrote:
         | Hey Andrew! It's great to see you here. I agree with everything
         | you said, 100%. I'm honestly torn because I was the first one
         | to recommend `if_not_exists` to engineers, but now I'm
         | reconsidering because this can lead to very strange issues in
         | production, especially for ActiveRecord applications.
         | Maintaining an "invalid index reaper job" doesn't sound like a
         | good long-term solution.
         | 
         | I've suggested some proposals to Rails here and would love to
         | propose a PR based on the solution that makes the most sense,
         | too.
         | 
         | https://github.com/rails/rails/issues/52583
        
           | andatki wrote:
           | Do you do any alerting for INVALID indexes? For example, by
           | default PgHero will display them prominently and I believe
           | PgAnalyze does as well. My thought is to put the energy into
           | making INVALID indexes highly visible. Perhaps combined with
           | a process step. 1. Any CREATE INDEX CONCURRENTLY migrations
           | go out in their own deployment. 2. Any queries that depends
           | on that index being present means a PR has a process step
           | asking the author to verify that the index exists and is
           | valid. That way you wouldn't have to lose if_not_exists.
           | 
           | That all said, still would be cool to make this the default
           | in Active Record. Nice idea!
        
             | shayonj wrote:
             | Yeah, def, good call.
             | 
             | I have a job that runs nightly or weekly and performs a
             | concurrent reindex on tables, which nicely covers cases
             | like this.
             | 
             | Given that lock timeouts are very much "rescuable," I'm
             | also thinking about addressing the problem right at the
             | start (when adding migrations) by implementing retries.
             | This approach at least would eliminate the need for any
             | follow-up steps by developers and reduce additional
             | cognitive load.
             | 
             | The ideal goal being "The system just works"(tm) :D
        
               | andatki wrote:
               | Makes sense. Check out the implementation of automatic
               | lock timeout retries in Strong Migrations.
               | 
               | Lock Timeout Retries [experimental]
               | https://github.com/ankane/strong_migrations
        
       | limaoscarjuliet wrote:
       | Concurrent index creation or re-indexing is not all magic. It
       | does scan the table multiple times and might lead to deadlocks if
       | the table is heavily modified.
       | 
       | Furthermore, failed concurrent index or re-index will leave
       | invalid cursors behind that are not usable but incur the update
       | costs. Of course, they consume the disk space.
       | 
       | Use with caution.
        
       | dogsledsleddog wrote:
       | I hope I am missing something in this logic?:
       | 
       | Broken indexes are intentionally left because indexing can be a
       | long operation on a big db, so you may regret not just repairing
       | them..
       | 
       | Let's delete and recreate indexes all the time, just in case
       | because they might be left over..
       | 
       | Hope we don't encounter the reason they left broken indexes and
       | wait a few hours for a rebuild?
        
         | shayonj wrote:
         | Reindex starts fresh on a temp index and then does a swap in
         | the end - https://www.postgresql.org/docs/current/sql-
         | reindex.html
        
           | dogsledsleddog wrote:
           | That's very helpful, but I still don't get the
           | recommendation. The reindex seems to guarantee that any
           | useful index will be used, the deletion and recreation seems
           | to guarantee a period of bad performance as long as an index
           | takes to build even if the old one was fine.
           | 
           | (I.e. I could imagine a customer upgrading from a version
           | that already backported an index to the new major version
           | where it was added and deciding there's a performance
           | regression.)
        
       | kgeist wrote:
       | We generally ban any IF NOT EXISTS in our team (we use MySQL)
       | because it usually masks the real problem (why is the developer
       | trying to create a table which already exists in the test
       | environment? maybe there's a name conflict with another branch
       | etc.)
        
         | shayonj wrote:
         | exactly - i missed to talk about this in the post. I think
         | using IF NOT EXISTS can be a sign of a higher degree issue as
         | well. Being more mindful can help.
        
         | lisper wrote:
         | > why is the developer trying to create a table which already
         | exists
         | 
         | Maybe idempotency was a design goal.
        
           | kgeist wrote:
           | We have a special table which remembers which migrations were
           | already run. So creating a second migration which tries to
           | create the table again is a sign something is wrong. It's
           | probably not what the developer intended and can result in
           | wrong assumptions about the table's schema, for example (they
           | expected their version would be used but the new migration is
           | actually skipped).
        
             | arp242 wrote:
             | In general I agree, but in MySQL/MariaDB this is quite
             | tricky because you can't do a lot of stuff in transactions;
             | definitely seen problems where someone does;
             | -- migrate-2024-08-12-foo.sql       create table [..]
             | alter table [..]       insert into migrations values
             | ('migrate-2024-08-12-foo');
             | 
             | And the create works, but the alter fails. And then you try
             | to run it again and now the create fails because you have
             | half a migration. Herp derp.
             | 
             | And you often can't split those two statements either: that
             | would be "half a migration" that won't work with the code.
             | 
             | As far as I know there isn't really a good solution for
             | this. It can be quite a pain.
             | 
             | In PostgreSQL and SQLite this is not an issue as you can
             | run all the above in a transaction. You indeed rarely need
             | "if not exists" there.
        
               | kgeist wrote:
               | For that exact reason, we have a rule that every DDL
               | migration must be a separate migration in the framework.
               | So if migration #2 out of 3 fails, only 1 migration will
               | be marked as completed. If we see migration failed in
               | CI/CD logs, it's immediately clear what exactly failed,
               | and we can quickly revert what has managed to run (by
               | applying the corresponding down migration), to return the
               | DB to the previous valid state (if we decide to abort the
               | release and fix the migrations first). Or we can rerun
               | individual DDL migrations again, if it's a timeout issue.
               | 
               | We design migrations in such a way so that new
               | schemas/data never conflict with already running (old)
               | code. And new code is activated only after all migrations
               | completed successfully. So in practice half completed
               | migrations is rarely an issue for us.
               | 
               | But yeah, it requires more boilerplate. However, there's
               | fewer surprises, compared to IF NOT EXISTS.
        
               | arp242 wrote:
               | There's lots of cases where that doesn't really work, and
               | can't logically work. An obvious example is:
               | drop index old;       create index new [..];
               | 
               | There's tons of examples like this where things can kind-
               | of work with enough effort, but also not really - if the
               | query takes 5 minutes without that index it's just going
               | to timeout, so not that different from having broken
               | code, and not much you can do about that too. You really
               | want those two combined to be atomic.
               | 
               | And in cases of "create table new; drop table old" using
               | the old table is often 1) a major hassle, and 2) often
               | doesn't actually work all that well as it's not a tested
               | path.
               | 
               | Either way, it seems MariaDB does support transactions
               | for all of this if you disable "autocommit mode", judging
               | from their docs? I haven't used MySQL/MariaDB in quite a
               | few years, but I've never seen migrations for them done
               | well without tons of hassle and it's a major reason I've
               | typically recommended PostgreSQL. It's just eliminates an
               | entire class of common headaches and errors.
        
               | kgeist wrote:
               | For your index example, why not just do:
               | create index new [..]       drop index old;
               | 
               | At every point, there's always an index to use. We
               | usually have to split releases into several
               | "subreleases":
               | 
               | Release #1                 (automatic migration step)
               | - alter the table       - add a new index which uses new
               | columns (etc.)       (code release step)       - new code
               | goes live and starts using the new index
               | 
               | Release #2 (we call it "post release")                 -
               | remove the old unused index
               | 
               | Automation, CI/CD help here. Yeah, it's somewhat error
               | prone, but you usually get used to it. Not saying it's
               | perfect, but manageable.
               | 
               | In any case, you can never make both SQL migration and
               | code release atomic (if you use rolling updates with zero
               | downtime) so you already have to split into such
               | subreleases anyway.
        
               | arp242 wrote:
               | > Yeah, it's somewhat error prone, but you usually get
               | used to it. Not saying it's perfect, but manageable.
               | 
               | Of course; I'm just saying this is why people want
               | idempotency and use "if not exists", especially in
               | MariaDB/MySQL land.
        
               | evanelias wrote:
               | > it seems MariaDB does support transactions for all of
               | this
               | 
               | No, DDL cannot be used in a transaction in any version of
               | MySQL or MariaDB.
               | 
               | > I've typically recommended PostgreSQL. It's just
               | eliminates an entire class of common headaches and errors
               | 
               | That's fair, but it does create new headaches, for
               | example the notion of an invalid index (as described in
               | the post) simply does not exist in MySQL or MariaDB.
               | Index creation does not block concurrent writes, and
               | won't fail mid-stream due to a lock conflict with old row
               | purging (equivalent to PG's vacuum). And if a DDL
               | statement somehow does fail, e.g. due to an ill-timed
               | host crash, it is fully cleaned up -- DDL is atomic, just
               | not transactional.
               | 
               | One way to avoid the "half a migration" problem you
               | mentioned up-thread is to use declarative schema
               | management, which doesn't require tracking completion
               | state of "migrations" at all:
               | https://www.skeema.io/blog/2019/01/18/declarative/
        
               | marcosdumay wrote:
               | There are tools that will break each statement into a
               | different migration, and apply a larger grouping by the
               | files you create.
               | 
               | I have actually never seen people adopt those. And the
               | "enterprise" tools people tend to circle around seem to
               | create more failure modes than it helps solving. But I
               | imagine somebody uses them somewhere.
               | 
               | Without those tools, you can break your migration around
               | the failure point and mark the first half as complete.
               | You should have some way to mark them, because you always
               | need some channel for "free" management of a database.
               | 
               | Anyway, transactional DDL is great.
        
             | 1a527dd5 wrote:
             | We have a similar setup; but there are still scenarios
             | where you need a IF NOT EXISTS. A classic example of a
             | table on dev being close to empty, and a table on prod
             | being huge and taking longer than the default timeout ~30
             | seconds.
             | 
             | Depending on what the timeout is (connection timeout,
             | transaction timeout etc) the DDL operation _may_ still be
             | running. But the migration "process" has failed. So you end
             | up having to slightly modify the attempted migration to
             | account for the fact the thing already exists.
             | 
             | It's an increasingly rare occurrence, probably a few times
             | a year. Like anything the more we do it, the better we get
             | at pulling left and preventing issues in the first place.
        
       | 1a527dd5 wrote:
       | IF NOT EXISTS is incredibly useful to resolve the state of
       | difference between environments.
       | 
       | A classic example is a massive table on production, but an empty
       | table on dev. Dev gets migrated fine, but prod fails for reasons.
       | You now need to resolve it.
       | 
       | Also, we have a convention (not enforced) that we let PostgreSQL
       | name the relation (e.g PK, UQ, Index et al) for us.
       | 
       | So we don't do:-
       | 
       | CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_test_data ON
       | test_table (data);
       | 
       | We do:-
       | 
       | CREATE INDEX CONCURRENTLY ON schema_name.table_name
       | (column_name);
        
         | lmm wrote:
         | How do you do migrations that drop those indices if they're
         | potentially differently-named on dev/prod?
        
       ___________________________________________________________________
       (page generated 2024-08-12 23:00 UTC)