[HN Gopher] How not to change PostgreSQL column type
       ___________________________________________________________________
        
       How not to change PostgreSQL column type
        
       Author : clarkbw
       Score  : 58 points
       Date   : 2024-05-07 14:59 UTC (8 hours ago)
        
 (HTM) web link (notso.boringsql.com)
 (TXT) w3m dump (notso.boringsql.com)
        
       | clarkbw wrote:
       | My tests running ALTER varied from ~20 seconds to ~1 min for the
       | changes.
       | 
       | > Current CI/CD practices often make it very easy for software
       | developers to commit and roll out database migrations to a
       | production environment, only to find themselves in the middle of
       | a production incident minutes later. While a staging deployment
       | might help, it's not guaranteed to share the same characteristics
       | as production (either due to the level of load or monetary
       | constraints).
       | 
       | (neon.tech employee here)
       | 
       | This is where branching databases with production data helps
       | quite a bit. Your CI/CD environment and even staging can
       | experience the schema changes. When you build from a seed
       | database you can often miss this kind of issue because it lacks
       | the characteristics of your production environment.
       | 
       | But the author rightly calls out how staging isn't even enough in
       | the next paragraph:
       | 
       | >The problem is, therefore (and I will repeat myself), the scale
       | of the amount of data being modified, overall congestion of the
       | system, I/O capacity, and the target table's importance in the
       | application design.
       | 
       | Your staging, even when branched from production, won't have the
       | same load patterns as your production database. And that load and
       | locks associated will result in a different rollout.
       | 
       | This has me thinking if you can match the production environment
       | patterns in staging by setting staging up to mirror the query
       | patterns of production. Mirroring like what's available from
       | pg_cat could put your staging under similar pressure.
       | 
       | And then this also made me think about how we're not capturing
       | the timing of these schema changes. Unless a developer looks and
       | sees that their schema change took 56 seconds to complete in
       | their CI system you won't know that this change might have larger
       | knock on effects in production.
        
         | radimm wrote:
         | Author here - this is my primary goal, exposing the complexity
         | developer might not even think about. Can't even count number
         | of instances seemingly inconspicuous changes caused incident.
         | 
         | "Works on my DB" is new "works on my machine" (and don't
         | trademark it, please :)))
        
           | clarkbw wrote:
           | Agreed! A common ORM pitfall is column rename which often
           | doesn't get implemented as a rename as much as it does a DROP
           | and ADD which will affect the data in a surprising way :-D
        
       | jbub wrote:
       | "Ensure the column is NULLable and does not have a default value
       | to avoid forcing a full table rewrite."
       | 
       | That one should be fixed since Postgres 11
       | https://brandur.org/postgres-default
        
         | radimm wrote:
         | Thank you for correction! Indeed - it's one of those things
         | i've read but practically avoided it for historic reasons and
         | therefore forgot about it.
        
       | miohtama wrote:
       | For " How to Safely Change a PostgreSQL Column Type" (new column,
       | trigger, etc). Is there a particular reason PostgreSQL cannot set
       | up column migration itself out-of-the-box? I have used PSQL for
       | many years and it is always a bit cumbersome to do the column
       | migrations manually, even though the process itself is quite
       | mechanical.
        
         | paulryanrogers wrote:
         | So Postgresql and MySQL can both do full table copies as
         | needed. And their locking has gotten better. They also can do
         | more changes in place. Yet still too often they'll need
         | exclusive locks which blocks even reads.
         | 
         | For very big tables it's often better to manually add a column,
         | backfill then drop-swap.
        
         | mattashii wrote:
         | > Is there a particular reason PostgreSQL cannot set up column
         | migration itself out-of-the-box?
         | 
         | People haven't asked hard enough to the right people, I
         | suppose. PostgreSQL is an open-source project, where wanted and
         | needed features are supplied by willing individuals and
         | companies, and vetted by the community so that the code base
         | remains in good quality.
         | 
         | I just suppose no-one has bothered yet with implementing ALTER
         | TABLE CONCURRENTLY to the point that it's been accepted by the
         | community, though another reason might be because the induced
         | metadata churn (only 1600 distinct user-defined columns
         | available in each table at most) might become problematic with
         | ADD COLUMN ... DROP COLUMN.
        
       | jihadjihad wrote:
       | Good post, you can tell the author has some burn marks from
       | seeing some sparks in production.
       | 
       | > Let's start with The Bad News - you cannot avoid rewriting the
       | entire table, which will generate a significant amount of WAL
       | files in the process. This is a given, and you must plan how to
       | manage it.
       | 
       | There are some exceptions to this. The blog post focuses on
       | changing a column from int to bigint, which indeed needs a full
       | table rewrite to reallocate 8 bytes per ID value instead of 4.
       | But:
       | 
       | > As an exception, when changing the type of an existing column,
       | if the USING clause does not change the column contents and the
       | old type is either binary coercible to the new type or an
       | unconstrained domain over the new type, a table rewrite is not
       | needed. [0]
       | 
       | So, you could change from varchar to text, for example, or change
       | from varchar(20) to varchar(4), etc. and not incur a full table
       | rewrite.
       | 
       | 0: https://www.postgresql.org/docs/current/sql-altertable.html
        
       | Thaxll wrote:
       | This is pretty much how pt-online-schema-change from Percona
       | works.
       | 
       | Good old MySQL days...
        
         | evanelias wrote:
         | My thoughts exactly. It's surprising that external online
         | schema change tools for Postgres have only become a thing
         | fairly recently! The only two I'm aware of are:
         | 
         | * pgroll: Written in Golang, first commits June 2023.
         | https://github.com/xataio/pgroll
         | 
         | * pg-osc: Written in Ruby, first commits Dec 2021.
         | https://github.com/shayonj/pg-osc
         | 
         | Meanwhile over in the MySQL and MariaDB ecosystem, external OSC
         | tools have been around for quite some time, starting with oak-
         | online-alter-table over 15 years ago. The most popular options
         | today are pt-online-schema-change or gh-ost, but other battle-
         | tested solutions include fb-osc, LHM, and the latest entry
         | Spirit.
        
       | vrosas wrote:
       | I'm probably alone here and this goes against the HN consensus
       | but, as great a piece of tech Postgres is, I've often found it to
       | be a premature optimization to pick as the main DB for newer
       | companies. If you don't know what your data models are, you're
       | going to end up in worlds of pain constantly running schema
       | migrations like the OP describes.
        
         | dewey wrote:
         | I'd say Postgres is very often the right choice for newer
         | company as it's well understood, easy to operate and you know
         | you don't have to switch to a new DB because the vendor got
         | acquired / merged / shutdown after 2 years or is going through
         | growing pains and deprecations themselves.
         | 
         | If you give your schema a good thought (The one place where you
         | shouldn't rush and take shortcuts at the beginning) and for
         | example use jsonb columns and later move data out of it if you
         | notice you need to query on it more performantly you can get
         | very far.
         | 
         | The pain of data model migrations is also usually not that big
         | if the company isn't very large and has a lot of legacy yet.
        
         | beeboobaa3 wrote:
         | > premature optimization
         | 
         | I'm extremely curious to hear what you consider a
         | better/simpler choice. At least postgres gives you the tools to
         | do schema migrations, and if you're operating at a scale where
         | such migrations become a problem (i.e. probably not for a
         | while) you really ought to know what you're doing.
        
         | unclebucknasty wrote:
         | It's not Postgres, specifically, as much as _any_ SQL or non-
         | schemaless database, right?
         | 
         | And if we're saying that's a problem, then sounds like we're
         | going back into the NoSQL debates from a decade ago.
         | 
         | Hopefully not.
         | 
         | I think it's better to understand your schema as much as
         | possible, and have a sane process for applying changes when
         | needed. Defining a schema forces you to think about what you're
         | building.
         | 
         | OTOH, the idea that developers on a project are just going to
         | throw whatever new attributes they need into a document as they
         | go along is a recipe for bugs and inefficiency. Also, near-
         | instant technical debt, as early work frequently survives
         | longer than anticipated.
         | 
         | You also don't completely escape data changes without pain when
         | using a NoSQL database. If for instance you change a string to
         | an int you'd still need to figure out what to do with existing
         | data, either via conversion or handling in-code.
        
         | xboxnolifes wrote:
         | I'm confused by the use of premature optimization here. What
         | exactly are you suggesting is the better default choice?
        
       | j-cheong wrote:
       | Changing a PostgreSQL column type without following the author's
       | instructions and just running the following command is VERY anti-
       | pattern. Confused why people do this in the first place.
       | 
       | ALTER TABLE table_name ALTER COLUMN column_name [SET DATA] TYPE
       | new_data_type
       | 
       | >you need to make sure the source system has enough disk space to
       | hold the WAL files for a long enough time
       | 
       | if the asynchronous replication process has an external buffer
       | instead of the WAL, then it addresses this issue
        
         | chatmasta wrote:
         | > Confused why people do this in the first place
         | 
         | Probably because every tutorial on the Internet, along with the
         | docs, recommends doing it this way. All the gotchas are buried
         | in the footnotes.
        
         | mattashii wrote:
         | > Confused why people do this in the first place.
         | 
         | Because you lose a significant amount of performance if you
         | start adding NULL and variable-length columns just because
         | you're afraid of a table rewrite.
         | 
         | Because the resulting table will not have had 1 table of
         | update-induced bloat at the end of the operation.
         | 
         | Because you can be sure the modification is applied atomically
         | and you as the user can be sure the migration from A to B goes
         | through as expected or has a graceful rollback to the old data,
         | rather than getting stuck or failures halfway through the
         | migration.
         | 
         | Because toasted data from DROP-ed columns is not removed from
         | storage with the DROP COLUMN statement, but only after the row
         | that refers to that toasted value is updated or deleted.
         | 
         | ...
         | 
         | Every column you "DROP" remains in the catalogs to make sure
         | old tuples' data can be read from disk. That's overhead you now
         | will have to carry around until the table is dropped. I'm not
         | someone who likes having to carry that bloat around.
        
         | dgan wrote:
         | Not everyone has billion rows in their tables -\\_(tsu)_/-
        
         | yen223 wrote:
         | The unsafe ALTER COLUMN is one step.
         | 
         | The safe option is four steps minimum.
         | 
         | It's not hard to see why people would be tempted by the unsafe
         | option.
        
       | junto wrote:
       | Microsoft SQL Server has a similar issue but also similarly when
       | adding a new column to a table, which I found out by accident
       | when trying to deploy a migration that had worked fine in our
       | non-production environment stages.
       | 
       | Adding the column as not nullable and setting a default is the
       | absolute killer.
       | 
       | A work around when you still want to have a default is to do a
       | three pass migration. First add the column as nullable and no
       | default, then run an update to set the default on all rows, and
       | finally change it to not nullable and set the default constraint.
       | 
       | It has a surprising difference in speed.
        
       ___________________________________________________________________
       (page generated 2024-05-07 23:01 UTC)