[HN Gopher] How not to change PostgreSQL column type
___________________________________________________________________
How not to change PostgreSQL column type
Author : clarkbw
Score : 58 points
Date : 2024-05-07 14:59 UTC (8 hours ago)
(HTM) web link (notso.boringsql.com)
(TXT) w3m dump (notso.boringsql.com)
| clarkbw wrote:
| My tests running ALTER varied from ~20 seconds to ~1 min for the
| changes.
|
| > Current CI/CD practices often make it very easy for software
| developers to commit and roll out database migrations to a
| production environment, only to find themselves in the middle of
| a production incident minutes later. While a staging deployment
| might help, it's not guaranteed to share the same characteristics
| as production (either due to the level of load or monetary
| constraints).
|
| (neon.tech employee here)
|
| This is where branching databases with production data helps
| quite a bit. Your CI/CD environment and even staging can
| experience the schema changes. When you build from a seed
| database you can often miss this kind of issue because it lacks
| the characteristics of your production environment.
|
| But the author rightly calls out how staging isn't even enough in
| the next paragraph:
|
| >The problem is, therefore (and I will repeat myself), the scale
| of the amount of data being modified, overall congestion of the
| system, I/O capacity, and the target table's importance in the
| application design.
|
| Your staging, even when branched from production, won't have the
| same load patterns as your production database. And that load and
| locks associated will result in a different rollout.
|
| This has me thinking if you can match the production environment
| patterns in staging by setting staging up to mirror the query
| patterns of production. Mirroring like what's available from
| pg_cat could put your staging under similar pressure.
|
| And then this also made me think about how we're not capturing
| the timing of these schema changes. Unless a developer looks and
| sees that their schema change took 56 seconds to complete in
| their CI system you won't know that this change might have larger
| knock on effects in production.
| radimm wrote:
| Author here - this is my primary goal, exposing the complexity
| developer might not even think about. Can't even count number
| of instances seemingly inconspicuous changes caused incident.
|
| "Works on my DB" is new "works on my machine" (and don't
| trademark it, please :)))
| clarkbw wrote:
| Agreed! A common ORM pitfall is column rename which often
| doesn't get implemented as a rename as much as it does a DROP
| and ADD which will affect the data in a surprising way :-D
| jbub wrote:
| "Ensure the column is NULLable and does not have a default value
| to avoid forcing a full table rewrite."
|
| That one should be fixed since Postgres 11
| https://brandur.org/postgres-default
| radimm wrote:
| Thank you for correction! Indeed - it's one of those things
| i've read but practically avoided it for historic reasons and
| therefore forgot about it.
| miohtama wrote:
| For " How to Safely Change a PostgreSQL Column Type" (new column,
| trigger, etc). Is there a particular reason PostgreSQL cannot set
| up column migration itself out-of-the-box? I have used PSQL for
| many years and it is always a bit cumbersome to do the column
| migrations manually, even though the process itself is quite
| mechanical.
| paulryanrogers wrote:
| So Postgresql and MySQL can both do full table copies as
| needed. And their locking has gotten better. They also can do
| more changes in place. Yet still too often they'll need
| exclusive locks which blocks even reads.
|
| For very big tables it's often better to manually add a column,
| backfill then drop-swap.
| mattashii wrote:
| > Is there a particular reason PostgreSQL cannot set up column
| migration itself out-of-the-box?
|
| People haven't asked hard enough to the right people, I
| suppose. PostgreSQL is an open-source project, where wanted and
| needed features are supplied by willing individuals and
| companies, and vetted by the community so that the code base
| remains in good quality.
|
| I just suppose no-one has bothered yet with implementing ALTER
| TABLE CONCURRENTLY to the point that it's been accepted by the
| community, though another reason might be because the induced
| metadata churn (only 1600 distinct user-defined columns
| available in each table at most) might become problematic with
| ADD COLUMN ... DROP COLUMN.
| jihadjihad wrote:
| Good post, you can tell the author has some burn marks from
| seeing some sparks in production.
|
| > Let's start with The Bad News - you cannot avoid rewriting the
| entire table, which will generate a significant amount of WAL
| files in the process. This is a given, and you must plan how to
| manage it.
|
| There are some exceptions to this. The blog post focuses on
| changing a column from int to bigint, which indeed needs a full
| table rewrite to reallocate 8 bytes per ID value instead of 4.
| But:
|
| > As an exception, when changing the type of an existing column,
| if the USING clause does not change the column contents and the
| old type is either binary coercible to the new type or an
| unconstrained domain over the new type, a table rewrite is not
| needed. [0]
|
| So, you could change from varchar to text, for example, or change
| from varchar(20) to varchar(4), etc. and not incur a full table
| rewrite.
|
| 0: https://www.postgresql.org/docs/current/sql-altertable.html
| Thaxll wrote:
| This is pretty much how pt-online-schema-change from Percona
| works.
|
| Good old MySQL days...
| evanelias wrote:
| My thoughts exactly. It's surprising that external online
| schema change tools for Postgres have only become a thing
| fairly recently! The only two I'm aware of are:
|
| * pgroll: Written in Golang, first commits June 2023.
| https://github.com/xataio/pgroll
|
| * pg-osc: Written in Ruby, first commits Dec 2021.
| https://github.com/shayonj/pg-osc
|
| Meanwhile over in the MySQL and MariaDB ecosystem, external OSC
| tools have been around for quite some time, starting with oak-
| online-alter-table over 15 years ago. The most popular options
| today are pt-online-schema-change or gh-ost, but other battle-
| tested solutions include fb-osc, LHM, and the latest entry
| Spirit.
| vrosas wrote:
| I'm probably alone here and this goes against the HN consensus
| but, as great a piece of tech Postgres is, I've often found it to
| be a premature optimization to pick as the main DB for newer
| companies. If you don't know what your data models are, you're
| going to end up in worlds of pain constantly running schema
| migrations like the OP describes.
| dewey wrote:
| I'd say Postgres is very often the right choice for newer
| company as it's well understood, easy to operate and you know
| you don't have to switch to a new DB because the vendor got
| acquired / merged / shutdown after 2 years or is going through
| growing pains and deprecations themselves.
|
| If you give your schema a good thought (The one place where you
| shouldn't rush and take shortcuts at the beginning) and for
| example use jsonb columns and later move data out of it if you
| notice you need to query on it more performantly you can get
| very far.
|
| The pain of data model migrations is also usually not that big
| if the company isn't very large and has a lot of legacy yet.
| beeboobaa3 wrote:
| > premature optimization
|
| I'm extremely curious to hear what you consider a
| better/simpler choice. At least postgres gives you the tools to
| do schema migrations, and if you're operating at a scale where
| such migrations become a problem (i.e. probably not for a
| while) you really ought to know what you're doing.
| unclebucknasty wrote:
| It's not Postgres, specifically, as much as _any_ SQL or non-
| schemaless database, right?
|
| And if we're saying that's a problem, then sounds like we're
| going back into the NoSQL debates from a decade ago.
|
| Hopefully not.
|
| I think it's better to understand your schema as much as
| possible, and have a sane process for applying changes when
| needed. Defining a schema forces you to think about what you're
| building.
|
| OTOH, the idea that developers on a project are just going to
| throw whatever new attributes they need into a document as they
| go along is a recipe for bugs and inefficiency. Also, near-
| instant technical debt, as early work frequently survives
| longer than anticipated.
|
| You also don't completely escape data changes without pain when
| using a NoSQL database. If for instance you change a string to
| an int you'd still need to figure out what to do with existing
| data, either via conversion or handling in-code.
| xboxnolifes wrote:
| I'm confused by the use of premature optimization here. What
| exactly are you suggesting is the better default choice?
| j-cheong wrote:
| Changing a PostgreSQL column type without following the author's
| instructions and just running the following command is VERY anti-
| pattern. Confused why people do this in the first place.
|
| ALTER TABLE table_name ALTER COLUMN column_name [SET DATA] TYPE
| new_data_type
|
| >you need to make sure the source system has enough disk space to
| hold the WAL files for a long enough time
|
| if the asynchronous replication process has an external buffer
| instead of the WAL, then it addresses this issue
| chatmasta wrote:
| > Confused why people do this in the first place
|
| Probably because every tutorial on the Internet, along with the
| docs, recommends doing it this way. All the gotchas are buried
| in the footnotes.
| mattashii wrote:
| > Confused why people do this in the first place.
|
| Because you lose a significant amount of performance if you
| start adding NULL and variable-length columns just because
| you're afraid of a table rewrite.
|
| Because the resulting table will not have had 1 table of
| update-induced bloat at the end of the operation.
|
| Because you can be sure the modification is applied atomically
| and you as the user can be sure the migration from A to B goes
| through as expected or has a graceful rollback to the old data,
| rather than getting stuck or failures halfway through the
| migration.
|
| Because toasted data from DROP-ed columns is not removed from
| storage with the DROP COLUMN statement, but only after the row
| that refers to that toasted value is updated or deleted.
|
| ...
|
| Every column you "DROP" remains in the catalogs to make sure
| old tuples' data can be read from disk. That's overhead you now
| will have to carry around until the table is dropped. I'm not
| someone who likes having to carry that bloat around.
| dgan wrote:
| Not everyone has billion rows in their tables -\\_(tsu)_/-
| yen223 wrote:
| The unsafe ALTER COLUMN is one step.
|
| The safe option is four steps minimum.
|
| It's not hard to see why people would be tempted by the unsafe
| option.
| junto wrote:
| Microsoft SQL Server has a similar issue but also similarly when
| adding a new column to a table, which I found out by accident
| when trying to deploy a migration that had worked fine in our
| non-production environment stages.
|
| Adding the column as not nullable and setting a default is the
| absolute killer.
|
| A work around when you still want to have a default is to do a
| three pass migration. First add the column as nullable and no
| default, then run an update to set the default on all rows, and
| finally change it to not nullable and set the default constraint.
|
| It has a surprising difference in speed.
___________________________________________________________________
(page generated 2024-05-07 23:01 UTC)