[HN Gopher] Zero-downtime schema migrations in Postgres using views
___________________________________________________________________
Zero-downtime schema migrations in Postgres using views
Author : fabianlindfors
Score : 82 points
Date : 2021-06-16 18:39 UTC (4 hours ago)
(HTM) web link (fabianlindfors.se)
(TXT) w3m dump (fabianlindfors.se)
| nartz wrote:
| Very interesting. I like the idea of "virtualizing" the hard
| schema with views to abstract away some things.
|
| View performance can be a thing at larger scale for OLTP
| workloads, also, the solution you propose also adds complexity
| since you have two schemas now instead of one, and as you rightly
| point out, complexity with views themselves. The question becomes
| when is this added complexity worth it?
| johbjo wrote:
| Views are equivalent to "query templates" unless for example
| they do operations on subquery aggregate columns (for example
| group by sum().)
|
| So views don't add complexity relative to giving the query
| explicitly. It might even help since the incoming queries can
| be smaller, thus less parsing/processing.
| fabianlindfors wrote:
| I think you hit the nail on the head regarding complexity. I'm
| experimenting with creating a migration tool which handles all
| the views for you. I believe some of the process can be
| automated and the complexity reduced but remains to be seen how
| well it works!
| ThrustVectoring wrote:
| I looked into this recently, and views cover like 98% of the
| functionality that the client app needs from Postgres. One issue
| I ran into was that Postgres forgets that the primary key is a
| primary key when pulling from a view, which breaks some queries
| that rely on grouping by the primary key.
|
| https://dba.stackexchange.com/questions/195104/postgres-grou...
| has some more info on this
| e12e wrote:
| Doesn't DISTINCT work in this case? SELECT
| DISTINCT ON (vt.id) row_to_json(vt.*) FROM vt JOIN
| vy ON vt.id = vy.tid WHERE vt.id = 1 AND vy.amount
| > 1;
| ThrustVectoring wrote:
| The SQL in question which was problematic for me (tables
| renamed):
|
| SELECT posts.*, MAX(COALESCE(comments.created_at,
| posts.created_at)) AS latest_timestamp FROM posts LEFT OUTER
| JOINS comments ON posts.id = comments.post_id GROUP BY
| posts.id ORDER BY latest_timestamp desc
|
| In short, this is sorting posts by the most recent comment,
| with a fallback to the post date if the post has no comments
| on it. Hard to get rid of the grouping here and get the same
| data back.
| e12e wrote:
| I see, that makes more sense.
| fabianlindfors wrote:
| Interesting link, thanks! It would be really nice if Postgres
| were to close that gap and make them fully equivalent (if that
| is even possible).
| munk-a wrote:
| I recently transitioned a number of tables over to views as
| part of a data model rearrangement and I absolutely loathed my
| previous self that leveraged that primary key trick. I don't
| think there is anything unsafe about them choosing to
| transition to allowing any column singularly defined as a
| unique index for the table to serve this role and it'd help
| make things a fair bit more logical.
|
| That all said, until that happens, I'd strongly suggest
| avoiding that functionality since it can lay down some real
| landmines.
| [deleted]
| julian37 wrote:
| This might work to some extent for renaming things but doesn't
| for any other kind of migration I can think of:
|
| - Dropping a column doesn't work (assuming the point of dropping
| it is actual deletion, rather than just hiding it)
|
| - Adding a column doesn't work either
|
| - Changing a column's nullability, default value, or data type
| doesn't work
|
| - Doesn't help with adding or changing constraints or indexes
| fabianlindfors wrote:
| With this model, migrations would be two-stepped. First the
| migration would be applied, creating a new schema and adding
| the new information to the underlying table. Once the version
| is old and no longer used by any clients, the migration would
| be "deleted", removing the schema and the underlying
| information.
|
| So when dropping a column it would go like this: 1. Migration
| is applied. Columns is removed from the new view but remains in
| the backing table. 2. Migration is no longer needed. Column is
| removed from backing table.
|
| This design should be extendable to cover changing data type
| (introduce a new column and moves values over + sync values
| using trigger), changing default or changing the contents of a
| full column.
|
| Constraints or indices are a bit trickier, but it might work to
| create a new column duplicated from the old one and add
| constraints to that instead.
| kroolik wrote:
| Adding a column, changing column's nullability and
| adding/changing constraints is already zero-downtime in PG.
|
| Not sure about default value change, but I would also say its
| zero-downtime as adding a column with default is zero-downtime.
| Haven't checked, though.
|
| Dropping a column and changing the data type are not zero-
| downtime.
|
| Renaming a column is not zero-downtime as you might have
| multiple readers/writes using both.
| mikeklaas wrote:
| Adding a NULL constraint isn't zero downtime since PG scans
| the whole table ensuring that the constraint is not violated.
| dariusj18 wrote:
| I don't remember which but there was a DB server I worked with
| where the tables weren't directly accessible and required alias's
| to be created, thus enforcing portability.
| fabianlindfors wrote:
| Author here, thanks for reading. As has been mentioned in some
| comments, the article only covers half the story of how this can
| be used for migrations. The greater idea is to have every commit
| be short-lived and two-phased: apply and cleanup. A migration is
| first applied and once it's no longer used by any clients, it's
| cleaned up.
|
| 1. Apply: the new schema is created and any new data is added to
| the backing table.
|
| 2. Cleanup: the schema is deleted and any data that is no longer
| needed is removed from the backing table.
|
| For example, to perform a column deletion:
|
| 1. Apply: create new schema without the column which is being
| deleted. Don't change anything in the backing table.
|
| 2. Cleanup: delete the schema and the column from the backing
| table.
| brycelarkin wrote:
| I think views are great way of abstracting out business logic.
| The one thing keeping me from using it more frequently is that it
| doesn't work well with Row Level Security.
| xemoka wrote:
| I was having issues with this as well; it seemed my problem was
| that I was creating the view as a superuser. Since view
| creation is essentially WITH SECURITY DEFINER (to take a term
| from UDFs) any user accessing the view to see the underlying
| rls enabled table would see all the results (since `postgres`
| user bypasses all RLS).
|
| Creating a separate "data owner" and "api owner" to own the api
| and data schema+tables respectively alleviates this problem.
| Your API schema owner != your table owner unless you `ALTER
| TABLE FORCE ROW LEVEL SECURITY` in addition to just enabling
| it.
|
| What issues are you running into with RLS?
| kroolik wrote:
| Interesting approach, with a pretty dark side-effect: the tech
| debt silently creeping in and saying forever. Apart from that,
| you can use any abstraction over the base schema, depending on
| the use-case. Renaming a column can be as simple as renaming a
| field in the DTO structure.
|
| The interesting part is when you actually get to pay the debt
| back. Yes, this is not free lunch. You move fast, but you have to
| eventually pay the cost of the underlying storage containing all
| the historical schemas combined at once.
|
| This is the difference between carefully planned schema
| migrations and abstraction-based migrations. The former takes
| more time planning and executing, but the end state is the
| reference schema. The latter keeps on extending the underlying
| schema and building projections over it. Over time, the system
| accumulates tech debt in the form of ever-expanding base schema
| that contains all the historical columns.
| johbjo wrote:
| The point here is to do migrations live while clients randomly
| connect and stay connected for random time. When all old
| clients have disconnected, the old schema can be removed.
|
| The point is not to let old clients work indefinitely.
|
| In other versions of this idea, all views/procedures are
| prefixed with some revision-number but that looks messy.
| kroolik wrote:
| But what happens when schema Xn adds a column N, and schema
| Xn+1 removes it? The base schema, the source for all the
| views will contain all the columns Xn... from all the
| migrations until purged.
|
| That purging is what has a lot of operational complexity. And
| renaming a column. All the rest is zero-downtime in PG
| already.
| johbjo wrote:
| It would have to be in steps:
|
| Add Xn+1: create new views without column. When no Xn
| clients remain: drop column from tables and drop schema Xn.
|
| The point as I see it is to not break live client
| connections which expect the column to exist.
| kroolik wrote:
| Yes, this is a very important insight! I think the author
| proposes this approach as replacement for regular schema
| migrations, whilst both approaches complement one
| another.
|
| You can use views to make migrations that were previously
| tricky zero-downtime.
|
| If that's not the case, then I mist've read the article
| wrong!
|
| Edit: although when I think of it - if you want to
| eventually materialize old migration schemas into the
| base schema, you need to do the rename, too. Which is not
| zero-downtime because of new migration schemas that do
| the renaming automatically. Meaning changing views'
| definition, meaning lots of locking.
|
| So, you still need maintenance windows to merge all the
| changes. Just not on every change. Otherwise the base
| schema will then eventually be completely out of sync and
| contain tons of old, unused columns.
| fabianlindfors wrote:
| You're right that the article doesn't mention how this
| can actually be practically used for migrations, but that
| is exactly what I had in mind. Maybe I'll write another
| one some day about the two-phase migrations combined with
| views.
| kroolik wrote:
| Looking forward to your future blog posts! The rename
| thing got me really interested.
| fabianlindfors wrote:
| This is precisely the way I was imagining! I'm
| experimenting with creating a migration tool which will
| do this fully automatically.
| zozbot234 wrote:
| Views can perform arbitrary queries. They can achieve _any_
| kind of inference starting from the base data that 's
| stored in tables. (Keep in mind that modern SQL is fully
| Turing-complete.) This makes any view-augmented database a
| viable source of rich, complex knowledge, not just mere
| information.
| kroolik wrote:
| But I doubt unleashing the full expressive power of SQL
| is the point here. It would easily turn a moderately
| complicated "remove column" migration a real maintenance
| hell.
|
| In the simplest form, when you add a column to some
| schema, it should be materialized in the base schema and
| exposed via the migration view.
|
| The problems start when you add and remove 20 columns
| because even though they are no longer visible in
| migration schemas, they take up space in the base schema
| lallysingh wrote:
| That sounds like problems upstream of this technique. Are
| you complaining that it isn't a silver bullet or that
| it's so powerful that it'll be abused?
| kroolik wrote:
| @lallysingh, sorry, can't reply directly.
|
| All I'm saying is that I don't think we need a full
| Turing-complete cannonball to hit the (relatively) small
| fly of no-downtime migrations.
|
| Is it a silver-bullet? It's Turing-complete so high
| chances yes. But for me it has a high risk of causing a
| silver-poisoning.
|
| Personally, I would stick with simpler solutions.
| squiggleblaz wrote:
| Butting in to say: Maximum reply depth is relative to the
| view. Just click on the comment's time and you will get a
| form with a reply field.
| fabianlindfors wrote:
| Yes, this approach might actually decrease the power of
| your database as some things become much trickier. It
| might be worth trading of some expressive power of the
| database for smoother, less hands-on migrations though,
| which is what I'm going for.
| fabianlindfors wrote:
| This is definitely a valid concern. My plan for building
| migrations on top of this is to have older migrations
| automatically be deleted as they become unused. Preferably you
| wouldn't actually keep using the old schemas but rather keep
| them as you roll out changes to your application servers.
| Migrations could then be deleted (in reverse order) based on
| some condition, for example time since the schema was created
| or time since last query.
| kroolik wrote:
| But in order to support the "add column" migration you have
| to add it to the base schema, right? It has to be
| materialized somewhere.
|
| Now, when you remove a column, you just create a new view
| without it being available. But the column in the base schema
| must stay because: you support the old schema, and because
| you want to avoid the operational complexity of physically
| removing it, right?
|
| Now, you can obviously split a table into a list of tables
| each storing the pk and some column value. But you have just
| invented columnar storage in a row-based rdbms :D
|
| A different approach is to have scheduled maintenance window,
| for example quarterly or every 6 month when you materialize
| all the changes at once. The amortized operational cost is
| way lower :)
| luhn wrote:
| I thought this wouldn't work with INSERTs and UPDATEs, but it
| turns out newer versions of Postgres automatically support
| updating simple views. [1]
|
| [1] https://www.postgresql.org/docs/12/sql-createview.html (See
| "Updatable Views")
| munk-a wrote:
| The information on this is, admittedly, hard to dig out of the
| primary docs but there also exists an INSTEAD OF trigger mode -
| so you can CREATE TRIGGER "actually_save_stuff" INSTEAD OF
| INSERT which will intercept an attempted insert on the table
| and write the rows elsewhere.
|
| Generally I've found these useful in the short term for letting
| application logic continue to work blissfully ignorant to the
| fact that the "table" it is writing to has been moved, merged
| or refactored into a set of tables that the trigger will bridge
| logic over to. While in the long term I've found that the
| automatic logic covers the most common case, a view that's
| filtering out some rows for some reason (permissions - soft
| deletion) before presenting it to the application logic.
___________________________________________________________________
(page generated 2021-06-16 23:00 UTC)