[HN Gopher] Declarative Schemas for simpler database management
___________________________________________________________________
Declarative Schemas for simpler database management
Author : kiwicopple
Score : 71 points
Date : 2025-04-03 17:02 UTC (1 days ago)
(HTM) web link (supabase.com)
(TXT) w3m dump (supabase.com)
| kiwicopple wrote:
| Combining this with the Postgres Language Server that was
| released this week, you can now execute statements directly
| within your IDE:
|
| https://x.com/psteinroe/status/1907803477939527728
| xyzzy_plugh wrote:
| This is exactly backwards. You should have declarative schemas
| but inferring migrations is crazy. Only pain will follow.
|
| Instead, I am a fan of doing both: either committing the
| resulting schema of a migration, or hand writing it aside the
| migration. Then have tests to ensure the database schema matches
| the expected schema after a migration.
|
| Generating these artifacts is fine, but in TFA's case there is no
| chance I wouldn't inspect and possibly modify the generated
| "diff" migration. It's significantly easier to go the other way:
| write the migration and show me the resulting schema diff.
| skybrian wrote:
| It seems like generating the diffs from the schema's version
| history is equivalent to doing it the opposite way, provided
| that each diff is tested to make sure the database upgrade
| works. Not all diffs will correspond to feasible database
| upgrades, so some patches would have to be rejected.
| pounderstanding wrote:
| Migrations give more control. Alter table
| foo add column bar; Update foo set bar=baz;
| Alter table foo modify column bar NOT NULL;
| skybrian wrote:
| True. Perhaps annotating the schema would help for simple
| things?
| bootsmann wrote:
| Not all migrations are as simple as adding and removing
| columns.
| williamdclt wrote:
| Completely agree.
|
| When writing a migration, the resulting schema is usually much,
| much less important than the characteristics of the migration
| itself. When I review a migration, my first question isn't "is
| this the right schema" but "is this migration going to bring
| downtime". I'd much rather a smooth migration to an incorrect
| schema than having a critical table locked for minutes/hours.
|
| I think that updates of stateful components should be
| imperative (explicit migrations), not declarative (implicit
| migrations). For example I don't think Terraform is great tool
| to manage RDS: it doesn't tell you the consequences of changing
| an attribute (database restart or other downtime-inducing
| stuff), I'd much rather I had to explicitly say how to get from
| state A to state B.
|
| Similarly, I don't think SQL migrations are perfect: they're
| still declarative, you still need implicit knowledge to know if
| a migration will take a lock and what will be the consequences.
| I'd much rather have to code "take explicit lock; alter table
| xxx;".
|
| This tool probably allows editing migrations, but I don't think
| it's a step in the right direction. Maybe it's a step towards
| databases being much better at migrations (so that we can rely
| on never having downtime), but even then I think it'll get
| worse before it gets better
| sweatybridge wrote:
| > it doesn't tell you the consequences of changing an
| attribute (database restart or other downtime-inducing stuff)
|
| Modern diff tools are designed to provide better guardrails
| in these situations. For eg, pg-schema-diff [0] tries to
| generate zero downtime migrations by using lock-free
| migrations and warns you about potentially hazardous
| migrations.
|
| I think it's good direction to bake these best practices into
| the tooling itself, rather than relying purely on the
| experiences of engineers.
|
| [0] https://github.com/stripe/pg-schema-diff
| evanelias wrote:
| This is exactly the key point. Declarative schema
| management is typically _better_ at catching hazardous
| operations, because declarative schema management tools
| inherently _require_ the ability to introspect your desired
| CREATE definitions and also introspect the current database
| state.
|
| Once a tool has those abilities, adding linters and
| destructive-action guardrails is much easier. Especially
| when compared to a traditional migration tool, which often
| has no underlying understanding of the ALTER TABLE
| operations being requested.
| evanelias wrote:
| As a concept, declarative schema management isn't crazy at all.
| Several thousand companies use this approach, including some
| huge names like Google, Meta, and GitHub, but many smaller
| companies too.
|
| When implemented well, with appropriate guardrails and linters,
| it's perfectly safe. And it has many benefits over imperative
| schema migrations, such as a substantially better Git
| versioning flow, and ability to synchronize environments /
| solve schema drift natively.
|
| The only major conceptual downsides are the inability to handle
| row data migrations, and inability to handle renames. These can
| be major hurdles for some smaller companies, but are more
| irrelevant at companies with large DB infrastructure, who
| require special-case handling for those operations anyway. In
| other words, if you have large tables, row data migrations are
| _already_ substantially more complex than running a single
| UPDATE statement, and you can 't do them using a traditional
| imperative migration tool anyway.
|
| > there is no chance I wouldn't inspect and possibly modify the
| generated "diff" migration
|
| Of course, you're generally supposed to do that with these
| tools. Database changes should always be reviewed carefully.
| Same with non-database-related infrastructure-as-code tools.
| kiwicopple wrote:
| > _there is no chance I wouldn 't inspect and possibly modify
| the generated "diff" migration_
|
| > _Of course, you 're generally supposed to do that with
| these tools_
|
| this seems to be the crux of the comments - we'll try to make
| it much clearer than declarative schemas don't _skip_ any
| migrations /review process, they just provide another way of
| generating the migrations (note: you can also generate them
| directly from the database using the CLI)
| nextaccountic wrote:
| Hey, are you using some tool like the unmaintained migra
| https://github.com/djrobstep/migra (perhaps using this tool
| https://github.com/blainehansen/postgres_migrator) or pg-
| schema-diff https://github.com/stripe/pg-schema-diff or
| maybe this other unrelated pg-schema-diff
| https://github.com/zombodb/pg-schema-diff anything like it?
|
| Is it open source?
|
| I mean, I see you say
|
| > We then use a schema diff tool, like migra, to figure out
| the necessary updates to views and functions when
| generating the migration file.
|
| But "like migra" is very nonspecific. I guess you are not
| using migra itself
| layer8 wrote:
| Yes. In particular, migrations expressed as SQL statements are
| strictly more powerful than a diff of two schemas, so there are
| situations where you can't infer the former from the latter
| (but you can always infer the latter from the former).
|
| I also object to the use of "declarative" here. Either we are
| talking about the schema as present in the database, then it's
| neither declarative nor non-declarative, it's just whatever is
| in the database. Or we are talking about a schema _definition_
| , and then I really don't know what a non-declarative schema
| definition would look like, in contrast to what is called
| "declarative" here. Thirdly, the traditional "declarative" SQL
| schema definition is really a series of imperative SQL
| statements, so arguably not declarative.
|
| What they seem to mean is a minimal sequence of statements that
| results in the desired schema, as opposed to a longer history
| of schema-altering statements. However, the minimal version is
| technically still a series of schema-altering statements,
| starting from (presumably) an empty schema.
| evanelias wrote:
| "Declarative" is the correct word, as the input to the tool
| is a desired-state set of CREATE statements, and the tool
| figures out what DDL is actually necessary to make the target
| DB reach that desired state.
|
| In other words, the engineer is declaring "make my schema
| look _like this_ " and the tool makes that happen. That's the
| key definition of a declarative system, consistent with how
| this is described in CS classes.
|
| Meanwhile traditional migration tools are "imperative", as
| the engineer must tell the tool exactly what operations (e.g.
| ALTER) to be run.
| perrygeo wrote:
| > You should have declarative schemas but inferring migrations
| is crazy. Only pain will follow.
|
| Inferring migrations isn't crazy. Automagically applying those
| migrations without review is crazy.
| bartvk wrote:
| So to summarize.
|
| In the old situation, you write CREATE TABLE statement at the
| start of the project. And when you add a feature, you have to
| write an ALTER TABLE script.
|
| In this new situation, you just change the CREATE TABLE script.
| And Supabase uses migra to figure out the difference and it
| automatically alters the table.
|
| What's interesting is that in your SQL code, there's no longer
| any difference between creating a new database, and updating an
| existing database.
| infogulch wrote:
| I've used this for Microsoft SQL Server and SQL Database
| Projects. It's basically as you say: write it as if creating a
| new database, then deploy it in CI where it does a diff on the
| live database to come up with the actual migration strategy on
| the fly. If you're clever you add a manual review stage in the
| pipeline and have the db engineers approve the generated
| migration script before deployment is completed automatically.
| https://learn.microsoft.com/en-us/sql/tools/sql-database-pro...
|
| I helped set this up in a fortune 500 company a few years ago.
| They were using a team of db engineers to execute manually
| written change scripts, with manual reviews, control processes,
| and deployment schedules. You'd be lucky if you got your schema
| change to prod in a month. When they started integrating this
| tool on some trial candidates they found SO many
| inconsistencies between environments: server settings
| differences, extraneous or missing indexes, vestigial "temp"
| tables created during previous migrations, enum tables that
| should be static with extra or missing rows, etc, etc. All the
| environment differences meant that deployments _had_ to be
| manual in the past. Once they got through the initial pain of
| syncing up the environments the whole department got way more
| efficient.
| rjbwork wrote:
| >If you're clever you add a manual review stage in the
| pipeline and have the db engineers approve the generated
| migration script before deployment is completed
| automatically.
|
| This is how I've set it up at my current employer. It works
| well. We modeled it after the Terraform Plan/Apply steps, and
| double check that the script generated by the "apply" step
| matches the script generated by the "plan" step, since these
| can occur at significant temporal distances, and fail it if
| not, just so that we can be sure what we've read and approved
| matches what gets executed.
| kiwicopple wrote:
| > db engineers approve the generated migration script
|
| yeah - this is definitely the intended flow here. We won't
| be recommending anyone blindly applying generated
| migrations.
|
| As you mention, it is expected that you generate & review
| on your local development machine, check into source
| control, push & merge. We've also been using this
| internally for ~2 years now and it works great
| pounderstanding wrote:
| Do you keep the history of applied migrations? (Just in
| case subtle issue need to be investigated later)
| kiwicopple wrote:
| yeah, migrations are generated from the Declarative
| files. For example, the steps are for adding a new
| column:
|
| 1/ Add a new column to the declarative file
|
| 2/ Generate a new migration: `supabase db diff -f
| my_new_migration`
|
| 3/ Review/edit the generated migration, check it into git
|
| 4/ Apply migration to database
| sweatybridge wrote:
| > When they started integrating this tool on some trial
| candidates they found SO many inconsistencies between
| environments: server settings differences, extraneous or
| missing indexes, vestigial "temp" tables created during
| previous migrations, enum tables that should be static with
| extra or missing rows, etc, etc. All the environment
| differences meant that deployments had to be manual in the
| past. Once they got through the initial pain of syncing up
| the environments the whole department got way more efficient.
|
| That was exactly our experience too.
|
| Perhaps we didn't highlight enough in the blog post that
| schema diff was not meant to replace manual review. It simply
| provided a good starting point for us to iterate on the
| migration, which often boosts efficiency.
| pounderstanding wrote:
| > they found SO many inconsistencies between environments
|
| This implies somebody with admin rights makes alterations in
| ad-hoc way without first doing it in test env.
|
| If they continue with adhoc stuff, then it means auto-
| generated migrations will be different in test vs prod. (I
| prefer to test exactly same thing that will be used in prod)
| evanelias wrote:
| > This implies somebody with admin rights makes alterations
| in ad-hoc way without first doing it in test env.
|
| Not necessarily. With a large team/org using the same
| database schema, it can just mean multiple people were
| trying to make changes to an environment around the same
| time, e.g. the migrations were applied in a different order
| in staging vs prod.
|
| Some migration tools provide extra checks for strict
| ordering, but many do not. There's often no guarantee that
| the migration file naming scheme ordering, Git commit
| ordering, and actual DB apply ordering line up -- that's 3
| different possible sources of truth, or more since the DB
| state varies by environment (dev/stage/prod etc).
|
| Late-night hot-fixes (to solve an emergency outage) can be
| another source of inconsistencies / drift.
|
| > If they continue with adhoc stuff, then it means auto
| auto-generated migrations will be different in test vs prod
|
| That depends on the declarative tool and whether it fully
| syncs the schema each time, or just generates migrations
| which are frozen into a plan which is executed as-is in all
| environments. Not that full-sync is bad, but yes in that
| case it will generate different things in each env.
| Although the end result is that it will solve the drift,
| and give you the same _end state_ in all environments. And
| that 's likely what you want to happen: after running the
| tool, the database state will match the desired state which
| was expressed by the CREATE statements in your schema repo.
|
| That said, the declarative tooling should have sufficient
| safety checks to ensure it doesn't do anything destructive
| in prod without very loudly telling you and requiring
| manual confirmation. That way, you won't be harmed when
| trying to synchronize an environment that had unexpected
| out-of-band changes.
| prmph wrote:
| But if you have data to migrate, it is not always possible to
| infer the diff, no?
|
| Say I have a users table with a name column. Then I alter the
| table and split the name column into two new columns: first
| name and last name.
|
| How is it possible to infer this change, just from seeing the
| new and old columns?
| efxhoy wrote:
| The best practice way to swap fullname for firstname,
| lastname would be to: 1. Migration that adds
| firstname and lastname columns will all nulls 2. Deploy
| application code change to start populating firstname and
| lastname alongside fullname, still reading fullname in the
| code. 3. backfill the firstname and lastname values
| with a script/command/migration 4. change app code to
| read firstname and lastname and stop writing fullname
| 5. drop the fullname column
|
| I don't think there's a safe way to do all that in a single
| migration unless all your app code also lives in the database
| so it can be atomically deployed. If you have multiple app
| servers and do rolling deploys with no downtime I think it
| has to be done in these 5 steps.
| MrMcCall wrote:
| 6. ensure there are no nulls in firstname and lastname
| 7. alter the columns to be NOT NULL
|
| Because no non-statistician uses nullable columns, right?
|
| Of course, some dbs (SQLServer?) infer NULL from the empty
| string, or am I misremembering?
|
| Always having the columns be NOT NULL is a fundamental
| cheat, after always having a PK, or is that too old school
| for 2025?
| efxhoy wrote:
| There's nothing wrong with nullable fields when it's
| appropriate. When kids are born they don't have names.
| Not all users want to tell you their names. A null value
| is data too.
| MrMcCall wrote:
| > when it's appropriate
|
| Yes, it just requires extra care when querying and
| handling the rows.
|
| It's always just easier, if you can, to make it NOT NULL
| after prepopulating all rows' columns to the empty string
| (or real data).
|
| Sometimes NULL is truly different than the empty string,
| but that, like you said, is just a kind of data.
| evanelias wrote:
| With large tables, you can't safely make data changes using
| migration tools either anyway. If you run a single UPDATE
| against a large table, you end up with a very long
| transaction with substantial MVCC implications (old row
| versions that the db needs to clean up) and can basically
| break production easily.
|
| Side note, but why do folks always bring up this "firstname
| lastname" example? It is not ever possible to implement that
| correctly in an automated fashion: some people have spaces in
| their first names, and some people have spaces in their last
| names. (I don't mean to single you out, as this exact example
| comes up multiple times in every declarative vs imperative
| discussion!)
| prmph wrote:
| Oh, I am aware of the problems with this scheme for names.
| I just used it as a simple example. I wouldn't design my
| users table this way.
|
| If I can, I even avoid storing actual names of users at
| all. If I have to, I use a simple free form "display_name"
| column; what is put there is up to the user.
| mappu wrote:
| I know it was just an example, but
| https://www.kalzumeus.com/2010/06/17/falsehoods-
| programmers-... definitely stopped me doing separate first
| and last name columns.
| ucarion wrote:
| Sorry if asked and answered: can you hand-edit the generated
| migrations? Like, what if I want to do a create index
| _concurrently_ or something?
| kiwicopple wrote:
| yes, you can edit the migrations. The steps are for adding a
| new column, for example:
|
| 1/ Add a new column to the declarative file
|
| 2/ Generate a new migration: `supabase db diff -f
| my_new_migration`
|
| 3/ Review/edit the generated migration
|
| Docs are here:
|
| https://supabase.com/docs/guides/local-development/declarati...
| geocar wrote:
| You're going to regret this.
|
| The thing you need to be doing is _testing_ your migrations, and
| some dumbass on your team is going to generate the migration
| during CI and load it into your database as a merge step, and you
| won 't realise what a mistake this was until possibly _years_
| later.
|
| The good news, is you _might_ be able to pay someone an enormous
| amount of money to unfuck things. Not good for you, I mean,
| obviously. Good for whoever you just bought a car.
| mylons wrote:
| we at tahoe labs dot io are here for you in that case.
| joshAg wrote:
| We built something similar for the managed DB we use, but i think
| it's a mistake to autogenerate the migration scripts instead of
| autogenerating the schema from the migration. Things like
| changing an enum, adding a nonnull column that shouldn't have a
| default to an existing table that already has data in it, and
| migrating data from one representation to another (eg, 'oh hey,
| we definitely shouldn't have made our users table have an fname
| and an lname field. let's change to full_name and
| preferred_name') are easily done in a migration script but hard,
| if not impossible, to infer from just schema changes.
| neutralino1 wrote:
| It seems to me Rails has been doing this but better for years. It
| definitely keeps atomic and historical migrations, but also
| maintains a schema.sql file that can be loaded as a one-off (e.g.
| for mock DBs in tests).
| kiwicopple wrote:
| > doing this but better
|
| I'm curious what makes it better than the approach in the blog?
|
| If it's this:
|
| > _also maintains a schema.sql file that can be loaded as a
| one-off_
|
| That's exactly what the declarative files provide
| xaas wrote:
| I use ZenStack. In future releases they are moving away from
| supabase...
| wmanley wrote:
| See also:
|
| "Simple declarative schema migration for SQLite"
|
| https://david.rothlis.net/declarative-schema-migration-for-s...
|
| Discussed previously:
|
| https://news.ycombinator.com/item?id=31249823
|
| Disclosure: I am the co author of that article
| mylons wrote:
| am i missing something? what does this offer over raw sql? it
| honestly looks very similar
| kiwicopple wrote:
| It is raw sql. The update here is more of a "workflow" change.
|
| In the past we only offered tools to create a sql migration to
| make a change to your database. Now you can write the state of
| your database into files, then the migration is generated for
| you.
|
| This is very similar to something like Rails/Phoenix/whatever,
| where you write your models and then it generates a change. The
| difference is that here you write your models in raw SQL,
| rather than an ORM
| webster451 wrote:
| I think we are getting close to the peak of "declarative"--or
| rather, I hope we are near the peak.
|
| In my experience, declarative APIs are very powerful abstractions
| for specific cases where finding the path to the declared state
| is better left to a machine. This is seldom the case - in most
| cases, offering the programmer control over the changes leads to
| better behaviors.
|
| Kubernetes and IaC tools lead the way to a declarative state of
| infrastructure and these add a ton of value. But, they were also
| incredibly hard to build - it took many years before Kubernetes
| eventing and control loop abstracts were rock solid. Most CRD-
| backed implementations suffer from tons and tons of bugs, and
| most CRDs are not declarative - they abstract away an imperative
| operation! I guess this is nothing new - "anything in excess is
| bad".
|
| Anyways, I think an imperative approach offers much higher
| control and predictability at a lower cost. The world inherently
| is imperative.
| carlmr wrote:
| >In my experience, declarative APIs are very powerful
| abstractions for specific cases where finding the path to the
| declared state is better left to a machine. This is seldom the
| case
|
| I've been waiting for this top comment for longer than you can
| imagine. The declarative madness has always bothered me.
| Sometimes it's easier to maintain when you see the process. And
| harder to declare the final state. It might look neat, but
| maintainability beats neatness every day.
| Onavo wrote:
| I mean they just reinvented Prisma and Django
| evanelias wrote:
| Yes, although the article isn't claiming to have invented
| declarative schema management. They're just saying it is now
| available as a feature in Supabase. (Personally I think
| that's great!)
|
| Regarding prior art: Django migrations are indeed
| declarative, and were very early in this space. But they're
| tied to Python model definitions in the ORM, which is a bit
| more of a special-case than the native SQL CREATE based
| approach described here.
|
| As for Prisma Migrate, they directly copied several
| innovations from my tool Skeema [1] which has been available
| since 2016, so they can be equally accused of "reinventing"
| things :)
|
| Not that I invented pure-SQL declarative schema management
| either, by any stretch. I was largely inspired by the
| workflow at Facebook, who adopted declarative table
| management company-wide back in ~2012. FB found that having a
| declarative reconciliation loop is an operational necessity
| with a massively sharded system, given some hardware just
| dies every day. And centralized SQL-based schema management
| is advantageous when applications are written in many
| different programming languages.
|
| [1] https://github.com/skeema/skeema
| Onavo wrote:
| Any plans to add Postgres and SQLite support?
| evanelias wrote:
| Skeema will always be specific to MySQL/MariaDB, but I
| hope to eventually release a separate tool which is more
| generic and modular. That's still a long while away
| though.
|
| There are some existing declarative tools for Postgres
| and SQLite though, see
| https://news.ycombinator.com/item?id=43576450
| Guillaume86 wrote:
| Could this be moved into a standalone CLI tool? Is there anything
| supabase specific about it? I've always wanted SSDT SQL projects
| for postgres (SSDT is MS declarative schema management solution
| for SQL Server).
|
| SSDT can also sync db projects (nicely organized DDL .sql files
| representing the schema) and databases (one way or the other),
| with the IDE support you can do stuff like "find all references"
| on a column or any other DB object, and build the project to
| check for errors. Linting the schema becomes possible, etc I have
| a hard time when I have to go back to imperative schema
| management...
| evanelias wrote:
| There are a few stand-alone declarative Postgres tools
| available, including Stripe's pg-schema-diff [1], Tusker [2]
| which wraps Migra [3], and sqldef [4] which supports multiple
| DBMS.
|
| For sake of completeness for users of other databases: for
| SQLite check out stb-tester's migrator [5], and last but not
| least for MySQL/MariaDB there's my tool Skeema [6].
|
| [1] https://github.com/stripe/pg-schema-diff
|
| [2] https://github.com/bikeshedder/tusker
|
| [3] https://github.com/djrobstep/migra
|
| [4] https://github.com/sqldef/sqldef/
|
| [5] https://david.rothlis.net/declarative-schema-migration-
| for-s...
|
| [6] https://github.com/skeema/skeema
| Guillaume86 wrote:
| Thanks for the list, on the commercial side there's also
| Regate Flyway which looks like a full multi-DB SSDT
| alternative.
|
| I wasn't aware of sqldef I will need to take a look.
| Something nice about SSDT is that when we export a DB to a
| SQL project (files), each DB object is neatly organized is
| its own file, and the tooling solve ordering and dependency
| cycles on its own, I'm not sure any OS tool currently do it
| like that (skeema maybe? but no PG).
| evanelias wrote:
| Flyway is primarily an imperative migration tool, rather
| than a declarative-first system. Their paid editions do
| have support for doing diffs ("state-based") and auto-
| generating migrations, but it's bolted-on to a workflow
| that was designed to be imperative-first. From what I can
| find, their public docs are rather lacking on what types of
| database objects and edge cases it supports.
|
| Skeema does indeed separate each DB object to its own file,
| and each logical schema to its own subdirectory. It also
| figures out how to correctly apply changes in a valid
| order, but yeah it's MySQL/MariaDB only. I think some of
| the declarative FOSS Postgres tools do attempt to provide a
| correct ordering, but from memory I'm not certain.
| Guillaume86 wrote:
| Thanks for the additional information!
| oulipo wrote:
| So I guess this is a bit the same as using something like Drizzle
| to define your tables and handle your migrations?
___________________________________________________________________
(page generated 2025-04-04 23:02 UTC)