[HN Gopher] The challenges of supporting foreign key constraints
___________________________________________________________________
The challenges of supporting foreign key constraints
Author : drecoe
Score : 77 points
Date : 2023-12-05 16:12 UTC (6 hours ago)
(HTM) web link (planetscale.com)
(TXT) w3m dump (planetscale.com)
| shlomi-noach wrote:
| Post author here, happy to answer technical questions.
| guptamanan100 wrote:
| I am the other post-author, and I am available too.
| padre24 wrote:
| This was a great read, thanks.
|
| Are there any plans to support recursive CTEs? What are the
| technical challenges there?
| guptamanan100 wrote:
| Thank you for the compliment!
|
| We recently started adding support for CTEs in Vitess! You
| can check out https://github.com/vitessio/vitess/pull/14321
| if you want to see some technical details of the
| implementation.
|
| For now, we have added preliminary support by converting
| them to derived tables internally, but we believe that we
| need to make CTEs first-class citizens themselves of query
| planning, specifically because recursive CTEs are very hard
| to model as derived tables. Once we make that change, we
| can look towards supporting recursive CTEs.
|
| This however will take some time, but then, all good things
| do!
| padre24 wrote:
| Awesome news! We work with hierarchical data so it was a
| non starter for us.
| guptamanan100 wrote:
| Oh, I see, that's unfortunate! Hopefully, we can remedy
| that though as soon as possible.
| jtriangle wrote:
| So how do you deal with orphaned child rows when reverting? I
| assume it's up to your users to deal with them or not? This
| very much seems like a clever automation for chosing when to
| care about foreign key constraints and not outright enforcement
| shlomi-noach wrote:
| Yes, you got that right! If you drop a foreign key constraint
| from a child table, and then follow up to INSERT/DELETE rows
| on parent and child in such way that is incompatible with
| foreign key constraints, and then revert, then the child, now
| again with the foreign key constraint, can have orphaned
| rows. It's as if you did `SET FOREIGN_KEY_CHECKS=0` and
| manipulated the data unobstructed. The schema itself remains
| valid, and some rows do not comply.
|
| It's worth noting that MySQL has no problem with this kind of
| situation. It never cares about the existence of orphaned
| rows. It only cares about not letting you creating them in
| the first place, and it cares about cleaning up. But it
| doesn't blow up if orphaned rows do exist. They will just
| become ghosts.
| saltcured wrote:
| Coming from a PostgreSQL worldview, I find this confusing.
| To me a foreign key constraint is about the referential
| integrity of a table, not an insert-time rule that can have
| loopholes to leave invalid data in the table. If the
| constraint is in place, I should be able to trust that any
| queryable data satisfies the constraint.
|
| Also from my PostgreSQL-infused worldview, it seems to me
| you are making your life too difficult by requiring a
| migration to make "one change" to a table or view. The
| brute force idiom I've seen for schema migrations is to
| break it into phases:
|
| 1. drop departing foreign key constraints
|
| 2. restructure table columns/types and values
|
| 3. add new foreign key constraints
|
| This is a bit like running with constraints deferred while
| making data changes that might look invalid until all data
| changes are done. But, it defers expression of the new
| constraints until the table structures are in place to
| support their definitions too, so it isn't just about
| deferring enforcement.
|
| The same strategy can be used for import scenarios to
| support schemas where there are circular foreign key
| reference constraints. I.e. tables are not in a strict
| parent-child hierarchy.
| obviyus wrote:
| The Planetscale blog consistently puts out such high quality
| posts. I can _highly_ recommend following their YouTube channel
| as well. I've learned a ton from the incredibly well made videos
| Aaron puts out!
| guptamanan100 wrote:
| Yep! Aaron is great, isn't he!!? I make it a point to watch all
| the videos he puts out. We appreciate your support!
| chrisjc wrote:
| Thanks for the recommendation.
|
| https://www.youtube.com/@PlanetScale
|
| Subscribed.
| leetrout wrote:
| Every time people would champion vitess / PS as a drop in fix
| this the caveat that would slow them down or stop them.
|
| This feature truly is a game-changer for adoption IMO.
|
| Congrats on shipping to all involved!
| shlomi-noach wrote:
| Thank you!
| irq-1 wrote:
| > At this time, foreign key constraint support is limited to
| unsharded/single shard databases.
| _a_a_a_ wrote:
| I've not met planetscale before and didn't understand why you're
| doing this. From the site:
|
| "Changing a table's schema is one of the most challenging
| problems in relational databases, and in MySQL in particular. In
| today's accelerated and rapid development cycles, engineers find
| that they need to make schema changes sometimes on a daily
| basis."
|
| Okay, but is this really true? Is it really necessary to have no
| downtime except in a very few cases? It's honestly rare to have
| systems that absolutely must stay up all the time, but allowing
| fiddling with the schema as well just seems excessive.
|
| I'd also say as software dev I'd be very uncomfortable with a
| system like yours that claims to maintain consistency while
| running and changing the schema, short of a formal proof.
| rdoherty wrote:
| Once your schema changes take more than a few minutes, yes.
| There's a lot of toil and burden if you need to take down your
| application every time you need a schema change. Announcements,
| coordination with internal teams and customers and then
| coordinating with other engineers.
|
| We aren't talking about zero downtime here, but continual,
| recurring downtime due to schema changes. Once you have beyond
| a few million rows in a normal RDBMS, schema changes can take
| minutes to hours depending on the type. Do this a few times per
| month and you now have 'lots' of downtime and you are blocking
| other engineering work from happening. It eventually becomes so
| much of a hassle that engineers don't want to do schema
| changes, blocking feature work. The more seamless and painless
| you can make them, the better.
| salawat wrote:
| Perhaps we shouldn't be collecting and retaining such large
| datasets that these issues become such a pressing problem?
| paulddraper wrote:
| Interesting, elaborate.
| klooney wrote:
| I feel like you're implying that this is caused by personal
| data collection and tracking, but it's not- you can get
| there pretty easily, in a small to medium sized app, with
| just user tables, or users + things configured.
|
| The giant data lakes for vacuuming up tracking data
| generally never do schema migrations at all.
| _a_a_a_ wrote:
| I'm going to have to be a bit contrary here.
|
| How often do you expect to make the schema changes? I mean I
| quoted this bit "...make schema changes sometimes on a daily
| basis" - is this realistic, or a kind of business insanity
| typically caused by bad management? Ditto "...but continual,
| recurring downtime due to schema changes". This really looks
| like a failure of management rather than a technical problem
| to be solved.
|
| Also aren't you likely to be doing something larger than just
| a schema change very often, in which case that would
| necessitate replacing your application, so changes are not
| just restricted to the database. You now have a bigger
| problem of co-ordinating app and DB changes.
|
| I also asked to do you need permanent uptime because in a lot
| of systems, especially smaller ones (and by the long tail
| most systems are going to be smallish) the users are very
| tolerant of an hours' downtime a month, for example.
|
| "Once you have beyond a few million rows in a normal RDBMS,
| schema changes can take minutes to hours depending on the
| type"
|
| That's a pretty strong claim; what kind of thing is going to
| take hours that your database can do consistently? Does it
| even take hours? I had a 100,000,000 row table of unique ints
| lying around so I put a foreign key from itself to itself (a
| bit daft, but just for timing purposes. DB is MS SQL, table
| is fully hot in memory)
|
| alter table [tmp_ints_clustered] add constraint ffffkkkkk
| foreign key (x) references [tmp_ints_clustered](x);
|
| 21 seconds.
|
| What you're doing (if you can get it correct! Which I have to
| wonder at) is doubtless excellent for some very large
| companies, but in general... I'm afraid I'm not so sure.
|
| Edit: I feel I'm perhaps missing your bigger picture.
| dylan604 wrote:
| I agree with your push back. Even in DEV, I'm not making
| daily schema changes. in fact, I hate schema changes and go
| back and forth on if the change is _really_ necessary.
| sure, changes do become necessary, but sheesh, daily is a
| sign to me that something else needs to be looked at in the
| dev cycle. like, is nobody forward thinking enough to come
| up with a workable schema. is the requirements truly being
| made by the seat of the pants. also, are the new schema
| requests really necessary to existing tables, or can we
| hang a new table and extend the joins? seems like taking a
| bit of time to do some forward thinking on the initial
| schema should keep daily changes from existing
| phkahler wrote:
| >> In today's accelerated and rapid development cycles,
| engineers find that they need to make schema changes sometimes
| on a daily basis.
|
| Surely that's a bad design, and capability to support it is
| enabling continued bad design.
| bob1029 wrote:
| Every time I turn on FK constraints I wind up regretting it and
| ripping them out of the schema. Not one time have I ever ran a
| non-query, received a constraint violation, and thought to myself
| "thank god, what a relief".
|
| Any time I am doing something that could violate some logical
| constraint, I am probably trying to fix something that is already
| fucked. The safety nannies simply add insult to injury at this
| stage. I will delete broken rows in whatever order I please.
|
| If constraint violations are adding value to your solution, you
| probably have a far more severe problem repeatedly slamming into
| the wall somewhere.
| perrygeo wrote:
| Strange, I was just reflecting this morning how grateful I am
| that FK constraints are respected. I got a FK constraint
| violation and without it I would have introduced a bug in my
| app. Specifically deleting X was doing to delete many child Ys.
|
| If there are any downstream queries that assume the
| relationship between X and Y, and you accidentally violated
| that contract, wouldn't you WANT the database to tell you?
| Without a FK constraint in place, I would just have to know
| that I needed to update hundreds of lines of (incidentally)
| dependent code!
|
| It's roughly analagous to the static vs dynamic typing debate.
| Do you want to discover FK violations at runtime in the future?
| Or head them off at insert time? Either way, you cannot sweep
| referential integrity under the rug... you only shift the
| burden from one place to another.
|
| So I thank my database daily for it's work enforcing FK
| constraints. Otherwise I'd have to write that code! You can't
| ignore the lifecycle of references.
| Exuma wrote:
| Can someone help me with a suggestion?
|
| Ive been researching databases now for several days straight, the
| choices are overwhelming but I've pretty much narrowed my use
| case down to an RDBMS system.
|
| I need to essentially handle 100's of millions of "leads" (and
| 10s of millions per day) which can make up any number of user
| fields. over 1B total
|
| I need to resolve duplicate leads either in realtime or near
| realtime. A duplication can occur across a combination of 1 or
| more fields, so basically OLTP type operations (select, update,
| delete on single rows)
|
| I do need to run large OLAP queries as well across all data
|
| I've looked at things like scylla and whatnot but they seem too
| heavy duty for my volume. it's not like i need to store trillions
| of messages like discord in some huge event log.
|
| I was considering these 3 options...
|
| 1. planetscale
|
| 2. citus
|
| 3. cockroachdb
|
| I havent really narrowed it down further than this, but i liked
| the idea of still having RDBMS features without needing to worry
| about storage and scaling with just sheer write volume.
|
| It seemed i could then do my basic OLTP stuff that i need, and
| citus had a cool demo how some OLAP query on 1B rows ran in 20s
| with 10 nodes, and that also fits a reasonable time for queries
| (BI tools will be used for that)
___________________________________________________________________
(page generated 2023-12-05 23:00 UTC)