[HN Gopher] The challenges of supporting foreign key constraints
       ___________________________________________________________________
        
       The challenges of supporting foreign key constraints
        
       Author : drecoe
       Score  : 77 points
       Date   : 2023-12-05 16:12 UTC (6 hours ago)
        
 (HTM) web link (planetscale.com)
 (TXT) w3m dump (planetscale.com)
        
       | shlomi-noach wrote:
       | Post author here, happy to answer technical questions.
        
         | guptamanan100 wrote:
         | I am the other post-author, and I am available too.
        
           | padre24 wrote:
           | This was a great read, thanks.
           | 
           | Are there any plans to support recursive CTEs? What are the
           | technical challenges there?
        
             | guptamanan100 wrote:
             | Thank you for the compliment!
             | 
             | We recently started adding support for CTEs in Vitess! You
             | can check out https://github.com/vitessio/vitess/pull/14321
             | if you want to see some technical details of the
             | implementation.
             | 
             | For now, we have added preliminary support by converting
             | them to derived tables internally, but we believe that we
             | need to make CTEs first-class citizens themselves of query
             | planning, specifically because recursive CTEs are very hard
             | to model as derived tables. Once we make that change, we
             | can look towards supporting recursive CTEs.
             | 
             | This however will take some time, but then, all good things
             | do!
        
               | padre24 wrote:
               | Awesome news! We work with hierarchical data so it was a
               | non starter for us.
        
               | guptamanan100 wrote:
               | Oh, I see, that's unfortunate! Hopefully, we can remedy
               | that though as soon as possible.
        
         | jtriangle wrote:
         | So how do you deal with orphaned child rows when reverting? I
         | assume it's up to your users to deal with them or not? This
         | very much seems like a clever automation for chosing when to
         | care about foreign key constraints and not outright enforcement
        
           | shlomi-noach wrote:
           | Yes, you got that right! If you drop a foreign key constraint
           | from a child table, and then follow up to INSERT/DELETE rows
           | on parent and child in such way that is incompatible with
           | foreign key constraints, and then revert, then the child, now
           | again with the foreign key constraint, can have orphaned
           | rows. It's as if you did `SET FOREIGN_KEY_CHECKS=0` and
           | manipulated the data unobstructed. The schema itself remains
           | valid, and some rows do not comply.
           | 
           | It's worth noting that MySQL has no problem with this kind of
           | situation. It never cares about the existence of orphaned
           | rows. It only cares about not letting you creating them in
           | the first place, and it cares about cleaning up. But it
           | doesn't blow up if orphaned rows do exist. They will just
           | become ghosts.
        
             | saltcured wrote:
             | Coming from a PostgreSQL worldview, I find this confusing.
             | To me a foreign key constraint is about the referential
             | integrity of a table, not an insert-time rule that can have
             | loopholes to leave invalid data in the table. If the
             | constraint is in place, I should be able to trust that any
             | queryable data satisfies the constraint.
             | 
             | Also from my PostgreSQL-infused worldview, it seems to me
             | you are making your life too difficult by requiring a
             | migration to make "one change" to a table or view. The
             | brute force idiom I've seen for schema migrations is to
             | break it into phases:
             | 
             | 1. drop departing foreign key constraints
             | 
             | 2. restructure table columns/types and values
             | 
             | 3. add new foreign key constraints
             | 
             | This is a bit like running with constraints deferred while
             | making data changes that might look invalid until all data
             | changes are done. But, it defers expression of the new
             | constraints until the table structures are in place to
             | support their definitions too, so it isn't just about
             | deferring enforcement.
             | 
             | The same strategy can be used for import scenarios to
             | support schemas where there are circular foreign key
             | reference constraints. I.e. tables are not in a strict
             | parent-child hierarchy.
        
       | obviyus wrote:
       | The Planetscale blog consistently puts out such high quality
       | posts. I can _highly_ recommend following their YouTube channel
       | as well. I've learned a ton from the incredibly well made videos
       | Aaron puts out!
        
         | guptamanan100 wrote:
         | Yep! Aaron is great, isn't he!!? I make it a point to watch all
         | the videos he puts out. We appreciate your support!
        
         | chrisjc wrote:
         | Thanks for the recommendation.
         | 
         | https://www.youtube.com/@PlanetScale
         | 
         | Subscribed.
        
       | leetrout wrote:
       | Every time people would champion vitess / PS as a drop in fix
       | this the caveat that would slow them down or stop them.
       | 
       | This feature truly is a game-changer for adoption IMO.
       | 
       | Congrats on shipping to all involved!
        
         | shlomi-noach wrote:
         | Thank you!
        
         | irq-1 wrote:
         | > At this time, foreign key constraint support is limited to
         | unsharded/single shard databases.
        
       | _a_a_a_ wrote:
       | I've not met planetscale before and didn't understand why you're
       | doing this. From the site:
       | 
       | "Changing a table's schema is one of the most challenging
       | problems in relational databases, and in MySQL in particular. In
       | today's accelerated and rapid development cycles, engineers find
       | that they need to make schema changes sometimes on a daily
       | basis."
       | 
       | Okay, but is this really true? Is it really necessary to have no
       | downtime except in a very few cases? It's honestly rare to have
       | systems that absolutely must stay up all the time, but allowing
       | fiddling with the schema as well just seems excessive.
       | 
       | I'd also say as software dev I'd be very uncomfortable with a
       | system like yours that claims to maintain consistency while
       | running and changing the schema, short of a formal proof.
        
         | rdoherty wrote:
         | Once your schema changes take more than a few minutes, yes.
         | There's a lot of toil and burden if you need to take down your
         | application every time you need a schema change. Announcements,
         | coordination with internal teams and customers and then
         | coordinating with other engineers.
         | 
         | We aren't talking about zero downtime here, but continual,
         | recurring downtime due to schema changes. Once you have beyond
         | a few million rows in a normal RDBMS, schema changes can take
         | minutes to hours depending on the type. Do this a few times per
         | month and you now have 'lots' of downtime and you are blocking
         | other engineering work from happening. It eventually becomes so
         | much of a hassle that engineers don't want to do schema
         | changes, blocking feature work. The more seamless and painless
         | you can make them, the better.
        
           | salawat wrote:
           | Perhaps we shouldn't be collecting and retaining such large
           | datasets that these issues become such a pressing problem?
        
             | paulddraper wrote:
             | Interesting, elaborate.
        
             | klooney wrote:
             | I feel like you're implying that this is caused by personal
             | data collection and tracking, but it's not- you can get
             | there pretty easily, in a small to medium sized app, with
             | just user tables, or users + things configured.
             | 
             | The giant data lakes for vacuuming up tracking data
             | generally never do schema migrations at all.
        
           | _a_a_a_ wrote:
           | I'm going to have to be a bit contrary here.
           | 
           | How often do you expect to make the schema changes? I mean I
           | quoted this bit "...make schema changes sometimes on a daily
           | basis" - is this realistic, or a kind of business insanity
           | typically caused by bad management? Ditto "...but continual,
           | recurring downtime due to schema changes". This really looks
           | like a failure of management rather than a technical problem
           | to be solved.
           | 
           | Also aren't you likely to be doing something larger than just
           | a schema change very often, in which case that would
           | necessitate replacing your application, so changes are not
           | just restricted to the database. You now have a bigger
           | problem of co-ordinating app and DB changes.
           | 
           | I also asked to do you need permanent uptime because in a lot
           | of systems, especially smaller ones (and by the long tail
           | most systems are going to be smallish) the users are very
           | tolerant of an hours' downtime a month, for example.
           | 
           | "Once you have beyond a few million rows in a normal RDBMS,
           | schema changes can take minutes to hours depending on the
           | type"
           | 
           | That's a pretty strong claim; what kind of thing is going to
           | take hours that your database can do consistently? Does it
           | even take hours? I had a 100,000,000 row table of unique ints
           | lying around so I put a foreign key from itself to itself (a
           | bit daft, but just for timing purposes. DB is MS SQL, table
           | is fully hot in memory)
           | 
           | alter table [tmp_ints_clustered] add constraint ffffkkkkk
           | foreign key (x) references [tmp_ints_clustered](x);
           | 
           | 21 seconds.
           | 
           | What you're doing (if you can get it correct! Which I have to
           | wonder at) is doubtless excellent for some very large
           | companies, but in general... I'm afraid I'm not so sure.
           | 
           | Edit: I feel I'm perhaps missing your bigger picture.
        
             | dylan604 wrote:
             | I agree with your push back. Even in DEV, I'm not making
             | daily schema changes. in fact, I hate schema changes and go
             | back and forth on if the change is _really_ necessary.
             | sure, changes do become necessary, but sheesh, daily is a
             | sign to me that something else needs to be looked at in the
             | dev cycle. like, is nobody forward thinking enough to come
             | up with a workable schema. is the requirements truly being
             | made by the seat of the pants. also, are the new schema
             | requests really necessary to existing tables, or can we
             | hang a new table and extend the joins? seems like taking a
             | bit of time to do some forward thinking on the initial
             | schema should keep daily changes from existing
        
         | phkahler wrote:
         | >> In today's accelerated and rapid development cycles,
         | engineers find that they need to make schema changes sometimes
         | on a daily basis.
         | 
         | Surely that's a bad design, and capability to support it is
         | enabling continued bad design.
        
       | bob1029 wrote:
       | Every time I turn on FK constraints I wind up regretting it and
       | ripping them out of the schema. Not one time have I ever ran a
       | non-query, received a constraint violation, and thought to myself
       | "thank god, what a relief".
       | 
       | Any time I am doing something that could violate some logical
       | constraint, I am probably trying to fix something that is already
       | fucked. The safety nannies simply add insult to injury at this
       | stage. I will delete broken rows in whatever order I please.
       | 
       | If constraint violations are adding value to your solution, you
       | probably have a far more severe problem repeatedly slamming into
       | the wall somewhere.
        
         | perrygeo wrote:
         | Strange, I was just reflecting this morning how grateful I am
         | that FK constraints are respected. I got a FK constraint
         | violation and without it I would have introduced a bug in my
         | app. Specifically deleting X was doing to delete many child Ys.
         | 
         | If there are any downstream queries that assume the
         | relationship between X and Y, and you accidentally violated
         | that contract, wouldn't you WANT the database to tell you?
         | Without a FK constraint in place, I would just have to know
         | that I needed to update hundreds of lines of (incidentally)
         | dependent code!
         | 
         | It's roughly analagous to the static vs dynamic typing debate.
         | Do you want to discover FK violations at runtime in the future?
         | Or head them off at insert time? Either way, you cannot sweep
         | referential integrity under the rug... you only shift the
         | burden from one place to another.
         | 
         | So I thank my database daily for it's work enforcing FK
         | constraints. Otherwise I'd have to write that code! You can't
         | ignore the lifecycle of references.
        
       | Exuma wrote:
       | Can someone help me with a suggestion?
       | 
       | Ive been researching databases now for several days straight, the
       | choices are overwhelming but I've pretty much narrowed my use
       | case down to an RDBMS system.
       | 
       | I need to essentially handle 100's of millions of "leads" (and
       | 10s of millions per day) which can make up any number of user
       | fields. over 1B total
       | 
       | I need to resolve duplicate leads either in realtime or near
       | realtime. A duplication can occur across a combination of 1 or
       | more fields, so basically OLTP type operations (select, update,
       | delete on single rows)
       | 
       | I do need to run large OLAP queries as well across all data
       | 
       | I've looked at things like scylla and whatnot but they seem too
       | heavy duty for my volume. it's not like i need to store trillions
       | of messages like discord in some huge event log.
       | 
       | I was considering these 3 options...
       | 
       | 1. planetscale
       | 
       | 2. citus
       | 
       | 3. cockroachdb
       | 
       | I havent really narrowed it down further than this, but i liked
       | the idea of still having RDBMS features without needing to worry
       | about storage and scaling with just sheer write volume.
       | 
       | It seemed i could then do my basic OLTP stuff that i need, and
       | citus had a cool demo how some OLAP query on 1B rows ran in 20s
       | with 10 nodes, and that also fits a reasonable time for queries
       | (BI tools will be used for that)
        
       ___________________________________________________________________
       (page generated 2023-12-05 23:00 UTC)