[HN Gopher] Do you really need foreign keys?
       ___________________________________________________________________
        
       Do you really need foreign keys?
        
       Author : shayonj
       Score  : 83 points
       Date   : 2023-12-21 17:26 UTC (5 hours ago)
        
 (HTM) web link (www.shayon.dev)
 (TXT) w3m dump (www.shayon.dev)
        
       | KeyBoardG wrote:
       | Unless your project is small and for learning, or won't be around
       | for very long, Yes. You absolutely do or will end up in a
       | technical debt world of hurt. I have worked on systems without
       | them that are 15+ years old and the vast, vast majority of fixes
       | and refactoring were self inflicted wounds like this. We added
       | FK's and indexes where necessary and wound up faster and safer.
        
         | x86_64Ubuntu wrote:
         | That sounds awful. My first instinct would be to see where an
         | FK constraint would fail, and then ask the business what it
         | should be. There is no guarantee that every business case from
         | way back needed an FK, or that anyone in the business knows
         | which FK value should be there. It sounds absolutely miserable.
        
           | eddd-ddde wrote:
           | In my experince, adding proper constraints will not only
           | ensure code can never write invalid data to disk, it will
           | also help you realize when business rules are lacking or
           | failed to consider edge cases.
        
       | cyberninja15 wrote:
       | nice, but I wonder if a query optimizer within the database might
       | be the more optimal solution.
        
       | cube2222 wrote:
       | On that note, has anybody figured out a nice way to combine
       | foreign keys and soft deletion (that is, a deleted_at column)?
       | 
       | Soft deletion is occasionally useful, but losing foreign keys for
       | it is a big pain.
       | 
       | EDIT: got a bunch of responses, thanks! To be clear, the issue I
       | have in mind is e.g. you want to have a foreign key that makes
       | sure the "singular" side of a one-to-many relationship isn't
       | soft-deleted (on delete restrict) as long as it has anything in
       | the "many" side. Both sides using soft-deletion.
        
         | michaelmior wrote:
         | I see at least two possible approaches (each with their
         | tradeoffs, neither great):
         | 
         | 1. For each relevant foreign key, have an additional column
         | without a constraint where the foreign key value can be copied
         | to before deletion. (This requires that the original foreign
         | key be nullable.)
         | 
         | 2. Make soft deletion universal so that foreign keys can remain
         | and any associated rows will still exist.
        
         | stop50 wrote:
         | How are foreign keys and deleted_at mutual exclusive?
        
           | dspillett wrote:
           | I assumed cube2222 meant where soft-delete wasn't universal:
           | so you might really delete the parent row but want to soft-
           | delete the child rows. Though I would suggest this is a bad
           | idea: if you want to soft-delete child rows you should enable
           | it on parent rows too.
        
           | cube2222 wrote:
           | E.g. you want to have a foreign key that makes sure the
           | "singular" side of a one-to-many relationship isn't soft-
           | deleted as long as it has anything in the "many" side. Both
           | sides using soft-deletion.
        
         | PeterisP wrote:
         | Just brainstorming - perhaps one way to do that would be a
         | multi-column FK towards the row ID combined with an is_deleted
         | column, which would mean that the FK constraint also enforces
         | updating the related rows if something is deleted.
        
         | arrowleaf wrote:
         | By soft deletion, do you mean you would want to ever re-instate
         | the record and make it valid again, PK/FK constraints and all?
         | Or tracking deleted records? For the latter I use audit tables
         | + triggers to track the changing values.
         | 
         | At a basic level you could duplicate the entire record into the
         | audit table on every action, e.g. the audit table would look
         | like `audit_id | record_id | user/process_id | action (insert,
         | update, delete) | timestamp | ...<record rows>`.
         | 
         | You can optimize it to not duplicating column values unless
         | necessary. On inserts you only need the metadata of the action.
         | On updates, the old value of columns with changes goes into the
         | audit table. On deletes the whole record goes into the audit
         | table.
        
       | zer00eyz wrote:
       | Ahh the old days of LAMP when the M was mysql and foreign keys
       | were just a dream...
       | 
       | When you reduce complexity and take off the safeguards things get
       | faster! Cock that foot gun and hope that it doesn't go off!
       | 
       | Can you do what the author suggests. You sure can and we did it
       | for a long time with MYSQL. Should you? It depends on your team,
       | how in tune they are with working with databases, sql etc...
        
         | wombatpm wrote:
         | If you are willing to forego 3rd normal form and just duplicate
         | data like mad, and have a million column like a giant excel
         | sheet, then you really don't need foreign keys.
        
           | doix wrote:
           | If you duplicate data like mad, how do you ensure that the
           | duplicates are in sync? If you avoid UPDATEs I could maybe
           | see it working.
        
           | zer00eyz wrote:
           | MyISAM I think was the table type that did NOT support FK
           | constraints and was blazing fast. InnoDB had FK constraints.
           | 
           | You would fully normalize, use proper joins etc. But without
           | FK constraints you could end up with orphan data... Not the
           | worst thing in the world, depending on how the joins in your
           | system were structured.
           | 
           | Care and diligence were the order of the day. You had to know
           | your schema and make sure you were doing the RIGHT thing at
           | all points in the stack.
        
         | Clubber wrote:
         | There was a funny NoSql video that said something to the effect
         | of, "if you are willing to sacrifice everything for speed, just
         | pipe all your data to /dev/null."
         | 
         | Found it: https://www.youtube.com/watch?v=b2F-DItXtZs
        
       | norir wrote:
       | I once worked somewhere that used rails in lieu of foreign keys.
       | The result was a brittle nightly delete_orphaned records script
       | as well as obscure user visible bugs. My team started adding
       | foreign keys to our records and unsurprisingly caught bugs in our
       | application code that otherwise would have been missed.
       | Personally, I think the default should always be to usr foreign
       | keys and only if you have a genuine scalability problem might you
       | consider dropping them or put then in a different database with
       | lighter data integrity guarantees.
        
         | shayonj wrote:
         | I agree and makes sense. Starting out with them and then
         | challenging the setup is a good practice. I like the idea of
         | moving to a lighter data integrity setup too (ofc when
         | possible)
        
         | seadan83 wrote:
         | This anecdote is not very surprising to me. Foreign keys are on
         | a scale of DB constraints. I opt to add a maximum of
         | constraints where I can. Adding a date that should be in the
         | future - I'm adding a constraint! Did you mess up your time
         | zone conversions & are trying to add a date in the past - the
         | constraint catches it. At some point the constraints are strong
         | enough where with high confidence the following statement can
         | be said: "if the data exists in the database at all, it is
         | correct & complete." This is the "database as fortress"
         | principle, that only correct data should ever be allowed into
         | the database to begin with.
        
         | enobrev wrote:
         | I worked on a couple projects in the early aughts like this
         | (php / mysql). Even did it myself for a project and fortunately
         | learned my lesson early enough to fix that very project and
         | rely in the database to handle data integrity.
         | 
         | Always important to remember that while learning from others is
         | important; They're just as human as you are.
        
       | eska wrote:
       | I wonder whether there's a measurable performance difference if
       | instead of calling delete two times he just used the second
       | delete with "on delete cascade"?
        
       | endisneigh wrote:
       | You don't need foreign keys, no. But you do need referential
       | integrity and foreign keys implemented and enforced by the
       | database is usually the easiest.
       | 
       | If you're disciplined it's not difficult to implement yourself in
       | the application, but the challenge is if folks can access the
       | database directly. If so, good luck, since it's inevitable that
       | they end up making modifications and break integrity.
       | 
       | If one insists on not having foreign keys (and there are decent
       | reasons to not have them), I would really really suggest not
       | allowing direct access to the database and enforce whatever
       | application be the main or only client to the database.
        
         | shayonj wrote:
         | That is great and I noticed it's something I have always
         | assumed should be the case - no indirect access to the database
         | outside your application / ORM. Because then you open a can of
         | worms.
        
           | out_of_protocol wrote:
           | Doesn't help much though. It only takes not wrapping related
           | inserts into transaction in one place in your code. i know
           | zero frameworks/ORMs that can help you with that
        
             | tacone wrote:
             | I usually wrap everything (including reads) in a
             | transaction at middleware level. Not perfect, has its
             | downsides but for many projects it's perfectly ok.
             | 
             | Of course it's not a real substitute for foreign keys, but
             | definitely better than nothing.
        
         | wharvle wrote:
         | Foreign keys are wholly optional if you're the kind of person
         | who thinks it'd be cool to add "manually repair millions of
         | rows" and/or "lose half a day of data; cry" to the todo list
         | for a 3AM emergency deployment rollback.
        
         | nonethewiser wrote:
         | What would be the advantage of implementing it in the
         | application?
         | 
         | I only see disadvantages:
         | 
         | - no data integrity check in the database
         | 
         | - more complicated definition of foreign relation, or none at
         | all, leaving possibility for deviant data
         | 
         | - scatteted schema. Cant look at the db table and understand
         | the entire model. Have to hunt in code an potentially across a
         | lot of code.
         | 
         | It just seems like a data integrity issue that needs to be
         | enforced at the db level.
        
           | randomdata wrote:
           | _> What would be the advantage of implementing it in the
           | application?_
           | 
           | It is always implemented in the application. The question is
           | about whether you do it in your own application or use
           | someone else's application.
           | 
           | The disadvantage of doing it in your application is that you
           | have to do the work that someone else has probably already
           | done, and are likely not as great of a programmer as that
           | other person, thus more likely to screw it up.
        
           | endisneigh wrote:
           | It's faster, you can do online schema migrations, but it's
           | hard to say unless a specific database is used as an example.
        
           | cogman10 wrote:
           | The main advantage is it's WAY faster if you are writing a
           | lot.
           | 
           | The big disadvantage of foreign keys is they do verify
           | integrity. That means making lookups on every write/update
           | which can be very costly especially as the model becomes more
           | complex.
           | 
           | If you have a read heavy application with low levels of
           | writes then by all means put in foreign keys to you hearts
           | content. But if you are at a point where you have millions or
           | billions of records in multiple tables, they simply aren't
           | feasible.
        
             | mavelikara wrote:
             | But you pay the cost in checking it in the application, as
             | GP said. If so, it simply is moving the cost from db to
             | application layer. Is there a reason the checks can be
             | implemented more efficiently in the application than the DB
             | can?
        
               | endisneigh wrote:
               | The reason is because the application knows what's
               | actually happening with the data.
        
               | LtWorf wrote:
               | It's supposed to know, but it actually doesn't.
               | 
               | At my first job they used mongodb for no reason other
               | than it was the fad at the moment, with the big data and
               | so on.
               | 
               | There were often crashes in the application because our
               | records followed several different schemas, because of
               | bugs in the application that were later fixed, behaviour
               | that got changed, ORM got replaced with hand written code
               | that was much faster...
               | 
               | Basically, what happens with over confident developers
               | and CTO that think they are very good developers and
               | aren't.
        
               | endisneigh wrote:
               | There are plenty of counter examples. It's simply a fact
               | that your application can be faster without foreign keys.
        
               | majkinetor wrote:
               | Yes there is although way more complex.
               | 
               | U could use caching so not to hit db often and also app
               | scales way easier than db.
        
               | butlerm wrote:
               | Referential integrity problems usually happen due to
               | missing deletes, improper deletes, or references that
               | should be cleared.
               | 
               | The overhead of checking for the existence of referred to
               | records in ordinary inserts and updates in application
               | code is unnecessary in most cases, and that is where the
               | problem is. Either you have to check to have any idea
               | what is going on, because your key values are being
               | supplied from an outside source or you should be able
               | write your application so that it does not insert random
               | references into your database.
               | 
               | If you actually need to delete a row that might be
               | referred to, the best thing to do is not to do that,
               | because you will need application level checks to make
               | the reason why you cannot delete something visible in any
               | case. 'Delete failed because the record is referred to
               | somewhere' is usually an inadequate explanation. The
               | application should probably check so that delete isn't
               | even presented as an option in cases like that.
        
               | setr wrote:
               | > If you actually need to delete a row that might be
               | referred to, the best thing to do is not to do that,
               | because you will need application level checks to make
               | the reason why you cannot delete something visible in any
               | case. 'Delete failed because the record is referred to
               | somewhere' is usually an inadequate explanation. The
               | application should probably check so that delete isn't
               | even presented as an option in cases like that.
               | 
               | I feel like this belongs to the same strategy as
               | duplicating form-validation on frontend/backend. The
               | frontend validations can't be trusted (they can be
               | skipped over with e.g. curl POST), so backend validation
               | _must_ be done. But you choose duplicate it to the
               | frontend for user-convenience  / better reporting /
               | faster feedback loop. The backend remains the source of
               | truth on validations.
               | 
               | The same between database and application; the database
               | is much more likely to be correct when enforcing basic
               | data constraints and referential integrity. The
               | application can do it, its just a lot more awkward
               | because they're also juggling other things and have a
               | higher-level view of the data (and the only real way to
               | check you didn't screw up is to make your testcase do
               | exactly the same thing... but be correct about it -- no
               | one else is going to tell you your dataset got fucked.
               | Also true in an RDBMS, but it's trivial to verify by eye,
               | and there's only one place to check per relationship).
               | Thus in my world-view, the database _must_ validate, and
               | the application can choose to duplicate validation for
               | user-convenience  / better reporting. The database
               | remains the source of truth on validations. As an
               | optimization, you remove the database validations, but at
               | your own risk.
               | 
               | And then in a multi-app, single db world, then you really
               | can't trust the application (validations can be skipped),
               | so even that optimization is likely illegal. Or you do
               | many-apps *-> single-api -> db, and maintain the
               | optimization at the cost of pretty much completely
               | dropping the flexibility of having an RDBMS in the first
               | place
        
             | dkjaudyeqooe wrote:
             | This is your typical false economy trading a very small
             | performance gain for strongly degraded data integrity.
             | 
             | If the indexes are too much to ask you're basically saying
             | that the referred to foreign records are never looked up
             | (they have no key index!) and/or that the foreign records
             | are never gathered up for referring table. If that's the
             | case the problem isn't having superfluous indexes it's that
             | you have superfluous data in your database.
        
               | cogman10 wrote:
               | > a very small performance gain for strongly degraded
               | data integrity.
               | 
               | It's not very small. Doing a lookup on every write can be
               | hugely detrimental to performance especially with large
               | numbers of writes.
               | 
               | > If the indexes are too much to ask you're basically
               | saying that the referred to foreign records are never
               | looked up (they have no key index!)
               | 
               | That does not follow. The lookup for foreign records is
               | not free so avoiding doing it when you don't need to will
               | gain faster performance vs doing it all the time. Indexes
               | make lookups faster, they don't make them free.
               | 
               | Further, you have to consider the impact of locks on such
               | a system. Writes to a table that references another will
               | lock the contents of the second table while the write is
               | in flight. So if I wanted to update the foreign record in
               | any way, that task now gets blocked until data integrity
               | check finishes.
               | 
               | In MSSQL, that lock is held until the end of the
               | transaction.
        
             | ndriscoll wrote:
             | Batch processing is also WAY faster (like 1-2 orders of
             | magnitude), but for whatever reason people don't talk about
             | it much (cynically, I guess because it's too
             | straightforward and old-hat). As far as I can tell, Rails
             | doesn't have multithreading/async, so the OP is presumably
             | not batching requests. That seems like a much better place
             | to start than giving up referential integrity.
        
               | cogman10 wrote:
               | We are batch processing. I don't know why you assume we
               | aren't. FK constraints are simply too much burden for us.
               | I'm not the OP, however, my company has experienced the
               | performance headaches of FKs.
        
               | ndriscoll wrote:
               | I didn't assume anything about you; I said OP uses a
               | framework that (as far as I know) makes it difficult to
               | batch requests, so they're probably not batching
               | requests. And they mention doing a `before_create` hook,
               | which again suggests they're operating on individual
               | models. If you're already batching and still running into
               | a wall, then yeah you might need to remove constraint
               | validation or shard. But batching alone can probably take
               | most people far past the scale they'll ever see.
        
         | cosmotic wrote:
         | Using an intermediate client would go against many of the
         | claimed performance benefits of not using the FK. One would
         | also likely lose the ACI from ACID. Replicating those features
         | in the intermediate client sure is risky; it would require
         | significant expertise. I'd imagine the implementation would
         | leverage transactions, which would be 'round-trip' to the
         | database server and thus the lock contention would still exist
         | and also be substantially worse (due to the round-trip
         | latency).
        
         | dkjaudyeqooe wrote:
         | It's basically like saying you're ok with dangling pointers or
         | orphaned records, or that you're awesome enough not to need
         | these "slow", annoying safety features.
         | 
         | That's all fine, but we know how this ends up in reality.
         | 
         | Checking that child records that can exist only if there is a
         | parent record are deleted when the parent is deleted and that
         | the foreign key you're using in a record actually exists in the
         | pointed-to table, for the price of a single index you very
         | likely need anyway, is hardly a problem.
         | 
         | You don't need to cascade deletes or anything, that's a red
         | herring, as is migration issues where you can turn off
         | referential integrity while the process is performed.
        
         | bjhartin wrote:
         | > but the challenge is if folks can access the database
         | directly
         | 
         | This is not the challenge. The challenge is that you think you
         | can do an 'almost-as-good' job of data integrity as the RDBMS
         | designers.
         | 
         | Even if you could, you will not always be the person that
         | maintains that code.
        
       | rpgmaker wrote:
       | 50 years onward and every innovation seems to break, rather than
       | improve, the tried and true relational database model.
        
         | arrowleaf wrote:
         | Implementations with skin in the game are still innovating and
         | improving the 'tried and true' relational database model, IMO.
         | PostgreSQL, MS SQL, MySQL, etc. consistently churn out new
         | features that actually make sense, like FK constraint support
         | for sharded/partitioned tables relevant to this discussion.
        
       | lrobinovitch wrote:
       | This github issue is often linked when this topic is discussed:
       | https://github.com/github/gh-ost/issues/331
       | 
       | > Personally, it took me quite a few years to make up my mind
       | about whether foreign keys are good or evil, and for the past 3
       | years I'm in the unchanging strong opinion that foreign keys
       | should not be used. Main reasons are:
       | 
       | > * FKs are in your way to shard your database. Your app is
       | accustomed to rely on FK to maintain integrity, instead of doing
       | it on its own. It may even rely on FK to cascade deletes
       | (shudder). When eventually you want to shard or extract data out,
       | you need to change & test the app to an unknown extent.
       | 
       | > * FKs are a performance impact. The fact they require indexes
       | is likely fine, since those indexes are needed anyhow. But the
       | lookup made for each insert/delete is an overhead.
       | 
       | > * FKs don't work well with online schema migrations.
        
         | shayonj wrote:
         | ah! thats a blast from the past. I maintain pg-osc (online
         | schema change tool for postgres) and very much agree that FKs
         | make OSC hard.
        
         | iLoveOncall wrote:
         | > FKs are a performance impact. The fact they require indexes
         | is likely fine, since those indexes are needed anyhow. But the
         | lookup made for each insert/delete is an overhead.
         | 
         | This is not a valid argument at all and I'm concerned anyone
         | would think it is.
         | 
         | If you have a foreign key, it means you have a dependency that
         | needs to be updated or deleted. If that's the case, you will
         | have an overhead anyway, the only question being whether it's
         | at the DB level or at the application level.
         | 
         | I don't think there are many cases where there's any advantage
         | to self-manage them at the application level.
         | 
         | > FKs don't work well with online schema migrations
         | 
         | This seems to be related only to the specific project that the
         | issue is about if you read about the detailed explanation
         | below.
        
           | randomdata wrote:
           | _> the only question being whether it 's at the DB level or
           | at the application level._
           | 
           | It is not a binary situation like that. With the rise of
           | 'n-tier' systems that are ever so popular today, there are
           | often multiple DB levels. The question is not so much if it
           | should go into the end user application - pretty much
           | everyone will say definitely not there - but at which DB
           | level it should it go in. That is less clear, and where you
           | will get mixed responses.
        
         | brlewis wrote:
         | Note that this was written in 2016 in the context of a mysql-
         | centric project. You will not find an "unchanging strong
         | opinion that foreign keys should not be used" outside that
         | context.
         | 
         | I haven't kept up with mysql enough to know if there are still
         | good reasons to avoid foreign keys. I just stick with
         | postgresql.
        
           | richardwhiuk wrote:
           | sharding is still a big problem for foreign keys
        
         | grep_it wrote:
         | The responder to that issue has also written some blog posts
         | that go into more detail on the subject.
         | 
         | * https://code.openark.org/blog/mysql/things-that-dont-work-
         | we...
         | 
         | * https://code.openark.org/blog/mysql/the-problem-with-
         | mysql-f...
        
       | _heimdall wrote:
       | Foreign keys allow for pushing a really vital piece of business
       | logic down to the database itself, referential integrity.
       | 
       | This can be done in application logic, but that risks bugs
       | allowing broken references into your data. Its more foolproof
       | when these checks are enforced directly at the db, making sure
       | data is valid before it's stored or updated.
        
         | kevincox wrote:
         | No to mention that most SQL databases have complex consistency
         | rules. Just checking that the target row exists (or not) before
         | adding (or deleting) a record may not result in a correct
         | database state in the face of concurrent transactions.
         | 
         | You better very carefully read your database's concurrency
         | guarantees and ensure that all transactions are running with
         | the right consistency level otherwise you will have a bad day.
         | 
         | On top of that the JOIN method of following foreign keys often
         | leads to missing rows if expected keys don't exist which can be
         | very hard to debug.
        
         | PartiallyTyped wrote:
         | There was an interesting topic on r/experienceddevs [1], where
         | the dev team was arguing with some DBAs on adding another
         | column. The DBA insisted on using FKs over enum / application
         | level logic, etc. The comments there present some excellent
         | arguments on top of everything you said.
         | 
         | [1]
         | https://www.reddit.com/r/ExperiencedDevs/comments/18ldexi/wr...
        
       | gokhan wrote:
       | Title should be "Do you really need Foreign Key constraints?".
       | Foreign Key is the field itself, will still be there without the
       | constraint.
        
         | shayonj wrote:
         | Great call!
        
         | LtWorf wrote:
         | And the body should really be "Yes. You do."
        
         | _a_a_a_ wrote:
         | True, a foreign key by itself is just a business requirement. A
         | foreign key constraint is what the database does to enforce
         | that. There's rarely any confusion about the two concepts,
         | though.
        
       | goostavos wrote:
       | Follow this advice with caution. Dropping foreign keys is
       | effectively giving up part of the C in ACID. It should be done
       | with very, very open eyes to the downsides. I'm not sure the
       | author is selling the "when" side of this very well. Migrations
       | are "hard" because the database is forcing you to handle
       | correctness criteria that are easy to ignore. "Lock contention"
       | is the database covering your sloppy ill-thought out application
       | code from doing the Wrong Thing. Yes, of course, you take a minor
       | performance hit in exchange for consistency, but how many of us
       | actually work on applications where that trade off is the wrong
       | one? Developers are frustratingly superstitious about scale and
       | their need to do it.
       | 
       | The data will very likely outlive the application that created
       | it. It's almost guaranteed to outlive your tenure on the team.
       | Viciously guarding its correctness solves all the problems that
       | _not_ guarding it causes.
        
         | erik_seaberg wrote:
         | Whenever anyone talks about "the application," I immediately
         | ask "what about all the other applications?" I promise you that
         | you will find customer service and accounting and biz dev have
         | also built stuff that uses the database to get their jobs done
         | (probably not with the same ORM or even the same language)
         | unless you have taken draconian measures to prevent them.
        
           | goostavos wrote:
           | I would argue that if you have completely different contexts
           | / business lines / concerns (customer service, accounting,
           | biz dev) all directly accessing the same database you have
           | far, far larger architectural concerns that FK will ever hope
           | to address.
        
             | eddd-ddde wrote:
             | I strongly believe the opposite. Your business IS your
             | database. How do you handle access to the data without
             | access to the data?
             | 
             | It is inevitable that 2 "organizations" will at some point
             | need access to the same data. You could just have each of
             | them decide how they interact with it. It doesn't matter
             | how they do it because the database itself makes sure
             | invariants are kept true (such as FKs).
             | 
             | What's the alternative?
        
               | arrowleaf wrote:
               | Have a single application clearly owning the data. All
               | access to the data is done through that application. Why
               | would you _want_ multiple applications deciding how to
               | handle data in a common store? What about access
               | authorization to sensitive data?
        
               | eddd-ddde wrote:
               | Because that single application already exists, in my
               | case its postgres.
               | 
               | Pgweb is one of my interfaces, admin dashboard for free,
               | and I am sure the changes I make there are as valid as
               | changes through any other interface.
               | 
               | I can implement a website as a SSR app that talks
               | directly to the db. Maybe tomorrow I decide I need to
               | work on a web scraper that will use python, instead of
               | adding more API endpoints to allow the scraper to talk to
               | the database, I just talk to the database...
               | 
               | The database is that "single appplication", no need to
               | write custom endpoints for every operation when SQL is
               | good enough.
               | 
               | It's all a spectrum of course, today you can even go as
               | far as to use something like pg_graphql and you don't
               | even need to write a REST api yourself.
               | 
               |  _Edit_ : I forgot to answer the last section, but
               | postgres can totally handle autorization with RLS for
               | instance, allowing users to only see their own data, or
               | maybe data marked as public, etc.
        
               | DasIch wrote:
               | If you operate at the scale where you consider the
               | performance implications of foreign keys, you probably
               | are not ok with anyone accessing the database or running
               | just any query against it.
               | 
               | It is not realistic that you can trust everyone who needs
               | to access the data with access to the database as they
               | might easily cause problems with poorly written queries.
               | 
               | Additionally you may want invariants maintained, that the
               | database cannot maintain but an application in front of
               | it can.
               | 
               | Also for historical data or analytical queries postgres
               | is not ideal either, so you probably want to move the
               | data into some OLAP database or datalake.
               | 
               | > I can implement a website as a SSR app that talks
               | directly to the db. Maybe tomorrow I decide I need to
               | work on a web scraper that will use python, instead of
               | adding more API endpoints to allow the scraper to talk to
               | the database, I just talk to the database...
               | 
               | If the website and the scraper are just parts of the same
               | application, it makes sense to do this but if they are
               | genuinely different applications. I would use different
               | databases here.
        
               | Akronymus wrote:
               | > If you operate at the scale where you consider the
               | performance implications of foreign keys, you probably
               | are not ok with anyone accessing the database or running
               | just any query against it.
               | 
               | I used to work on an application where ALL database
               | accesses were via stored procedures. Genuinly the best
               | dev experience, to me, so far.
        
               | JackFr wrote:
               | > I used to work on an application where ALL database
               | accesses were via stored procedures. Genuinly the best
               | dev experience, to me, so far.
               | 
               | I found such an environment to be simply terrible.
               | 
               | In general, I think stored procs certainly have their
               | place, but if ALL access is through stored procs, you
               | better have a schema that's basically set in stone
               | otherwise dev will turn into a nightmare.
        
               | jrumbut wrote:
               | > Because that single application already exists, in my
               | case its postgres.
               | 
               | This is exactly what postgres was designed for! Tens of
               | thousands of hours of work over decades to solve the
               | problem of relational database management. That's why we
               | call it an RDBMS!
               | 
               | Which isn't to say you shouldn't make your own API ever.
               | There are a lot of situations where you don't want things
               | to connect directly to postgres.
               | 
               | But you shouldn't be afraid of having multiple systems
               | connect to postgres. It has incredibly mature and robust
               | features to accommodate that use case. It's the expected
               | use case.
        
             | butlerm wrote:
             | Large ERP systems do that sort of thing as a matter of
             | course and have for decades now. It does require careful
             | planning and design. I mean AR / AP / scheduling /
             | manufacturing / inventory and so on.
             | 
             | The main downside of splitting everything into isolated
             | databases is that it makes it approximately impossible to
             | generate reports that require joining across databases. Not
             | without writing new and relatively complex application code
             | to do what used to require a simple SQL query to accomplish
             | anyway.
             | 
             | Of course if you have the sort of business with scalability
             | problems that require abandoning or restructuring your
             | database on a regular basis, then placing that kind of data
             | in a shared database is probably not such a great idea.
             | 
             | It should also be said that common web APIs as a
             | programming technique are much harder to use and implement
             | reliably due to the data marshalling and extra error
             | handling code required than just about any system of
             | queries or stored procedures against a conventional
             | database. The need to page is perverse, for example.
             | 
             | That does not mean that sort of tight coupling is
             | appropriate in many cases, but it is (typically) much
             | easier to implement. Web APIs could use standard support
             | for two phase commit and internally paged queries that
             | preserve some semblance of consistency. The problem is that
             | stateless architecture makes that sort of thing virtually
             | impossible. Who knows which rows will disappear when you
             | query for page two because the positions of all of your
             | records have just shifted? Or which parts of a distributed
             | transaction will still be there if anything goes wrong?
        
             | winnebago wrote:
             | Different business units often want aggregate, filter, join
             | and transform data in different ways - sometimes in ways
             | that are hard to anticipate.
             | 
             | In general, the closer to the persistence layer you can
             | perform those transformations, the better they will scale.
             | If you pull the transform into the app layer, you need to
             | move and serialize more data. If you pull the
             | transformation into a constellation of apps, you need to
             | move and serialize a constellation of data.
             | 
             | (edit: formatting)
        
           | arrowleaf wrote:
           | My assumption (with modern applications!) is that nothing but
           | the role directly owning the data will access the data. The
           | development and DBA teams will likely have a role they can
           | assume after performing a carefully-audited breakglass
           | procedure to use in an emergency (rare) or to fulfill audit
           | tasks. At least in my org this is a well-known problem with
           | legacy applications sharing databases. Limit access to the
           | database to a single role, used by a single application, and
           | you absolve so many issues.
        
             | jbmsf wrote:
             | 100%.
        
             | chasil wrote:
             | When your data has outlived several generations of front-
             | end technologies (mine started with Powerbuilder), you will
             | find that ownership of the data does not control the
             | evolution of how it will be used with what replaces past
             | clients.
             | 
             | One particularly interesting Oracle problem is:
             | ORA-00060: deadlock detected while waiting for resource
             | 
             | Tom Kyte's book, _Expert One-on-One Oracle_ , describes the
             | primary culprit:
             | 
             | "Oracle considers deadlocks to be so rare, so unusual, that
             | it creates a trace file on the server each and every time
             | one does occur... The number one cause of deadlocks in the
             | Oracle database, in my experience, is un-indexed foreign
             | keys."
             | 
             | For another perspective, add to this a default setting in
             | every SQLite database:                 $ sqlite3 verynew.db
             | SQLite version 3.34.1 2021-01-20 14:10:07       Enter
             | ".help" for usage hints.       sqlite> .dump       PRAGMA
             | foreign_keys=OFF;       BEGIN TRANSACTION;       COMMIT;
             | 
             | Foreign keys can cause interesting problems, and SQLite
             | specifically prefers to avoid them.
        
             | pixl97 wrote:
             | >Limit access to the database to a single role, used by a
             | single application, and you absolve so many issues.
             | 
             | This kinda sounds like "Get rid of 90% of the usefulness of
             | having a database"
             | 
             | Of course, maybe you mean make the same data the DB has
             | available via API, or make other users of the data read
             | only.
        
           | mavelikara wrote:
           | +1. When I started in the industry, it was common for more
           | experienced developers to drill the "data outlives the
           | application that generated it" principle into you. Somewhere
           | in the transition to NoSQL and back we lost this.
        
             | Pet_Ant wrote:
             | The _data_ yes, the _database_ no. The data will get
             | migrated to different solutions at different times. I've
             | lost track of how often I've been porting MySQL to SQL
             | Server, SQL Server to Postgres, Postgres to Mongo, Mongo to
             | cloud etc.
             | 
             | EDIT: fixed capitalization of "PostGres"
        
               | winnebago wrote:
               | This really depends on the size of your data set. If you
               | have a large database (TBs+) it's likely to be very long-
               | lived due the effort/hardware resources a migration
               | requires - especially if you want to improve the schema
               | when migrating.
        
               | blowski wrote:
               | I've seen a database migration project outlive
               | application rewrites, multiple changes in CTO, and the
               | company being acquired.
        
               | jhardy54 wrote:
               | Friendly suggestion: the conventional capitalization is
               | either Postgres or PostgreSQL.
        
             | SkyPuncher wrote:
             | Context is key.
             | 
             | If you have an established business or startup, data will
             | outlive the application. However, you need a product that
             | lives long enough for either data or application to matter.
             | 
             | In the startup world, that means making decisions that help
             | you ship now, at the expense of debt/costs down the road.
        
             | da_chicken wrote:
             | It wasn't the transition to NoSQL. It was the transition to
             | _web APIs_. Now the application that can access the
             | database has a universal shim in place that deals with the
             | messy business of taking requests and applying them to the
             | data store.
             | 
             | Now there's typically three applications that need DB
             | access:
             | 
             | 1. The application it was built for.
             | 
             | 2. The read-only reporting and visualization tools.
             | 
             | 3. The web API.
        
           | Pet_Ant wrote:
           | > probably not with the same ORM or even the same language
           | 
           | I mean, they shouldn't? Like you've just identified a bug:
           | another application can access your database. If another
           | department needs your data, they should request an endpoint
           | that you control. You should be using an "application
           | database"[1] not an "integration database"[2].
           | 
           | [1] https://martinfowler.com/bliki/ApplicationDatabase.html
           | [2] https://martinfowler.com/bliki/IntegrationDatabase.html
        
             | epicfeedback wrote:
             | They are you. You is they. It's one company with one goal
             | -- keep companying.
             | 
             | The idea that ever "department" should access every other
             | department's data through some bespoke interface that the
             | latter department _maintains_ might work at some corporate
             | behemoth, but at almost all other scales is absurd.
        
               | ch4s3 wrote:
               | This would be a total waste of effort when you need to be
               | building a product and iterating. I hate articles like
               | this because they do a poor job contextualizing the
               | tradeoffs and when it might be appropriate to do the
               | weird exceptional thing.
               | 
               | IMHO if you have a performance critical case when foreign
               | keys are in the way, load THAT data into an in memory DB
               | on a recurring basis and server time sensitive requests
               | from there.
        
           | dasil003 wrote:
           | Interesting that this whole thread makes no distinction
           | between read and write access, as those are dramatically
           | different use cases. Read access is by far the more
           | necessary, and is usually solved relative easily by saving
           | snapshots to a data warehouse. This is no panacea as data can
           | still easily be misinterpreted or replicated and used out of
           | context in violation of expected production data lifecycle by
           | the team that owns the source-of-truth, but in many cases
           | it's trivial and minimizes coordination overhead.
           | 
           | Write access on the other hand is a different story. IMO
           | restricting write access of a database to a single
           | application is table stakes for scaling complex applications.
           | Sure there are other approaches such as writing all your
           | logic in the database via constraints and stored procedures
           | and making the DBA a god-like figure, but these approaches
           | have fallen out of favor as they've proven less scalable
           | compared to wrapping a DB with a service that has exclusive
           | write access. The latter arrangement allows many constraints
           | to be enforced in a horizontally scalable and more legible
           | layer, while still leveraging the DB to prevent races with a
           | better menu of tradeoffs.
           | 
           | Of course this requires thoughtful service and interface
           | design by competent technical domain experts, which is easier
           | said than done, but the alternative allows the overall system
           | cohesion to degrade to where no one understands the system
           | well enough to make any changes without risking major
           | incidents. At that point, the agency of system builders and
           | maintainers is replaced by care and feeding of the unknowable
           | system to not upset the status quo, accompanied with
           | increasingly byzantine hacks and workarounds to enable any
           | business changes.
        
             | JackFr wrote:
             | Indeed - I immediately thought of hypothetical system
             | (similar to many systems I've worked on) where SELECTS are
             | 100x more common than INSERTS which are 100x more common
             | than DELETES.
             | 
             | The 5x speed up on delete performance is a useless
             | optimization.
        
             | pmontra wrote:
             | The no foreign keys advice tastes of 2005/2010 when the
             | lack of support in some frameworks (Rails included)
             | prompted developers to dismiss them as unnecessary. This
             | piece of advice at least quantifies the terms of the
             | tradeoff.
             | 
             | Regarding the correct "saving snapshots to a data
             | warehouse", if there are one million web apps in the world,
             | how many of them have the scale to noticeably benefit from
             | either doing without foreign keys or from a data warehouse?
             | I've been using many of them like everybody else but they
             | are totally irrelevant to the long tail of apps and
             | developers.
             | 
             | By the way, a good DBA can make miracles for the
             | performances of most databases in that long tail. A few
             | days of work are worth the cost especially if the team pays
             | attention and learn the lesson. The best remark I got about
             | a DB of mine was a "not bad for being only a developer".
             | The DBA version was better.
        
               | ttfkam wrote:
               | If I recall correctly, "no foreign keys" hit its stride
               | back in early PHP days when MySQL didn't support them
               | properly. Rather than cop to the fact that MySQL just
               | implemented them badly, MySQL AB went on a dev PR run
               | telling folks that foreign keys weren't actually useful
               | and just slowed a system down.
               | 
               | Once MySQL implemented them less horribly, the PR push
               | finally started to die down. I will never forgive them
               | for that, and decades later where Oracle controls MySQL
               | (and arguably is doing a better job), I still hold a
               | grudge against MySQL that I have to actively suppress
               | when the contract demands it.
        
         | buttercraft wrote:
         | > The data will very likely outlive the application that
         | created it.
         | 
         | Yeah, if your front end is garbage, you can fix or rewrite it.
         | If your data is garbage, you've lost. Depending on your
         | requirements, of course.
        
         | butlerm wrote:
         | If you do not particularly care about performance or have a
         | great deal of headroom then database enforcement of referential
         | integrity is great. Alternatively you could just write test
         | cases to check for it and not pay the severe performance
         | penalty.
         | 
         | The other major downside of database enforcement of referential
         | integrity is the common need to drop and re-create foreign keys
         | during database schema upgrades and data conversions.
        
           | hermanradtke wrote:
           | > you could just write test cases to check for it
           | 
           | This effectively means you are building an embedded database
           | in your application and using the networked database for
           | storage. There are a few reasons to do this and a million
           | reasons not to.
        
           | dasil003 wrote:
           | "just write test cases to check for [referential integrity]"
           | is doing some awful heavy lifting in this comment.
           | 
           | Assuming a standard n-tier application architecture, how do
           | you guarantee the test prevents race conditions?
        
             | jrumbut wrote:
             | You either end up reinventing foreign keys, your support
             | volume will scale faster than your data, or user experience
             | will suffer.
             | 
             | There may be situations where foreign keys become too much
             | overhead, but it's worth fighting to keep them as long as
             | possible. Data integrity only becomes more important at
             | scale. Every orphaned record is a support ticket, lost
             | sale, etc.
        
               | butlerm wrote:
               | Orphaned detail records are usually inconsequential, like
               | uncollected garbage. References to anything with an
               | optional relationship should use outer joins as a matter
               | of course. If you delete something that really needs to
               | be there you have a problem, which is one of the reasons
               | not to delete rows like that, ever, but rather to mark
               | them as inactive or deleted instead.
        
             | butlerm wrote:
             | Typically you look for orphan rows - the sort of thing ON
             | DELETE CASCADE was invented to prevent. Another thing to
             | check for are records that need to exist but should have
             | references cleared when something else is deleted, e.g. ON
             | DELETE SET NULL. And the third thing is ON DELETE RESTRICT.
             | 
             | You can check for the first two of those things after the
             | fact, and they are relatively benign. In many cases it will
             | make no difference to application queries, especially with
             | the judicious use of outer joins, which should be used for
             | all optional relationships anyway.
             | 
             | If you need ON DELETE RESTRICT application code should
             | probably check anyway, because otherwise you have
             | unexpected delete failures with no application level
             | visibility as to what went wrong. That can be tested for,
             | and pretty much has to be before code that deletes rows
             | subject to delete restrictions is released into production.
             | 
             | As far as race conditions go, they should be eliminated
             | through the use of database transactions. Another
             | alternative is never to delete rows that are referred to
             | elsewhere and just set a deleted flag or something. That is
             | mildly annoying to check for however. Clearing an active
             | flag is simpler because you usually want rows like that to
             | stay around anyway, just not be used in new transactions.
        
           | setr wrote:
           | You're still going to pay the cost of maintaining referential
           | integrity -- you're just doing it on the app side. You can do
           | it faster by being not-correct -- eg you don't need a lock if
           | you ignore race conditions -- but it's not like the database
           | is arbitrarily slow at doing one of its basic fundamental
           | jobs.
           | 
           | Of course, you can just skip the validation altogether and
           | cross your fingers and hope you're correct, but it's the same
           | reasoning as removing array bounds checking from your app
           | code; you've eked out some more performance and it's great
           | until it's catastrophically not so great.
           | 
           | Your reasoning should really be inverted. Be correct first,
           | and maintain excessive validation as you can, and rip it out
           | where performance matters. With OLTP workloads, your data's
           | correctness is generally much more valuable than the
           | additional hardware you might have to throw at it.
           | 
           | I'm also not sure why dropping/creating foreign keys is a big
           | deal for migrations, other than time spent
        
             | butlerm wrote:
             | It is quite common for modern databases to have
             | multiversion concurrency so that writers do not block
             | readers. If you do not your transactions should either be
             | awfully short, you should be prepared to wait, or you
             | should implement dirty reads (which are quite common in any
             | case).
        
         | forinti wrote:
         | That's the best advice: data will outlive the application.
         | 
         | It will also serve other applications.
         | 
         | So treat it as its own thing, not as an appendix of the
         | application.
        
           | postalrat wrote:
           | Imo better advice would be your data may outlive the schema
           | it's stored in or your application may outlive the data.
        
       | charlie0 wrote:
       | Having worked in a db with lots of missing FKs, it becomes very
       | difficult to figure out relationships. This makes it difficult to
       | find out what tables need to be updated when there's a lot of
       | them. In this particular company, the only thing holding it
       | together was institutional knowledge.
        
       | davidw wrote:
       | Start with them, and if you're lucky enough to scale to a point
       | where they're hurting things, reconsider.
        
         | arrowleaf wrote:
         | And then, don't just drop the constraints because you read an
         | article about how FK constraints don't work with a distributed
         | relational model, take a careful look at the docs of whatever
         | DB engine you're using. PostgreSQL 12 for example supports
         | foreign keys with either side being a partitioned table.
        
       | jamestimmins wrote:
       | Foreign keys are often the best (and only) documentation of the
       | data model, and the only one guaranteed not to decay, so this is
       | an important consideration when rethinking them.
       | 
       | That said, I quite like this post and the way the author is
       | thinking critically about software decisions most of us take for
       | granted. Even if you decide that foreign keys are always best,
       | this type of thinking is a great way to improve your
       | understanding.
        
         | shayonj wrote:
         | Thank you :)
        
         | claytongulick wrote:
         | I agree - and I also appreciate the author's objectivity and
         | focus on making the correct technical decision for the problem.
         | Dogma is the bane of good architecture.
         | 
         | I completely agree with the idea that foreign keys aren't
         | always recommended. For example, one of the reasons that
         | MongoDB was so great at edge writes was because of the lack of
         | referential integrity and the "document" as a row concept -
         | denormalize the data into a document and store everything you
         | need there.
         | 
         | That works great for high volume edge writes that you want to
         | operationalize later, asynchronously. Sure, the data can be
         | messy and may have errors - but many use cases tolerate this
         | well.
         | 
         | The nice thing about modern postgres is that you get the best
         | of both worlds, RDBMS stuff when you need it, and NoSQL stuff
         | if you don't.
         | 
         | The author's point, which I think is an excellent one, is that
         | just because it's in a database, it doesn't necessarily make
         | sense to go full 4th normal form on everything.
        
         | seadan83 wrote:
         | +1 for that documentation.
         | 
         | I was recently working on a system that split data across
         | multiple database instances (unnecessarily) and that means
         | referential integrity is lacking and sometimes a huge problem.
         | 
         | To expound, I was tracking down something call a "slotid" and
         | the absence of foreign keys does not mean anything. That data
         | could very well live in some other table on the other DB
         | instance. It turned out though that "slot-id" instead referred
         | to a start time of day as counted by 'minutes-from-midnight'.
         | Thus "slot-id=375" was just "6:15am"... Yup.. 3 hours of my
         | life I will not get back to realize that data did not refer to
         | anything at all. If everything that could have had foreign keys
         | did, then it would have been a super quick investigation to
         | realize the data was not a reference to anything at all.
        
       | belltaco wrote:
       | Do you really need airbags and safety belts? They add a lot of
       | weight, cost and occupy space.
        
       | ww520 wrote:
       | The performance gain by dropping foreign keys doesn't hold water.
       | You still have to do the referential checks in application or in
       | the ORM code. Unless you meant dropping referential checks.
        
         | arrowleaf wrote:
         | The only time I could see it really impacting performance is
         | during bulk operations. But that's where making FK constraints
         | deferred and wrapping the bulk actions in a transaction makes
         | more sense.
        
         | crazygringo wrote:
         | Not really -- in practice, it generally means all inserts are
         | done as a transaction, and deletes simply aren't done.
         | 
         | Then there isn't really any opportunity for inconsistent data.
        
         | magicalhippo wrote:
         | Might be a performance penalty as well.
         | 
         | We had some missing FK constraints, along with missing cleanup
         | in code for a given table. Some customers had many millions of
         | orphaned rows, while only thousands of live rows.
         | 
         | Certainly didn't help index scans...
        
         | arp242 wrote:
         | Here' the common scenario:                 obj = new_object()
         | obj.col1 = get_value_from_somewhere()       obj.user_id =
         | get_logged_in_user_id()       obj.insert()
         | 
         | Or:                 # Find correct ID.       obj.some_id =
         | run_query("select some_id from tbl where x=?", param)
         | 
         | Lots of variation on that, but the user_id and some_id here are
         | pretty much guaranteed to be accurate when implemented
         | correctly. The biggest potential issue might be race conditions
         | with deletes on the parent ID, but just having soft deletes
         | sufficiently alleviates that (potential) small issue.
        
       | pphysch wrote:
       | I've been on both sides of this fence.
       | 
       | I work on a legacy MySQL database that has 100+ moderatedly-
       | related tables with _zero_ foreign key constraints.
       | 
       | Someone wanted to change the name of a particular object, so I
       | did a little investigation and found that approximately two dozen
       | tables were storing this name-as-key, with at least half a dozen
       | different column names. So essentially we have to scan every row
       | in the entire database to robustly maintain referential
       | integrity, and I vetoed the largely cosmetic change. Granted,
       | this isn't 100% a FK constraint issue, but I imagine if the
       | original DBA knew what FKs were, I wouldn't be in this mess.
       | 
       | On the other hand, I have implemented prototypes that clearly had
       | way too many FK constraints, which leads to a bloated schema of
       | unnecessary tables with 1-2 columns each, and makes object
       | lifecycles a spaghetti nightmare.
        
       | amtamt wrote:
       | This post seems like long version of "Directive 595 Part 2" from
       | https://thedailywtf.com/articles/directive-595
       | 
       | Anyone planning to test "YMMV", or full blown follow through, is
       | advised to read that dailywtf.
        
       | BenoitP wrote:
       | My rule of thumb has been: enable them strictly in DEV and INT
       | environments, disable in PROD. They can catch schema
       | discrepancies, but can impede ingestion rates.
       | 
       | Also some referential errors are sort of ok in PROD, as long as
       | it's only about not dropping user data; which can be dealt with
       | later on (INT gets reset with PROD user data from a backup each
       | week, it also helps in the restore plan, fk are enabled, errors
       | are caught, then data gets pruned heavily)
       | 
       | If referential integrity is a business-level bug, then of course
       | we should enable them.
        
         | shayonj wrote:
         | Very interesting! Thanks for sharing
        
         | somehnguy wrote:
         | To me that seems like an invitation for catastrophic bugs or
         | problems to creep through and surprise you in prod, since the
         | lower envs are running different a different configuration.
         | 
         | I suppose it's entirely dependent on the type of data you're
         | working with though.
        
         | LtWorf wrote:
         | > as long as it's only about not dropping user data
         | 
         | Can you tell the name of the company? It's for my lawyer.
        
         | HideousKojima wrote:
         | >My rule of thumb has been: enable them strictly in DEV and INT
         | environments, disable in PROD
         | 
         | I mean I think the constraints should be on in all
         | environments, but disabling them in prod but not dev seems
         | utterly backwards? Protect your test data from getting
         | corrupted but not your actual customer data?
        
       | binarymax wrote:
       | How often is data actually DELETEd from production databases?
       | Unless following through with regulatory removal such as GDPR,
       | it's far better to use an isdeleted flag IMO (so you can un-
       | delete if the action was a mistake).
        
         | pphysch wrote:
         | That's true, but managing the "soft" referential integrity
         | becomes a big authorization/security headache. How do you
         | prevent/allow "deleted" objects from being accessed indirectly?
         | 
         | I like Django's ORM approach which allows you to easily set
         | baseline filters for your Model Managers, so you can implicitly
         | exclude "deleted" objects from most queries.
        
           | binarymax wrote:
           | Good point. I'm one of those "never-ORM" people, so writing &
           | maintaining the SQL is prone to error when the DB gets
           | complex, and it does introduce mental overhead during dev.
        
         | arp242 wrote:
         | There's also an impact on inserts; it's not just deletes.
         | 
         | Essentially a foreign key constraint is an on
         | {insert,update,delete} trigger which checks whether the target
         | check exists, so that's a select on the target table. I'm not
         | sure if that's still the case, but I believe that for a long
         | time foreign keys _were_ just implemented as triggers in
         | PostgreSQL.
         | 
         | For a lot of things that has a minimal performance impact and
         | it's not a big deal. For some other things it can really add
         | up.
        
           | binarymax wrote:
           | FKs are typically primary keys/clustered indices on the
           | related table, so in those cases the integrity overhead would
           | be very minimal.
           | 
           | So in the vast majority of cases, the integrity is worth it.
        
             | arp242 wrote:
             | I assume you mean "performance overhead"? For insert-heavy
             | workloads the difference very much is not "very minimal".
             | In a quick test it's ~100ms vs. ~20ms with two foreign keys
             | which check against a "countries" table (250 rows) - one
             | column for nationality and one for residence.
             | 
             | +-a lot because this is a very quick test I ran just twice,
             | but it fits what I measured before when I very much ran in
             | to this performance penalty a few years back.
             | 
             | And look, 20ms vs. 100ms is very much fine for a lot of use
             | cases. But it's also not for a lot of others. And it's
             | certainly not very minimal, and with many inserts it does
             | amplify a lot.
        
       | karmakaze wrote:
       | Large scale MySQL databases I've worked on typically do not use
       | foreign keys. The payoff is higher write performance. They're
       | implicit based on table/column naming and relationships defined
       | in code, e.g. Rails associations or process/operation classes.
       | 
       | With a certain level of team maturity and thoughtful reviews,
       | this has rarely been an issue. Sometimes there are orphaned rows
       | (from incomplete/buggy writes) which also have clean-ups that
       | tend to have jobs for pruning data that has gone 'out of the
       | retention window'.
        
       | toasted-subs wrote:
       | Something about a secret racist thing about the word "foreign".
       | Personally I wouldn't mind working with more Spanish, Italian,
       | and French people.
       | 
       | Not complaining about the types of coworkers I see, but sometimes
       | tapas are kind of nice.
        
       | simonw wrote:
       | If you work on a large scale system that doesn't include foreign
       | key constraints, try running some queries that should never
       | return results because that would expose data integrity problems.
       | 
       | If your system is hooked up to a data warehouse you can run
       | queries there, too.
       | 
       | I bet you can find all sorts of weird edge-case records doing
       | this.
        
       | iamflimflam1 wrote:
       | Whenever I read posts like this I know, despite all the caveats
       | at the start of the article, a large number of people will take
       | away "we shouldn't have any foreign keys because they impact
       | performance".
       | 
       | I suggest that anyone thinking that has a look at a database
       | where there are no foreign keys.
        
       | forinti wrote:
       | A colleague once insisted in not using FKs because he read
       | somewhere that that was the modern way of doing things.
       | 
       | So the universe decided to punish him and a few weeks later some
       | data got deleted that would have been protected by an FK.
       | 
       | This idea is no longer discussed.
        
       | bhaak wrote:
       | I'm missing the execution plans of these DELETE statements.
       | 
       | Without them it's not clear what's the problem.
       | 
       | But in any case sacrificing referential integrity for speed is a
       | bad idea. You could go NoSQL in that case anyway.
        
       | jelled wrote:
       | I see foreign keys the same way I see types. You may not need
       | them, but they catch a lot of mistakes and give you more
       | confidence in the integrity of your system.
        
       | cdcarter wrote:
       | The Salesforce CRM application somewhat "famously" does not use
       | native DB foreign keys to model most relationships, and has a
       | custom relational integrity and indexing layer.
       | 
       | Mostly, this is in order to support our complex custom schema
       | functionality, described in the Multitenant Whitepaper here
       | (https://www.developerforce.com/media/ForcedotcomBookLibrary/...)
       | 
       | But interestingly, this also lets our internal data modelers
       | choose to make a relationship e.g. leave dangling records or have
       | other update/delete patterns that are appropriate for the data
       | and data volume. Obviously there are cleanups and tradeoffs
       | required here. It's certainly not a model I'd suggest people
       | start with, but there really are times where database native FKs
       | aren't the answer.
        
         | jamestimmins wrote:
         | Do you think this has served y'all well in the long term?
         | 
         | I assume "internal data modelers" is an actual role of person
         | whose job is to ensure long run data integrity?
        
       | _a_a_a_ wrote:
       | A better post than I expected. The only thing I'd add is that
       | foreign keys can actually improve read times because the
       | optimiser knows it can safely skip certain joins e.g. if you have
       | inner equi-joins between tables a, b and c                   a
       | join b join c
       | 
       | If there is an FK from a to b, and likewise from b to c, and you
       | don't use anything in b, then the optimiser can rewrite this to
       | a join c
       | 
       | YMMV
        
         | senderista wrote:
         | Yes, I remember using foreign key relationships to optimize
         | symmetric hash join back when I worked on an OLAP db. The idea
         | was that for a 1:1 relationship you can immediately discard
         | both sides of a joined tuple, while for a 1:many relationship
         | you can immediately discard the "many" side of a joined tuple.
        
           | _a_a_a_ wrote:
           | thanks for the detail!
        
       | Scubabear68 wrote:
       | I do consulting for a lot of different companies. The older ones
       | who did not religiously use PK/FK constraints have databases that
       | are nightmares to maintain. The data will start to rot
       | surprisingly quickly, devs will react by writing weird code to
       | compensate, and your life will not be fun.
        
       | betenoire wrote:
       | I worked for a company where we would not use features that
       | depended on foreign keys (cascading deletes, etc.), but we would
       | define the foreign keys 100% of the time. We ran all of our
       | staging environments with the keys, but when it came to prod our
       | DBA would remove them, among other optimizations. I've not done
       | this at other places that didn't have a dedicated DBA or two,
       | though.
        
       | ceeam wrote:
       | Yes.
       | 
       | And if you're concerned for microspeed of basic operations
       | Postgres isn't your friend anyway.
        
       | knorker wrote:
       | I used MySQL before foreign key support.
       | 
       | It was a nightmare. The article seems to basically be saying "but
       | it's hard!". Well, I'd rather put down the extra effort so that I
       | don't have corrupt data.
       | 
       | You know what's _really_ hard? Fixing corrupt data.
       | 
       | The data is the most important thing in a database.
        
       | jbverschoor wrote:
       | Well, do we really need protected memory and bounds checking?
        
       | mastazi wrote:
       | Despite the title, this article is about dropping foreign key
       | constraints, not about dropping foreign keys altogether (which is
       | possible, see below).
       | 
       | You can drop foreign key constraints and enforce referential
       | integrity in your code, but from a logical standpoint your
       | foo.barId column is still a foreign key.
       | 
       | Actually getting rid of foreign keys on the other hand can be
       | done in a data store that allows nested structures, like e.g. a
       | document based store. The price to pay is data duplication and
       | denormalisation (and the related complexity you will have to deal
       | with in all write operations), the pros are ease and speed of
       | retrieval.
        
       | prudentpomelo wrote:
       | Referential integrity saves you so many headaches. I am dealing
       | with a 500+ table database with no foreign keys at all. This has
       | allowed so many problems to creep in. Writing queries is pure
       | suffering.
        
       | pif wrote:
       | At last, a counterexample to Betteridge's law of headlines!
        
       | mikl wrote:
       | Do you really need referential integrity?
       | 
       | Do you really need consistent data?
       | 
       | Do you really need test coverage?
       | 
       | I mean, you can make do without it. But it's generally better to
       | have it than not.
        
       ___________________________________________________________________
       (page generated 2023-12-21 23:01 UTC)