[HN Gopher] Do you really need foreign keys?
___________________________________________________________________
Do you really need foreign keys?
Author : shayonj
Score : 83 points
Date : 2023-12-21 17:26 UTC (5 hours ago)
(HTM) web link (www.shayon.dev)
(TXT) w3m dump (www.shayon.dev)
| KeyBoardG wrote:
| Unless your project is small and for learning, or won't be around
| for very long, Yes. You absolutely do or will end up in a
| technical debt world of hurt. I have worked on systems without
| them that are 15+ years old and the vast, vast majority of fixes
| and refactoring were self inflicted wounds like this. We added
| FK's and indexes where necessary and wound up faster and safer.
| x86_64Ubuntu wrote:
| That sounds awful. My first instinct would be to see where an
| FK constraint would fail, and then ask the business what it
| should be. There is no guarantee that every business case from
| way back needed an FK, or that anyone in the business knows
| which FK value should be there. It sounds absolutely miserable.
| eddd-ddde wrote:
| In my experince, adding proper constraints will not only
| ensure code can never write invalid data to disk, it will
| also help you realize when business rules are lacking or
| failed to consider edge cases.
| cyberninja15 wrote:
| nice, but I wonder if a query optimizer within the database might
| be the more optimal solution.
| cube2222 wrote:
| On that note, has anybody figured out a nice way to combine
| foreign keys and soft deletion (that is, a deleted_at column)?
|
| Soft deletion is occasionally useful, but losing foreign keys for
| it is a big pain.
|
| EDIT: got a bunch of responses, thanks! To be clear, the issue I
| have in mind is e.g. you want to have a foreign key that makes
| sure the "singular" side of a one-to-many relationship isn't
| soft-deleted (on delete restrict) as long as it has anything in
| the "many" side. Both sides using soft-deletion.
| michaelmior wrote:
| I see at least two possible approaches (each with their
| tradeoffs, neither great):
|
| 1. For each relevant foreign key, have an additional column
| without a constraint where the foreign key value can be copied
| to before deletion. (This requires that the original foreign
| key be nullable.)
|
| 2. Make soft deletion universal so that foreign keys can remain
| and any associated rows will still exist.
| stop50 wrote:
| How are foreign keys and deleted_at mutual exclusive?
| dspillett wrote:
| I assumed cube2222 meant where soft-delete wasn't universal:
| so you might really delete the parent row but want to soft-
| delete the child rows. Though I would suggest this is a bad
| idea: if you want to soft-delete child rows you should enable
| it on parent rows too.
| cube2222 wrote:
| E.g. you want to have a foreign key that makes sure the
| "singular" side of a one-to-many relationship isn't soft-
| deleted as long as it has anything in the "many" side. Both
| sides using soft-deletion.
| PeterisP wrote:
| Just brainstorming - perhaps one way to do that would be a
| multi-column FK towards the row ID combined with an is_deleted
| column, which would mean that the FK constraint also enforces
| updating the related rows if something is deleted.
| arrowleaf wrote:
| By soft deletion, do you mean you would want to ever re-instate
| the record and make it valid again, PK/FK constraints and all?
| Or tracking deleted records? For the latter I use audit tables
| + triggers to track the changing values.
|
| At a basic level you could duplicate the entire record into the
| audit table on every action, e.g. the audit table would look
| like `audit_id | record_id | user/process_id | action (insert,
| update, delete) | timestamp | ...<record rows>`.
|
| You can optimize it to not duplicating column values unless
| necessary. On inserts you only need the metadata of the action.
| On updates, the old value of columns with changes goes into the
| audit table. On deletes the whole record goes into the audit
| table.
| zer00eyz wrote:
| Ahh the old days of LAMP when the M was mysql and foreign keys
| were just a dream...
|
| When you reduce complexity and take off the safeguards things get
| faster! Cock that foot gun and hope that it doesn't go off!
|
| Can you do what the author suggests. You sure can and we did it
| for a long time with MYSQL. Should you? It depends on your team,
| how in tune they are with working with databases, sql etc...
| wombatpm wrote:
| If you are willing to forego 3rd normal form and just duplicate
| data like mad, and have a million column like a giant excel
| sheet, then you really don't need foreign keys.
| doix wrote:
| If you duplicate data like mad, how do you ensure that the
| duplicates are in sync? If you avoid UPDATEs I could maybe
| see it working.
| zer00eyz wrote:
| MyISAM I think was the table type that did NOT support FK
| constraints and was blazing fast. InnoDB had FK constraints.
|
| You would fully normalize, use proper joins etc. But without
| FK constraints you could end up with orphan data... Not the
| worst thing in the world, depending on how the joins in your
| system were structured.
|
| Care and diligence were the order of the day. You had to know
| your schema and make sure you were doing the RIGHT thing at
| all points in the stack.
| Clubber wrote:
| There was a funny NoSql video that said something to the effect
| of, "if you are willing to sacrifice everything for speed, just
| pipe all your data to /dev/null."
|
| Found it: https://www.youtube.com/watch?v=b2F-DItXtZs
| norir wrote:
| I once worked somewhere that used rails in lieu of foreign keys.
| The result was a brittle nightly delete_orphaned records script
| as well as obscure user visible bugs. My team started adding
| foreign keys to our records and unsurprisingly caught bugs in our
| application code that otherwise would have been missed.
| Personally, I think the default should always be to usr foreign
| keys and only if you have a genuine scalability problem might you
| consider dropping them or put then in a different database with
| lighter data integrity guarantees.
| shayonj wrote:
| I agree and makes sense. Starting out with them and then
| challenging the setup is a good practice. I like the idea of
| moving to a lighter data integrity setup too (ofc when
| possible)
| seadan83 wrote:
| This anecdote is not very surprising to me. Foreign keys are on
| a scale of DB constraints. I opt to add a maximum of
| constraints where I can. Adding a date that should be in the
| future - I'm adding a constraint! Did you mess up your time
| zone conversions & are trying to add a date in the past - the
| constraint catches it. At some point the constraints are strong
| enough where with high confidence the following statement can
| be said: "if the data exists in the database at all, it is
| correct & complete." This is the "database as fortress"
| principle, that only correct data should ever be allowed into
| the database to begin with.
| enobrev wrote:
| I worked on a couple projects in the early aughts like this
| (php / mysql). Even did it myself for a project and fortunately
| learned my lesson early enough to fix that very project and
| rely in the database to handle data integrity.
|
| Always important to remember that while learning from others is
| important; They're just as human as you are.
| eska wrote:
| I wonder whether there's a measurable performance difference if
| instead of calling delete two times he just used the second
| delete with "on delete cascade"?
| endisneigh wrote:
| You don't need foreign keys, no. But you do need referential
| integrity and foreign keys implemented and enforced by the
| database is usually the easiest.
|
| If you're disciplined it's not difficult to implement yourself in
| the application, but the challenge is if folks can access the
| database directly. If so, good luck, since it's inevitable that
| they end up making modifications and break integrity.
|
| If one insists on not having foreign keys (and there are decent
| reasons to not have them), I would really really suggest not
| allowing direct access to the database and enforce whatever
| application be the main or only client to the database.
| shayonj wrote:
| That is great and I noticed it's something I have always
| assumed should be the case - no indirect access to the database
| outside your application / ORM. Because then you open a can of
| worms.
| out_of_protocol wrote:
| Doesn't help much though. It only takes not wrapping related
| inserts into transaction in one place in your code. i know
| zero frameworks/ORMs that can help you with that
| tacone wrote:
| I usually wrap everything (including reads) in a
| transaction at middleware level. Not perfect, has its
| downsides but for many projects it's perfectly ok.
|
| Of course it's not a real substitute for foreign keys, but
| definitely better than nothing.
| wharvle wrote:
| Foreign keys are wholly optional if you're the kind of person
| who thinks it'd be cool to add "manually repair millions of
| rows" and/or "lose half a day of data; cry" to the todo list
| for a 3AM emergency deployment rollback.
| nonethewiser wrote:
| What would be the advantage of implementing it in the
| application?
|
| I only see disadvantages:
|
| - no data integrity check in the database
|
| - more complicated definition of foreign relation, or none at
| all, leaving possibility for deviant data
|
| - scatteted schema. Cant look at the db table and understand
| the entire model. Have to hunt in code an potentially across a
| lot of code.
|
| It just seems like a data integrity issue that needs to be
| enforced at the db level.
| randomdata wrote:
| _> What would be the advantage of implementing it in the
| application?_
|
| It is always implemented in the application. The question is
| about whether you do it in your own application or use
| someone else's application.
|
| The disadvantage of doing it in your application is that you
| have to do the work that someone else has probably already
| done, and are likely not as great of a programmer as that
| other person, thus more likely to screw it up.
| endisneigh wrote:
| It's faster, you can do online schema migrations, but it's
| hard to say unless a specific database is used as an example.
| cogman10 wrote:
| The main advantage is it's WAY faster if you are writing a
| lot.
|
| The big disadvantage of foreign keys is they do verify
| integrity. That means making lookups on every write/update
| which can be very costly especially as the model becomes more
| complex.
|
| If you have a read heavy application with low levels of
| writes then by all means put in foreign keys to you hearts
| content. But if you are at a point where you have millions or
| billions of records in multiple tables, they simply aren't
| feasible.
| mavelikara wrote:
| But you pay the cost in checking it in the application, as
| GP said. If so, it simply is moving the cost from db to
| application layer. Is there a reason the checks can be
| implemented more efficiently in the application than the DB
| can?
| endisneigh wrote:
| The reason is because the application knows what's
| actually happening with the data.
| LtWorf wrote:
| It's supposed to know, but it actually doesn't.
|
| At my first job they used mongodb for no reason other
| than it was the fad at the moment, with the big data and
| so on.
|
| There were often crashes in the application because our
| records followed several different schemas, because of
| bugs in the application that were later fixed, behaviour
| that got changed, ORM got replaced with hand written code
| that was much faster...
|
| Basically, what happens with over confident developers
| and CTO that think they are very good developers and
| aren't.
| endisneigh wrote:
| There are plenty of counter examples. It's simply a fact
| that your application can be faster without foreign keys.
| majkinetor wrote:
| Yes there is although way more complex.
|
| U could use caching so not to hit db often and also app
| scales way easier than db.
| butlerm wrote:
| Referential integrity problems usually happen due to
| missing deletes, improper deletes, or references that
| should be cleared.
|
| The overhead of checking for the existence of referred to
| records in ordinary inserts and updates in application
| code is unnecessary in most cases, and that is where the
| problem is. Either you have to check to have any idea
| what is going on, because your key values are being
| supplied from an outside source or you should be able
| write your application so that it does not insert random
| references into your database.
|
| If you actually need to delete a row that might be
| referred to, the best thing to do is not to do that,
| because you will need application level checks to make
| the reason why you cannot delete something visible in any
| case. 'Delete failed because the record is referred to
| somewhere' is usually an inadequate explanation. The
| application should probably check so that delete isn't
| even presented as an option in cases like that.
| setr wrote:
| > If you actually need to delete a row that might be
| referred to, the best thing to do is not to do that,
| because you will need application level checks to make
| the reason why you cannot delete something visible in any
| case. 'Delete failed because the record is referred to
| somewhere' is usually an inadequate explanation. The
| application should probably check so that delete isn't
| even presented as an option in cases like that.
|
| I feel like this belongs to the same strategy as
| duplicating form-validation on frontend/backend. The
| frontend validations can't be trusted (they can be
| skipped over with e.g. curl POST), so backend validation
| _must_ be done. But you choose duplicate it to the
| frontend for user-convenience / better reporting /
| faster feedback loop. The backend remains the source of
| truth on validations.
|
| The same between database and application; the database
| is much more likely to be correct when enforcing basic
| data constraints and referential integrity. The
| application can do it, its just a lot more awkward
| because they're also juggling other things and have a
| higher-level view of the data (and the only real way to
| check you didn't screw up is to make your testcase do
| exactly the same thing... but be correct about it -- no
| one else is going to tell you your dataset got fucked.
| Also true in an RDBMS, but it's trivial to verify by eye,
| and there's only one place to check per relationship).
| Thus in my world-view, the database _must_ validate, and
| the application can choose to duplicate validation for
| user-convenience / better reporting. The database
| remains the source of truth on validations. As an
| optimization, you remove the database validations, but at
| your own risk.
|
| And then in a multi-app, single db world, then you really
| can't trust the application (validations can be skipped),
| so even that optimization is likely illegal. Or you do
| many-apps *-> single-api -> db, and maintain the
| optimization at the cost of pretty much completely
| dropping the flexibility of having an RDBMS in the first
| place
| dkjaudyeqooe wrote:
| This is your typical false economy trading a very small
| performance gain for strongly degraded data integrity.
|
| If the indexes are too much to ask you're basically saying
| that the referred to foreign records are never looked up
| (they have no key index!) and/or that the foreign records
| are never gathered up for referring table. If that's the
| case the problem isn't having superfluous indexes it's that
| you have superfluous data in your database.
| cogman10 wrote:
| > a very small performance gain for strongly degraded
| data integrity.
|
| It's not very small. Doing a lookup on every write can be
| hugely detrimental to performance especially with large
| numbers of writes.
|
| > If the indexes are too much to ask you're basically
| saying that the referred to foreign records are never
| looked up (they have no key index!)
|
| That does not follow. The lookup for foreign records is
| not free so avoiding doing it when you don't need to will
| gain faster performance vs doing it all the time. Indexes
| make lookups faster, they don't make them free.
|
| Further, you have to consider the impact of locks on such
| a system. Writes to a table that references another will
| lock the contents of the second table while the write is
| in flight. So if I wanted to update the foreign record in
| any way, that task now gets blocked until data integrity
| check finishes.
|
| In MSSQL, that lock is held until the end of the
| transaction.
| ndriscoll wrote:
| Batch processing is also WAY faster (like 1-2 orders of
| magnitude), but for whatever reason people don't talk about
| it much (cynically, I guess because it's too
| straightforward and old-hat). As far as I can tell, Rails
| doesn't have multithreading/async, so the OP is presumably
| not batching requests. That seems like a much better place
| to start than giving up referential integrity.
| cogman10 wrote:
| We are batch processing. I don't know why you assume we
| aren't. FK constraints are simply too much burden for us.
| I'm not the OP, however, my company has experienced the
| performance headaches of FKs.
| ndriscoll wrote:
| I didn't assume anything about you; I said OP uses a
| framework that (as far as I know) makes it difficult to
| batch requests, so they're probably not batching
| requests. And they mention doing a `before_create` hook,
| which again suggests they're operating on individual
| models. If you're already batching and still running into
| a wall, then yeah you might need to remove constraint
| validation or shard. But batching alone can probably take
| most people far past the scale they'll ever see.
| cosmotic wrote:
| Using an intermediate client would go against many of the
| claimed performance benefits of not using the FK. One would
| also likely lose the ACI from ACID. Replicating those features
| in the intermediate client sure is risky; it would require
| significant expertise. I'd imagine the implementation would
| leverage transactions, which would be 'round-trip' to the
| database server and thus the lock contention would still exist
| and also be substantially worse (due to the round-trip
| latency).
| dkjaudyeqooe wrote:
| It's basically like saying you're ok with dangling pointers or
| orphaned records, or that you're awesome enough not to need
| these "slow", annoying safety features.
|
| That's all fine, but we know how this ends up in reality.
|
| Checking that child records that can exist only if there is a
| parent record are deleted when the parent is deleted and that
| the foreign key you're using in a record actually exists in the
| pointed-to table, for the price of a single index you very
| likely need anyway, is hardly a problem.
|
| You don't need to cascade deletes or anything, that's a red
| herring, as is migration issues where you can turn off
| referential integrity while the process is performed.
| bjhartin wrote:
| > but the challenge is if folks can access the database
| directly
|
| This is not the challenge. The challenge is that you think you
| can do an 'almost-as-good' job of data integrity as the RDBMS
| designers.
|
| Even if you could, you will not always be the person that
| maintains that code.
| rpgmaker wrote:
| 50 years onward and every innovation seems to break, rather than
| improve, the tried and true relational database model.
| arrowleaf wrote:
| Implementations with skin in the game are still innovating and
| improving the 'tried and true' relational database model, IMO.
| PostgreSQL, MS SQL, MySQL, etc. consistently churn out new
| features that actually make sense, like FK constraint support
| for sharded/partitioned tables relevant to this discussion.
| lrobinovitch wrote:
| This github issue is often linked when this topic is discussed:
| https://github.com/github/gh-ost/issues/331
|
| > Personally, it took me quite a few years to make up my mind
| about whether foreign keys are good or evil, and for the past 3
| years I'm in the unchanging strong opinion that foreign keys
| should not be used. Main reasons are:
|
| > * FKs are in your way to shard your database. Your app is
| accustomed to rely on FK to maintain integrity, instead of doing
| it on its own. It may even rely on FK to cascade deletes
| (shudder). When eventually you want to shard or extract data out,
| you need to change & test the app to an unknown extent.
|
| > * FKs are a performance impact. The fact they require indexes
| is likely fine, since those indexes are needed anyhow. But the
| lookup made for each insert/delete is an overhead.
|
| > * FKs don't work well with online schema migrations.
| shayonj wrote:
| ah! thats a blast from the past. I maintain pg-osc (online
| schema change tool for postgres) and very much agree that FKs
| make OSC hard.
| iLoveOncall wrote:
| > FKs are a performance impact. The fact they require indexes
| is likely fine, since those indexes are needed anyhow. But the
| lookup made for each insert/delete is an overhead.
|
| This is not a valid argument at all and I'm concerned anyone
| would think it is.
|
| If you have a foreign key, it means you have a dependency that
| needs to be updated or deleted. If that's the case, you will
| have an overhead anyway, the only question being whether it's
| at the DB level or at the application level.
|
| I don't think there are many cases where there's any advantage
| to self-manage them at the application level.
|
| > FKs don't work well with online schema migrations
|
| This seems to be related only to the specific project that the
| issue is about if you read about the detailed explanation
| below.
| randomdata wrote:
| _> the only question being whether it 's at the DB level or
| at the application level._
|
| It is not a binary situation like that. With the rise of
| 'n-tier' systems that are ever so popular today, there are
| often multiple DB levels. The question is not so much if it
| should go into the end user application - pretty much
| everyone will say definitely not there - but at which DB
| level it should it go in. That is less clear, and where you
| will get mixed responses.
| brlewis wrote:
| Note that this was written in 2016 in the context of a mysql-
| centric project. You will not find an "unchanging strong
| opinion that foreign keys should not be used" outside that
| context.
|
| I haven't kept up with mysql enough to know if there are still
| good reasons to avoid foreign keys. I just stick with
| postgresql.
| richardwhiuk wrote:
| sharding is still a big problem for foreign keys
| grep_it wrote:
| The responder to that issue has also written some blog posts
| that go into more detail on the subject.
|
| * https://code.openark.org/blog/mysql/things-that-dont-work-
| we...
|
| * https://code.openark.org/blog/mysql/the-problem-with-
| mysql-f...
| _heimdall wrote:
| Foreign keys allow for pushing a really vital piece of business
| logic down to the database itself, referential integrity.
|
| This can be done in application logic, but that risks bugs
| allowing broken references into your data. Its more foolproof
| when these checks are enforced directly at the db, making sure
| data is valid before it's stored or updated.
| kevincox wrote:
| No to mention that most SQL databases have complex consistency
| rules. Just checking that the target row exists (or not) before
| adding (or deleting) a record may not result in a correct
| database state in the face of concurrent transactions.
|
| You better very carefully read your database's concurrency
| guarantees and ensure that all transactions are running with
| the right consistency level otherwise you will have a bad day.
|
| On top of that the JOIN method of following foreign keys often
| leads to missing rows if expected keys don't exist which can be
| very hard to debug.
| PartiallyTyped wrote:
| There was an interesting topic on r/experienceddevs [1], where
| the dev team was arguing with some DBAs on adding another
| column. The DBA insisted on using FKs over enum / application
| level logic, etc. The comments there present some excellent
| arguments on top of everything you said.
|
| [1]
| https://www.reddit.com/r/ExperiencedDevs/comments/18ldexi/wr...
| gokhan wrote:
| Title should be "Do you really need Foreign Key constraints?".
| Foreign Key is the field itself, will still be there without the
| constraint.
| shayonj wrote:
| Great call!
| LtWorf wrote:
| And the body should really be "Yes. You do."
| _a_a_a_ wrote:
| True, a foreign key by itself is just a business requirement. A
| foreign key constraint is what the database does to enforce
| that. There's rarely any confusion about the two concepts,
| though.
| goostavos wrote:
| Follow this advice with caution. Dropping foreign keys is
| effectively giving up part of the C in ACID. It should be done
| with very, very open eyes to the downsides. I'm not sure the
| author is selling the "when" side of this very well. Migrations
| are "hard" because the database is forcing you to handle
| correctness criteria that are easy to ignore. "Lock contention"
| is the database covering your sloppy ill-thought out application
| code from doing the Wrong Thing. Yes, of course, you take a minor
| performance hit in exchange for consistency, but how many of us
| actually work on applications where that trade off is the wrong
| one? Developers are frustratingly superstitious about scale and
| their need to do it.
|
| The data will very likely outlive the application that created
| it. It's almost guaranteed to outlive your tenure on the team.
| Viciously guarding its correctness solves all the problems that
| _not_ guarding it causes.
| erik_seaberg wrote:
| Whenever anyone talks about "the application," I immediately
| ask "what about all the other applications?" I promise you that
| you will find customer service and accounting and biz dev have
| also built stuff that uses the database to get their jobs done
| (probably not with the same ORM or even the same language)
| unless you have taken draconian measures to prevent them.
| goostavos wrote:
| I would argue that if you have completely different contexts
| / business lines / concerns (customer service, accounting,
| biz dev) all directly accessing the same database you have
| far, far larger architectural concerns that FK will ever hope
| to address.
| eddd-ddde wrote:
| I strongly believe the opposite. Your business IS your
| database. How do you handle access to the data without
| access to the data?
|
| It is inevitable that 2 "organizations" will at some point
| need access to the same data. You could just have each of
| them decide how they interact with it. It doesn't matter
| how they do it because the database itself makes sure
| invariants are kept true (such as FKs).
|
| What's the alternative?
| arrowleaf wrote:
| Have a single application clearly owning the data. All
| access to the data is done through that application. Why
| would you _want_ multiple applications deciding how to
| handle data in a common store? What about access
| authorization to sensitive data?
| eddd-ddde wrote:
| Because that single application already exists, in my
| case its postgres.
|
| Pgweb is one of my interfaces, admin dashboard for free,
| and I am sure the changes I make there are as valid as
| changes through any other interface.
|
| I can implement a website as a SSR app that talks
| directly to the db. Maybe tomorrow I decide I need to
| work on a web scraper that will use python, instead of
| adding more API endpoints to allow the scraper to talk to
| the database, I just talk to the database...
|
| The database is that "single appplication", no need to
| write custom endpoints for every operation when SQL is
| good enough.
|
| It's all a spectrum of course, today you can even go as
| far as to use something like pg_graphql and you don't
| even need to write a REST api yourself.
|
| _Edit_ : I forgot to answer the last section, but
| postgres can totally handle autorization with RLS for
| instance, allowing users to only see their own data, or
| maybe data marked as public, etc.
| DasIch wrote:
| If you operate at the scale where you consider the
| performance implications of foreign keys, you probably
| are not ok with anyone accessing the database or running
| just any query against it.
|
| It is not realistic that you can trust everyone who needs
| to access the data with access to the database as they
| might easily cause problems with poorly written queries.
|
| Additionally you may want invariants maintained, that the
| database cannot maintain but an application in front of
| it can.
|
| Also for historical data or analytical queries postgres
| is not ideal either, so you probably want to move the
| data into some OLAP database or datalake.
|
| > I can implement a website as a SSR app that talks
| directly to the db. Maybe tomorrow I decide I need to
| work on a web scraper that will use python, instead of
| adding more API endpoints to allow the scraper to talk to
| the database, I just talk to the database...
|
| If the website and the scraper are just parts of the same
| application, it makes sense to do this but if they are
| genuinely different applications. I would use different
| databases here.
| Akronymus wrote:
| > If you operate at the scale where you consider the
| performance implications of foreign keys, you probably
| are not ok with anyone accessing the database or running
| just any query against it.
|
| I used to work on an application where ALL database
| accesses were via stored procedures. Genuinly the best
| dev experience, to me, so far.
| JackFr wrote:
| > I used to work on an application where ALL database
| accesses were via stored procedures. Genuinly the best
| dev experience, to me, so far.
|
| I found such an environment to be simply terrible.
|
| In general, I think stored procs certainly have their
| place, but if ALL access is through stored procs, you
| better have a schema that's basically set in stone
| otherwise dev will turn into a nightmare.
| jrumbut wrote:
| > Because that single application already exists, in my
| case its postgres.
|
| This is exactly what postgres was designed for! Tens of
| thousands of hours of work over decades to solve the
| problem of relational database management. That's why we
| call it an RDBMS!
|
| Which isn't to say you shouldn't make your own API ever.
| There are a lot of situations where you don't want things
| to connect directly to postgres.
|
| But you shouldn't be afraid of having multiple systems
| connect to postgres. It has incredibly mature and robust
| features to accommodate that use case. It's the expected
| use case.
| butlerm wrote:
| Large ERP systems do that sort of thing as a matter of
| course and have for decades now. It does require careful
| planning and design. I mean AR / AP / scheduling /
| manufacturing / inventory and so on.
|
| The main downside of splitting everything into isolated
| databases is that it makes it approximately impossible to
| generate reports that require joining across databases. Not
| without writing new and relatively complex application code
| to do what used to require a simple SQL query to accomplish
| anyway.
|
| Of course if you have the sort of business with scalability
| problems that require abandoning or restructuring your
| database on a regular basis, then placing that kind of data
| in a shared database is probably not such a great idea.
|
| It should also be said that common web APIs as a
| programming technique are much harder to use and implement
| reliably due to the data marshalling and extra error
| handling code required than just about any system of
| queries or stored procedures against a conventional
| database. The need to page is perverse, for example.
|
| That does not mean that sort of tight coupling is
| appropriate in many cases, but it is (typically) much
| easier to implement. Web APIs could use standard support
| for two phase commit and internally paged queries that
| preserve some semblance of consistency. The problem is that
| stateless architecture makes that sort of thing virtually
| impossible. Who knows which rows will disappear when you
| query for page two because the positions of all of your
| records have just shifted? Or which parts of a distributed
| transaction will still be there if anything goes wrong?
| winnebago wrote:
| Different business units often want aggregate, filter, join
| and transform data in different ways - sometimes in ways
| that are hard to anticipate.
|
| In general, the closer to the persistence layer you can
| perform those transformations, the better they will scale.
| If you pull the transform into the app layer, you need to
| move and serialize more data. If you pull the
| transformation into a constellation of apps, you need to
| move and serialize a constellation of data.
|
| (edit: formatting)
| arrowleaf wrote:
| My assumption (with modern applications!) is that nothing but
| the role directly owning the data will access the data. The
| development and DBA teams will likely have a role they can
| assume after performing a carefully-audited breakglass
| procedure to use in an emergency (rare) or to fulfill audit
| tasks. At least in my org this is a well-known problem with
| legacy applications sharing databases. Limit access to the
| database to a single role, used by a single application, and
| you absolve so many issues.
| jbmsf wrote:
| 100%.
| chasil wrote:
| When your data has outlived several generations of front-
| end technologies (mine started with Powerbuilder), you will
| find that ownership of the data does not control the
| evolution of how it will be used with what replaces past
| clients.
|
| One particularly interesting Oracle problem is:
| ORA-00060: deadlock detected while waiting for resource
|
| Tom Kyte's book, _Expert One-on-One Oracle_ , describes the
| primary culprit:
|
| "Oracle considers deadlocks to be so rare, so unusual, that
| it creates a trace file on the server each and every time
| one does occur... The number one cause of deadlocks in the
| Oracle database, in my experience, is un-indexed foreign
| keys."
|
| For another perspective, add to this a default setting in
| every SQLite database: $ sqlite3 verynew.db
| SQLite version 3.34.1 2021-01-20 14:10:07 Enter
| ".help" for usage hints. sqlite> .dump PRAGMA
| foreign_keys=OFF; BEGIN TRANSACTION; COMMIT;
|
| Foreign keys can cause interesting problems, and SQLite
| specifically prefers to avoid them.
| pixl97 wrote:
| >Limit access to the database to a single role, used by a
| single application, and you absolve so many issues.
|
| This kinda sounds like "Get rid of 90% of the usefulness of
| having a database"
|
| Of course, maybe you mean make the same data the DB has
| available via API, or make other users of the data read
| only.
| mavelikara wrote:
| +1. When I started in the industry, it was common for more
| experienced developers to drill the "data outlives the
| application that generated it" principle into you. Somewhere
| in the transition to NoSQL and back we lost this.
| Pet_Ant wrote:
| The _data_ yes, the _database_ no. The data will get
| migrated to different solutions at different times. I've
| lost track of how often I've been porting MySQL to SQL
| Server, SQL Server to Postgres, Postgres to Mongo, Mongo to
| cloud etc.
|
| EDIT: fixed capitalization of "PostGres"
| winnebago wrote:
| This really depends on the size of your data set. If you
| have a large database (TBs+) it's likely to be very long-
| lived due the effort/hardware resources a migration
| requires - especially if you want to improve the schema
| when migrating.
| blowski wrote:
| I've seen a database migration project outlive
| application rewrites, multiple changes in CTO, and the
| company being acquired.
| jhardy54 wrote:
| Friendly suggestion: the conventional capitalization is
| either Postgres or PostgreSQL.
| SkyPuncher wrote:
| Context is key.
|
| If you have an established business or startup, data will
| outlive the application. However, you need a product that
| lives long enough for either data or application to matter.
|
| In the startup world, that means making decisions that help
| you ship now, at the expense of debt/costs down the road.
| da_chicken wrote:
| It wasn't the transition to NoSQL. It was the transition to
| _web APIs_. Now the application that can access the
| database has a universal shim in place that deals with the
| messy business of taking requests and applying them to the
| data store.
|
| Now there's typically three applications that need DB
| access:
|
| 1. The application it was built for.
|
| 2. The read-only reporting and visualization tools.
|
| 3. The web API.
| Pet_Ant wrote:
| > probably not with the same ORM or even the same language
|
| I mean, they shouldn't? Like you've just identified a bug:
| another application can access your database. If another
| department needs your data, they should request an endpoint
| that you control. You should be using an "application
| database"[1] not an "integration database"[2].
|
| [1] https://martinfowler.com/bliki/ApplicationDatabase.html
| [2] https://martinfowler.com/bliki/IntegrationDatabase.html
| epicfeedback wrote:
| They are you. You is they. It's one company with one goal
| -- keep companying.
|
| The idea that ever "department" should access every other
| department's data through some bespoke interface that the
| latter department _maintains_ might work at some corporate
| behemoth, but at almost all other scales is absurd.
| ch4s3 wrote:
| This would be a total waste of effort when you need to be
| building a product and iterating. I hate articles like
| this because they do a poor job contextualizing the
| tradeoffs and when it might be appropriate to do the
| weird exceptional thing.
|
| IMHO if you have a performance critical case when foreign
| keys are in the way, load THAT data into an in memory DB
| on a recurring basis and server time sensitive requests
| from there.
| dasil003 wrote:
| Interesting that this whole thread makes no distinction
| between read and write access, as those are dramatically
| different use cases. Read access is by far the more
| necessary, and is usually solved relative easily by saving
| snapshots to a data warehouse. This is no panacea as data can
| still easily be misinterpreted or replicated and used out of
| context in violation of expected production data lifecycle by
| the team that owns the source-of-truth, but in many cases
| it's trivial and minimizes coordination overhead.
|
| Write access on the other hand is a different story. IMO
| restricting write access of a database to a single
| application is table stakes for scaling complex applications.
| Sure there are other approaches such as writing all your
| logic in the database via constraints and stored procedures
| and making the DBA a god-like figure, but these approaches
| have fallen out of favor as they've proven less scalable
| compared to wrapping a DB with a service that has exclusive
| write access. The latter arrangement allows many constraints
| to be enforced in a horizontally scalable and more legible
| layer, while still leveraging the DB to prevent races with a
| better menu of tradeoffs.
|
| Of course this requires thoughtful service and interface
| design by competent technical domain experts, which is easier
| said than done, but the alternative allows the overall system
| cohesion to degrade to where no one understands the system
| well enough to make any changes without risking major
| incidents. At that point, the agency of system builders and
| maintainers is replaced by care and feeding of the unknowable
| system to not upset the status quo, accompanied with
| increasingly byzantine hacks and workarounds to enable any
| business changes.
| JackFr wrote:
| Indeed - I immediately thought of hypothetical system
| (similar to many systems I've worked on) where SELECTS are
| 100x more common than INSERTS which are 100x more common
| than DELETES.
|
| The 5x speed up on delete performance is a useless
| optimization.
| pmontra wrote:
| The no foreign keys advice tastes of 2005/2010 when the
| lack of support in some frameworks (Rails included)
| prompted developers to dismiss them as unnecessary. This
| piece of advice at least quantifies the terms of the
| tradeoff.
|
| Regarding the correct "saving snapshots to a data
| warehouse", if there are one million web apps in the world,
| how many of them have the scale to noticeably benefit from
| either doing without foreign keys or from a data warehouse?
| I've been using many of them like everybody else but they
| are totally irrelevant to the long tail of apps and
| developers.
|
| By the way, a good DBA can make miracles for the
| performances of most databases in that long tail. A few
| days of work are worth the cost especially if the team pays
| attention and learn the lesson. The best remark I got about
| a DB of mine was a "not bad for being only a developer".
| The DBA version was better.
| ttfkam wrote:
| If I recall correctly, "no foreign keys" hit its stride
| back in early PHP days when MySQL didn't support them
| properly. Rather than cop to the fact that MySQL just
| implemented them badly, MySQL AB went on a dev PR run
| telling folks that foreign keys weren't actually useful
| and just slowed a system down.
|
| Once MySQL implemented them less horribly, the PR push
| finally started to die down. I will never forgive them
| for that, and decades later where Oracle controls MySQL
| (and arguably is doing a better job), I still hold a
| grudge against MySQL that I have to actively suppress
| when the contract demands it.
| buttercraft wrote:
| > The data will very likely outlive the application that
| created it.
|
| Yeah, if your front end is garbage, you can fix or rewrite it.
| If your data is garbage, you've lost. Depending on your
| requirements, of course.
| butlerm wrote:
| If you do not particularly care about performance or have a
| great deal of headroom then database enforcement of referential
| integrity is great. Alternatively you could just write test
| cases to check for it and not pay the severe performance
| penalty.
|
| The other major downside of database enforcement of referential
| integrity is the common need to drop and re-create foreign keys
| during database schema upgrades and data conversions.
| hermanradtke wrote:
| > you could just write test cases to check for it
|
| This effectively means you are building an embedded database
| in your application and using the networked database for
| storage. There are a few reasons to do this and a million
| reasons not to.
| dasil003 wrote:
| "just write test cases to check for [referential integrity]"
| is doing some awful heavy lifting in this comment.
|
| Assuming a standard n-tier application architecture, how do
| you guarantee the test prevents race conditions?
| jrumbut wrote:
| You either end up reinventing foreign keys, your support
| volume will scale faster than your data, or user experience
| will suffer.
|
| There may be situations where foreign keys become too much
| overhead, but it's worth fighting to keep them as long as
| possible. Data integrity only becomes more important at
| scale. Every orphaned record is a support ticket, lost
| sale, etc.
| butlerm wrote:
| Orphaned detail records are usually inconsequential, like
| uncollected garbage. References to anything with an
| optional relationship should use outer joins as a matter
| of course. If you delete something that really needs to
| be there you have a problem, which is one of the reasons
| not to delete rows like that, ever, but rather to mark
| them as inactive or deleted instead.
| butlerm wrote:
| Typically you look for orphan rows - the sort of thing ON
| DELETE CASCADE was invented to prevent. Another thing to
| check for are records that need to exist but should have
| references cleared when something else is deleted, e.g. ON
| DELETE SET NULL. And the third thing is ON DELETE RESTRICT.
|
| You can check for the first two of those things after the
| fact, and they are relatively benign. In many cases it will
| make no difference to application queries, especially with
| the judicious use of outer joins, which should be used for
| all optional relationships anyway.
|
| If you need ON DELETE RESTRICT application code should
| probably check anyway, because otherwise you have
| unexpected delete failures with no application level
| visibility as to what went wrong. That can be tested for,
| and pretty much has to be before code that deletes rows
| subject to delete restrictions is released into production.
|
| As far as race conditions go, they should be eliminated
| through the use of database transactions. Another
| alternative is never to delete rows that are referred to
| elsewhere and just set a deleted flag or something. That is
| mildly annoying to check for however. Clearing an active
| flag is simpler because you usually want rows like that to
| stay around anyway, just not be used in new transactions.
| setr wrote:
| You're still going to pay the cost of maintaining referential
| integrity -- you're just doing it on the app side. You can do
| it faster by being not-correct -- eg you don't need a lock if
| you ignore race conditions -- but it's not like the database
| is arbitrarily slow at doing one of its basic fundamental
| jobs.
|
| Of course, you can just skip the validation altogether and
| cross your fingers and hope you're correct, but it's the same
| reasoning as removing array bounds checking from your app
| code; you've eked out some more performance and it's great
| until it's catastrophically not so great.
|
| Your reasoning should really be inverted. Be correct first,
| and maintain excessive validation as you can, and rip it out
| where performance matters. With OLTP workloads, your data's
| correctness is generally much more valuable than the
| additional hardware you might have to throw at it.
|
| I'm also not sure why dropping/creating foreign keys is a big
| deal for migrations, other than time spent
| butlerm wrote:
| It is quite common for modern databases to have
| multiversion concurrency so that writers do not block
| readers. If you do not your transactions should either be
| awfully short, you should be prepared to wait, or you
| should implement dirty reads (which are quite common in any
| case).
| forinti wrote:
| That's the best advice: data will outlive the application.
|
| It will also serve other applications.
|
| So treat it as its own thing, not as an appendix of the
| application.
| postalrat wrote:
| Imo better advice would be your data may outlive the schema
| it's stored in or your application may outlive the data.
| charlie0 wrote:
| Having worked in a db with lots of missing FKs, it becomes very
| difficult to figure out relationships. This makes it difficult to
| find out what tables need to be updated when there's a lot of
| them. In this particular company, the only thing holding it
| together was institutional knowledge.
| davidw wrote:
| Start with them, and if you're lucky enough to scale to a point
| where they're hurting things, reconsider.
| arrowleaf wrote:
| And then, don't just drop the constraints because you read an
| article about how FK constraints don't work with a distributed
| relational model, take a careful look at the docs of whatever
| DB engine you're using. PostgreSQL 12 for example supports
| foreign keys with either side being a partitioned table.
| jamestimmins wrote:
| Foreign keys are often the best (and only) documentation of the
| data model, and the only one guaranteed not to decay, so this is
| an important consideration when rethinking them.
|
| That said, I quite like this post and the way the author is
| thinking critically about software decisions most of us take for
| granted. Even if you decide that foreign keys are always best,
| this type of thinking is a great way to improve your
| understanding.
| shayonj wrote:
| Thank you :)
| claytongulick wrote:
| I agree - and I also appreciate the author's objectivity and
| focus on making the correct technical decision for the problem.
| Dogma is the bane of good architecture.
|
| I completely agree with the idea that foreign keys aren't
| always recommended. For example, one of the reasons that
| MongoDB was so great at edge writes was because of the lack of
| referential integrity and the "document" as a row concept -
| denormalize the data into a document and store everything you
| need there.
|
| That works great for high volume edge writes that you want to
| operationalize later, asynchronously. Sure, the data can be
| messy and may have errors - but many use cases tolerate this
| well.
|
| The nice thing about modern postgres is that you get the best
| of both worlds, RDBMS stuff when you need it, and NoSQL stuff
| if you don't.
|
| The author's point, which I think is an excellent one, is that
| just because it's in a database, it doesn't necessarily make
| sense to go full 4th normal form on everything.
| seadan83 wrote:
| +1 for that documentation.
|
| I was recently working on a system that split data across
| multiple database instances (unnecessarily) and that means
| referential integrity is lacking and sometimes a huge problem.
|
| To expound, I was tracking down something call a "slotid" and
| the absence of foreign keys does not mean anything. That data
| could very well live in some other table on the other DB
| instance. It turned out though that "slot-id" instead referred
| to a start time of day as counted by 'minutes-from-midnight'.
| Thus "slot-id=375" was just "6:15am"... Yup.. 3 hours of my
| life I will not get back to realize that data did not refer to
| anything at all. If everything that could have had foreign keys
| did, then it would have been a super quick investigation to
| realize the data was not a reference to anything at all.
| belltaco wrote:
| Do you really need airbags and safety belts? They add a lot of
| weight, cost and occupy space.
| ww520 wrote:
| The performance gain by dropping foreign keys doesn't hold water.
| You still have to do the referential checks in application or in
| the ORM code. Unless you meant dropping referential checks.
| arrowleaf wrote:
| The only time I could see it really impacting performance is
| during bulk operations. But that's where making FK constraints
| deferred and wrapping the bulk actions in a transaction makes
| more sense.
| crazygringo wrote:
| Not really -- in practice, it generally means all inserts are
| done as a transaction, and deletes simply aren't done.
|
| Then there isn't really any opportunity for inconsistent data.
| magicalhippo wrote:
| Might be a performance penalty as well.
|
| We had some missing FK constraints, along with missing cleanup
| in code for a given table. Some customers had many millions of
| orphaned rows, while only thousands of live rows.
|
| Certainly didn't help index scans...
| arp242 wrote:
| Here' the common scenario: obj = new_object()
| obj.col1 = get_value_from_somewhere() obj.user_id =
| get_logged_in_user_id() obj.insert()
|
| Or: # Find correct ID. obj.some_id =
| run_query("select some_id from tbl where x=?", param)
|
| Lots of variation on that, but the user_id and some_id here are
| pretty much guaranteed to be accurate when implemented
| correctly. The biggest potential issue might be race conditions
| with deletes on the parent ID, but just having soft deletes
| sufficiently alleviates that (potential) small issue.
| pphysch wrote:
| I've been on both sides of this fence.
|
| I work on a legacy MySQL database that has 100+ moderatedly-
| related tables with _zero_ foreign key constraints.
|
| Someone wanted to change the name of a particular object, so I
| did a little investigation and found that approximately two dozen
| tables were storing this name-as-key, with at least half a dozen
| different column names. So essentially we have to scan every row
| in the entire database to robustly maintain referential
| integrity, and I vetoed the largely cosmetic change. Granted,
| this isn't 100% a FK constraint issue, but I imagine if the
| original DBA knew what FKs were, I wouldn't be in this mess.
|
| On the other hand, I have implemented prototypes that clearly had
| way too many FK constraints, which leads to a bloated schema of
| unnecessary tables with 1-2 columns each, and makes object
| lifecycles a spaghetti nightmare.
| amtamt wrote:
| This post seems like long version of "Directive 595 Part 2" from
| https://thedailywtf.com/articles/directive-595
|
| Anyone planning to test "YMMV", or full blown follow through, is
| advised to read that dailywtf.
| BenoitP wrote:
| My rule of thumb has been: enable them strictly in DEV and INT
| environments, disable in PROD. They can catch schema
| discrepancies, but can impede ingestion rates.
|
| Also some referential errors are sort of ok in PROD, as long as
| it's only about not dropping user data; which can be dealt with
| later on (INT gets reset with PROD user data from a backup each
| week, it also helps in the restore plan, fk are enabled, errors
| are caught, then data gets pruned heavily)
|
| If referential integrity is a business-level bug, then of course
| we should enable them.
| shayonj wrote:
| Very interesting! Thanks for sharing
| somehnguy wrote:
| To me that seems like an invitation for catastrophic bugs or
| problems to creep through and surprise you in prod, since the
| lower envs are running different a different configuration.
|
| I suppose it's entirely dependent on the type of data you're
| working with though.
| LtWorf wrote:
| > as long as it's only about not dropping user data
|
| Can you tell the name of the company? It's for my lawyer.
| HideousKojima wrote:
| >My rule of thumb has been: enable them strictly in DEV and INT
| environments, disable in PROD
|
| I mean I think the constraints should be on in all
| environments, but disabling them in prod but not dev seems
| utterly backwards? Protect your test data from getting
| corrupted but not your actual customer data?
| binarymax wrote:
| How often is data actually DELETEd from production databases?
| Unless following through with regulatory removal such as GDPR,
| it's far better to use an isdeleted flag IMO (so you can un-
| delete if the action was a mistake).
| pphysch wrote:
| That's true, but managing the "soft" referential integrity
| becomes a big authorization/security headache. How do you
| prevent/allow "deleted" objects from being accessed indirectly?
|
| I like Django's ORM approach which allows you to easily set
| baseline filters for your Model Managers, so you can implicitly
| exclude "deleted" objects from most queries.
| binarymax wrote:
| Good point. I'm one of those "never-ORM" people, so writing &
| maintaining the SQL is prone to error when the DB gets
| complex, and it does introduce mental overhead during dev.
| arp242 wrote:
| There's also an impact on inserts; it's not just deletes.
|
| Essentially a foreign key constraint is an on
| {insert,update,delete} trigger which checks whether the target
| check exists, so that's a select on the target table. I'm not
| sure if that's still the case, but I believe that for a long
| time foreign keys _were_ just implemented as triggers in
| PostgreSQL.
|
| For a lot of things that has a minimal performance impact and
| it's not a big deal. For some other things it can really add
| up.
| binarymax wrote:
| FKs are typically primary keys/clustered indices on the
| related table, so in those cases the integrity overhead would
| be very minimal.
|
| So in the vast majority of cases, the integrity is worth it.
| arp242 wrote:
| I assume you mean "performance overhead"? For insert-heavy
| workloads the difference very much is not "very minimal".
| In a quick test it's ~100ms vs. ~20ms with two foreign keys
| which check against a "countries" table (250 rows) - one
| column for nationality and one for residence.
|
| +-a lot because this is a very quick test I ran just twice,
| but it fits what I measured before when I very much ran in
| to this performance penalty a few years back.
|
| And look, 20ms vs. 100ms is very much fine for a lot of use
| cases. But it's also not for a lot of others. And it's
| certainly not very minimal, and with many inserts it does
| amplify a lot.
| karmakaze wrote:
| Large scale MySQL databases I've worked on typically do not use
| foreign keys. The payoff is higher write performance. They're
| implicit based on table/column naming and relationships defined
| in code, e.g. Rails associations or process/operation classes.
|
| With a certain level of team maturity and thoughtful reviews,
| this has rarely been an issue. Sometimes there are orphaned rows
| (from incomplete/buggy writes) which also have clean-ups that
| tend to have jobs for pruning data that has gone 'out of the
| retention window'.
| toasted-subs wrote:
| Something about a secret racist thing about the word "foreign".
| Personally I wouldn't mind working with more Spanish, Italian,
| and French people.
|
| Not complaining about the types of coworkers I see, but sometimes
| tapas are kind of nice.
| simonw wrote:
| If you work on a large scale system that doesn't include foreign
| key constraints, try running some queries that should never
| return results because that would expose data integrity problems.
|
| If your system is hooked up to a data warehouse you can run
| queries there, too.
|
| I bet you can find all sorts of weird edge-case records doing
| this.
| iamflimflam1 wrote:
| Whenever I read posts like this I know, despite all the caveats
| at the start of the article, a large number of people will take
| away "we shouldn't have any foreign keys because they impact
| performance".
|
| I suggest that anyone thinking that has a look at a database
| where there are no foreign keys.
| forinti wrote:
| A colleague once insisted in not using FKs because he read
| somewhere that that was the modern way of doing things.
|
| So the universe decided to punish him and a few weeks later some
| data got deleted that would have been protected by an FK.
|
| This idea is no longer discussed.
| bhaak wrote:
| I'm missing the execution plans of these DELETE statements.
|
| Without them it's not clear what's the problem.
|
| But in any case sacrificing referential integrity for speed is a
| bad idea. You could go NoSQL in that case anyway.
| jelled wrote:
| I see foreign keys the same way I see types. You may not need
| them, but they catch a lot of mistakes and give you more
| confidence in the integrity of your system.
| cdcarter wrote:
| The Salesforce CRM application somewhat "famously" does not use
| native DB foreign keys to model most relationships, and has a
| custom relational integrity and indexing layer.
|
| Mostly, this is in order to support our complex custom schema
| functionality, described in the Multitenant Whitepaper here
| (https://www.developerforce.com/media/ForcedotcomBookLibrary/...)
|
| But interestingly, this also lets our internal data modelers
| choose to make a relationship e.g. leave dangling records or have
| other update/delete patterns that are appropriate for the data
| and data volume. Obviously there are cleanups and tradeoffs
| required here. It's certainly not a model I'd suggest people
| start with, but there really are times where database native FKs
| aren't the answer.
| jamestimmins wrote:
| Do you think this has served y'all well in the long term?
|
| I assume "internal data modelers" is an actual role of person
| whose job is to ensure long run data integrity?
| _a_a_a_ wrote:
| A better post than I expected. The only thing I'd add is that
| foreign keys can actually improve read times because the
| optimiser knows it can safely skip certain joins e.g. if you have
| inner equi-joins between tables a, b and c a
| join b join c
|
| If there is an FK from a to b, and likewise from b to c, and you
| don't use anything in b, then the optimiser can rewrite this to
| a join c
|
| YMMV
| senderista wrote:
| Yes, I remember using foreign key relationships to optimize
| symmetric hash join back when I worked on an OLAP db. The idea
| was that for a 1:1 relationship you can immediately discard
| both sides of a joined tuple, while for a 1:many relationship
| you can immediately discard the "many" side of a joined tuple.
| _a_a_a_ wrote:
| thanks for the detail!
| Scubabear68 wrote:
| I do consulting for a lot of different companies. The older ones
| who did not religiously use PK/FK constraints have databases that
| are nightmares to maintain. The data will start to rot
| surprisingly quickly, devs will react by writing weird code to
| compensate, and your life will not be fun.
| betenoire wrote:
| I worked for a company where we would not use features that
| depended on foreign keys (cascading deletes, etc.), but we would
| define the foreign keys 100% of the time. We ran all of our
| staging environments with the keys, but when it came to prod our
| DBA would remove them, among other optimizations. I've not done
| this at other places that didn't have a dedicated DBA or two,
| though.
| ceeam wrote:
| Yes.
|
| And if you're concerned for microspeed of basic operations
| Postgres isn't your friend anyway.
| knorker wrote:
| I used MySQL before foreign key support.
|
| It was a nightmare. The article seems to basically be saying "but
| it's hard!". Well, I'd rather put down the extra effort so that I
| don't have corrupt data.
|
| You know what's _really_ hard? Fixing corrupt data.
|
| The data is the most important thing in a database.
| jbverschoor wrote:
| Well, do we really need protected memory and bounds checking?
| mastazi wrote:
| Despite the title, this article is about dropping foreign key
| constraints, not about dropping foreign keys altogether (which is
| possible, see below).
|
| You can drop foreign key constraints and enforce referential
| integrity in your code, but from a logical standpoint your
| foo.barId column is still a foreign key.
|
| Actually getting rid of foreign keys on the other hand can be
| done in a data store that allows nested structures, like e.g. a
| document based store. The price to pay is data duplication and
| denormalisation (and the related complexity you will have to deal
| with in all write operations), the pros are ease and speed of
| retrieval.
| prudentpomelo wrote:
| Referential integrity saves you so many headaches. I am dealing
| with a 500+ table database with no foreign keys at all. This has
| allowed so many problems to creep in. Writing queries is pure
| suffering.
| pif wrote:
| At last, a counterexample to Betteridge's law of headlines!
| mikl wrote:
| Do you really need referential integrity?
|
| Do you really need consistent data?
|
| Do you really need test coverage?
|
| I mean, you can make do without it. But it's generally better to
| have it than not.
___________________________________________________________________
(page generated 2023-12-21 23:01 UTC)