[HN Gopher] YAGRI: You are gonna read it
___________________________________________________________________
YAGRI: You are gonna read it
Author : escot
Score : 303 points
Date : 2025-04-23 21:47 UTC (1 days ago)
(HTM) web link (www.scottantipa.com)
(TXT) w3m dump (www.scottantipa.com)
| al_borland wrote:
| I agree with this as written, as think it's important to have
| some degree of forethought when building out the DB to plan for
| future growth and needs.
|
| That said, the monkey paw of this would be someone reading it and
| deciding they should capture and save all possible user data,
| "just in case", which becomes a liability.
| arjonagelhout wrote:
| A little while back, I had a conversation with a colleague about
| sorting entries by "updated at" in the user interface, and to my
| surprise this was not added by the backend team.
|
| Many of these "we are going to need it"s come from experience.
| For example in the context of data structures (DS), I have made
| many "mistakes" that I do correctly a second time. These mistakes
| made writing algorithms for the DS harder, or made the DS have
| bad performance.
|
| Sadly, it's hard to transfer this underlying breadth of knowledge
| and intuition for making good tradeoffs. As such, a one-off tip
| like this is limited in its usefulness.
| SchemaLoad wrote:
| Somewhat related, but I suggest having both the record updated
| at, and some kind of "user editing updated at". As I've
| encountered issues where some data migration ends up touching
| records and bumping the updated at, which shocks users since
| they see the UI reshuffle and think they have been hacked when
| they see the records updated at a time they didn't update them.
| XorNot wrote:
| I mean this is what audit logs are for I'd say: generally
| speaking you want to know what was changed, by who and why.
|
| So really you probably just want a reference to the tip of
| the audit log chain.
| tcdent wrote:
| Database schemas being perfect out-of-the gate was replaced by
| reliable migrations.
|
| If it's not data that's essential to serving the current
| functionality, just add a column later. `updated_at` doesn't
| have to be accurate for your entire dataset; just set it to
| `NOW()` when you run the migration.
| SOLAR_FIELDS wrote:
| "Reliable migrations" almost seems like an oxymoron.
| Migrations are complicated, difficult and error prone. I
| think there's a good takeaway here around good initial schema
| design practices. The less you have to morph your schema
| overtime, the less of those risky migrations need to run.
| tcdent wrote:
| My experience over the last decade has been different.
|
| Use a popular framework. Run it against your test database.
| Always keep backups in case something unforseen happens.
|
| Something especially trivial like adding additional columns
| is a solved problem.
| __float wrote:
| My experience has not been so smooth. Migrations are
| reasonable, but they're not free and "always keeps
| backups" sounds like you'd tolerate downtime more than I
| would.
|
| Even in the best case (e.g. basic column addition), the
| migration itself can be "noisy neighbors" for other
| queries. It can cause pressure on downstream systems
| consuming CDC (and maybe some of those run queries too,
| and now your load is even higher).
| imcritic wrote:
| Still depends on what the data represent: you could get
| yourself in a storm of phone calls from customers if after
| your latest release there's now a weird note saying their
| saved document was last updated today.
|
| "HOW DARE YOU MODIFY MY DOCUMENTS WITHOUT MY..."
| phire wrote:
| Sure, migrations are bearable (especially ones that only add
| columns).
|
| But for the example of the "updated_at" column, or "soft
| delete" functionality, you only find out you need it because
| the operations team suddenly discovered they needed that
| functionality on existing production rows because something
| weird happened.
| smithkl42 wrote:
| In C#-land, we just have it as a standard that ~every table
| inherits from `ITrackable`, and we wrote a little EF plugin
| to automatically update the appropriate columns.
|
| public interface ITrackable { DateTime CreatedOn {get; set;}
| DateTime ModifiedOn {get; set;} }
|
| Saves so much time and hassle.
| reillyse wrote:
| One thing I do quite frequently which is related to this (and
| possibly is a pattern in rails) is to use times in place of
| Booleans.
|
| So is_deleted would contain a timestamp to represent the
| deleted_at time for example. This means you can store more
| information for a small marginal cost. It helps that rails will
| automatically let you use it as a Boolean and will interpret a
| timestamp as true.
| acedTrex wrote:
| This one little change alone can bring such huge benefits
| later.
| noman-land wrote:
| This is all well and good until you need to represent something
| that happened on Jan 1 1970 00:00 UTC.
| jchw wrote:
| 32-bit UNIX timestamps are often signed so you can actually
| go before that, but most UNIX timestamps are 64-bit now,
| which can represent quite a larger range. And SQL datetime
| types might have a totally different range.
|
| Not that it _really_ matters; deleted_at times for your
| database records will rarely predate the existence of said
| database.
| noman-land wrote:
| It's not about the scale, it's that `if (0)` will evaluate
| to `false` in many languages.
| xp84 wrote:
| So you're still fine as long as you're not tracking
| things that were deleted on that exact instant 50 years
| ago, a safe assumption, for instance, for things that
| happened in your application that has only existed for
| less time than that. That said, I haven't ever seen this
| implemented in a way that casts. It's implemented with
| scopes in the ORM, usually.
| MyModel.nondeleted.where(<criteria>)
|
| etc.
|
| which generates a query with "WHERE deleted_at IS NULL"
|
| 1-1-1970 is fine.
| jchw wrote:
| In addition to the sibling comment, which is exactly
| right (you should be using a nullable column here, if
| you're using SQL, for multiple reasons) I reckon this a
| design issue in the programming language that is largely
| unrelated to how you model the database. It's pretty easy
| to run into bugs especially if you compound it with other
| quirky APIs, like strcmp: `if (strcmp(a, b)) // forgot to
| do == 0; accidentally backwards!` -- So really, you just
| don't have much of a choice other than to tread carefully
| and enable compiler warnings. Personally in this case I'd
| use an Optional wrapper around the underlying timestamp
| type anyways, if I needed to be able to represent the
| UNIX timestamp at 0 as well as an empty state.
| BeFlatXIII wrote:
| Leave it null for non-deleted items.
| jfengel wrote:
| I consider booleans a code smell. It's not a bug, but it's a
| suggestion that I'm considering something wrong. I will
| probably want to replace it with something more meaningful in
| the future. It might be an enum, a subclass, a timestamp,
| refactoring, or millions of other things, but the Boolean was
| probably the wrong thing to do even if I don't know it yet.
| xp84 wrote:
| This seems at first like a controversial idea, but the more I
| think about it the more I like this thought technology.
| Merely the idea of asking myself if there's a better way to
| store a fact like that will potentially improve designs.
| crdrost wrote:
| The enum idea is often wise; also: for just an example that
| has probably occurred a hundred thousand times across the
| world in various businesses...
|
| Original design: store a row that needs to be reported to
| someone, with an is_reported column that is boolean.
|
| Problem: one day for whatever reason the ReporterService
| turns out to need to run two of these in parallel. Maybe
| it's that the reporting is the last step after ingestion in
| a single service and we need to ingest in parallel. Maybe
| it's that there are too many reports to different people
| and the reports themselves are parallelizable (grab 5
| clients, grab unreported rows that foreign key to them,
| report those rows... whoops sometimes two processes choose
| the same client!)... Maybe it's just that these are run in
| Kubernetes and if the report happens when you're rolling
| pods then the request gets retried by both the dying pod
| and the new pod.
|
| Alternative to boolean: unreported and reported records
| both live in the `foo` table and then a trigger puts a row
| for any new Foos into the `foo_unreported` table. This
| table can now store a lock timestamp, a locker UUID, and
| denormalize any columns you need (client_id) to select
| them. The reporter UPDATEs a bunch of rows reserving them,
| SELECTs whatever it has successfully reserved, reports
| them, then DELETEs them. It reserves rows where the lock
| timestamp IS NULL or is less than now minus 5 minutes, and
| the Reporter itself runs with a 5 minute timeout. The DB
| will do the barest amount of locking to make sure that two
| UPDATES don't conflict, there is no risk of deadlock, and
| the Boolean has turned into whether something exists in a
| set or not.
|
| A similar trick is used in the classic Python talk "Stop
| Writing Classes" by @jackdied where a version of The Game
| of Life is optimized by saying that instead of holding a
| big 2D array of true/false booleans on a finite gameboard,
| we'll hold an infinite gameboard with a set of (x,y) pairs
| of living cells which will internally be backed by a
| hashmap.
| lud_lite wrote:
| Booleans also force the true/false framing.
|
| E.g. a field called userCannotLoginWithoutOTP.
|
| Then in code "if not userCannotLoginWithoutOTP or otpPresent
| then..."
|
| Thus may seem easy until you have a few flags to combine and
| check.
|
| An enum called LoginRequirements with values Password,
| PasswordAndOTP is one less negation and easier to read.
| NitpickLawyer wrote:
| For me enums win especially when you consider that you can
| get help from your environment every time you add/remove
| stuff. Some languages force you to deal with the changes
| (i.e. rust) or you could add linter rules for other
| languages. But you're more likely to catch a problem before
| it arises, rather than deal with ever increasing bool
| checks. Makes reasoning about states a lot easier.
| retropragma wrote:
| Tangential: I was recently wishing that bitwise flags had
| better support in Postgres. For now, bools are just easier to
| work with
| crabmusket wrote:
| The way I think about it: a boolean is usually an _answer to
| a question_ about the state, not the _state itself_.
|
| A light switch doesn't have an atomic state, it has a range
| of motion. The answer to the question "is the switch on?" is
| a boolean answer to a question whose input state is a range
| (e.g. is distance between contacts <= epsilon).
| hadriendavid wrote:
| << Anytime you store Boolean, a kitten dies >> Nobody has ever
| said that but nobody wants any kitten to die so nobody has ever
| challenged me anytime I use that statement.
| seveibar wrote:
| Agree, although the acronym in the article could be interpreted
| to mean "you are going to read it, so index it appropriately",
| which is sort of bad advice and can lead to overindexing. There
| is probably something better for "add appropriate and
| conventional metadata" (the author suggests updated_at,
| created_at etc)
|
| Not a huge fan of the example of soft delete, i think hard
| deletes with archive tables (no foreign key enforcement) is a
| much much better pattern. Takes away from the main point of the
| article a bit, but glad the author hinted at deleted_at only
| being used for soft deletes.
| mkhalil wrote:
| Event-sourcing solves this. And with how cheap storage is, it
| should be more prevalent in the industry. IMO the biggest thing
| holding it back is that there isn't a framework that's plug-and-
| play (say like Next.js is to React) that provides people with
| that ability.
|
| I've been working on one in Typescript (with eventual re-writes
| in other langs. like Rust and Go), but it's difficult even coming
| up with conventions.
| rising-sky wrote:
| Care to share what you've been working on?
| code_biologist wrote:
| Event sourcing is an expensive solution and I don't mean from a
| storage perspective -- it burns engineering cognitive
| horsepower quickly on things that don't matter. Do it if you're
| in finance or whatever. Having been burned by my own "let's
| event source" impulse on data change tracking systems, I now
| prefer less sophisticated solutions. Figuring out how to deal
| with slow projections, watching a projection rebuild go from
| minutes to hours to a few days as a system I expected to handle
| a few events/minute go to 20 events/second. Fancy caches can't
| save you if you want to use that vaunted ability to reconstruct
| from scratch. Event schema evolution also presents difficult
| tradeoffs: when old events stop having meaning or evolve in
| meaning you either end up adding on new event subtypes and
| variants leaving old cruft to accumulate, or you do migrations
| and edit history on really large tables.
|
| I'd counsel anyone considering event sourcing to use more "low
| power" solutions like audit logs or soft deletes (if really
| necessary) first if possible.
| 000ooo000 wrote:
| Appreciate your perspective, and it makes me wish there was
| some kind of online 'engineers learning from their mistakes'
| forum (rare to see "I burned myself"). To hear hard won
| knowledge distilled like this is a nice reminder to spend
| ones complexity budget wisely.
| koakuma-chan wrote:
| Why not keep the projection always updated?
| evanmoran wrote:
| Just curious, how do people feel about this general style of soft
| deletes currently? Do people still use these in production or
| prefer to just delete fully or alternatively move deleted rows to
| a separate tables / schema?
|
| I find the complexity to still feel awkward enough that makes me
| wonder if deleted_at is worth it. Maybe there are better patterns
| out there to make this cleaner like triggers to prevent deletion,
| something else?
|
| As for the article, I couldn't agree more on having timestamps /
| user ids on all actions. I'd even suggest updated_by to add to
| the list.
| swagasaurus-rex wrote:
| I think soft deletes using timestamptz are a good thing.
|
| Deleting rows directly could mean you're breaking references.
| For example, say you have a product that the seller wants to
| delete. Well, what happens if customers have purchased that
| product? You still want it in the database, and you still want
| to fulfill the orders placed.
|
| Your backend can selectively query for products, filter out
| deleted_at for any customer facing queries, but show all
| products when looking at purchase history.
|
| There are times when deleting rows makes sense, but that's
| usually because you have a write-heavy table that needs
| clearing. Yes, soft-deletes requires being careful with WHERE
| statements filtering out deleted rows, but that's a feature not
| a bug.
| pmontra wrote:
| > what happens if customers have purchased that product? You
| still want it in the database, and you still want to fulfill
| the orders placed.
|
| You might still want to show to those customers their
| purchase history including what they bought 25 years ago. For
| example, my ISP doesn't have anymore that 10 Mb/s fiber optic
| product I bought im 2000, because it was superseded by 100
| Mb/s products and then by 1 Gb/s ones. It's also not my ISP
| anymore but I use it for the SIM in my phone. That also
| accumulated a number of product changes along the years.
|
| And think about the inventory of eshops with a zillion
| products and the archive of the pady orders. Maybe they keep
| the last few years, maybe everything until the db gets too
| large.
| imcritic wrote:
| Always soft-deletion first. Then it gets exported to a separate
| archive and only then, after some time and may be attempted to
| be fully deleted from the initial base.
| zie wrote:
| If you have a good audit log, it really doesn't matter. You can
| always restore it if need be.
|
| If you have no audit log(or a bad one), like lots of apps, then
| you have to care a lot.
|
| Personally, I just implement a good audit log and then I just
| delete with impunity. Worst case scenario, someone(maybe even
| me) made a mistake and I have to run undo_log_audit() with the
| id of the audit log entry I want to put back. Nearly zero
| hassle.
|
| The upside, when something goes wrong, I can tell you who, what
| and when. I usually have to infer the why, or go ask a human,
| but it's not usually even difficult to do that.
| dml2135 wrote:
| Can you share more about what makes a good audit log? My
| company doesn't currently have one and I'm a little lost on
| where to start.
|
| Should this be at the application code level, or the ORM, or
| the database itself?
| jimbokun wrote:
| Probably application level in most cases as those other
| levels probably don't have all the information you want to
| include.
| zie wrote:
| That depends on where the data you need to keep track of is
| and your architecture. The important thing is, you want
| your audit log to be able to tell you: *
| Who * What * When * Ideally Why
|
| For any change in the system. Also when storing the audit
| log, take into account that you might need to undo things
| that happened(not just deletes). For instance maybe some
| process went haywire and inserted 100k records it wasn't
| supposed to. A good audit log, you should be able to run
| something like undo_log_audit(rec1, rec100k) and it will do
| the right thing. I'm not saying that code needs to exist
| day 1, but you should take into account the ability to do
| that when designing it.
|
| Also you need to take into account your regulatory
| environment. Sometimes it's very very important that your
| audit logs are write once, and read only afterwards and are
| stored off machine, etc. Other times it's just for internal
| use and you can be a little more lax about date integrity
| of your audit logs.
|
| Our app is heavily database centric. We push into the DB
| the current unix user, the current PID of the process
| connecting to the DB, etc(also every user has their own
| login to the DB so it handles our authentication too). This
| means our database(Postgres) does all of the audit logging
| for us. There are plenty of Postgres audit logging
| extensions. We run 2 of them. One that is trigger based
| creating entries in a log_audit table(which the
| undo_log_audit() code uses along with most reporting use
| cases) and a second one that writes out to syslog(so we can
| move logs off machine and keep them read only). We are in a
| regulated industry that gets audited regularly however. Not
| everyone needs the same level of audit logging.
|
| You need to figure out how you can answer the above
| questions given your architecture. Normally the "Why"
| question is hard to answer without talking with a human,
| but unless you have the who, what and when, it's nearly
| impossible to even get to the Why part of the question.
| lexh wrote:
| It is Postgres specific, but I've gotten a lot of mileage
| out of the advice in this article:
|
| https://supabase.com/blog/postgres-audit
| mrkeen wrote:
| I once worked in a small VB6-based team - you can probably
| guess the attitudes and surrounding tech were just as out
| of date.
|
| I tried to push for using svn, rather than just making
| copies of our source code folders and adding dates to them.
|
| My manager allowed me to use svn, but to make sure I also
| did things the proper way by making copies of the source
| code folders.
|
| That's the current level of discourse around audit logs.
| Write down what happened using your data tables ... but
| write down what _really_ happened in the audit logs.
|
| At some point you should just lean into putting audit logs
| first (just like developers reach for the git first).
| refset wrote:
| > Maybe there are better patterns out there to make this
| cleaner
|
| SQL:2011 temporal tables are worth a look.
| j_w wrote:
| Financial world: records have a "close" or "expire" date which
| is then purged after some period of time. A deletion doesn't
| just happen, the record is updated to be "closed" or "expired"
| and some time after that it's deleted.
|
| Something like a loan could live in a production environment
| for well over a year after closing, while an internal note may
| last just a month.
| PeterStuer wrote:
| There can be legal requirements to retain data for a specified
| time for law enforcement and audits, while at the same time
| other legal requirements have you requiring to delete data upon
| customer request.
|
| Doing this with pure 'hard' deletes is not possible, unless you
| maintain 2 different tables, one of which would still have the
| soft delete explicit or implicit. You could argue the full db
| log would contain the data for the former requirement, but
| while academicly correct this does not fly in practice.
| metanonsense wrote:
| In our product, we have different strategies depending on the
| requirements. Sometimes, we just delete. Sometimes, we do soft
| delete with timestamps. Sometimes, we have a history table with
| or without versioned entities. Sometimes, we have versions in
| the table. Sometimes, we have an audit log. Sometimes, we use
| event sourcing (although everyone in the team hates it ;-)
| ffitch wrote:
| curious that both YAGNI and YAGRI arguments could realistically
| be made for the same fields. guess it boils down to whether
| someone's YAGRI is stronger than their colleague's YAGNI ( :
| nailer wrote:
| I have a different way of thinking about this: data loss. If you
| are throwing away data about who performed a delete it is a data
| loss situation. You should think about whether that's OK. It
| probably isn't.
| zie wrote:
| Literally just have a good audit log and then you get all of this
| for free and more.
| shaunofthedev wrote:
| The perils of UI design wagging the horse.
|
| I like the heuristics described here. However if these things
| aren't making it into a product spec where appropriate, then I
| smell some dysfunction that goes beyond what's being stored by
| default.
|
| Product need (expressed as spec, design, etc) should highlight
| the failure cases where we would expect fields like these to be
| surfaced.
|
| I'd hope that any given buisness shouldn't need someone with
| production database access on hand to inform as to why/when/how
| 'thing' was deleted. Really we'd want the user (be it 'boss' or
| someone else) to be able to access that information in a
| controlled manner.
|
| "What information do we need when something goes wrong?". Ask it.
| Drill it. Ask it again.
|
| That said, if you can't get those things, this seems a fine way
| to be pragmatic.
| Hasnep wrote:
| "Wagging the horse" is a great turn of phrase, better than
| "putting the cart before the dog."
| sgarland wrote:
| > But I've never heard someone complain about a table having too
| many timestamps.
|
| I do. Each one is 8 bytes. At the billions of rows scale, that
| adds up. Disk is cheap, but not free; more importantly, memory is
| not cheap at all.
| hoipaloi wrote:
| Counter point: "Soft Deletion Probably Isn't Worth It"
| https://brandur.org/soft-deletion
| jimbokun wrote:
| But auditing probably is.
| drewcoo wrote:
| To every point there is a counterpoint. So what?
| yoshicoder wrote:
| Well in the same vain that we discuss "points" and talk about
| the merits, its useful to discuss and understand their counter
| points. I for one did not know about this and thought it was
| insightful when building a product that hasn't fully been
| scoped out and is more greenfield
| im_anon_on_hn wrote:
| As an acronym, it's easy to be misremembered as "You ARENT gonna
| read it" (based on the popularity of yagni) - and have the
| opposite advice spread..
| 000ooo000 wrote:
| How about: IIPTWNIRTIoSoDIQMBSNRTL
|
| (It Is Probable That While Not Immediately Required The
| Implementation of Storage of Data In Question May Be Simpler
| Now Rather Than Later)
|
| I've gone ahead and included additional detail in the acronym
| in the event that the clarity is required later, as this would
| be difficult to retrofit into a shorter, more-established
| acronym.
| willismichael wrote:
| Why do I feel like that should be a cheat code in a FPS game?
| __float wrote:
| YRIS: you'll read it someday
| oofManBang wrote:
| How do you distinguish from "you aren't gonna read it"? The
| acronym is poorly designed.
| gorgoiler wrote:
| Additionally, mutable fields will quite often benefit from having
| a separate _edit_ table which records the old value, the new
| value, who changed it, and when. Your main table's _created_ and
| _updated_ times can be a function of (or a complement to) the
| _edit_ table.
|
| It is tempting to supernormalize everything into the relations
| _object(id, type)_ and _edit(time, actor_id, object_id, key,
| value)_. This is getting dangerously and excitingly close to a
| graph database implemented in a relational database! Implement
| one at your peril -- what you gain in schemaless freedom you also
| lose in terms of having the underlying database engine no longer
| enforcing consistency on your behalf.
| crabmusket wrote:
| > This is getting dangerously and excitingly close to a graph
| database implemented in a relational database!
|
| This feels like a great unresolved tension in database /
| backend design - or maybe I'm just not sophisticated enough to
| notice the solutions?
|
| Is the solution event sourcing and using the relational
| database as a "read model" only? Is that where the truly
| sophisticated application developers are at? Is it really
| overkill for everybody not working in finance? Or is there just
| not a framework that's made it super easy yet?
|
| Users demand flexible schemas - should we tell them no?
| wavemode wrote:
| > supernormalize everything into the relations object(id, type)
| and edit(time, actor_id, object_id, key, value)
|
| I frankly hate this sort of thing whenever I see it. Software
| engineers have a tendency to optimize for the wrong things.
|
| Generic relations reduce the number of tables in the database.
| But who cares about the number of tables in the database? Are
| we paying per table? Optimize for the data model actually being
| understandable and consistently enforced (+ bonus points for
| ease of querying).
| motorest wrote:
| > Additionally, mutable fields will quite often benefit from
| having a separate edit table which records the old value, the
| new value, who changed it, and when.
|
| Aren't you describing a non-functional approach to event
| sourcing? I mean, if the whole point of your system is to track
| events that caused changes, why isn't your system built around
| handling events that cause changes?
| smitty1e wrote:
| YAGRI proponents organized themselves into a community to develop
| their...YAGRIculture.
|
| I'll show myself out.
| alganet wrote:
| *_at and *_by fields in SQL are just denormalization + pruning
| patterns consolidated, right?
|
| Do the long walk:
|
| Make the schema fully auditable (one record per edit) and the
| tables normalized (it will feel weird). Then suffer with it,
| discover that normalization leads to performance decrease.
|
| Then discover that pruned auditing records is a good middle
| ground. Just the last edit and by whom is often enough (ominous
| foreshadowing).
|
| Fail miserably by discovering that a single missing auditing
| record can cost a lot.
|
| Blame database engines for making you choose. Adopt an
| experimental database with full auditing history. Maybe do
| incremental backups. Maybe both, since you have grown paranoid by
| now.
|
| Discover that it is not enough again. Find that no silver bullet
| exists for auditing.
|
| Now you can make a conscious choice about it. Then you won't need
| acronyms to remember stuff!
| klysm wrote:
| My current state is have the database be the current state and
| use logical replication (CDC) to keep the log of changes in
| case you need it
| datadrivenangel wrote:
| It is interesting thinking about record changes as a spectrum
| towards application logs. At some point too much detail is
| expensive to store, and you must adopt an archival strategy.
| klysm wrote:
| Really depends on the app. If you have a low throughput
| line-of-business kind of application you can probably get
| away with storing everything.
| alganet wrote:
| If you see it from the pure SQL point of view, you are in the
| "blame database engines and adopt an experimental solution".
|
| It is the point where you give up modeling the audit as part
| of the systems tables.
|
| The drawbacks of this choice are often related to retrieval.
| It depends on the engine.
|
| I once maintained a system that kept a fully working log
| replicated instance delayed by 24h, ready for retrieval
| queries, in addition to regular disk backups (slow costy
| retrieval).
|
| I am more developer than DBA, so I can probably speak more
| about modeling solutions than infra-centric solutions.
| klysm wrote:
| Yeah 100% giving up on pure SQL to solve the problem,
| mainly from the perspective that doing full versioning etc.
| in SQL is really damn hard.
| globular-toast wrote:
| The problem with this is the audit log is only at the CRUD
| level which is often too low. Ambiguities can arise. For
| example if the question is "who published the article" do you
| look for a create or do you look for an update with
| published=true? It's even worse when you consider the schema
| can change over time, so both can be correct but at different
| points in time. Event sourcing is the way if you want to
| capture business-level events.
| lud_lite wrote:
| Another option is audit info could go to another table or
| datastore entirely.
|
| If you never use it, that data can be dumped to s3 glacier
| periodically (e.g. after 90 days).
|
| By losing the foreign key you gain flexibility in what you
| audit. Maybe audit the operation and not the 20 writes it
| causes.
| yes_man wrote:
| Fair enough, but now your application is relying on 100%
| uptime of AWS and S3 and no network failures in between. And
| what happens if your transaction goes through, but the
| request to AWS doesn't? What happens if another operation
| mutates the target meanwhile before you can retry with
| current state? Your app is also slowing down since it needs
| to send the events to S3 and guarantee they got there. Now
| you are reinventing two-stage commits. Unless you aren't
| actually making an audit log and don't care if events are
| guaranteed to be logged?
|
| So like OP said, no silver bullets exist for auditing.
| lud_lite wrote:
| Correct. This is a system design problem. You want this to
| be transactional and work at scale? That might be hard to
| achieve. Maybe if the data can be partioned then each node
| handles its own auditing in a table ad part of the
| transaction. There are many possibilities. Allowing
| inconsistently might be OK too depending on what is
| required.
| awesome_dude wrote:
| But wait, there's Event Driven Architectures and Event
| Sourcing, meaning that the events are your log of edits!
| alganet wrote:
| Doesn't that also falls on the "blame the database engines
| and go for an experimental solution"?
|
| I'm not saying databases are blameless. It's just that
| experiencing the issues they have by yourself is rewarding!
|
| There is also a walk before the long walk of databases. Store
| things in text files and use basic tools (cat, sed, sh...).
|
| The event driven stuff (like Kafka) reminds me of that. I am
| not very familiar with it though, just played a little bit
| with it once or twice.
| awesome_dude wrote:
| Kind of, the WAL in postgres is effectively an event log,
| and many people keep replicas of it for backup reasons,
| which is auditable, kind of meaning that an EDA/Event
| source is just a shinier version of that?
| guy4261 wrote:
| Author is very kind! In practice, many times I saw only the
| CR/CRU of CRUD getting implemented.
|
| For example: as a company aspires to launch its product, one of
| the first features implemented in any system is to add a new
| user. But when the day comes when a customer leaves, suddenly you
| discover no one implemented off-boarding and cleanup of any sort.
| crazygringo wrote:
| These are not decisions that should be taken solely by whoever is
| programming the backend.
|
| They need to be surfaced to the product owner to decide. There
| may very well be reasons pieces of data should _not_ be stored.
| And all of this adds complexity, more things to go wrong.
|
| If the product owner wants to start tracking every change and by
| who, that can completely change your database requirements.
|
| So have that conversation properly. Then decide it's either not
| worth it and don't add any of these "extra" fields you "might"
| need, or decide it is and fully spec it out and how much
| additional time and effort it will be to do it as a proper
| feature. But don't do it as some half-built just-in-case "favor"
| to a future programmer who may very well have to rip it out.
|
| On a personal project, do whatever you want. But on something
| professional, this stuff needs to be specced out and accounted
| for. This isn't a programming decision, it's a product decision.
| hamandcheese wrote:
| Some things are trivial and nearly free - created_at,
| updated_at. I don't think engineers need to bring trivialities
| like this to a "product owner". Own your craft.
| kace91 wrote:
| A soft delete might not be, for compliance reasons (GDPR and
| the like). Otherwise I agree.
| danielPort9 wrote:
| I think the tricky part lies on knowing which things can be
| done without consulting any product owner. I agree that
| created_at and updated_at don't cause any harm. deleted_at on
| the other hand cannot be decided by engineers only (mainly
| because of GDPR reasons: if something is expected to be
| totally deleted, then that must be it). As usual, these kind
| of things are obvious to engineers with years of experience ,
| not so much to newcomers.
| IanCal wrote:
| Although those can be more complicated, and it should be
| clear what they're for and _why_ they exist. Will this result
| in an object having an updated_at timestamp elsewhere in a
| larger application? Is it clear which properties that refers
| to?
| hombre_fatal wrote:
| I never worked at a place with product owners, but their post
| made me appreciate my roles where I'm trusted to help design
| the product myself. Yeesh.
|
| Being unable to even call the shot of whether a database
| table should have an updated_at or soft-delete sounds like a
| Dilbertian hellscape to me.
| bayindirh wrote:
| When the product you're developing is governed by regulations
| and standards you need to comply, owning your craft is doing
| things by the book, not adding fields on your own because it
| might be useful later.
| dogleash wrote:
| So what? I've worked places with lots of regulation. Part
| of every development job is learning the product domain. In
| that case devs become comfortable with reading
| standard/law/regulations and anticipating when software
| implementation might interact with the areas covered.
|
| Sure there were people who's job was to offload as much
| compliance work from everyone else; by turning it into
| internal requirements, participating in design discussion
| and specializing in ensuring compliance. But trying to
| isolate the development team from it is just asking for
| micromanagers.
| bayindirh wrote:
| > So what?
|
| Think before you act. The machine has no brain. Use
| yours.
|
| > Part of every development job is learning the product
| domain.
|
| Yes.
|
| > In that case devs become comfortable with reading
| standard/law/regulations and anticipating when software
| implementation might interact with the areas covered.
|
| This is what I'm saying, too. A developer needs to think
| whether what they are doing is OK by the regulation
| they're flying against. They need to ask for permissions
| by asking themselves "wait, is this OK by the regulation
| I'm trying to comply?".
|
| > But trying to isolate the development team from it is
| just asking for micromanagers.
|
| Nope, I'm all for taking initiatives, and against
| micromanagement. However, I'm also against "I need no
| permission because I'm doing something amazing" attitude.
| So own your craft, "code responsibly".
| dogleash wrote:
| Oh, I thought you were disagreeing with hamandcheese's
| point that every little decision doesn't need to go
| through a product owner before anything happens.
| bayindirh wrote:
| No, not at all. by "the book", I meant regulations, not
| the management. :)
| Chris2048 wrote:
| > And all of this adds complexity, more things to go wrong
|
| That's a little vague given this specific example, which
| appears to be about maintaining some form of informative
| logging; though I don't think it necessarily needs to be in the
| form of an DB table.
| chii wrote:
| > don't do it as some half-built just-in-case "favor" to a
| future programmer who may very well have to rip it out.
|
| in other words - YAGNI !
| damnitbuilds wrote:
| Yes. Adding fields to a DB is not a risk-free thing a
| programmer should just do.
| Cthulhu_ wrote:
| A product owner may (probably does not) have these things on
| their radar, it's up to a developer to inform them of industry
| best practices like these.
| Extasia785 wrote:
| This entirely depends on the company culture. I worked in teams
| where every small decision is in the hand of the PO and I've
| worked in teams where a software engineer is a respected
| professional enabled to make their own technical decisions. I
| found the second option to create higher quality software
| faster.
|
| Also not sure what you mean by additional effort? Created_at,
| updated_at or soft-deletes are part of most proper frameworks.
| In Spring all you need is an annotation, I've been using those
| in major projects and implementation cost is around a few
| seconds with so far zero seconds of maintenance effort in years
| of development. At least those fields are solved problems.
| exe34 wrote:
| even better, assume soft deletes and inform them this is how
| delete works - if there's a hard requirement for hard delete,
| they will tell you.
|
| the hypothetical future programmer is you in two weeks.
| crazygringo wrote:
| But what if it's not a technical decision? What if there are
| legal implications around data retention that it's not your
| job to be aware of?
|
| I've been parts of teams where features had to be totally
| thrown out and rebuilt because developers made big
| assumptions that turned out to be wrong, because they didn't
| think it was worth it to check with the product owner.
| Because they _assumed_ it was only a "technical decision",
| or they _assumed_ they understood the customer needs despite
| never actually asking the customer.
|
| This doesn't mean checking with product around each line of
| your code, obviously. But deciding what information gets
| stored in the database, what level of event tracking you do,
| whether deletes are hard or soft -- these have massive
| _product_ implications, and potentially legal ones.
|
| And it _is_ additional effort. Now you have to write tests
| for all those things. Are the timestamps being stored
| correctly? Are the permission bits being stored correctly? Is
| "created_by" coming from the right user? Are we sure a
| malicious user can't spoof that? Do we care? Is "updated_at"
| _actually_ being updated on _every_ row change? But are we
| making sure "updated_at" is _not_ getting changed when we
| import data from a separate table? How often do we remove
| soft-deleted data in order to comply with privacy policies
| and regulations, and with what cron job, and who maintains
| that? Where do alerts go if the cron job fails? What happens
| if that employee leaves? I could go on and on and on.
|
| So that's what I mean by additional effort. It's not "around
| a few seconds". Because it's not just a technical question,
| it's a product one. It's a proper feature that needs to be
| properly defined and properly scoped out and properly tested.
| jeremyjh wrote:
| No. Just no. Put created_at, updated_at on every table. You
| are really grasping to find a problem with it, because
| there isn't one, and its been helpful in literally every
| job I've had for the last 28 years. Product owners don't do
| application support.
| PaulHoule wrote:
| I've seen product owners who _get blindsided every time_ by
| this sort of thing.
|
| On the other hand, in some shops there is a dedicated DBA who
| is in charge of database schemas and possibly everything else.
| Before it became fashionable to create a "service layer" where
| people access the database (now database(s)) throw web
| endpoints, some organizations would put all the database access
| into a set of stored procedures managed by the DBA. Maybe
| that's extreme, but in the real world product owners come and
| go but the database is forever and deserves to have somebody
| speaking out for its interests.
| makeitdouble wrote:
| > And all of this adds complexity, more things to go wrong.
|
| This argument is one of the reason why a backend engineer could
| just add the created_at and updated_at fields without asking a
| product owner.
|
| It doesn't make much sense from the pure product perspective,
| so the standard answer will be "no, let's add them when we have
| a real case they're needed". The product I'm inheriting right
| now misses these fields on half of the tables. Except when you
| really want the data, it won't be there as you're not going
| back in time.
|
| Trying to convince someone that it's worth it will also give
| the impression they're optional, when you already decided you
| need them. So at the end of the day, it's your responsibility
| as an engineer to do what's required, without punting it to non
| technical stakeholders to push your back.
|
| I also wouldn't ask a product manager if they think table
| schema should be orthogonal.
|
| Now keeping or not IPs or user IDs in a table is a whole
| different story and requires a lot more consulting, and not
| just with the PO.
| escot wrote:
| In a good work environment you'll have some telepathy with your
| product people and you wont have to communicate every minor
| change, which wastes time. Its similar to implementing a UI
| design -- you and the designer will develop a relationship
| where you dont need to communicate every pixel, you fill in the
| blanks and show results, repeat.
| arialdomartini wrote:
| While I like the YAGRI principle very much, I find that adding
|
| - updated_at
|
| - deleted_at (soft deletes)
|
| - created_by etc
|
| - permission used during CRUD
|
| to every table is a solution weaker than having a separate audit
| log table.
|
| I feel that mixing audit fields with transactional data in the
| same table is a violation of the separation of concerns
| principle.
|
| In the proposed solution, updated_at only captures the last
| change only. A problem that a separate audit log table is not
| affected to.
| motorest wrote:
| Event sourcing also works great. You don't need an audit log
| per se if you already track a history of all commands that
| introduced changes to your system.
| arialdomartini wrote:
| Yep. But Event Sourcing comes with its own set of other
| problems. Also, I don't think this would apply to OP's post:
| with Event Sourcing you would not even have those DB tables.
| motorest wrote:
| The DB tables suggested by OP are a kin to snapshots,
| whereas each event would require a separate data store. OP
| is trying to shoehorn event history into the snapshots,
| which hardly makes any sense.
| nine_k wrote:
| Event sourcing and "the right to be forgotten" are not always
| easy to marry.
| motorest wrote:
| > Event sourcing and "the right to be forgotten" are not
| always easy to marry.
|
| The absolute basics is to support snapshots and event
| replay. This is hardly rocket science.
| nine_k wrote:
| If you try to redact a part of the past, it can also
| affect the present, as any time traveler knows.
|
| Let's assume we want to remove every message related to
| user A.
|
| A photo by user B got to be the best of the day because
| it collected most upvotes. Without the A's vote, it's no
| longer so. The photo also got to become the best of the
| month because it was later voted as the top from the
| best-of-the-day entries, and received a prize. Should we
| now play the message stream without the A's upvote,
| things are going to end up radically different, or end up
| in a processing error.
|
| User B was able to send a message to user C, and thus
| start a long thread, because user A had introduced them.
| With user A removed, the message replay chokes at the
| attempt of B to communicate with C.
|
| One way is to ignore the inconsistencies; it deprives you
| of most of the benefits of event sourcing.
|
| Another way is anonymizing: replace messages about user A
| with messages about some null user, representing the
| removed users. This can lead to more paradoxes and
| message replay inconsistencies.
| motorest wrote:
| > If you try to redact a part of the past, it can also
| affect the present, as any time traveler knows.
|
| That's not how snapshots work. You record the state of
| your system at a point in time, and then you keep all
| events that occurred after that point. This means you
| retain the ability to rebuild the current state from that
| snapshot by replaying all events. I.e., event sourcing's
| happy flow.
|
| > User B was able to send a message to user C, and thus
| start a long thread, because user A had introduced them.
| With user A removed, the message replay chokes at the
| attempt of B to communicate with C.
|
| Not really. That's just your best attempt at reasoning
| how the system could work. In the meantime, depending on
| whether you have a hard requirement on retaining messages
| from removed users you can either keep them assigned to a
| deleted user or replace them by deleted messages. This is
| not a problem caused by event sourcing; it's a problem
| caused by failing to design a system that meets it's
| requirements.
| grey-area wrote:
| An audit log table often takes a huge amount of space compared
| to simple fields on the records so there are tradeoffs. Which
| solution is best depends on how important change logs are.
| wodenokoto wrote:
| I kinda agree, but don't underestimate the power of having
| things where people are looking.
|
| Put your documentation in doc strings where the function is
| defined - don't have a separate file in a separate folder for
| that. It might separate concerns, but no one is looking there.
|
| Similarly if those fields aren't nullable, someone trying to
| add new rows will have to fill in something for those metadata
| fields - and that something will now very likely be what's
| needed, rather than not pushing anything to the audit table.
|
| Obviously your app can outgrow these simple columns, but you're
| getting value now.
| simonw wrote:
| Related from a few years ago: PAGNIs, for Probably Are Gonna Need
| Its - https://simonwillison.net/2021/Jul/1/pagnis/
| KingOfCoders wrote:
| OT As the great Alone ("Last Psychiatrist") said, "if you read
| it, it's for you" - IYRIIFY
| silisili wrote:
| I don't like general advice like this, because it's too general.
| For many, it's probably good advice. For others, not so much.
|
| Anyone who has worked at a small company selling to large B2B
| SaaS can attest we get like 20 hits a day on a busy day. Most of
| that is done by one person in one company, who is probably also
| the only person from said company you've ever talked to.
|
| From that lens, this is all overkill. It's not bad advice, it's
| just that it will get quoted for scenarios it doesn't apply.
| Which also apply to K8S, or microservices at large even, and most
| 'do as I say' tech blogs.
| begueradj wrote:
| >I don't like general advice like this, because it's too
| general. For many, it's probably good advice. For others, not
| so much.
|
| That's true for any other good advice you may have heard of.
| ltbarcly3 wrote:
| This is good advice except for deleted_at. Soft deletion is
| rarely smart. Deleted things just accumulate and every time you
| query that table is a new opportunity to forget to omit deleted
| things. Query performance suffers a lot. It's just a needless
| complexity.
|
| Instead, just for the tables where you want to support soft
| delete, copy the data somewhere else. Make a table like `deleteds
| (tablename text not null, data jsonb not null default '{}')` that
| you can stuff a serialized copy of the rows you delete from other
| tables (but just the ones you think you want to support soft
| delete on).
|
| The theory here is: You don't actually want soft delete, you are
| just being paranoid and you will never go undelete anything. If
| you actually do want to undelete stuff, you'll end up building a
| whole feature around it to expose that to the user anyway so that
| is when you need to actually think through building the feature.
| In the meantime you can sleep at night, safe in the knowledge
| that the data you will never go look at anyway is safe in some
| table that doesn't cause increased runtime cost and development
| complexity.
| Aldipower wrote:
| Five years ago everybody would lough about "soft deletes" or
| "marked as deleted". Whoever thought this is a good idea from a
| data protection perspective? You also lying in the face of your
| users with such a behavior. Shame.
| tliltocatl wrote:
| Except almost every database (and most storage devices
| nowadays) works this way - mark an entry as deleted, then batch
| delete a lot of entries during garbage collection. It's
| fundamentally impossible to efficiently erase a record from the
| middle of a file, except maybe by using an encryption tree,
| which would still be fairly inefficient.
|
| Actually erasing data is quite hard. Soft deletes doesn't add
| any new lies, they just move the lies to the upper layer.
| Aldipower wrote:
| Come on. With a manual "marked as deleted" it stays as this
| forever, it is not deleted and never will, and the "deleted"
| data lands also in database backups, is still query-able and
| so on. I do not care if the deleted data stays for a while on
| disk or in memory as long it will be eventually deleted by
| the garbage collector and isn't query-able anymore.
| nexo-v1 wrote:
| Shipped and supported enough startup apps to learn this the hard
| way: users will delete things they shouldn't, and you will be
| asked to explain or undo it. Soft deletes and basic metadata
| (created_at, deleted_by, etc.) have saved me multiple times --
| not for some future feature, just for basic operational sanity.
| IshKebab wrote:
| Yes! Why something happened is incredibly important. Gitlab made
| this mistake _hard_. We have a medium sized instance with some
| complex CI pipelines and often they 'll just get cancelled and it
| doesn't say why or even who by. And anyone can do it! The only
| option is to ask the entire company "did anyone cancel this?"
| optician_owl wrote:
| It's a terrible post. What it suggests is to turn your head off
| and follow overly generalised principle. I guess when somebody
| invent yet another acronym it is a red flag.
|
| Data has its own life cycles in every area it passes through. And
| it's part of requirements gathering to find those cycles: the
| dependent systems, the teams, and the questions you need to
| answer. Mindlessly adding fields won't save you in every
| situation.
|
| Bonus point: when you start collecting questions while designing
| your service, you'll discover how mature your colleagues'
| thinking is.
| KronisLV wrote:
| I don't get why all of the big RDBMSes (PostgreSQL,
| MariaDB/MySQL, SQL Server, Oracle, ...) don't seem to have built
| in support for soft deletes up front and center?
| CREATE TABLE ... WITH SOFT DELETES
|
| Where the regular DELETE wouldn't get rid of the data for real
| but rather you could query the deleted records as well, probably
| have timestamps for everything as a built in low level feature,
| vs having to handle this with a bunch of ORMs and having to
| remember to put AND deleted_at IS NULL in all of your custom
| views.
|
| If we like to talk about in-database processing so much, why
| don't we just put the actual common features in the DB, so that
| toggling them on or off doesn't take a bunch of code changes in
| app, or that you'd even be able to add soft deletes to any legacy
| app that knows nothing of the concept, on a per table basis or
| whatever.
| moritzwarhier wrote:
| Maybe my intuition is wrong, but to me this sounds like a
| violation of the principle of least power.
|
| "Soft deletes" is just a name for a regular write operation,
| with specific semantics.
|
| Adding a layer of magic to the DB for this doesn't seem right
| to me.
|
| And applications could have many different requirements for
| soft deletes, like the article points out. For example, the
| simplest version would be just a boolean "deleted" column, but
| it could also be "deleted_at", "deleted_by", etc.
|
| All of these cases require an bunch of code changes anyway, and
| the more complex ones could interfere with an implementation of
| this feature at the database level: such a transparent
| implementation couldn't access app-specific concerns such as
| user data, for example.
|
| Adding soft deletes to a legacy app that knows nothing about it
| would only work for a boolean flag and a maybe date-time value,
| unless the DBMS would also offer triggers for soft deletes etc?
|
| Seems to me to that this capability would make a DBMS much more
| complicated.
| Spivak wrote:
| I don't think it's all magic because you have to implement
| soft deletes in your application layer and it would be nice
| to have a little DB support for it. It doesn't have to be
| some big thing, just the ability for selects and such to work
| only on undeleted elements by default unless you ask for them
| would be nice so it doesn't pollute your code and make you
| have to always remember to point the gun away from your foot.
| PaulHoule wrote:
| I'd argue that what SQL needs is better facilities for code
| reuse, metaprogramming and such, it ought to give you the
| tools that you can make something that lets you add
| something to the language such that you can add
| ... WITH SOFT UPDATES
|
| and it adds to the table definition as well as to the
| schema that will cause subsequent statements to be
| rewritten. There's a lot of interesting logic (in the
| literal sense) in SQL that is hidden by a strange,
| irregular syntax that is more obvious in other approaches
| to databases such as Datalog. I think it was little
| appreciated outside the hardcore semantic web community
| that you could compile SPARQL + OWL to SQL and get powerful
| inference facilities. SQL is a great target for
| metaprogramming precisely because it is not Turing complete
| and that a code generator doesn't have to think at all
| about the order that events are sequenced in. It's kinda
| sad that metaprogramming tools for SQL are almost all pre-
| Chomsky and pre-dragon book internal DSLs like JooQ and
| SQLAlchemy which have their charms (JooQ's excellent
| integration with Java IDEs) but fall short of what could be
| done with SQL-to-SQL and SQL-to-X transformations.
|
| Stored procedures are great but many shops don't use them
| for various reasons. It doesn't help that they look like a
| mix of FORTRAN and COBOL and also come in a few variations
| from the (better) set-based PL/SQL of Oracle to the (worse)
| Transact-SQL based stored proc of Microsoft SQL and
| PostgresSQL. The other day I talked with Krisztian Szabo of
|
| https://schemamap.io/
|
| who developed a compiler that writes stored procs that do
| database synchronization.
|
| On the other hand, if you've got access to the internals of
| the frickin' database I think you can do something better
| than the ordinary application level soft updates. For
| instance a "customer record" might very well be not just a
| row in one table but maybe 15 rows in four tables that are
| inserted in a transaction and you want to be able to
| undelete them as a unit.
| hnthrow90348765 wrote:
| >Adding soft deletes to a legacy app
|
| As an aside, I've never found this to be worth it since you
| have to change too much and re-test everything for minimal
| user benefit and time savings. The effort is way worse if the
| code is not great in the first place. It can be a great
| decision to make before everything is written.
|
| Maybe it's worth it for files which are hard to reproduce,
| but you can also rely on DB backups to get those back. If
| people are regularly deleting things they're not supposed to,
| you're better off removing the user-facing delete actions,
| limiting the action to specific users, etc.
| hypeatei wrote:
| Temporal tables in SQL server fit this use-case[0], I think.
|
| 0: https://learn.microsoft.com/en-us/sql/relational-
| databases/t...
| rjbwork wrote:
| Available on postgres as an extension. It's a bit jank and
| doesn't have language integrated clauses like sql server.
| ajuc wrote:
| Oracle has flashback queries (SELECT ... AS OF timestamp).
|
| It's one of these things that are available but most people
| ignore it and implement it manually with created_at
| updated_at deleted_at columns etc. I think one reason for
| this is lack of support in ORMs and lack of standardization
| between RDBMSes.
| mastax wrote:
| Also System-Versioned Tables in MariaDB:
| https://mariadb.com/kb/en/system-versioned-tables/
| crazygringo wrote:
| Because it's too dependent on business logic.
|
| Different products will handle soft deletes differently. Which
| queries need to include soft-deleted rows and which don't? What
| about different levels of soft deletes, e.g. done by the user
| (can be undone by user) vs. done by an admin (can't be undone
| by user)?
|
| Implementing soft deletes yourself isn't hard. Yes you'll have
| to make a bunch of decisions about how they work in every
| circumstance, but that's the point.
| marcosdumay wrote:
| It's just not bothersome enough to deviate from the standard.
|
| If they did this, nobody would use it. They do lots of more
| useful things that people don't use because it's not portable.
|
| There's a sibling comment about temporal databases. Those solve
| a very bothersome problem, so a few people use them. That means
| that there's a chance soft deletes get adopted as a side effect
| of a much more complex standard.
| ajuc wrote:
| Oracle has this already. SELECT ... AS OF timestamp.
|
| It needs to be enabled of course and it's not free.
| perrygeo wrote:
| The whole model of RDBMS is based on mutable tuples; soft
| deletes don't make much sense as an intrinsic part of that
| model. If you want soft deletes, you create an application
| layer or use a different data model.
|
| Most of the time if you want "soft deletes", you really want an
| immutable log so that you time travel to any point in the
| history. XTDB and Datomic are worth looking at if you want to
| solve the problem at the data model level.
| 8bitme wrote:
| For `updated_at` and `deleted_at` making them nullable and null
| until touched is incredibly useful.
|
| Answering queries like how many of these were never updated? Or
| how many of these were never cancelled?
| kamatour wrote:
| Why can't databases just remember stuff we delete, like a trash
| can?
| fwip wrote:
| Assuming you're serious, there are two main reasons. You want
| to regain storage space after you delete things, and sometimes
| you want to actually delete things (e.g. to be in compliance
| with regulations).
| ajuc wrote:
| They do if you tell them. Both Oracle, mssql and postgresql
| have this implemented. I only used it in Oracle - it's called
| flashback queries - you do SELECT ... AS OF timestamp and you
| can see state of the table from the past.
|
| It needs to be enabled tho, and it has some problems (for
| example you need to remove data for legal reasons sometimes).
|
| Reading this in thread is very frustrating with people
| reinventing the wheel and asking why it wasn't invented
| already:)
| yellowapple wrote:
| The problem with updated_at and updated_by is that a given record
| could experience multiple updates by multiple people at multiple
| times, and you'd only have visibility into the most recent.
|
| The logical conclusion here is to log the updates (and creations
| and deletions and undeletions and such) themselves:
| CREATE TABLE foo_log (id, foo_id,
| whodunnit, action,
| performed_at, column_1,
| column_2, -- ...
| column_n);
|
| Technically you don't even need the "foo" table anymore, since
| you can reconstruct its contents by pulling the most recent
| transaction for a given foo_id and discarding the reconstructed
| record if the most recent action on it was a deletion. Probably
| still a good idea to create a view or somesuch for the sake of
| convenience, but the point of this is that the log itself becomes
| the record of truth - and while this approach does cost some disk
| space (due to duplicated data) and read performance (due to the
| more complex query involved), it's invaluable for tracking down a
| record's full lifecycle. Even better if you can enforce append-
| only access to that table.
|
| This is a pretty typical approach for things like bookkeeping and
| inventory management (though usually those are tracking the
| deltas between the old and new states, instead of recording the
| updated states directly as the above example would imply).
| PaulHoule wrote:
| Funny I've been developing an adaption layer that implements the
| functionality that I use in
|
| https://docs.python-arango.com/en/main/
|
| over tables in Postgres that has a PRIMARY _key and a JSONB
| document field. The issue is that I have a number of prototypes
| I've developed with arangodb but the license is awful and I don't
| feel like I can either open source _or_ commercialize any of them
| until I 'm running on an open source database.
|
| It's a fun project because I don't need to support everything in
| python-arango, in fact I don't even need to support 100% of the
| features I use because I am free to change my applications. Also
| it's a chance to make the library that I _really_ want to use so
| already it has real integer and UUID primary keys.
|
| I just added a feature to have the library manage _created and
| _updated fields not just because I thought it was good in general
| but it was a feature I needed for a particular application, a
| crawler that fetches headlines from the HN API. I want to fetch
| headlines right away so I can avoid submitting duplicates but I
| also want accurate counts of how many votes and comments articles
| got and that involves recrawling again in two weeks. Of course
| _created and _updated are helpful for that.
| r7n wrote:
| We did exactly this when designing StatleyDB. We realized there
| are a set of metadata fields[0] that almost everyone needs when
| modeling data. Our approach takes this a step further in that we
| always track these fields transparently so if you forget to add
| them to your schema initially for any reason, that's okay, you
| can always add them later!
|
| [0] https://docs.stately.cloud/schema/fields/#metadata-fields
| imoreno wrote:
| It's not really YAGNI if you need it to debug, is it?
___________________________________________________________________
(page generated 2025-04-24 23:02 UTC)