hngopher.com

       [HN Gopher] YAGRI: You are gonna read it
       ___________________________________________________________________
        
       YAGRI: You are gonna read it
        
       Author : escot
       Score  : 303 points
       Date   : 2025-04-23 21:47 UTC (1 days ago)
        
 (HTM) web link (www.scottantipa.com)
 (TXT) w3m dump (www.scottantipa.com)
        
       | al_borland wrote:
       | I agree with this as written, as think it's important to have
       | some degree of forethought when building out the DB to plan for
       | future growth and needs.
       | 
       | That said, the monkey paw of this would be someone reading it and
       | deciding they should capture and save all possible user data,
       | "just in case", which becomes a liability.
        
       | arjonagelhout wrote:
       | A little while back, I had a conversation with a colleague about
       | sorting entries by "updated at" in the user interface, and to my
       | surprise this was not added by the backend team.
       | 
       | Many of these "we are going to need it"s come from experience.
       | For example in the context of data structures (DS), I have made
       | many "mistakes" that I do correctly a second time. These mistakes
       | made writing algorithms for the DS harder, or made the DS have
       | bad performance.
       | 
       | Sadly, it's hard to transfer this underlying breadth of knowledge
       | and intuition for making good tradeoffs. As such, a one-off tip
       | like this is limited in its usefulness.
        
         | SchemaLoad wrote:
         | Somewhat related, but I suggest having both the record updated
         | at, and some kind of "user editing updated at". As I've
         | encountered issues where some data migration ends up touching
         | records and bumping the updated at, which shocks users since
         | they see the UI reshuffle and think they have been hacked when
         | they see the records updated at a time they didn't update them.
        
           | XorNot wrote:
           | I mean this is what audit logs are for I'd say: generally
           | speaking you want to know what was changed, by who and why.
           | 
           | So really you probably just want a reference to the tip of
           | the audit log chain.
        
         | tcdent wrote:
         | Database schemas being perfect out-of-the gate was replaced by
         | reliable migrations.
         | 
         | If it's not data that's essential to serving the current
         | functionality, just add a column later. `updated_at` doesn't
         | have to be accurate for your entire dataset; just set it to
         | `NOW()` when you run the migration.
        
           | SOLAR_FIELDS wrote:
           | "Reliable migrations" almost seems like an oxymoron.
           | Migrations are complicated, difficult and error prone. I
           | think there's a good takeaway here around good initial schema
           | design practices. The less you have to morph your schema
           | overtime, the less of those risky migrations need to run.
        
             | tcdent wrote:
             | My experience over the last decade has been different.
             | 
             | Use a popular framework. Run it against your test database.
             | Always keep backups in case something unforseen happens.
             | 
             | Something especially trivial like adding additional columns
             | is a solved problem.
        
               | __float wrote:
               | My experience has not been so smooth. Migrations are
               | reasonable, but they're not free and "always keeps
               | backups" sounds like you'd tolerate downtime more than I
               | would.
               | 
               | Even in the best case (e.g. basic column addition), the
               | migration itself can be "noisy neighbors" for other
               | queries. It can cause pressure on downstream systems
               | consuming CDC (and maybe some of those run queries too,
               | and now your load is even higher).
        
           | imcritic wrote:
           | Still depends on what the data represent: you could get
           | yourself in a storm of phone calls from customers if after
           | your latest release there's now a weird note saying their
           | saved document was last updated today.
           | 
           | "HOW DARE YOU MODIFY MY DOCUMENTS WITHOUT MY..."
        
           | phire wrote:
           | Sure, migrations are bearable (especially ones that only add
           | columns).
           | 
           | But for the example of the "updated_at" column, or "soft
           | delete" functionality, you only find out you need it because
           | the operations team suddenly discovered they needed that
           | functionality on existing production rows because something
           | weird happened.
        
           | smithkl42 wrote:
           | In C#-land, we just have it as a standard that ~every table
           | inherits from `ITrackable`, and we wrote a little EF plugin
           | to automatically update the appropriate columns.
           | 
           | public interface ITrackable { DateTime CreatedOn {get; set;}
           | DateTime ModifiedOn {get; set;} }
           | 
           | Saves so much time and hassle.
        
       | reillyse wrote:
       | One thing I do quite frequently which is related to this (and
       | possibly is a pattern in rails) is to use times in place of
       | Booleans.
       | 
       | So is_deleted would contain a timestamp to represent the
       | deleted_at time for example. This means you can store more
       | information for a small marginal cost. It helps that rails will
       | automatically let you use it as a Boolean and will interpret a
       | timestamp as true.
        
         | acedTrex wrote:
         | This one little change alone can bring such huge benefits
         | later.
        
         | noman-land wrote:
         | This is all well and good until you need to represent something
         | that happened on Jan 1 1970 00:00 UTC.
        
           | jchw wrote:
           | 32-bit UNIX timestamps are often signed so you can actually
           | go before that, but most UNIX timestamps are 64-bit now,
           | which can represent quite a larger range. And SQL datetime
           | types might have a totally different range.
           | 
           | Not that it _really_ matters; deleted_at times for your
           | database records will rarely predate the existence of said
           | database.
        
             | noman-land wrote:
             | It's not about the scale, it's that `if (0)` will evaluate
             | to `false` in many languages.
        
               | xp84 wrote:
               | So you're still fine as long as you're not tracking
               | things that were deleted on that exact instant 50 years
               | ago, a safe assumption, for instance, for things that
               | happened in your application that has only existed for
               | less time than that. That said, I haven't ever seen this
               | implemented in a way that casts. It's implemented with
               | scopes in the ORM, usually.
               | MyModel.nondeleted.where(<criteria>)
               | 
               | etc.
               | 
               | which generates a query with "WHERE deleted_at IS NULL"
               | 
               | 1-1-1970 is fine.
        
               | jchw wrote:
               | In addition to the sibling comment, which is exactly
               | right (you should be using a nullable column here, if
               | you're using SQL, for multiple reasons) I reckon this a
               | design issue in the programming language that is largely
               | unrelated to how you model the database. It's pretty easy
               | to run into bugs especially if you compound it with other
               | quirky APIs, like strcmp: `if (strcmp(a, b)) // forgot to
               | do == 0; accidentally backwards!` -- So really, you just
               | don't have much of a choice other than to tread carefully
               | and enable compiler warnings. Personally in this case I'd
               | use an Optional wrapper around the underlying timestamp
               | type anyways, if I needed to be able to represent the
               | UNIX timestamp at 0 as well as an empty state.
        
           | BeFlatXIII wrote:
           | Leave it null for non-deleted items.
        
         | jfengel wrote:
         | I consider booleans a code smell. It's not a bug, but it's a
         | suggestion that I'm considering something wrong. I will
         | probably want to replace it with something more meaningful in
         | the future. It might be an enum, a subclass, a timestamp,
         | refactoring, or millions of other things, but the Boolean was
         | probably the wrong thing to do even if I don't know it yet.
        
           | xp84 wrote:
           | This seems at first like a controversial idea, but the more I
           | think about it the more I like this thought technology.
           | Merely the idea of asking myself if there's a better way to
           | store a fact like that will potentially improve designs.
        
             | crdrost wrote:
             | The enum idea is often wise; also: for just an example that
             | has probably occurred a hundred thousand times across the
             | world in various businesses...
             | 
             | Original design: store a row that needs to be reported to
             | someone, with an is_reported column that is boolean.
             | 
             | Problem: one day for whatever reason the ReporterService
             | turns out to need to run two of these in parallel. Maybe
             | it's that the reporting is the last step after ingestion in
             | a single service and we need to ingest in parallel. Maybe
             | it's that there are too many reports to different people
             | and the reports themselves are parallelizable (grab 5
             | clients, grab unreported rows that foreign key to them,
             | report those rows... whoops sometimes two processes choose
             | the same client!)... Maybe it's just that these are run in
             | Kubernetes and if the report happens when you're rolling
             | pods then the request gets retried by both the dying pod
             | and the new pod.
             | 
             | Alternative to boolean: unreported and reported records
             | both live in the `foo` table and then a trigger puts a row
             | for any new Foos into the `foo_unreported` table. This
             | table can now store a lock timestamp, a locker UUID, and
             | denormalize any columns you need (client_id) to select
             | them. The reporter UPDATEs a bunch of rows reserving them,
             | SELECTs whatever it has successfully reserved, reports
             | them, then DELETEs them. It reserves rows where the lock
             | timestamp IS NULL or is less than now minus 5 minutes, and
             | the Reporter itself runs with a 5 minute timeout. The DB
             | will do the barest amount of locking to make sure that two
             | UPDATES don't conflict, there is no risk of deadlock, and
             | the Boolean has turned into whether something exists in a
             | set or not.
             | 
             | A similar trick is used in the classic Python talk "Stop
             | Writing Classes" by @jackdied where a version of The Game
             | of Life is optimized by saying that instead of holding a
             | big 2D array of true/false booleans on a finite gameboard,
             | we'll hold an infinite gameboard with a set of (x,y) pairs
             | of living cells which will internally be backed by a
             | hashmap.
        
           | lud_lite wrote:
           | Booleans also force the true/false framing.
           | 
           | E.g. a field called userCannotLoginWithoutOTP.
           | 
           | Then in code "if not userCannotLoginWithoutOTP or otpPresent
           | then..."
           | 
           | Thus may seem easy until you have a few flags to combine and
           | check.
           | 
           | An enum called LoginRequirements with values Password,
           | PasswordAndOTP is one less negation and easier to read.
        
             | NitpickLawyer wrote:
             | For me enums win especially when you consider that you can
             | get help from your environment every time you add/remove
             | stuff. Some languages force you to deal with the changes
             | (i.e. rust) or you could add linter rules for other
             | languages. But you're more likely to catch a problem before
             | it arises, rather than deal with ever increasing bool
             | checks. Makes reasoning about states a lot easier.
        
           | retropragma wrote:
           | Tangential: I was recently wishing that bitwise flags had
           | better support in Postgres. For now, bools are just easier to
           | work with
        
           | crabmusket wrote:
           | The way I think about it: a boolean is usually an _answer to
           | a question_ about the state, not the _state itself_.
           | 
           | A light switch doesn't have an atomic state, it has a range
           | of motion. The answer to the question "is the switch on?" is
           | a boolean answer to a question whose input state is a range
           | (e.g. is distance between contacts <= epsilon).
        
         | hadriendavid wrote:
         | << Anytime you store Boolean, a kitten dies >> Nobody has ever
         | said that but nobody wants any kitten to die so nobody has ever
         | challenged me anytime I use that statement.
        
       | seveibar wrote:
       | Agree, although the acronym in the article could be interpreted
       | to mean "you are going to read it, so index it appropriately",
       | which is sort of bad advice and can lead to overindexing. There
       | is probably something better for "add appropriate and
       | conventional metadata" (the author suggests updated_at,
       | created_at etc)
       | 
       | Not a huge fan of the example of soft delete, i think hard
       | deletes with archive tables (no foreign key enforcement) is a
       | much much better pattern. Takes away from the main point of the
       | article a bit, but glad the author hinted at deleted_at only
       | being used for soft deletes.
        
       | mkhalil wrote:
       | Event-sourcing solves this. And with how cheap storage is, it
       | should be more prevalent in the industry. IMO the biggest thing
       | holding it back is that there isn't a framework that's plug-and-
       | play (say like Next.js is to React) that provides people with
       | that ability.
       | 
       | I've been working on one in Typescript (with eventual re-writes
       | in other langs. like Rust and Go), but it's difficult even coming
       | up with conventions.
        
         | rising-sky wrote:
         | Care to share what you've been working on?
        
         | code_biologist wrote:
         | Event sourcing is an expensive solution and I don't mean from a
         | storage perspective -- it burns engineering cognitive
         | horsepower quickly on things that don't matter. Do it if you're
         | in finance or whatever. Having been burned by my own "let's
         | event source" impulse on data change tracking systems, I now
         | prefer less sophisticated solutions. Figuring out how to deal
         | with slow projections, watching a projection rebuild go from
         | minutes to hours to a few days as a system I expected to handle
         | a few events/minute go to 20 events/second. Fancy caches can't
         | save you if you want to use that vaunted ability to reconstruct
         | from scratch. Event schema evolution also presents difficult
         | tradeoffs: when old events stop having meaning or evolve in
         | meaning you either end up adding on new event subtypes and
         | variants leaving old cruft to accumulate, or you do migrations
         | and edit history on really large tables.
         | 
         | I'd counsel anyone considering event sourcing to use more "low
         | power" solutions like audit logs or soft deletes (if really
         | necessary) first if possible.
        
           | 000ooo000 wrote:
           | Appreciate your perspective, and it makes me wish there was
           | some kind of online 'engineers learning from their mistakes'
           | forum (rare to see "I burned myself"). To hear hard won
           | knowledge distilled like this is a nice reminder to spend
           | ones complexity budget wisely.
        
           | koakuma-chan wrote:
           | Why not keep the projection always updated?
        
       | evanmoran wrote:
       | Just curious, how do people feel about this general style of soft
       | deletes currently? Do people still use these in production or
       | prefer to just delete fully or alternatively move deleted rows to
       | a separate tables / schema?
       | 
       | I find the complexity to still feel awkward enough that makes me
       | wonder if deleted_at is worth it. Maybe there are better patterns
       | out there to make this cleaner like triggers to prevent deletion,
       | something else?
       | 
       | As for the article, I couldn't agree more on having timestamps /
       | user ids on all actions. I'd even suggest updated_by to add to
       | the list.
        
         | swagasaurus-rex wrote:
         | I think soft deletes using timestamptz are a good thing.
         | 
         | Deleting rows directly could mean you're breaking references.
         | For example, say you have a product that the seller wants to
         | delete. Well, what happens if customers have purchased that
         | product? You still want it in the database, and you still want
         | to fulfill the orders placed.
         | 
         | Your backend can selectively query for products, filter out
         | deleted_at for any customer facing queries, but show all
         | products when looking at purchase history.
         | 
         | There are times when deleting rows makes sense, but that's
         | usually because you have a write-heavy table that needs
         | clearing. Yes, soft-deletes requires being careful with WHERE
         | statements filtering out deleted rows, but that's a feature not
         | a bug.
        
           | pmontra wrote:
           | > what happens if customers have purchased that product? You
           | still want it in the database, and you still want to fulfill
           | the orders placed.
           | 
           | You might still want to show to those customers their
           | purchase history including what they bought 25 years ago. For
           | example, my ISP doesn't have anymore that 10 Mb/s fiber optic
           | product I bought im 2000, because it was superseded by 100
           | Mb/s products and then by 1 Gb/s ones. It's also not my ISP
           | anymore but I use it for the SIM in my phone. That also
           | accumulated a number of product changes along the years.
           | 
           | And think about the inventory of eshops with a zillion
           | products and the archive of the pady orders. Maybe they keep
           | the last few years, maybe everything until the db gets too
           | large.
        
         | imcritic wrote:
         | Always soft-deletion first. Then it gets exported to a separate
         | archive and only then, after some time and may be attempted to
         | be fully deleted from the initial base.
        
         | zie wrote:
         | If you have a good audit log, it really doesn't matter. You can
         | always restore it if need be.
         | 
         | If you have no audit log(or a bad one), like lots of apps, then
         | you have to care a lot.
         | 
         | Personally, I just implement a good audit log and then I just
         | delete with impunity. Worst case scenario, someone(maybe even
         | me) made a mistake and I have to run undo_log_audit() with the
         | id of the audit log entry I want to put back. Nearly zero
         | hassle.
         | 
         | The upside, when something goes wrong, I can tell you who, what
         | and when. I usually have to infer the why, or go ask a human,
         | but it's not usually even difficult to do that.
        
           | dml2135 wrote:
           | Can you share more about what makes a good audit log? My
           | company doesn't currently have one and I'm a little lost on
           | where to start.
           | 
           | Should this be at the application code level, or the ORM, or
           | the database itself?
        
             | jimbokun wrote:
             | Probably application level in most cases as those other
             | levels probably don't have all the information you want to
             | include.
        
             | zie wrote:
             | That depends on where the data you need to keep track of is
             | and your architecture. The important thing is, you want
             | your audit log to be able to tell you:                 *
             | Who       * What       * When       * Ideally Why
             | 
             | For any change in the system. Also when storing the audit
             | log, take into account that you might need to undo things
             | that happened(not just deletes). For instance maybe some
             | process went haywire and inserted 100k records it wasn't
             | supposed to. A good audit log, you should be able to run
             | something like undo_log_audit(rec1, rec100k) and it will do
             | the right thing. I'm not saying that code needs to exist
             | day 1, but you should take into account the ability to do
             | that when designing it.
             | 
             | Also you need to take into account your regulatory
             | environment. Sometimes it's very very important that your
             | audit logs are write once, and read only afterwards and are
             | stored off machine, etc. Other times it's just for internal
             | use and you can be a little more lax about date integrity
             | of your audit logs.
             | 
             | Our app is heavily database centric. We push into the DB
             | the current unix user, the current PID of the process
             | connecting to the DB, etc(also every user has their own
             | login to the DB so it handles our authentication too). This
             | means our database(Postgres) does all of the audit logging
             | for us. There are plenty of Postgres audit logging
             | extensions. We run 2 of them. One that is trigger based
             | creating entries in a log_audit table(which the
             | undo_log_audit() code uses along with most reporting use
             | cases) and a second one that writes out to syslog(so we can
             | move logs off machine and keep them read only). We are in a
             | regulated industry that gets audited regularly however. Not
             | everyone needs the same level of audit logging.
             | 
             | You need to figure out how you can answer the above
             | questions given your architecture. Normally the "Why"
             | question is hard to answer without talking with a human,
             | but unless you have the who, what and when, it's nearly
             | impossible to even get to the Why part of the question.
        
             | lexh wrote:
             | It is Postgres specific, but I've gotten a lot of mileage
             | out of the advice in this article:
             | 
             | https://supabase.com/blog/postgres-audit
        
             | mrkeen wrote:
             | I once worked in a small VB6-based team - you can probably
             | guess the attitudes and surrounding tech were just as out
             | of date.
             | 
             | I tried to push for using svn, rather than just making
             | copies of our source code folders and adding dates to them.
             | 
             | My manager allowed me to use svn, but to make sure I also
             | did things the proper way by making copies of the source
             | code folders.
             | 
             | That's the current level of discourse around audit logs.
             | Write down what happened using your data tables ... but
             | write down what _really_ happened in the audit logs.
             | 
             | At some point you should just lean into putting audit logs
             | first (just like developers reach for the git first).
        
         | refset wrote:
         | > Maybe there are better patterns out there to make this
         | cleaner
         | 
         | SQL:2011 temporal tables are worth a look.
        
         | j_w wrote:
         | Financial world: records have a "close" or "expire" date which
         | is then purged after some period of time. A deletion doesn't
         | just happen, the record is updated to be "closed" or "expired"
         | and some time after that it's deleted.
         | 
         | Something like a loan could live in a production environment
         | for well over a year after closing, while an internal note may
         | last just a month.
        
         | PeterStuer wrote:
         | There can be legal requirements to retain data for a specified
         | time for law enforcement and audits, while at the same time
         | other legal requirements have you requiring to delete data upon
         | customer request.
         | 
         | Doing this with pure 'hard' deletes is not possible, unless you
         | maintain 2 different tables, one of which would still have the
         | soft delete explicit or implicit. You could argue the full db
         | log would contain the data for the former requirement, but
         | while academicly correct this does not fly in practice.
        
         | metanonsense wrote:
         | In our product, we have different strategies depending on the
         | requirements. Sometimes, we just delete. Sometimes, we do soft
         | delete with timestamps. Sometimes, we have a history table with
         | or without versioned entities. Sometimes, we have versions in
         | the table. Sometimes, we have an audit log. Sometimes, we use
         | event sourcing (although everyone in the team hates it ;-)
        
       | ffitch wrote:
       | curious that both YAGNI and YAGRI arguments could realistically
       | be made for the same fields. guess it boils down to whether
       | someone's YAGRI is stronger than their colleague's YAGNI ( :
        
       | nailer wrote:
       | I have a different way of thinking about this: data loss. If you
       | are throwing away data about who performed a delete it is a data
       | loss situation. You should think about whether that's OK. It
       | probably isn't.
        
       | zie wrote:
       | Literally just have a good audit log and then you get all of this
       | for free and more.
        
       | shaunofthedev wrote:
       | The perils of UI design wagging the horse.
       | 
       | I like the heuristics described here. However if these things
       | aren't making it into a product spec where appropriate, then I
       | smell some dysfunction that goes beyond what's being stored by
       | default.
       | 
       | Product need (expressed as spec, design, etc) should highlight
       | the failure cases where we would expect fields like these to be
       | surfaced.
       | 
       | I'd hope that any given buisness shouldn't need someone with
       | production database access on hand to inform as to why/when/how
       | 'thing' was deleted. Really we'd want the user (be it 'boss' or
       | someone else) to be able to access that information in a
       | controlled manner.
       | 
       | "What information do we need when something goes wrong?". Ask it.
       | Drill it. Ask it again.
       | 
       | That said, if you can't get those things, this seems a fine way
       | to be pragmatic.
        
         | Hasnep wrote:
         | "Wagging the horse" is a great turn of phrase, better than
         | "putting the cart before the dog."
        
       | sgarland wrote:
       | > But I've never heard someone complain about a table having too
       | many timestamps.
       | 
       | I do. Each one is 8 bytes. At the billions of rows scale, that
       | adds up. Disk is cheap, but not free; more importantly, memory is
       | not cheap at all.
        
       | hoipaloi wrote:
       | Counter point: "Soft Deletion Probably Isn't Worth It"
       | https://brandur.org/soft-deletion
        
         | jimbokun wrote:
         | But auditing probably is.
        
       | drewcoo wrote:
       | To every point there is a counterpoint. So what?
        
         | yoshicoder wrote:
         | Well in the same vain that we discuss "points" and talk about
         | the merits, its useful to discuss and understand their counter
         | points. I for one did not know about this and thought it was
         | insightful when building a product that hasn't fully been
         | scoped out and is more greenfield
        
       | im_anon_on_hn wrote:
       | As an acronym, it's easy to be misremembered as "You ARENT gonna
       | read it" (based on the popularity of yagni) - and have the
       | opposite advice spread..
        
         | 000ooo000 wrote:
         | How about: IIPTWNIRTIoSoDIQMBSNRTL
         | 
         | (It Is Probable That While Not Immediately Required The
         | Implementation of Storage of Data In Question May Be Simpler
         | Now Rather Than Later)
         | 
         | I've gone ahead and included additional detail in the acronym
         | in the event that the clarity is required later, as this would
         | be difficult to retrofit into a shorter, more-established
         | acronym.
        
           | willismichael wrote:
           | Why do I feel like that should be a cheat code in a FPS game?
        
         | __float wrote:
         | YRIS: you'll read it someday
        
       | oofManBang wrote:
       | How do you distinguish from "you aren't gonna read it"? The
       | acronym is poorly designed.
        
       | gorgoiler wrote:
       | Additionally, mutable fields will quite often benefit from having
       | a separate _edit_ table which records the old value, the new
       | value, who changed it, and when. Your main table's _created_ and
       | _updated_ times can be a function of (or a complement to) the
       | _edit_ table.
       | 
       | It is tempting to supernormalize everything into the relations
       | _object(id, type)_ and _edit(time, actor_id, object_id, key,
       | value)_. This is getting dangerously and excitingly close to a
       | graph database implemented in a relational database! Implement
       | one at your peril -- what you gain in schemaless freedom you also
       | lose in terms of having the underlying database engine no longer
       | enforcing consistency on your behalf.
        
         | crabmusket wrote:
         | > This is getting dangerously and excitingly close to a graph
         | database implemented in a relational database!
         | 
         | This feels like a great unresolved tension in database /
         | backend design - or maybe I'm just not sophisticated enough to
         | notice the solutions?
         | 
         | Is the solution event sourcing and using the relational
         | database as a "read model" only? Is that where the truly
         | sophisticated application developers are at? Is it really
         | overkill for everybody not working in finance? Or is there just
         | not a framework that's made it super easy yet?
         | 
         | Users demand flexible schemas - should we tell them no?
        
         | wavemode wrote:
         | > supernormalize everything into the relations object(id, type)
         | and edit(time, actor_id, object_id, key, value)
         | 
         | I frankly hate this sort of thing whenever I see it. Software
         | engineers have a tendency to optimize for the wrong things.
         | 
         | Generic relations reduce the number of tables in the database.
         | But who cares about the number of tables in the database? Are
         | we paying per table? Optimize for the data model actually being
         | understandable and consistently enforced (+ bonus points for
         | ease of querying).
        
         | motorest wrote:
         | > Additionally, mutable fields will quite often benefit from
         | having a separate edit table which records the old value, the
         | new value, who changed it, and when.
         | 
         | Aren't you describing a non-functional approach to event
         | sourcing? I mean, if the whole point of your system is to track
         | events that caused changes, why isn't your system built around
         | handling events that cause changes?
        
       | smitty1e wrote:
       | YAGRI proponents organized themselves into a community to develop
       | their...YAGRIculture.
       | 
       | I'll show myself out.
        
       | alganet wrote:
       | *_at and *_by fields in SQL are just denormalization + pruning
       | patterns consolidated, right?
       | 
       | Do the long walk:
       | 
       | Make the schema fully auditable (one record per edit) and the
       | tables normalized (it will feel weird). Then suffer with it,
       | discover that normalization leads to performance decrease.
       | 
       | Then discover that pruned auditing records is a good middle
       | ground. Just the last edit and by whom is often enough (ominous
       | foreshadowing).
       | 
       | Fail miserably by discovering that a single missing auditing
       | record can cost a lot.
       | 
       | Blame database engines for making you choose. Adopt an
       | experimental database with full auditing history. Maybe do
       | incremental backups. Maybe both, since you have grown paranoid by
       | now.
       | 
       | Discover that it is not enough again. Find that no silver bullet
       | exists for auditing.
       | 
       | Now you can make a conscious choice about it. Then you won't need
       | acronyms to remember stuff!
        
         | klysm wrote:
         | My current state is have the database be the current state and
         | use logical replication (CDC) to keep the log of changes in
         | case you need it
        
           | datadrivenangel wrote:
           | It is interesting thinking about record changes as a spectrum
           | towards application logs. At some point too much detail is
           | expensive to store, and you must adopt an archival strategy.
        
             | klysm wrote:
             | Really depends on the app. If you have a low throughput
             | line-of-business kind of application you can probably get
             | away with storing everything.
        
           | alganet wrote:
           | If you see it from the pure SQL point of view, you are in the
           | "blame database engines and adopt an experimental solution".
           | 
           | It is the point where you give up modeling the audit as part
           | of the systems tables.
           | 
           | The drawbacks of this choice are often related to retrieval.
           | It depends on the engine.
           | 
           | I once maintained a system that kept a fully working log
           | replicated instance delayed by 24h, ready for retrieval
           | queries, in addition to regular disk backups (slow costy
           | retrieval).
           | 
           | I am more developer than DBA, so I can probably speak more
           | about modeling solutions than infra-centric solutions.
        
             | klysm wrote:
             | Yeah 100% giving up on pure SQL to solve the problem,
             | mainly from the perspective that doing full versioning etc.
             | in SQL is really damn hard.
        
           | globular-toast wrote:
           | The problem with this is the audit log is only at the CRUD
           | level which is often too low. Ambiguities can arise. For
           | example if the question is "who published the article" do you
           | look for a create or do you look for an update with
           | published=true? It's even worse when you consider the schema
           | can change over time, so both can be correct but at different
           | points in time. Event sourcing is the way if you want to
           | capture business-level events.
        
         | lud_lite wrote:
         | Another option is audit info could go to another table or
         | datastore entirely.
         | 
         | If you never use it, that data can be dumped to s3 glacier
         | periodically (e.g. after 90 days).
         | 
         | By losing the foreign key you gain flexibility in what you
         | audit. Maybe audit the operation and not the 20 writes it
         | causes.
        
           | yes_man wrote:
           | Fair enough, but now your application is relying on 100%
           | uptime of AWS and S3 and no network failures in between. And
           | what happens if your transaction goes through, but the
           | request to AWS doesn't? What happens if another operation
           | mutates the target meanwhile before you can retry with
           | current state? Your app is also slowing down since it needs
           | to send the events to S3 and guarantee they got there. Now
           | you are reinventing two-stage commits. Unless you aren't
           | actually making an audit log and don't care if events are
           | guaranteed to be logged?
           | 
           | So like OP said, no silver bullets exist for auditing.
        
             | lud_lite wrote:
             | Correct. This is a system design problem. You want this to
             | be transactional and work at scale? That might be hard to
             | achieve. Maybe if the data can be partioned then each node
             | handles its own auditing in a table ad part of the
             | transaction. There are many possibilities. Allowing
             | inconsistently might be OK too depending on what is
             | required.
        
         | awesome_dude wrote:
         | But wait, there's Event Driven Architectures and Event
         | Sourcing, meaning that the events are your log of edits!
        
           | alganet wrote:
           | Doesn't that also falls on the "blame the database engines
           | and go for an experimental solution"?
           | 
           | I'm not saying databases are blameless. It's just that
           | experiencing the issues they have by yourself is rewarding!
           | 
           | There is also a walk before the long walk of databases. Store
           | things in text files and use basic tools (cat, sed, sh...).
           | 
           | The event driven stuff (like Kafka) reminds me of that. I am
           | not very familiar with it though, just played a little bit
           | with it once or twice.
        
             | awesome_dude wrote:
             | Kind of, the WAL in postgres is effectively an event log,
             | and many people keep replicas of it for backup reasons,
             | which is auditable, kind of meaning that an EDA/Event
             | source is just a shinier version of that?
        
       | guy4261 wrote:
       | Author is very kind! In practice, many times I saw only the
       | CR/CRU of CRUD getting implemented.
       | 
       | For example: as a company aspires to launch its product, one of
       | the first features implemented in any system is to add a new
       | user. But when the day comes when a customer leaves, suddenly you
       | discover no one implemented off-boarding and cleanup of any sort.
        
       | crazygringo wrote:
       | These are not decisions that should be taken solely by whoever is
       | programming the backend.
       | 
       | They need to be surfaced to the product owner to decide. There
       | may very well be reasons pieces of data should _not_ be stored.
       | And all of this adds complexity, more things to go wrong.
       | 
       | If the product owner wants to start tracking every change and by
       | who, that can completely change your database requirements.
       | 
       | So have that conversation properly. Then decide it's either not
       | worth it and don't add any of these "extra" fields you "might"
       | need, or decide it is and fully spec it out and how much
       | additional time and effort it will be to do it as a proper
       | feature. But don't do it as some half-built just-in-case "favor"
       | to a future programmer who may very well have to rip it out.
       | 
       | On a personal project, do whatever you want. But on something
       | professional, this stuff needs to be specced out and accounted
       | for. This isn't a programming decision, it's a product decision.
        
         | hamandcheese wrote:
         | Some things are trivial and nearly free - created_at,
         | updated_at. I don't think engineers need to bring trivialities
         | like this to a "product owner". Own your craft.
        
           | kace91 wrote:
           | A soft delete might not be, for compliance reasons (GDPR and
           | the like). Otherwise I agree.
        
           | danielPort9 wrote:
           | I think the tricky part lies on knowing which things can be
           | done without consulting any product owner. I agree that
           | created_at and updated_at don't cause any harm. deleted_at on
           | the other hand cannot be decided by engineers only (mainly
           | because of GDPR reasons: if something is expected to be
           | totally deleted, then that must be it). As usual, these kind
           | of things are obvious to engineers with years of experience ,
           | not so much to newcomers.
        
           | IanCal wrote:
           | Although those can be more complicated, and it should be
           | clear what they're for and _why_ they exist. Will this result
           | in an object having an updated_at timestamp elsewhere in a
           | larger application? Is it clear which properties that refers
           | to?
        
           | hombre_fatal wrote:
           | I never worked at a place with product owners, but their post
           | made me appreciate my roles where I'm trusted to help design
           | the product myself. Yeesh.
           | 
           | Being unable to even call the shot of whether a database
           | table should have an updated_at or soft-delete sounds like a
           | Dilbertian hellscape to me.
        
           | bayindirh wrote:
           | When the product you're developing is governed by regulations
           | and standards you need to comply, owning your craft is doing
           | things by the book, not adding fields on your own because it
           | might be useful later.
        
             | dogleash wrote:
             | So what? I've worked places with lots of regulation. Part
             | of every development job is learning the product domain. In
             | that case devs become comfortable with reading
             | standard/law/regulations and anticipating when software
             | implementation might interact with the areas covered.
             | 
             | Sure there were people who's job was to offload as much
             | compliance work from everyone else; by turning it into
             | internal requirements, participating in design discussion
             | and specializing in ensuring compliance. But trying to
             | isolate the development team from it is just asking for
             | micromanagers.
        
               | bayindirh wrote:
               | > So what?
               | 
               | Think before you act. The machine has no brain. Use
               | yours.
               | 
               | > Part of every development job is learning the product
               | domain.
               | 
               | Yes.
               | 
               | > In that case devs become comfortable with reading
               | standard/law/regulations and anticipating when software
               | implementation might interact with the areas covered.
               | 
               | This is what I'm saying, too. A developer needs to think
               | whether what they are doing is OK by the regulation
               | they're flying against. They need to ask for permissions
               | by asking themselves "wait, is this OK by the regulation
               | I'm trying to comply?".
               | 
               | > But trying to isolate the development team from it is
               | just asking for micromanagers.
               | 
               | Nope, I'm all for taking initiatives, and against
               | micromanagement. However, I'm also against "I need no
               | permission because I'm doing something amazing" attitude.
               | So own your craft, "code responsibly".
        
               | dogleash wrote:
               | Oh, I thought you were disagreeing with hamandcheese's
               | point that every little decision doesn't need to go
               | through a product owner before anything happens.
        
               | bayindirh wrote:
               | No, not at all. by "the book", I meant regulations, not
               | the management. :)
        
         | Chris2048 wrote:
         | > And all of this adds complexity, more things to go wrong
         | 
         | That's a little vague given this specific example, which
         | appears to be about maintaining some form of informative
         | logging; though I don't think it necessarily needs to be in the
         | form of an DB table.
        
         | chii wrote:
         | > don't do it as some half-built just-in-case "favor" to a
         | future programmer who may very well have to rip it out.
         | 
         | in other words - YAGNI !
        
         | damnitbuilds wrote:
         | Yes. Adding fields to a DB is not a risk-free thing a
         | programmer should just do.
        
         | Cthulhu_ wrote:
         | A product owner may (probably does not) have these things on
         | their radar, it's up to a developer to inform them of industry
         | best practices like these.
        
         | Extasia785 wrote:
         | This entirely depends on the company culture. I worked in teams
         | where every small decision is in the hand of the PO and I've
         | worked in teams where a software engineer is a respected
         | professional enabled to make their own technical decisions. I
         | found the second option to create higher quality software
         | faster.
         | 
         | Also not sure what you mean by additional effort? Created_at,
         | updated_at or soft-deletes are part of most proper frameworks.
         | In Spring all you need is an annotation, I've been using those
         | in major projects and implementation cost is around a few
         | seconds with so far zero seconds of maintenance effort in years
         | of development. At least those fields are solved problems.
        
           | exe34 wrote:
           | even better, assume soft deletes and inform them this is how
           | delete works - if there's a hard requirement for hard delete,
           | they will tell you.
           | 
           | the hypothetical future programmer is you in two weeks.
        
           | crazygringo wrote:
           | But what if it's not a technical decision? What if there are
           | legal implications around data retention that it's not your
           | job to be aware of?
           | 
           | I've been parts of teams where features had to be totally
           | thrown out and rebuilt because developers made big
           | assumptions that turned out to be wrong, because they didn't
           | think it was worth it to check with the product owner.
           | Because they _assumed_ it was only a  "technical decision",
           | or they _assumed_ they understood the customer needs despite
           | never actually asking the customer.
           | 
           | This doesn't mean checking with product around each line of
           | your code, obviously. But deciding what information gets
           | stored in the database, what level of event tracking you do,
           | whether deletes are hard or soft -- these have massive
           | _product_ implications, and potentially legal ones.
           | 
           | And it _is_ additional effort. Now you have to write tests
           | for all those things. Are the timestamps being stored
           | correctly? Are the permission bits being stored correctly? Is
           | "created_by" coming from the right user? Are we sure a
           | malicious user can't spoof that? Do we care? Is "updated_at"
           | _actually_ being updated on _every_ row change? But are we
           | making sure  "updated_at" is _not_ getting changed when we
           | import data from a separate table? How often do we remove
           | soft-deleted data in order to comply with privacy policies
           | and regulations, and with what cron job, and who maintains
           | that? Where do alerts go if the cron job fails? What happens
           | if that employee leaves? I could go on and on and on.
           | 
           | So that's what I mean by additional effort. It's not "around
           | a few seconds". Because it's not just a technical question,
           | it's a product one. It's a proper feature that needs to be
           | properly defined and properly scoped out and properly tested.
        
             | jeremyjh wrote:
             | No. Just no. Put created_at, updated_at on every table. You
             | are really grasping to find a problem with it, because
             | there isn't one, and its been helpful in literally every
             | job I've had for the last 28 years. Product owners don't do
             | application support.
        
         | PaulHoule wrote:
         | I've seen product owners who _get blindsided every time_ by
         | this sort of thing.
         | 
         | On the other hand, in some shops there is a dedicated DBA who
         | is in charge of database schemas and possibly everything else.
         | Before it became fashionable to create a "service layer" where
         | people access the database (now database(s)) throw web
         | endpoints, some organizations would put all the database access
         | into a set of stored procedures managed by the DBA. Maybe
         | that's extreme, but in the real world product owners come and
         | go but the database is forever and deserves to have somebody
         | speaking out for its interests.
        
         | makeitdouble wrote:
         | > And all of this adds complexity, more things to go wrong.
         | 
         | This argument is one of the reason why a backend engineer could
         | just add the created_at and updated_at fields without asking a
         | product owner.
         | 
         | It doesn't make much sense from the pure product perspective,
         | so the standard answer will be "no, let's add them when we have
         | a real case they're needed". The product I'm inheriting right
         | now misses these fields on half of the tables. Except when you
         | really want the data, it won't be there as you're not going
         | back in time.
         | 
         | Trying to convince someone that it's worth it will also give
         | the impression they're optional, when you already decided you
         | need them. So at the end of the day, it's your responsibility
         | as an engineer to do what's required, without punting it to non
         | technical stakeholders to push your back.
         | 
         | I also wouldn't ask a product manager if they think table
         | schema should be orthogonal.
         | 
         | Now keeping or not IPs or user IDs in a table is a whole
         | different story and requires a lot more consulting, and not
         | just with the PO.
        
         | escot wrote:
         | In a good work environment you'll have some telepathy with your
         | product people and you wont have to communicate every minor
         | change, which wastes time. Its similar to implementing a UI
         | design -- you and the designer will develop a relationship
         | where you dont need to communicate every pixel, you fill in the
         | blanks and show results, repeat.
        
       | arialdomartini wrote:
       | While I like the YAGRI principle very much, I find that adding
       | 
       | - updated_at
       | 
       | - deleted_at (soft deletes)
       | 
       | - created_by etc
       | 
       | - permission used during CRUD
       | 
       | to every table is a solution weaker than having a separate audit
       | log table.
       | 
       | I feel that mixing audit fields with transactional data in the
       | same table is a violation of the separation of concerns
       | principle.
       | 
       | In the proposed solution, updated_at only captures the last
       | change only. A problem that a separate audit log table is not
       | affected to.
        
         | motorest wrote:
         | Event sourcing also works great. You don't need an audit log
         | per se if you already track a history of all commands that
         | introduced changes to your system.
        
           | arialdomartini wrote:
           | Yep. But Event Sourcing comes with its own set of other
           | problems. Also, I don't think this would apply to OP's post:
           | with Event Sourcing you would not even have those DB tables.
        
             | motorest wrote:
             | The DB tables suggested by OP are a kin to snapshots,
             | whereas each event would require a separate data store. OP
             | is trying to shoehorn event history into the snapshots,
             | which hardly makes any sense.
        
           | nine_k wrote:
           | Event sourcing and "the right to be forgotten" are not always
           | easy to marry.
        
             | motorest wrote:
             | > Event sourcing and "the right to be forgotten" are not
             | always easy to marry.
             | 
             | The absolute basics is to support snapshots and event
             | replay. This is hardly rocket science.
        
               | nine_k wrote:
               | If you try to redact a part of the past, it can also
               | affect the present, as any time traveler knows.
               | 
               | Let's assume we want to remove every message related to
               | user A.
               | 
               | A photo by user B got to be the best of the day because
               | it collected most upvotes. Without the A's vote, it's no
               | longer so. The photo also got to become the best of the
               | month because it was later voted as the top from the
               | best-of-the-day entries, and received a prize. Should we
               | now play the message stream without the A's upvote,
               | things are going to end up radically different, or end up
               | in a processing error.
               | 
               | User B was able to send a message to user C, and thus
               | start a long thread, because user A had introduced them.
               | With user A removed, the message replay chokes at the
               | attempt of B to communicate with C.
               | 
               | One way is to ignore the inconsistencies; it deprives you
               | of most of the benefits of event sourcing.
               | 
               | Another way is anonymizing: replace messages about user A
               | with messages about some null user, representing the
               | removed users. This can lead to more paradoxes and
               | message replay inconsistencies.
        
               | motorest wrote:
               | > If you try to redact a part of the past, it can also
               | affect the present, as any time traveler knows.
               | 
               | That's not how snapshots work. You record the state of
               | your system at a point in time, and then you keep all
               | events that occurred after that point. This means you
               | retain the ability to rebuild the current state from that
               | snapshot by replaying all events. I.e., event sourcing's
               | happy flow.
               | 
               | > User B was able to send a message to user C, and thus
               | start a long thread, because user A had introduced them.
               | With user A removed, the message replay chokes at the
               | attempt of B to communicate with C.
               | 
               | Not really. That's just your best attempt at reasoning
               | how the system could work. In the meantime, depending on
               | whether you have a hard requirement on retaining messages
               | from removed users you can either keep them assigned to a
               | deleted user or replace them by deleted messages. This is
               | not a problem caused by event sourcing; it's a problem
               | caused by failing to design a system that meets it's
               | requirements.
        
         | grey-area wrote:
         | An audit log table often takes a huge amount of space compared
         | to simple fields on the records so there are tradeoffs. Which
         | solution is best depends on how important change logs are.
        
         | wodenokoto wrote:
         | I kinda agree, but don't underestimate the power of having
         | things where people are looking.
         | 
         | Put your documentation in doc strings where the function is
         | defined - don't have a separate file in a separate folder for
         | that. It might separate concerns, but no one is looking there.
         | 
         | Similarly if those fields aren't nullable, someone trying to
         | add new rows will have to fill in something for those metadata
         | fields - and that something will now very likely be what's
         | needed, rather than not pushing anything to the audit table.
         | 
         | Obviously your app can outgrow these simple columns, but you're
         | getting value now.
        
       | simonw wrote:
       | Related from a few years ago: PAGNIs, for Probably Are Gonna Need
       | Its - https://simonwillison.net/2021/Jul/1/pagnis/
        
       | KingOfCoders wrote:
       | OT As the great Alone ("Last Psychiatrist") said, "if you read
       | it, it's for you" - IYRIIFY
        
       | silisili wrote:
       | I don't like general advice like this, because it's too general.
       | For many, it's probably good advice. For others, not so much.
       | 
       | Anyone who has worked at a small company selling to large B2B
       | SaaS can attest we get like 20 hits a day on a busy day. Most of
       | that is done by one person in one company, who is probably also
       | the only person from said company you've ever talked to.
       | 
       | From that lens, this is all overkill. It's not bad advice, it's
       | just that it will get quoted for scenarios it doesn't apply.
       | Which also apply to K8S, or microservices at large even, and most
       | 'do as I say' tech blogs.
        
         | begueradj wrote:
         | >I don't like general advice like this, because it's too
         | general. For many, it's probably good advice. For others, not
         | so much.
         | 
         | That's true for any other good advice you may have heard of.
        
       | ltbarcly3 wrote:
       | This is good advice except for deleted_at. Soft deletion is
       | rarely smart. Deleted things just accumulate and every time you
       | query that table is a new opportunity to forget to omit deleted
       | things. Query performance suffers a lot. It's just a needless
       | complexity.
       | 
       | Instead, just for the tables where you want to support soft
       | delete, copy the data somewhere else. Make a table like `deleteds
       | (tablename text not null, data jsonb not null default '{}')` that
       | you can stuff a serialized copy of the rows you delete from other
       | tables (but just the ones you think you want to support soft
       | delete on).
       | 
       | The theory here is: You don't actually want soft delete, you are
       | just being paranoid and you will never go undelete anything. If
       | you actually do want to undelete stuff, you'll end up building a
       | whole feature around it to expose that to the user anyway so that
       | is when you need to actually think through building the feature.
       | In the meantime you can sleep at night, safe in the knowledge
       | that the data you will never go look at anyway is safe in some
       | table that doesn't cause increased runtime cost and development
       | complexity.
        
       | Aldipower wrote:
       | Five years ago everybody would lough about "soft deletes" or
       | "marked as deleted". Whoever thought this is a good idea from a
       | data protection perspective? You also lying in the face of your
       | users with such a behavior. Shame.
        
         | tliltocatl wrote:
         | Except almost every database (and most storage devices
         | nowadays) works this way - mark an entry as deleted, then batch
         | delete a lot of entries during garbage collection. It's
         | fundamentally impossible to efficiently erase a record from the
         | middle of a file, except maybe by using an encryption tree,
         | which would still be fairly inefficient.
         | 
         | Actually erasing data is quite hard. Soft deletes doesn't add
         | any new lies, they just move the lies to the upper layer.
        
           | Aldipower wrote:
           | Come on. With a manual "marked as deleted" it stays as this
           | forever, it is not deleted and never will, and the "deleted"
           | data lands also in database backups, is still query-able and
           | so on. I do not care if the deleted data stays for a while on
           | disk or in memory as long it will be eventually deleted by
           | the garbage collector and isn't query-able anymore.
        
       | nexo-v1 wrote:
       | Shipped and supported enough startup apps to learn this the hard
       | way: users will delete things they shouldn't, and you will be
       | asked to explain or undo it. Soft deletes and basic metadata
       | (created_at, deleted_by, etc.) have saved me multiple times --
       | not for some future feature, just for basic operational sanity.
        
       | IshKebab wrote:
       | Yes! Why something happened is incredibly important. Gitlab made
       | this mistake _hard_. We have a medium sized instance with some
       | complex CI pipelines and often they 'll just get cancelled and it
       | doesn't say why or even who by. And anyone can do it! The only
       | option is to ask the entire company "did anyone cancel this?"
        
       | optician_owl wrote:
       | It's a terrible post. What it suggests is to turn your head off
       | and follow overly generalised principle. I guess when somebody
       | invent yet another acronym it is a red flag.
       | 
       | Data has its own life cycles in every area it passes through. And
       | it's part of requirements gathering to find those cycles: the
       | dependent systems, the teams, and the questions you need to
       | answer. Mindlessly adding fields won't save you in every
       | situation.
       | 
       | Bonus point: when you start collecting questions while designing
       | your service, you'll discover how mature your colleagues'
       | thinking is.
        
       | KronisLV wrote:
       | I don't get why all of the big RDBMSes (PostgreSQL,
       | MariaDB/MySQL, SQL Server, Oracle, ...) don't seem to have built
       | in support for soft deletes up front and center?
       | CREATE TABLE ... WITH SOFT DELETES
       | 
       | Where the regular DELETE wouldn't get rid of the data for real
       | but rather you could query the deleted records as well, probably
       | have timestamps for everything as a built in low level feature,
       | vs having to handle this with a bunch of ORMs and having to
       | remember to put AND deleted_at IS NULL in all of your custom
       | views.
       | 
       | If we like to talk about in-database processing so much, why
       | don't we just put the actual common features in the DB, so that
       | toggling them on or off doesn't take a bunch of code changes in
       | app, or that you'd even be able to add soft deletes to any legacy
       | app that knows nothing of the concept, on a per table basis or
       | whatever.
        
         | moritzwarhier wrote:
         | Maybe my intuition is wrong, but to me this sounds like a
         | violation of the principle of least power.
         | 
         | "Soft deletes" is just a name for a regular write operation,
         | with specific semantics.
         | 
         | Adding a layer of magic to the DB for this doesn't seem right
         | to me.
         | 
         | And applications could have many different requirements for
         | soft deletes, like the article points out. For example, the
         | simplest version would be just a boolean "deleted" column, but
         | it could also be "deleted_at", "deleted_by", etc.
         | 
         | All of these cases require an bunch of code changes anyway, and
         | the more complex ones could interfere with an implementation of
         | this feature at the database level: such a transparent
         | implementation couldn't access app-specific concerns such as
         | user data, for example.
         | 
         | Adding soft deletes to a legacy app that knows nothing about it
         | would only work for a boolean flag and a maybe date-time value,
         | unless the DBMS would also offer triggers for soft deletes etc?
         | 
         | Seems to me to that this capability would make a DBMS much more
         | complicated.
        
           | Spivak wrote:
           | I don't think it's all magic because you have to implement
           | soft deletes in your application layer and it would be nice
           | to have a little DB support for it. It doesn't have to be
           | some big thing, just the ability for selects and such to work
           | only on undeleted elements by default unless you ask for them
           | would be nice so it doesn't pollute your code and make you
           | have to always remember to point the gun away from your foot.
        
             | PaulHoule wrote:
             | I'd argue that what SQL needs is better facilities for code
             | reuse, metaprogramming and such, it ought to give you the
             | tools that you can make something that lets you add
             | something to the language such that you can add
             | ... WITH SOFT UPDATES
             | 
             | and it adds to the table definition as well as to the
             | schema that will cause subsequent statements to be
             | rewritten. There's a lot of interesting logic (in the
             | literal sense) in SQL that is hidden by a strange,
             | irregular syntax that is more obvious in other approaches
             | to databases such as Datalog. I think it was little
             | appreciated outside the hardcore semantic web community
             | that you could compile SPARQL + OWL to SQL and get powerful
             | inference facilities. SQL is a great target for
             | metaprogramming precisely because it is not Turing complete
             | and that a code generator doesn't have to think at all
             | about the order that events are sequenced in. It's kinda
             | sad that metaprogramming tools for SQL are almost all pre-
             | Chomsky and pre-dragon book internal DSLs like JooQ and
             | SQLAlchemy which have their charms (JooQ's excellent
             | integration with Java IDEs) but fall short of what could be
             | done with SQL-to-SQL and SQL-to-X transformations.
             | 
             | Stored procedures are great but many shops don't use them
             | for various reasons. It doesn't help that they look like a
             | mix of FORTRAN and COBOL and also come in a few variations
             | from the (better) set-based PL/SQL of Oracle to the (worse)
             | Transact-SQL based stored proc of Microsoft SQL and
             | PostgresSQL. The other day I talked with Krisztian Szabo of
             | 
             | https://schemamap.io/
             | 
             | who developed a compiler that writes stored procs that do
             | database synchronization.
             | 
             | On the other hand, if you've got access to the internals of
             | the frickin' database I think you can do something better
             | than the ordinary application level soft updates. For
             | instance a "customer record" might very well be not just a
             | row in one table but maybe 15 rows in four tables that are
             | inserted in a transaction and you want to be able to
             | undelete them as a unit.
        
           | hnthrow90348765 wrote:
           | >Adding soft deletes to a legacy app
           | 
           | As an aside, I've never found this to be worth it since you
           | have to change too much and re-test everything for minimal
           | user benefit and time savings. The effort is way worse if the
           | code is not great in the first place. It can be a great
           | decision to make before everything is written.
           | 
           | Maybe it's worth it for files which are hard to reproduce,
           | but you can also rely on DB backups to get those back. If
           | people are regularly deleting things they're not supposed to,
           | you're better off removing the user-facing delete actions,
           | limiting the action to specific users, etc.
        
         | hypeatei wrote:
         | Temporal tables in SQL server fit this use-case[0], I think.
         | 
         | 0: https://learn.microsoft.com/en-us/sql/relational-
         | databases/t...
        
           | rjbwork wrote:
           | Available on postgres as an extension. It's a bit jank and
           | doesn't have language integrated clauses like sql server.
        
           | ajuc wrote:
           | Oracle has flashback queries (SELECT ... AS OF timestamp).
           | 
           | It's one of these things that are available but most people
           | ignore it and implement it manually with created_at
           | updated_at deleted_at columns etc. I think one reason for
           | this is lack of support in ORMs and lack of standardization
           | between RDBMSes.
        
           | mastax wrote:
           | Also System-Versioned Tables in MariaDB:
           | https://mariadb.com/kb/en/system-versioned-tables/
        
         | crazygringo wrote:
         | Because it's too dependent on business logic.
         | 
         | Different products will handle soft deletes differently. Which
         | queries need to include soft-deleted rows and which don't? What
         | about different levels of soft deletes, e.g. done by the user
         | (can be undone by user) vs. done by an admin (can't be undone
         | by user)?
         | 
         | Implementing soft deletes yourself isn't hard. Yes you'll have
         | to make a bunch of decisions about how they work in every
         | circumstance, but that's the point.
        
         | marcosdumay wrote:
         | It's just not bothersome enough to deviate from the standard.
         | 
         | If they did this, nobody would use it. They do lots of more
         | useful things that people don't use because it's not portable.
         | 
         | There's a sibling comment about temporal databases. Those solve
         | a very bothersome problem, so a few people use them. That means
         | that there's a chance soft deletes get adopted as a side effect
         | of a much more complex standard.
        
         | ajuc wrote:
         | Oracle has this already. SELECT ... AS OF timestamp.
         | 
         | It needs to be enabled of course and it's not free.
        
         | perrygeo wrote:
         | The whole model of RDBMS is based on mutable tuples; soft
         | deletes don't make much sense as an intrinsic part of that
         | model. If you want soft deletes, you create an application
         | layer or use a different data model.
         | 
         | Most of the time if you want "soft deletes", you really want an
         | immutable log so that you time travel to any point in the
         | history. XTDB and Datomic are worth looking at if you want to
         | solve the problem at the data model level.
        
       | 8bitme wrote:
       | For `updated_at` and `deleted_at` making them nullable and null
       | until touched is incredibly useful.
       | 
       | Answering queries like how many of these were never updated? Or
       | how many of these were never cancelled?
        
       | kamatour wrote:
       | Why can't databases just remember stuff we delete, like a trash
       | can?
        
         | fwip wrote:
         | Assuming you're serious, there are two main reasons. You want
         | to regain storage space after you delete things, and sometimes
         | you want to actually delete things (e.g. to be in compliance
         | with regulations).
        
         | ajuc wrote:
         | They do if you tell them. Both Oracle, mssql and postgresql
         | have this implemented. I only used it in Oracle - it's called
         | flashback queries - you do SELECT ... AS OF timestamp and you
         | can see state of the table from the past.
         | 
         | It needs to be enabled tho, and it has some problems (for
         | example you need to remove data for legal reasons sometimes).
         | 
         | Reading this in thread is very frustrating with people
         | reinventing the wheel and asking why it wasn't invented
         | already:)
        
       | yellowapple wrote:
       | The problem with updated_at and updated_by is that a given record
       | could experience multiple updates by multiple people at multiple
       | times, and you'd only have visibility into the most recent.
       | 
       | The logical conclusion here is to log the updates (and creations
       | and deletions and undeletions and such) themselves:
       | CREATE TABLE foo_log (id,                               foo_id,
       | whodunnit,                               action,
       | performed_at,                               column_1,
       | column_2,                               -- ...
       | column_n);
       | 
       | Technically you don't even need the "foo" table anymore, since
       | you can reconstruct its contents by pulling the most recent
       | transaction for a given foo_id and discarding the reconstructed
       | record if the most recent action on it was a deletion. Probably
       | still a good idea to create a view or somesuch for the sake of
       | convenience, but the point of this is that the log itself becomes
       | the record of truth - and while this approach does cost some disk
       | space (due to duplicated data) and read performance (due to the
       | more complex query involved), it's invaluable for tracking down a
       | record's full lifecycle. Even better if you can enforce append-
       | only access to that table.
       | 
       | This is a pretty typical approach for things like bookkeeping and
       | inventory management (though usually those are tracking the
       | deltas between the old and new states, instead of recording the
       | updated states directly as the above example would imply).
        
       | PaulHoule wrote:
       | Funny I've been developing an adaption layer that implements the
       | functionality that I use in
       | 
       | https://docs.python-arango.com/en/main/
       | 
       | over tables in Postgres that has a PRIMARY _key and a JSONB
       | document field. The issue is that I have a number of prototypes
       | I've developed with arangodb but the license is awful and I don't
       | feel like I can either open source _or_ commercialize any of them
       | until I 'm running on an open source database.
       | 
       | It's a fun project because I don't need to support everything in
       | python-arango, in fact I don't even need to support 100% of the
       | features I use because I am free to change my applications. Also
       | it's a chance to make the library that I _really_ want to use so
       | already it has real integer and UUID primary keys.
       | 
       | I just added a feature to have the library manage _created and
       | _updated fields not just because I thought it was good in general
       | but it was a feature I needed for a particular application, a
       | crawler that fetches headlines from the HN API. I want to fetch
       | headlines right away so I can avoid submitting duplicates but I
       | also want accurate counts of how many votes and comments articles
       | got and that involves recrawling again in two weeks. Of course
       | _created and _updated are helpful for that.
        
       | r7n wrote:
       | We did exactly this when designing StatleyDB. We realized there
       | are a set of metadata fields[0] that almost everyone needs when
       | modeling data. Our approach takes this a step further in that we
       | always track these fields transparently so if you forget to add
       | them to your schema initially for any reason, that's okay, you
       | can always add them later!
       | 
       | [0] https://docs.stately.cloud/schema/fields/#metadata-fields
        
       | imoreno wrote:
       | It's not really YAGNI if you need it to debug, is it?
        
       ___________________________________________________________________
       (page generated 2025-04-24 23:02 UTC)