[HN Gopher] Soft deletion probably isn't worth it
___________________________________________________________________
Soft deletion probably isn't worth it
Author : lfittl
Score : 340 points
Date : 2022-07-19 18:35 UTC (4 hours ago)
(HTM) web link (brandur.org)
(TXT) w3m dump (brandur.org)
| habibur wrote:
| Which is why I don't add that extra deleted field. Rather
| duplicate all the tables into a new database called "archive" and
| then insert there before deleting from main.
|
| That works for updates too, by preserving the old data and
| showing you a time machine like backlog. But the archive database
| gets too large over time and you need to purge it periodically.
| You can create some delete triggers for automating this "save
| before delete" behavior.
| tehbeard wrote:
| How do you account for maintaining integrity in the archive?
|
| E.g. you have 3 users sign up with the same email (a unique
| field) one after the other with deletions in-between each sign-
| up?
| habibur wrote:
| No PK, FK or Unique constrains on the archive. Rather use
| simple index to speed up queries.
| justin_oaks wrote:
| I'm not who you asked the question of, but I do sometimes
| make use of archive/deleted/history tables. I'll refer to
| them as history tables from here on out.
|
| In short, I leave data integrity to the original table and
| drop it for the history table.
|
| The history table isn't identical to the original table. It
| has it's own primary keys that are separate from the original
| table. It doesn't include the original table's unique
| constraints or foreign key constraints. It also generally has
| a timestamp to know when the record was put there.
| spfzero wrote:
| I like the deleted-items-table suggestion the author makes. It's
| useful though, to think about the cases where you'd want to
| delete, say a customer with existing invoices. In one situation,
| you may have made a mistake and want to start over, say an
| operator creates the customer and order, but then the customer
| changes their mind. In that situation a hard delete is in order;
| you want to "undo" the _creation_ of the customer and invoice,
| and nothing further has happened as far as referencing their key
|
| In other situations though, you may have some reason to treat the
| customer as if they were deleted, but better to examine the
| reason for that, and use an attribute more relevant to that
| reason, such as active/inactive etc. Would be different for
| different entities of course.
| jasonhansel wrote:
| It's pretty easy to solve the foreign key issue (where you need
| to write elaborate DELETE queries to avoid breaking foreign keys)
| in Postgres using deferrable constraints. Just start a
| transaction, run "SET CONSTRAINTS ALL DEFERRED," delete rows from
| various tables in any order, then commit the transaction. The
| DELETE statements will effectively ignore the foreign key
| constraints, but any remaining "broken" foreign keys will be
| caught when the transaction commits.
| bob1029 wrote:
| If you are going to think about this pattern, why not go one step
| further and simply event source everything with an append-only,
| immutable log?
|
| You could even sprinkle cryptographic guarantees into the mix.
| This would be very challenging to do with mutable DB rows.
| phibz wrote:
| I've definitely seen soft delete work in practice. A couple
| things: for small data sets you can implement the naive
| deleted_at you can hide the records from your users by forcing
| them to use a view. You can also handle updates on the view to
| prevent data conflicting with deleted data if you need to.
|
| For foreign key constraints you can set the foreign key to null
| and orphan the records if the relation is deleted. You could also
| hard delete them in this case. It depend on your use case.
|
| When the data volume grows or the ratio of soft deleted to normal
| records is high, you should consider another solution. One
| solution you suggested, moving the record to a deleted table is a
| fine one.
|
| The other solution that I've used successfully is to journal your
| deletions in another table or system. For smaller volumes having
| an audit table Journaling the data and storing the pk, fkeys, and
| a serialized version of the record, json works great in postgres,
| works well. For large volumes or frequent deletions something
| like Kafka or PubSub work better.
|
| You may very well find others interested in consuming your audit
| journal to track changes. Updates and even inserts fit great in
| the more general case.
| waspight wrote:
| I use soft deletes to maintain insights. For instance I would
| like to know how many users that has been created in total even
| if some has been deleted later on. Is this a bad approach? Most
| of the other comments here seems to use it only to be able to
| restore deleted entries.
| dafelst wrote:
| Views are a simple solution to this problem. Pretty much all
| moderns RDBMSs support updatable views, so creating views over
| your tables with a simple WHERE deleted_at IS NULL solves the
| majority of the author's problems, including (IIRC) foreign key
| issues, assuming the deletes are done appropriately.
|
| I feel like a lot of developers underutilize the capabilities of
| the massively advanced database engines they code against. Sure,
| concerns about splitting logic between the DB and app layers are
| valid, but there are fairly well developed techniques for keeping
| DB and app states, logic and schemas aligned via migrations and
| partitioning and whatnot.
| OJFord wrote:
| I had the same reaction to the 'code leakage' section, but
| 'foreign keys'? You can't reference a view; so you either don't
| use them (fks) or they point at the underlying table and you
| have the problem described.
|
| You could have views that say 'thing I have a foreign key to is
| not deleted' of course, but that sort of seems like 'code
| leakage' again, just in SQL this time.
| vlunkr wrote:
| That doesn't solve the foreign key problem. You can still
| easily have a reference to a record that is "deleted"
| yen223 wrote:
| Also in Postgres, you cannot have a foreign key constraint
| that references a view, not even a materialised view.
|
| I'm with the author on this one. Any soft delete logic does
| have a tendency to bleed into other systems and make your
| systems more complicated, for very little gain.
| 7crow wrote:
| > there are fairly well developed techniques for keeping DB and
| app states, logic and schemas aligned via migrations and
| partitioning and whatnot.
|
| Hi, <1 yr experience swe here. Would HN mind unpacking
| "whatnot" with specific names of some these techniques?
| semiquaver wrote:
| The main problem with views for this use case in practice is
| that they ossify your schema. Views and matviews are
| effectively a dependency tree, and many common types of schema
| evolution become substantially more difficult when the system
| forces you to wrap your DDL in a series of view drop/recreation
| steps.
|
| This is merely annoying when dealing with regular views because
| recreating even a large number of views is fast, but can be
| catastrophic if you have any matviews in your table dependency
| tree. A matview can easily turn what should be an instantaneous
| DDL operation into a partial outage while the matview is being
| regenerated.
|
| (this is all postgres specific, it may be untrue for other
| systems)
| smallnamespace wrote:
| To avoid an outage, have you tried fronting the matview with
| an additional view to allow hot-swapping?
| enepture wrote:
| As an FYI using a tool like DBT solves this problem. As
| someone who was not a data engineer I was not familiar, there
| were tools like this
| teej wrote:
| dbt is great, but I'm not sure it's appropriate to manage
| objects in a production transactional database. It's
| designed for analytical, columnar databases.
| cpursley wrote:
| I have this problem. What is DBT?
| banashark wrote:
| I believe this: https://docs.getdbt.com/docs/introduction
| PaulMest wrote:
| In case you're not familiar with DBT, it helps with the
| "transform" in ELT
| https://docs.getdbt.com/docs/introduction.
| slt2021 wrote:
| it is very dangerous to have dependency on materialized view
| - it is a poor architectural decision from DBA to do that.
|
| if you want view depending on mat view - materialize it
| yourself in a table, and refresh it yourself controllably.
| semiquaver wrote:
| Aren't you just saying that materialized views should not
| be used? I'm pretty sure that was my original point.
| ibejoeb wrote:
| > developers underutilize the capabilities of the massively
| advanced database engines
|
| So true. There are so many amazing, powerful features in all of
| the major players.
|
| Also: updatable views are amazing. With query rewriting
| (whatever you vendor calls it) you can affect some truly
| material changes to the system without any changes to the
| client applications. An example would be implementing temporal
| relations.
| chrisshroba wrote:
| How would a view solve the foreign key issue? Are you
| suggesting coding specific deletion triggers into the view such
| that appropriate foreign keys are "cascade" deleted when a row
| in the view is deleted?
| jermaustin1 wrote:
| because when you are getting a child record through a join
| (in the view), the parent will never have deleted_at set.
|
| Say I have a simple view `select * from foo join bar on
| foo.foo_id = bar.foo_id where foo.deleted_at is null`
|
| I never have to worry about deleting from bar, because I
| should never grab a child when the parent is 'deleted'.
| thehappypm wrote:
| Assuming that the joins are always joined against the
| views, not the raw tables. This introduces an opportunity
| for a mistake.
| rodw wrote:
| Seriously. That "Downsides: Code leakage" point is nonsensical.
|
| ``` CREATE OR REPLACE VIEW active_customer AS SELECT * FROM
| customer WHERE deleted_at IS NULL OR deleted_at <= NOW() ; ```
|
| There, I fixed it.
|
| Just use `active_customer` instead of `customer ... deleted_at
| IS NULL`.
|
| In fact, since the deleted_at column is a timestamp, the
| original "leakage" query:
|
| ``` SELECT * FROM customer WHERE id = @id AND deleted_at IS
| NULL; ```
|
| is actually broken. A non-null `deleted_at` timestamp that's in
| the future implies the record hasn't been deleted yet, right?
|
| I've often had junior devs assert that views are some kind of
| code smell, but these sorts of "canned query/filter that you
| want to apply very very often" seem like the perfect use case
| for a view to me. It's DRY, and the fact that your standard
| "query" is in the database" means you can change it more
| readily than trying to make sure you hit all the points it
| might be embedded in the application code.
|
| > I feel like a lot of developers underutilize the capabilities
| of the massively advanced database engines they code against
|
| Early-ish in the JDBC days a senior dev I was working with at
| the time (as a junior dev myself) made a pretty good case that
| "the database is part of the application" that's always stuck
| with me. Full database independence via software level
| abstractions is a pretty silly goal outside of library code. If
| you have a service that makes extensive use of the database,
| don't throw away the database features in the interest of some
| abstract "we could swap out oracle with mysql without changing
| anything" objective. If you want it to be generic, use the SQL
| standard, but don't be afraid to have a few db-specific bits in
| the app code if that's a subsystem you _might_ replace once a
| decade or something.
|
| I blame the DBA/Dev divide for a lot of this. A lot of the
| impedance between these layers is social/procedural. If you can
| change the DB as easily as the code, there's a lot less fear of
| using the right tool for the specific job.
| doctor_eval wrote:
| > the database is part of the application
|
| 100% this. If you accept that the database is part of the
| application, you give yourself permission to use the full
| feature set of the database, and life becomes a lot simpler.
| Using views, stored procedures and other features lets you
| implement things like soft delete trivially, without it
| infecting all your application code.
|
| In my entire career I've changed backend databases for an
| application exactly twice. It's not easy, and no amount of
| abstraction is likely to make it easier.
| djur wrote:
| The query isn't broken. In the Rails community at least it is
| very common to use a nullable frobbed_at column to indicate
| both "was it frobbed" and "when was it frobbed". In that
| context, the boolean check is always NULL/NOT NULL, rather
| than a time comparison.
| madisp wrote:
| I'd argue that the simple `deleted_at IS NULL` check is not
| broken - unless your product / domain specifically allows and
| requires scheduled future deletions adding such logic can
| easily introduce bugs. For example, you could to get the
| comparison flipped by accident, and if it's only in one place
| out of many that bug could go unnoticed for a while.
| doctor_eval wrote:
| You don't put the check in your code, you put it in the
| view, and access the data exclusively through that view. In
| that way, the check is defined exactly once.
| nousermane wrote:
| > (...updatable view...) WHERE deleted_at IS NULL
|
| This is the way. Also, save record creation timestamp, and you
| can have very flexible "time-machine" selects/views of your
| table essentially for free.
| layer8 wrote:
| How does this help with foreign keys? Normally you can't have
| foreign keys referencing a view.
|
| I agree that one should make use of RDBMS capabilities. A check
| constraint may be practical instead of (or in addition to) the
| foreign-key constraint.
| zozbot234 wrote:
| Views can be used to implement pretty much any kind of
| automated inference, reasoning, rules etc. on the "raw" table
| data. The example of filtering out deleted records is just one
| of the simplest. That one single feature can easily transform a
| simple DB platform into a fully-featured knowledge base system,
| easily usable to support even complex reasoning tasks.
| aeyes wrote:
| At least in Postgres, having a huge amount of "dead" data in
| large tables is problematic because vacuum always has to read
| the full data set.
|
| Even with conditional indexes where you exclude deleted data
| you take a significant performance hit reading dead blocks
| because there is no way to quickly vacuum them. You accumulate
| hours of bloat until your vacuum finishes.
|
| You can't beat a separate insert only archive table which you
| never have to vacuum.
| anarazel wrote:
| Vacuum does not have to read the full data set every time.
| The visibility map tracks, on a block level, whether all rows
| in a page are known to be visible to everyone (starting in
| 8.4 or such) and whether the page is "frozen", i.e., does not
| contain visibility information that might need vacuuming
| (starting in 9.6 IIRC).
|
| However, indexes _do_ currently have to be scanned as a
| whole. But that 's only done by autovacuum if there's enough
| row versions for that to be worth it (in recent versions).
| layer8 wrote:
| Shouldn't partitioning help with that? (I have no experience
| with Postgres.)
| tomnipotent wrote:
| Only if those partitions are on separate storage, otherwise
| you have the same number of dead tuples/dirty pages.
| SoftTalker wrote:
| A problem (unless something has changed, my context is Oracle
| from some time ago) is that NULL values are not indexed. So the
| "WHERE deleted_at IS NULL" could trigger a full table scan. It
| can also cause row migration when the NULL value is eventually
| filled in. Unless you explicitly need the deleted date, it's
| probably better to use a non-nullable Y/N for this.
| remram wrote:
| It seems Oracle does it although there is a special syntax to
| opt-in. That seems wild. I am not aware of another DBMS
| having that limitation though.
| firloop wrote:
| Views can really bite you performance wise, at least with
| Postgres. If you add a WHERE against a query on a view,
| Postgres (edit: often) won't merge in your queries' predicates
| with the predicates of the view, often leading to large table
| scans.
| AdrianB1 wrote:
| There is no impact with views in MS SQL. You can also have
| indexed views and filtered indexes, so you can have even
| better performance.
| dafelst wrote:
| IIRC Postgres has supported predicate push down on trivial
| views like this for over a decade now, and possibly even more
| complex views these days (I haven't kept up with the latest
| greatest changes).
| firloop wrote:
| Postgres can do it, you're correct, but in my experience it
| rarely happens with any view that's even slightly non-
| trivial even on recent versions of Postgres. Most views
| with a join break predicate pushdown. It greatly reduces
| the usecases of views in practice.
| sarchertech wrote:
| I haven't had any problems with this at all and I've been
| using joins in my views for years.
|
| Are you using CTEs in your views?
| tomnipotent wrote:
| Postgres 12 fixed the CTE issue.
| bavell wrote:
| Postgres docs on CTEs:
| https://www.postgresql.org/docs/current/queries-with.html
| [deleted]
| [deleted]
| jaydub wrote:
| Based on my experience, I like the author's approach since it
| makes things pretty clear-cut and optimized the storage in the
| core table (in my experience as well, deletes happen frequently
| and the soft deletes are rarely touched). In large, row-
| oriented tables that that storage can add up and even with
| views/materialized views there's a cost to using/maintaining
| those as well.
| 5e92cb50239222b wrote:
| This is one of those situations where a good ORM can simplify
| things greatly. For example, with EF Core you can add a global
| filter which will filter out soft-deleted rows in all queries
| automatically (unless you add .IgnoreQueryFilters()).
|
| It couples nicely with some hackery which turns removes into
| soft-deletes. You can remove objects as usual and they get
| soft-deleted in the database.
|
| I've used this in a few projects and it's fantastic.
|
| https://docs.microsoft.com/en-us/ef/core/querying/filters
|
| https://www.thereformedprogrammer.net/ef-core-in-depth-soft-...
| airstrike wrote:
| Views are such a powerful concept I'm honestly disheartened by
| how hard it is to use, replicate or leverage that functionality
| outside of dropping straight into the db shell
| quickthrower2 wrote:
| See http://materialize.com
| pbardea wrote:
| > assuming the deletes are done appropriately
|
| This is one gripe I have with soft-deletion. Since I can no
| longer rely on ON DELETE CASCADE relationships, I need to re-
| defined these relationship between objects at the application
| layer. This gets more and more difficult as relationships
| between objects increase.
|
| If the goal is to keep a history of all records for compliance
| reasons or "just in case", I tend to prefer a CDC stream into a
| separate historical system of record.
| dexwiz wrote:
| You may end up doing this anyways if you have any application
| code that needs access to delete hooks, or access control
| varies across objects. At this point, you are probably using
| a ORM instead of direct queries, and place logic that could
| be in the db instead at the app layer.
| danielrhodes wrote:
| Being unable to effectively use foreign key relationships is
| definitely a downside of using soft deletes. But it's also
| worth asking if these types of behaviors, which would also
| include a feature like triggers, really belongs in a database
| or whether it's better to have at the application level (or
| at least at a layer above the data layer). I'd argue that
| ultimately you probably don't want these things at the DB
| level because you get into a situation where you're sharing
| business logic between two (or more places).
| Volundr wrote:
| I'm less likely to use triggers, but I'll say I pretty much
| always want proper foreign key relationships set up in the
| database. Unique and other constraints too. In principal I
| might agree with you that it's an application level
| concern, but being able to setup these kind of invariants
| and trust that they will be enforced even in the face of
| bugs in my code (and there will be bugs in my code) is just
| too powerful to let go of in the name of purity. I'd much
| rather let a bit of business logic creep between multiple
| layers that discover that I have a whole bunch of orphaned
| records and no sensible way to reconcile them.
| corrral wrote:
| The DB's responsible for maintaining the integrity of data
| in it, unless there's some very good reason you can't let
| it do that. It's faster and better at it than your
| application, 99% of the time, and it can keep doing that
| even when/if a _second_ application starts using the same
| database, or when being used from a SQL prompt, or
| whatever.
| mmerlin wrote:
| My perspective is DB level triggers are the absolute very
| best place to put cascading update/delete logic so it only
| ever needs to be written once and is consistent regardless
| of any future frontend clients that might be written in a
| different language and/or framework than the original
| codebase.
|
| Right now in $dayjob I am converting an old non-DRY
| codebase from NoSQL data layer format to proper relational
| SQL backend.
|
| This old front-end was created by verbosely coding up a
| relational cascading update/delete system for the NoSQL
| backend, in numerous places redundantly with subtle
| differences and inconsistencies, making the code brittle.
|
| My current estimate is some front end functions will be
| reduced in LOC size by 95% once we use the power of SQL in
| the backend.
|
| And the backend SQL Triggers+StoredProcedure required to
| replace these long NoSQL frontend functions doing cascading
| updates/deletes is only around 10% the size of the replaced
| front-end code.
|
| And now future new frontends can reuse this without
| exploding the size of their codebase where complex data
| operations are required. And no need to reinvent the same
| data handling algorithm all over again (and risk subtle
| variation creeping in from the different front-end
| implementation of data algorithms)
| munk-a wrote:
| If we're assuming you're using a view based approach which
| elides the soft deleted rows automatically then you'll get a
| lot of these dependent objects correctly updated for free
| assuming you're pulling them out of the DB with JOINs -
| SELECT FROM foo JOIN bar (assuming bar is a view into
| barwithdeleted) will automatically filter out the invalid
| rows from foo... if you're using this information to populate
| a CRUD interface it's likely you'll be JOINing bar already to
| get some metadata for display (like maybe bar.name instead of
| the surrogate bar.id key you use for joining).
| nfm wrote:
| Yes, but other queries (any aggregate queries that don't
| join the soft deleted table, any joins to other tables)
| will now return rows that would have been deleted under
| hard deletion with cascade.
| corrral wrote:
| You could use a trigger to cascade soft-delete flag
| toggles, provided all the relevant tables have such a
| column. Still have to factor that into your other
| queries, but at least you wouldn't have to make
| potentially-impossible-to-exhaustively-check joins to
| figure out if a given row was "deleted".
| munk-a wrote:
| This is definitely something to watch out for, but in
| practice (as someone that migrated a system without soft
| deletes to one that had them) I found that it doesn't
| tend to come up nearly as much as you might think -
| usually the table being transitioned to support soft
| deletes is a relatively core table (since ancillary
| tables usually aren't worth the complexity to transition)
| so a lot of your reporting queries will already be
| pulling in that table. You definitely need to check to
| make sure you're not missing anything - and sometimes
| CRUD interfaces will need to be completely revamped to
| include the table in question - but it's usually not
| _that_ hard.
| dspillett wrote:
| _> Since I can no longer rely on ON DELETE CASCADE
| relationships_
|
| Cascaded deletes scare me anyway. It only takes one idiot to
| implement UPSERT as DELETE+INSERT because it seems easier,
| and child data is lost. You could always use triggers to
| cascade you soft-delete flags as an alternative method,
| though that would be less efficient (and more likely to be
| buggy) than the built-in solution that cascaded deletes are.
|
| If you look at how system-versioned (or "temporal") tables
| are implemented in some DBMSs, that is a good compromise. The
| history table is your audit, containing all old versions of
| rows even deleted ones, and the base table can be really
| deleted from, so you don't need views or other abstractions
| away from the base data to avoid accidentally resurrecting
| data. You can also apply different storage options to the
| archive data (compression/not, different indexes, ...
| depending on expected use cases) without more manaully
| setting up partitioning based on the deleted/not flag. It can
| make some query times less efficient (you need to union two
| tables to get the latest version of things including deleted
| ones, etc.) but they make other things easier (especially
| with the syntactic sugar like AS AT SYSTEM_TIME <when> and so
| forth) and yet more things are rendered possible (if
| inefficient) where they were not before.
|
| _> I tend to prefer a CDC stream into a separate historical
| system of record._
|
| This is similar, though with system versioned tables you are
| pretty much always keeping the history/audit in the same DB.
|
| ---
|
| FWIW: we create systems for highly regulated finance
| companies where _really_ deleting things is often verboten,
| until it isn 't and then you have the other extreme and need
| to _absolutely_ purge information, so these things are often
| on my mind.
| michaelt wrote:
| _> It only takes one idiot to implement UPSERT as
| DELETE+INSERT because it seems easier, and child data is
| lost._
|
| Seems unfortunate to miss out on all the referential
| integrity benefits of a serious database when hiring
| standards, training and code reviews should all be
| preventing idiotic changes.
|
| If I'm making a shopping cart system, I want to know every
| order line belongs to an order, every order belongs to a
| user and so on. Anyone who can't be trusted to write an
| update statement certainly can't be trusted to avoid
| creating a bunch of orphan records IMHO.
| wvenable wrote:
| Often you don't have to rely on ON DELETE CASCADE
| relationships. Because you are never deleting anything, you
| will never have any orphaned records. If you don't want to
| see say Invoices for a deleted Customer then that's just
| another filter feature.
|
| Mostly I use soft-delete because for auditing requirements we
| pretty much can't remove anything but also because nothing
| ever truly goes away. If we have an Invoice or Order then,
| from our perspective, we must have those forever even if the
| corresponding client is deleted and can never place another
| one.
| nkozyra wrote:
| > Often you don't have to rely on ON DELETE CASCADE
| relationships. Because you are never deleting anything, you
| will never have any orphaned records
|
| Exactly. Unless you're doing something silly like adding
| deleted at to bridge tables ... which, you probably don't
| need even in 1:many.
| kukx wrote:
| Is not there any attempt to improve the soft deletion at the
| engine/SQL level? I can see it as a possible feature request.
| pbardea wrote:
| One interesting feature that some DBs implement is
| something like SELECT AS OF SYSTEM TIME
| (https://www.cockroachlabs.com/docs/stable/as-of-system-
| time....) which _kinda_ does this.
|
| However in practice this usually dramatically slows down
| reads if you have to constantly skip over the historic rows
| so you probably don't want to keep garbage round longer
| than absolutely necessary. The concept of a historic table
| mentioned below could be interesting though - especially if
| it could be offloaded to cold storage.
| simcop2387 wrote:
| There's the idea of temporal tables,
| https://pgxn.org/dist/temporal_tables/
|
| It's not a standard (I think) but it'd let you do a
| cascading delete and then be able to go and look at the old
| objects as they were at time of deletion too.
|
| You'd need to do things very differently to show a list of
| deleted objects though.
| psYchotic wrote:
| It appears that there's been an attempt at standardizing
| temporal features in SQL in the SQL:2011 standard:
| https://en.wikipedia.org/wiki/SQL:2011
| dspillett wrote:
| _> temporal tables ... It 's not a standard_
|
| They were introduced in ANSI SQL 2011.
|
| How closely implementations follow the standard I don't
| know, but something close exists in several DBMSs: I use
| them regualrly in MS SQL Server, there are plug-ins for
| postgres, MariaDB has them, and so forth.
| smallnamespace wrote:
| If you're using PostgreSQL, you can implement cascading
| soft-deletes yourself.
|
| The information schema table holds all foreign key
| relationships, so one can write a generic procedure that
| cascades through the fkey graph to soft-delete rows in any
| related tables.
| BeefySwain wrote:
| Could someone take a stab at an example of what this
| would look like? Sounds really interesting.
| xwdv wrote:
| Doubt it. It seems like something obvious yet I've waited
| so long for it. Seems like you have to rely on third party
| plugins.
| dragonwriter wrote:
| > This is one gripe I have with soft-deletion. Since I can no
| longer rely on ON DELETE CASCADE relationships
|
| If you use soft deletes on all tables, you can also cascade
| them as long as you either cascade updates to the real keys
| as well, or prevent such updates, by having a deleted flag
| column on each table, including it in a unique constraint
| with the actual key column(s), and including it in the
| foreign key.
| coding123 wrote:
| I was going to chime in with this. thanks. One issue with views
| however is that a lot of these features require more and more
| nuanced knowledge of RDBMSes where these days unless you have a
| veteran architect, most of the team just knows the various
| library/tooling that interacts with "a variety of databases" so
| there is often less effort to go deeper.
| quickthrower2 wrote:
| Where is this anti-fb culture? Is it a startup thing?
|
| Everywhere I have worked people know a decent amount about
| their data store. Not architects, just mid devs and higher.
| [deleted]
| lowercased wrote:
| "Instead, we rolled forward by creating a new app, and helping
| them copy environment and data from the deleted app to it. So
| even where soft deletion was theoretically most useful, we still
| didn't use it."
|
| But... weren't you using all those env and data info from the
| soft-deleted set?
|
| I've typically been using soft-deletes for most projects for
| years. People have accidentally deleted records, and having a
| process to undelete them - manually or giving them a screen to
| review/restore - has usually been great.
|
| Yes, if there's a lot of related artefacts not in the database
| (files/etc) that were literally deleted, you may not be able to
| get them back. But that's an ever greater edge case in projects I
| work in as to not be a huge issue. We probably have some files in
| a backup somewhere, if it's recent. Trying to 'undelete' a record
| from years ago - yeah, likely ain't gonna happen.
|
| People are used to 'undo' and 'undelete'. Soft-deletes are one
| way to provide that functionality for some projects.
| tehbeard wrote:
| > Instead, we rolled forward by creating a new app, and helping
| them copy environment and data from the deleted app to it. So
| even where soft deletion was theoretically most useful, we still
| didn't use it
|
| I don't get this statement. You wouldn't have had the env or data
| without soft delete? You did use it!
|
| I would say, soft delete isn't a tick the box and done solution
| as many ORMs make it.
|
| You need to consider the data model, and adjust your queries to
| that.
|
| It may make sense for a product to be deleted, but orderlines
| still able to access it to display product name etc.
|
| With blob data, I tend to move that to a "bin" with a 30-60 day
| grace period. Customers know quickly reporting, we can fully
| recover, while outside that time they'll have to provide images
| etc. It's a decent compromise.
|
| Reuse of unique fields is the sticking point I run into often, as
| mysql interprets null as not clashing with other nulls so
| composite uniques using the ID and deletion date don't work.
| TylerE wrote:
| > mysql interprets null as not clashing with other nulls
|
| Which is correct per SQL. Null is NaN, not zero (or negative
| infinity).
| GartzenDeHaes wrote:
| Personally I like no delete designs, which give you a full audit
| history of changes. This is similar to generally accepted
| accounting principles.
| https://en.wikipedia.org/wiki/Generally_Accepted_Accounting_...
| znpy wrote:
| Your taste in database design is probably nor gdpr compliant, i
| hope you don't work in the eu.
| munk-a wrote:
| It's surprising but the EU tends to be one of the most
| stringent regions to work in both when it comes to totally
| permanently deleting things and when it comes to never ever
| deleting things - as with anything like this where there is a
| debate (rather than a settled best practice) there are some
| times when soft deletion is appropriate and necessary (i.e.
| to adhere to log retention requirements common in the EU) and
| some times when it's unnecessary... and the occasional fun
| time when it's both necessary to support soft deletes and
| hard deletes - when logs need to be retained for auditing
| purposes but also when some users can force a hard delete
| (leading to that data either being purged or moved to an
| archive storage if it's still needed for auditing purposes).
|
| The world is almost never as simple as it seems.
| munk-a wrote:
| So if an account was active in your system and is active no
| longer... do you soft delete it (even if that means UPDATE ...
| SET active = 'f') or hard delete it?
| GartzenDeHaes wrote:
| That depends on the problem domain and how you design the
| system, since there are several way to do this. Typically in
| financial systems you would either have a start and end date
| on a summary record, or add an inactive record to a
| transaction table that has a record for each change to the
| account, or both.
| munk-a wrote:
| I just wanted to touch on the fact that eliding soft-deleted rows
| from queries is really, really easy - this article makes it out
| to be a constant headache but here's my suggested approach.
| ALTER TABLE blah ADD COLUMN deleted_at NULL TIMESTAMP;
| ALTER TABLE blah RENAME TO blahwithdeleted; CREATE VIEW
| blah (SELECT * FROM blahwithdeleted WHERE deleted_at IS NULL);
|
| And thus your entire application just needs to keep SELECTing
| from blah while only a few select pieces of code related to
| undeleting things (or generating reports including deleted
| things) need to be shifted to read from blahwithdeleted.
| yladiz wrote:
| But, assuming you don't really need the data, why make your
| queries more complex and instead just actually delete the data?
| munk-a wrote:
| This is a really scenario specific question - sometimes it's
| needed, sometimes it isn't. At my shop we have customers that
| will suspend their account but our sales team is pretty damn
| awesome so usually they end up renewing after going a while
| without our product - so being able to easily restore a large
| swath of former customers with all their permissions and
| preferences intact with a simple click of a button is a huge
| win compared to having a dev try to piece the data together
| out of backups that are six months out of date (a little
| secret... we never did this and just put the obligation on
| the CS team to manually recreate the records since that cost
| the company less).
| yladiz wrote:
| If I was a tech lead and also wanted to advocate for hard
| deletion, I would ask the question for this scenario:
| "What's the cost of keeping all this data unnecessarily,
| modifying most queries to filter for deleted data, and
| dealing with other various consequences of soft deletion,
| and how does this cost compare with the cost of building a
| bespoke tool to restore deleted data at a customer's
| request within a certain time frame and compare with the
| benefit of a customer being able to restore their data at
| the 'click of a button'?".
|
| Having dealt with systems that have hundreds of millions of
| records or more, many of which reference deleted data and
| are therefore useless, I lean towards hard deletion more
| and more and on the off chance that deleted data really
| needs to be recovered you build a separate
| system/infrastructure to support that, rather than building
| your _entire_ system around the small likelihood you really
| need to restore it.
| munk-a wrote:
| As your data architect I'd probably answer your question
| "Well, it'll take a little while longer to vet all the
| indices - since we'll probably want most indices to
| filter on WHERE deleted_at IS NULL but we'll likely want
| a few without that constraint for managing undeletion.
| We'll use more space on disk which, honestly, is pretty
| much a non-issue in the modern world - and if it gets bad
| enough we can always partition the table on deletion
| status and dump the soft deleted rows on a secondary
| server... but I wouldn't worry about that until we hit
| facebook scale. In terms of application developer time -
| I'll probably need an hour of their time to explain it
| once and all existing queries keep working as they're
| working now... and as for that undelete tool, well, we
| don't actually have to build it - if we don't build it we
| can just ask devs to submit manual queries to undelete
| data rows as needed via our migration, or oneoff or
| console interface. We'll want to do a sweep for any
| queries that reference tables dependent on the soft-
| delete having table - just to make sure they're throwing
| an INNER JOIN against the view but, honestly, we could
| just bop those bugs on the head if they ever come up."
|
| Like, I'm definitely not saying soft-deletion is always
| the answer, it takes additional effort and planning, but
| it's really, really easy to do safely.
| TylerE wrote:
| Becauase that's a huge assumption. From my experience, it's
| sorta like the giant box of old cables most techies keep
| around.
|
| Do you reach into it often? No.
|
| But when you do it absolutely saves the day and makes you a
| hero.
| bjourne wrote:
| This is not a solution. It introduces a leaky abstraction which
| sooner or later will lead to errors. Sure, all code _you_ write
| will access the view and not the table. But how can you ensure
| all other code in the organisation uses the view? Perhaps you
| add some access control to the table so that only authorized
| users can read directly from it, but that 's even more
| technical overhead. Then you have foreign keys. If you have a
| "deleted" column in the Customer table you need to remake the
| Invoice table as a view so that it hides invoices for deleted
| customers. The same goes for the InvoiceItem table (foreign key
| of a foreign key) and all author auxiliary information related
| to the soft-deleted customers.
|
| Furthermore, the cost of an error is potentially massive.
| Someone new at the company makes a revenue report based in the
| billed Invoices and does not realize they should query the view
| and not the table... Not great if 90% of all invoices belong to
| soft-deleted customers!
|
| The author is right; soft-deletes are probably most definitely
| not worth it. There are many better ways to solve the problem.
| croes wrote:
| If you mention the costs of reporting error you should
| consider the costs of wrongful hard delete too.
|
| And reports are developed and tested before they are used for
| crucial purposes.
| tshaddox wrote:
| > Furthermore, the cost of an error is potentially massive.
| Someone new at the company makes a revenue report based in
| the billed Invoices and does not realize they should query
| the view and not the table... Not great if 90% of all
| invoices belong to soft-deleted customers!
|
| I'm not sure I buy this argument. It's certainly conceivable
| for that to happen, but no more so than any other case of
| "the engineer queried the wrong table and thus got incorrect
| results." There's never going to be any _technical_ way of
| preventing this: if you have access to multiple sets of
| numbers, and you want to sum up one set of numbers but
| mistakenly sum up the other set of numbers, you 're going to
| get the wrong answer!
| bjourne wrote:
| > "the engineer queried the wrong table and thus got
| incorrect results."
|
| The difference is that the path of least resistance, the
| most obvious method - just query the damn table - is
| incorrect. Bad design can certainly make a system more
| error prone.
| tshaddox wrote:
| In the example from this thread the names of the table
| and view are reasonably clear, so even in a hypothetical
| project without any external documentation of naming
| conventions or engineering processes (code review, etc.)
| the most obvious thing would be to query _blah_ instead
| of _blahwithdeleted_.
|
| Of course, this is _extremely_ hypothetical, and in any
| real project where you 're generating a financial report
| you absolutely must have a detailed understanding of the
| relation that you're aggregating over. Even if your
| project has very strict naming conventions for tables,
| views, etc. you've gotta put in more work than "this
| short string sounds like a plausible label for the
| relation I want to aggregate over."
| bjourne wrote:
| "reasonably clear" are famous last words. It ignores tons
| of evidence on how commercial software development works.
| Everything is "reasonably clear" in isolation, but not
| when you throw in thousands of other "reasonably clear"
| things developers are supposed to keep track while not
| missing tight deadlines. Fact is that if you add
| opportunities for people to screw up then you will make
| people screw up, regardless of how "reasonably clear" or
| "obvious" the system is.
|
| Maybe you need more work experience because believing
| that it is implausible for someone to accidentally query
| the customer_with_deleted table over the customer view is
| incredibly naive. Likewise, people generating financial
| reports _can_ have detailed understanding of data
| modelling but scarily often don 't. Give them extra
| opportunities to fuck up and they will take them every
| time. KISS
| munk-a wrote:
| I don't really agree with that. Within an organization you
| have documentation and instruction as tools - but you're also
| making the dumb approach (SELECT * FROM blah) the correct
| approach. If a user is writing a query against the DB, has no
| idea what the layout of the data is, and decides to prefer
| blahwithdeleted over blah then I'd really question whats
| going on at your organization - blahwithdeleted is pretty
| clearly self-documenting and it's likely a lot of your other
| domain specific tables with be much harder to naively
| discover your way through.
|
| I, personally, would in no way restrict access to
| blahwithdeleted, but I have made a pattern of it in our DB,
| there are about a dozen blahwithdeleted tables - each with a
| corresponding blah view... I usually get about one question
| per every two new employees about which table to use which I
| can answer in less than a minute with a helpful little
| explanation.
|
| I'd also mention I've not made a specific value statement on
| soft deletions in a general case since, if there was a clear
| general case solution we'd just all do that. This is a
| decision that needs to be made on a per table basis - it's a
| rather trivial decision in most cases, but it's very specific
| to the problem at hand.
| scifibestfi wrote:
| > When I worked at Heroku, we used soft deletion. When I worked
| at Stripe, we used soft deletion. At my job right now, we use
| soft deletion.
|
| > As far as I'm aware, never once, in ten plus years, did anyone
| at any of these places ever actually use soft deletion to
| undelete something.
|
| That's wild. So it seems the idea of needing undelete is largely
| an unfounded fear.
| revskill wrote:
| In realworld, there's no concept as deletion from DB !
|
| There's only deactivate account, archive a legacy product,...
|
| Because there's no such thing as delete something from real
| world.
| dfee wrote:
| My experience is that soft-deletes are blunt tools bridging the
| gap between hard deletes and event sourcing (capturing all the
| changes against the table, in a replay-worthy stream).
|
| Event sourcing is hard - because the engineers responsible for
| setting it up and managing it aren't generally well skilled in
| this domain (myself included) and there aren't a wealth of great
| tools helping engineers find their way into the pit of success.
|
| The downsides of soft-deletes (as identified in the article) are
| numerous. The biggest problem is that it appears "simple" at
| first blush (just add a deleted_at column!), but it rots your
| data model from the inside out.
| vbezhenar wrote:
| You can restore to any point of time from your database backup.
| So it can cover some requirements.
| Terr_ wrote:
| > My experience is that soft-deletes are blunt tools bridging
| the gap between hard deletes and event sourcing
|
| Agreed, sometimes it makes business-sense to implement it, but
| in the big picture it's still kludgy and not-ideal.
|
| While full-on event-sourcing isn't always the answer, once
| business-rules prevent you from un-deleting anything there's
| not much point of having all those dead-rows interspersed in
| your regular tables.
| zozbot234 wrote:
| An event store is just a special case of a temporal database.
| The whole point of temporal databases is to natively support
| the notion of historical vs. current data.
| redavni wrote:
| Just want to add that the downsides as identified in the
| article make little sense. Deleting a customers invoices should
| be a very rare thing. I can't imagine any accountant or auditor
| is going to be happy with an IT guy deciding when to delete
| invoices.
|
| If accidentally writing the wrong query is a problem, then
| writing the wrong query is your problem.
| btown wrote:
| The question for either of these systems IMO is: do you trust
| that a change from your upstream represents a true, everlasting
| intention, or is it something that may need to be reinterpreted
| or rolled back in the future?
|
| At my startup, soft deletes for our SKUs are critical, because
| we work with data sources where notoriously both the technical
| systems and the humans driving will all-too-frequently
| accidentally represent something to our connection as deleted.
| Or there might be an irrecoverable error when asking "what
| things are still active upstream" - but that doesn't mean the
| SKUs are deleted, we might just not have certain live details
| until a bugfix is made. So "error status" and "soft delete" are
| somewhat synonymous, and both require investigation into root
| causes and root intents. Yes, the concept of "unerrored and
| active" is peppered through our codebase and analytics - but
| our ability to recover from supplier technical mistakes is much
| higher as a result. And we could absolutely do this with an
| event sourced system - but the tooling for relational databases
| is so much better, it's night and day.
| rodelrod wrote:
| Or you can see it the other way around: soft-deletes are a
| pragmatic alternative to event sourcing that provides a lot of
| the value without requiring a team of super-humans and a
| radical redesign of the existing systems.
| willlll wrote:
| For the control plane part of Crunchy Bridge, on day one I
| decided to go with the deleted_records table that is mentioned at
| the end of this post. It's been great. No need to keep around
| dead data that no one ever looks at.
|
| We don't need to have `where deleted_at is null` on every single
| query. But the best part though is our actual working data set of
| records we actually care about is tiny compared to the deleted
| cruft that would have otherwise been just sticking around
| forever. Backups and restores take no time at all. It's really
| cool that postgres lets you have conditional indexes on things,
| but it's even cooler not to need them.
| dcdc123 wrote:
| If you are using a state manager with models in something like
| rails/django/etc then it is trivial to support soft deletion
| without it infecting your entire code base.
| duxup wrote:
| > All our selects look something like this: SELECT * FROM
| customer WHERE id = @id AND deleted_at IS NULL;
|
| Solution... a whole other table of deleted stuff... in a new
| structure.
|
| Man soft deletes just look better to my eye.
| satyrnein wrote:
| We switched a lot of tables to soft deletes so we could replicate
| those deletes into our data warehouse. You can also use bin log
| replication for hard deletes, but every schema change would break
| it.
| msie wrote:
| I've used Soft Deletion so many times so I'll say it's been worth
| it. I believe using an audit table would have made recovery more
| difficult for me. Anyways take this advice with a grain of salt.
| It's only one guy's opinion. As is mine.
| lolsal wrote:
| In my 20 years of software experience the soft delete is not so
| often used to _undelete_ something, but more often used to _know
| what_ has been deleted. If you delete a record from a table, did
| it ever exist? Can you reference that customer /user/product ever
| again? Not to mention the one-in-a-million case where a customer
| had their account erroneously or fraudulently deleted -
| undeleting saves time/money/bacon when it's needed and is
| relatively inexpensive to maintain.
| justin_oaks wrote:
| The author didn't mention it, but restoring data from a database
| backup is a perfectly reasonable way to handle undeletes. By this
| I mean the situations that are "Oh crap, we didn't mean to delete
| that!" instead of usual business operations.
|
| I've probably restored data from backup maybe 4 times in my
| career. I greatly prefer to do this on the rare one-off scenario
| than to deal with the overhead of soft deleting everything.
| marcosdumay wrote:
| The difference in framing one gets by looking around is
| amazing, even funny.
|
| > I've probably restored data from backup maybe 4 times in my
| career.
|
| Yet, I often use soft-deletes because it allows people to
| undelete things from the software interface and not call me all
| day long.
|
| But that's not the most common reason I have for them. Normally
| it is because the data just can not be gone, and the full table
| is still important somewhere.
| hn_throwaway_99 wrote:
| The deleted records table he mentions at the end is a good
| approach, but:
|
| 1. This can easily be done with a trigger, so that you just call
| a DELETE on the table and deleted tables are copied to the
| deletion table automatically.
|
| 2. I prefer, instead of having a jsonb column, that each table
| has a corresponding `deleted_original_table_name` table that
| exactly matches the schema of the base table, with the addition
| that the first column is a `deleted_at` timestamp. It's easy to
| use helper methods in schema migrations to always keep the table
| definitions in sync.
| pierrebai wrote:
| The author claims pruning soft-deleted entries requires a complex
| query, but hard-deleting an entry would have required the same
| complexity. So it's really not an argument.
| tgbugs wrote:
| One use case that I think is not sufficiently considered in this
| is related to two comments I made about a year ago [0, 1].
|
| If you can _actually_ delete something, then that means that a
| malicious actor can fabricate data an claim that you deleted it.
| GDPR may be well intentioned but systems that have the ability to
| remove any record of a thing lay the groundwork for systematic
| fabrication of data, because any record of the past has been
| erased.
|
| Operationally, I can totally see why soft delete might be
| considered to be problematic in certain cases, but from an
| information security point of view I think it is absolutely
| critic for protecting users against a whole class of attacks.
|
| 0. https://news.ycombinator.com/item?id=27249738 1.
| https://news.ycombinator.com/item?id=27691442
| jonstaab wrote:
| If you implement soft delete, you should surface it to your user.
| That's who is accidentally deleting things, and that's who will
| want to un-delete them. As for side effects like spinning up/down
| servers, build that into your data model (of course, in a case
| like Heroku's that can be prohibitively expensive, so don't).
|
| Source: I write back of house software for resale store owners,
| and accidental deletes happen occasionally. Being able to restore
| things instills a lot of confidence for our customers.
| kleer001 wrote:
| Yea, it is.
| kache_ wrote:
| wait until this guy finds out about financial regulations
| jelkand wrote:
| Soft deletion is certainly very situationally worth it. I've
| found the most value when 1. it is well supported at the ORM
| layer and 2. business requirements dictate strong auditability of
| data. While I have undeleted items on occasion, I've used soft
| deletes more frequently to debug and build a timeline of events
| around the data.
|
| For context, I've worked in fintech where I often needed to
| review backoffice approvals, transactions, offers, etc.
| krstf13 wrote:
| Wouldn't storing the deleted data in an immutable storage, with
| time stamp, be much better for auditability ? I mean how could
| you audit deleted, restored and deleted again data with that
| setup? Also, while I know it's not really accurate, I tend to
| understand relations as sets, it makes me uncomfortable to have
| soft deleted data that are neither member or not member of the
| set.
| chomp wrote:
| Yep, we have an abstraction layer on top of the ORM to provide
| common queries. "Give me all X" will always return stuff not
| soft deleted. Data people also like to go diving through old
| data, and without getting into data warehousing and stuff like
| that, it's not too complex to support a single flag to enable
| us to keep old stuff.
| delusional wrote:
| They might like to, but you should definitely consider if
| saving data no longer required for your business violates
| privacy regulation/ethical considerations.
| chomp wrote:
| Definitely. When a user "deletes" their account, we null
| out all identifying fields to "DELETED_PII_$user_id". We
| have running metrics we compute that would go off the rails
| if we dropped the row completely.
| hinkley wrote:
| In my limited experience, soft deletion also has better
| prospects where partial indexes are involved, since it reduces
| the size of the index and reduces search and insert time a
| little bit. If soft deletes are rare, you aren't going to see
| much of a payback for your investment in code complexity.
|
| And since you can never really be sure what you'll need 2 years
| from now, I imagine there are a lot of anecdotes out there of
| people who implemented it thinking it would be used a lot, and
| turned out to be wrong.
| pilgrimfff wrote:
| All you need is a layer of abstraction to get past the downsides
| of soft deletion. You can use views or your ORM (if you use one)
|
| In Django, it's really easy to create almost seamless soft
| deletion logic in the model manager or in your querysets.
|
| Over the last decade, I find myself using soft deletion more and
| more - usually to accommodate user/client requests.
| adrianmsmith wrote:
| > Instead of keeping deleted data in the same tables from which
| it was deleted from, there can be a new relation specifically for
| storing all deleted data
|
| The disadvantage of this is that if you ever _do_ want to access
| this "deleted" data, e.g. in admin or compliance tools, you now
| have to do it in two different ways, one way for the main data
| and a different way in case the data has been "deleted".
|
| The article asserts you'll never need to "undelete" the data. So
| they're presenting a solution with that assumption, fair enough.
| Without that assumption, however, moving the data back from an
| archive table becomes a pain, and if there are any unique
| constraints e.g. on username or email address, you'll have a
| problem if you've moved the data out of the main table and
| another user has used that username or email address.
| layer8 wrote:
| > now have to do it in two different ways
|
| Use a view.
|
| > if there are any unique constraints e.g. on username or email
| address
|
| Have those in a dedicated table where they aren't deleted, and
| add a synthetic key referenced by the other tables.
| Terr_ wrote:
| > The article asserts you'll never need to "undelete" the data.
|
| IMO it's worth distinguishing between (A) some kind of "click
| to undelete" feature versus (B) simply having that old-data
| conveniently exposed for a developer to manually-edit things or
| craft database-change scripts.
|
| In practice I've only ever seen the latter get used, because it
| requires a developer to figure out how the heck to get "the
| parts that matter" back while preserving the integrity of other
| newer data and obeying certain business-rules.
| dunkelheit wrote:
| This brings back memories... Some time ago I was an intern in a
| team working on a UGC map editor. We were using this soft-delete
| pattern and for some task I needed to deploy a database migration
| that fiddled with the "deleted" status field. It was quite late
| and after the migration finished I almost went home but for some
| reason decided to check community forums. There users were having
| a time of their life taking screenshots of deleted objects that
| suddenly became visible (many of them quite amusing, including
| swear words written in 500km letters). Dunno how this escaped
| testing, but horror of what I have done brought clarity of mind
| and I quickly found an error and devised another migration that
| fixed the data. That worked and I was able to finally go home.
|
| So yeah, be careful with the soft-delete pattern :)
| scott_w wrote:
| The example the author gives is... frankly awful.
|
| I can't think of a single case where you'd want to remove the
| invoices of a customer you delete. Ever. In fact, the opposite is
| more likely to be a big problem, accidentally cascading your
| delete to your financial records!
|
| Using a soft delete, your invoices won't "disappear" because your
| app WILL have a view for looking at just the invoices.
|
| Source: I built a bookkeeping system and soft deletes is a
| necessary feature.
| Pxtl wrote:
| Trivial case I hit:
|
| 1) Client wants to remove user from the system who have left
| their org but
|
| 2) There are objects that were contributed by that user which are
| required to persist beyond the user's deletion.
|
| Those are ideal cases for soft deletion. We can still query
| information about the deleted user to explain who created this
| object, with the note that their account has been deleted.
|
| Probably I should be doing full event-sourcing for this case, but
| delete flag works well. MS offers temporal tables for this use
| case and I'm still considering the implications there -- AFAIK
| ORM support is WIP.
|
| And unlike the article author, I _have_ used soft deletion to
| undelete things. Many times. Maybe he has better users than I do,
| I don 't know.
| outworlder wrote:
| I wish Datomic was made open-source (with maybe some features
| available as an 'enterprise' offering) so that we could actually
| have a decent alternative for this 'soft-delete' problem.
| joshuanapoli wrote:
| Atlassian's deletion-related outage demonstrates why soft
| deletion should be the default. Use hard deletion after a grace
| period for data that truly needs to be expunged. Even if undelete
| is not part of the normal workflow, experience shows that swift
| recovery from bugs and operator errors is a universal part of
| serving users. The less data motion involved in deletion (and
| recovery) the better for both the original deletion process and
| also any recovery process.
|
| https://news.ycombinator.com/item?id=31015813
| dragonwriter wrote:
| > The concept behind soft deletion is to make deletion safer, and
| reversible
|
| IME, as with "updated_by" and "last_modified_at" columns, it's
| usually hazy audit requirements, not making deletion reversible,
| that motivates it.
|
| A proper history store maintained by appropriate triggers solves
| this, and leaves the referential integrity constraints on the
| base table intact. (It can also be used for reversibility if you
| need that.)
|
| Views conceptually would work, but then you get bitten by all the
| ways that all relations are not equal in real-world RDBMSs.
| kardianos wrote:
| This poster misses the point completly. Soft delete is a must
| have for historical data, where you want to keep history, but
| keep the current set clean.
|
| Effectively, you don't check for the soft delete flag if you get
| to it from a an un-deleted record, but you do check for it if you
| access it the other way around.
| armchairhacker wrote:
| Dumb solution: make soft deletes explicit in your backup system.
|
| Your company has a database backup system right? That system
| should be configured so that when it runs a backup, it will not
| remove deleted entries from the previous backup, instead just
| mark them as "deleted_since" the current backup time.
|
| Idk if any backup system actually support this, if there's some
| glaring problem (like you can't just overwrite parts of a
| database backup for some reason), or if most companies just don't
| have backups because they're too expensive (probably not), but
| this is the solution I would go with. It works for other sorts of
| data like file systems as well.
| deerIRL wrote:
| As someone who has done development work with Class A data and
| specifically in the realm of justice, soft deletes aren't simply
| a good idea, they are required by law.
|
| Most of these downsides are easily mitigatable issues as well. As
| many users have stated, something like views solves the issue of
| forgetting the 'deleted' clause.
|
| Lastly, I'm not sure the issue with foreign keys/stray records
| really resonates with me. I'd be hard pressed to be comfortable
| allowing a developer or DBA who isn't fully comfortable with the
| data model to be hard deleting records, let alone flagging them
| as soft deleted.
| ThePhysicist wrote:
| I don't get what the problem is with cascading deletes. I mean
| you typically only use them for foreign keys where deletion of
| the parent object makes the referencing object simply invalid, so
| there would be no reason to leave the referencing object in the
| database.
|
| The point that is true is that queries get more complicated as
| you'll have to add a "WHERE deleted_at IS NULL" to every SELECT
| (once for each table you refer to), but that can be automated if
| you use an ORM. A paradigm that I often use is that all objects
| in the database belong to a role object that determines who can
| read/write/delete the given object. So before doing anything with
| an object I always check the role object (e.g. the "user"
| referencing an "invoice", to stay with the example OP gives), and
| as part of this I check whether the user object still exists.
| Alternatively, you could automate most of the required update
| logic using triggers as well.
|
| But otherwise I agree, soft deletes often don't seem to be a
| worthwhile tradeoff, not sure if I would use them again when
| designing a relational schema. They are very useful for auditing
| and undo though: In a current project, whenever a set of objects
| gets updated I soft-delete the old versions and create new
| objects, keeping the UUIDs intact. That allows me to display the
| entire version history of each object to the user, which can be
| necessary e.g. for compliance reasons. You can achieve this with
| an audit log as well but that would require more logic and
| different queries, whereas querying soft-deleted objects just
| requires a slight modification of existing queries.
| giantg2 wrote:
| "The concept behind soft deletion is to make deletion safer, and
| reversible."
|
| That's one part. The other part is that in many industries you
| have regulatory data retention and audit requirements. This is
| arguably the most valuable and common reason to perform Logical
| deletes.
| brtkdotse wrote:
| In banking and bookkeeping, there's no such thing as a
| "delete". Once something is in the ledger you can't undo it -
| you have to make a new entry that negates the old one.
| hammock wrote:
| Is the same true in mass surveillance?
| m-p-3 wrote:
| and the blockchain.
| tomrod wrote:
| Which is a decentralized, distributed ledger, so...
| banking! :)
| cweagans wrote:
| "banking"
| i_hate_pigeons wrote:
| not every thing is a transaction though, say a deleted piece
| of ref data that still has FKs to other stuff etc
| Hallucinaut wrote:
| These are still versioned in many serious finance systems
| munk-a wrote:
| Yes, but banks tend to have websites with accounts and those
| accounts need to be deactivated when a customer should no
| longer have access (or, even more finicky, specific accounts
| for a client need to be deactivated or activated as they
| change their usage).
|
| All this essentially forces the use of some sort of soft
| deletion. (Activation flags are sort of just a more
| complicated form of soft deletion).
| AdrianB1 wrote:
| Status (active/inactive/other) and existence (present or
| deleted) are very different things, they may be separated
| most of the time. Legal requirements can prevent deletion
| of inactive data, so in some cases one may have a lot of
| inactive but must keep data. Soft delete can help in some
| scenarios, for example you want to give people a way to re-
| activate accounts, but hide the ones marked as deleted.
| jiggywiggy wrote:
| Ha, and then there is the opposite regulation that you have to
| delete user data.
| giantg2 wrote:
| In some industries the retention regulations trump the
| deletion ones. I believe finance is one area where they will
| delete some of your data, but are still required to maintain
| 7 years of specifically listed data.
| MonkeyMalarky wrote:
| Well you see, you need a complete record of when it was
| created, every change that occurred, everyone who could view
| it and also log every access attempt. But it. You're not
| actually supposed to keep it. Just everything surrounding it.
| scott_w wrote:
| Not quite. GDPR (and equivalents) have clear escape hatches
| to allow you to store data if you have good reason (even if
| the data subject requests its removal).
|
| Invoices, from the article, is a great example. That record
| must remain unchanged in most financial regulations. I'd
| wager a customer sending a deletion request for invoices will
| be met with raucous laughter from the legal and finance
| teams.
| nirvdrum wrote:
| My experience at a few start-ups has been that account deletion
| just isn't prioritized. It's not a focus when building an MVP.
| If the application ever gains traction, everyone is then
| terrified they'll accidentally delete customer data that they
| never delete anything. It's a shame. As a user, when I delete
| my account or data in my account, I want you to permanently
| delete it, not keep it around and just make inaccessible to me.
| nerdponx wrote:
| I've also seen businesses retain "deleted" data in order to
| support legitimate data analysis work in the future. And it
| actually can help significantly. Maybe scrubbing PII from
| deleted accounts is a good idea, but those deleted accounts
| are perfectly good data points, especially in smaller/newer
| companies with lower-volume data streams.
| nirvdrum wrote:
| If you want to retain anonymous statistics, fine. I'm not
| keen on you retaining my actual data so you can monetize it
| after our business relationship has concluded. The biggest
| thing a small team can learn is why they failed to retain a
| customer or a trial that didn't convert. For that, usage
| metrics are considerably more valuable than user data.
| jrockway wrote:
| If it makes you feel better, the startups that don't have
| time to delete your data probably don't have a viable
| disaster recovery plan either.
|
| As we learned from the Atlassian snafu, even giant companies
| with billions in revenue often can't recover from disasters.
| (I try to test mine every 6 months. I've never had a test go
| perfectly.)
| tomcam wrote:
| Good to know. What do you were tests look like?
| dexwiz wrote:
| I think billions have been spent bridging the gap between
| "ideal" software and what businesses actually need. Access
| control is another thing I see developers wanting to simplify
| or push to implement later, but is actually a key feature.
| necovek wrote:
| And yet another part is making deletes (appear) instantaneous:
| useful when it involves cleaning up a bunch of "related" data
| possibly living on different services (eg. S3, ES...).
|
| This also helps with the original goal of making them safer by
| manually implementing "eventual consistency" for data living
| outside the transactional world.
| taeric wrote:
| Don't make deletes appear instantaneous? If you have heavy
| weight systems, then it makes sense for provisioning and
| deleting entities is a process, that should be open to
| monitoring.
| jewayne wrote:
| I would argue that in many cases the concept behind soft
| deletion is to make deletion permanent.
|
| Hard deletes retain no memory of what you wanted to be gone, so
| any malfunctioning sync process will continuously recreate the
| deleted record soon after it's deleted. Soft deletes are often
| the only way to make sure deleted records don't reappear.
| trinovantes wrote:
| Assuming you're using a modern database, replication is done
| with paxos/raft and they are formally proven to not allow
| this to happen as both edits/deletions are both just entries
| in distributed event log.
| eeperson wrote:
| Couldn't a malfunctioning sync process undo a soft delete as
| well?
| jayd16 wrote:
| Null is more ambiguous than an explicit conflict. If there
| is literally no record, even of the delete action then
| there's no timestamp for last write wins.
| [deleted]
| silisili wrote:
| Why not just use an audit table, to keep from littering your
| indices?
| jacobr1 wrote:
| You still need a record of the data, which probably isn't
| fully captured in the audit table
| jelkand wrote:
| An audit table is a similar, but not entirely equivalent
| tool. There are circumstance where both audit tables / soft
| deletes are appropriate, and where only one of the two is.
| Other systems that work are append-only tables, event driven
| systems, and I'm sure there are more that I'm not aware of.
| MonkeyMalarky wrote:
| Then there's always the joy of a situation where your client is
| being sued by one of their clients and now needs your help
| recovering everything possible from your platform. And you're
| going to help them because you'd like to keep them as a client
| rather than let them be sued into oblivion.
| taeric wrote:
| This is almost certainly going to bite you if you don't push
| all customer identification data out of your main data stores.
|
| And it will go a long way to making your services harder to use
| if you don't allow users to associate friendly names with
| things. And to assume that the same friendly name will be used
| for a future item. (For example, if you name devices based on
| the room you put them in. Is reasonable to think that when you
| replace a device, that you are likely to want to reuse the
| name.)
| [deleted]
| Apreche wrote:
| I agree with the author that a separate table is the way to go,
| but I go one step further than the author and use database
| triggers to manage that second table. Alternatively, a
| combination of database views and triggers can do the same thing
| without having an actual extra table to manage.
|
| Either way, it allows you to have soft deletion and/or full
| activity logging functionality without the application having to
| know about it.
| encoderer wrote:
| Even if you don't "undelete" something, soft deletes make it
| possible to instantly hide something while saving the expensive
| sql delete for processing later.
| rubyist5eva wrote:
| One thing that I could find in the article: performance.
|
| At least for our use case, soft deletes made everything slower
| because it's much harder to index. For our database we basically
| had to do an audit of all of our WHERE clauses and create partial
| indexes on "not yet deleted" records. Of course, this bloats your
| indexes/disk and hurts write performance so it's not a silver
| bullet.
|
| We've also taken to inserting into "delete records tables" for
| records we may want to recover or for historical reasons. You
| still lose foreign keys but indexing and query optimization is a
| lot easier, and your old data is just still a simple query away.
| jacobsenscott wrote:
| Deletion is never worth it full stop. How do you delete from a
| backup? You can't delete all your backups. Effective dates and
| app level encryption to allow for cryptographic "deletes" is the
| way to go.
| whack wrote:
| We use soft-deletes extensively at our startup. Here's a couple
| reasons:
|
| - Feature creep. "Sometimes our users accidentally hit the delete
| button, or change their minds a minute later. We want to give
| them a way to undo the deletion." Or "I know we said last quarter
| that we users want to delete stuff, but they also want to see a
| list of everything they've deleted in the past." Soft-deletes
| handle feature-creep a lot better than hard-deletions
|
| - It simplifies foreign-keys management. If you want to hard-
| delete something that some other entity is referencing, you'll
| have to hard-delete or modify that other entity first. And
| potentially repeat this process recursively for their own
| references. This is a pain. One could argue that if you really
| want to delete something, you should be deleting all children as
| well. Such arguments are highly domain specific, and very bad
| universal claims. We've seen some use-cases where such pedantry
| is not necessary
|
| - It makes it easier to recover from mistakes and bugs. Customer
| deleted something accidentally and emailed you begging for help?
| Your code has a bug causing stuff to get deleted when it
| shouldn't be? You'll be thankful you did a soft-delete and not a
| hard-delete. Is it going to solve every single problem where the
| data has system-wide ripple effects in a unicorn sized
| organization? No. But it'll still solve a number of problems
| where the data impact is more localized
|
| - It makes debugging easier. You have a clear record of
| everything that used to exist. You don't have to go digging
| through your logs to find something that used to exist but has
| now been deleted
|
| - Speed. All of the above problems can be solved in other ways
| too. The author suggests putting all deleted data in a "deleted
| records table." So now you need to maintain a 2nd table for every
| table that you may want to delete stuff from. All schema updates
| will need to be mirrored on this 2nd table. And you'll need to
| write and maintain code to populate this deleted-records-table
| every time you delete stuff from the original table. All doable
| and straight-forward but takes time away from other things you
| could be doing instead
|
| The main benefit from hard-deletions is data compliance and
| liability. Ie, being able to tell privacy-conscious customers
| that you actually deleted their data. If you're handling any
| sensitive data, you should definitely do hard-deletions at some
| point for this reason. But the other reason the author gave - _"
| it's annoying having to check for `deleted_at` when writing SQL
| queries"_ - seems pretty minor compared to the benefits.
| waspight wrote:
| It seems that it is too easy to delete things in your system.
| Rather than solving it with reversible soft deletes I would
| suggest to improve the UX. I don't agree that it simplifies
| foreign key management, it is most often the opposite from my
| point of view.
| codemac wrote:
| Well, there are several problems with this analysis when you go
| very large (>10000 machines):
|
| - For many applications, it's easiest to put the state of the
| object in the primary key, and thus point reads will fail when
| something gets deleted. This has other problems though with
| hotspotting and compaction during deletes. The deleted table
| doesn't really solve this either.
|
| - For storage systems, GC is critical functionality to implement.
| Most systems whether they want to believe it or not are glorified
| storage systems. Garbage collection is hard to do at scale, and
| I've never seen it implemented as SQL statements rather than
| code. Especially for GDPR etc.
|
| - For large scale distributed systems, foreign key constraints
| are rare if impossible to implement with reasonable latency, so
| they don't exist either way. I haven't worked on a system in >15
| years that had fk constraints.
|
| - For large scale restores where you need to undelete trillions
| of rows, keeping the rows basically pre-assigns the distribution
| of writes. When you have to re-create the rows, you tend to get
| intense hotspotting and failures along the way as you attempt to
| load balance on the keyspace of the writes.
|
| A deleted records table is good for smaller (<10000 machine)
| systems when latency between nodes can be kept within the same
| campus. It can really improve performance of your GC if reading
| by column isn't fast compared to reading by table.
| khaledh wrote:
| One reason we encourage keeping soft-deleted records at least for
| a while is synchronizing data across systems. We want to
| propagate deletions downstream. At some point when all downstream
| consumers have caught up, we can purge the soft-deleted records.
| JohnBooty wrote:
| I've been a software dev since the 90s and at this point, I've
| learned to basically do things like audit trails and soft
| deletion by default, unless there's some reason _not_ to.
|
| Somebody _always_ wants to undelete something, or examine it to
| see why it was deleted, or see who changed something, or blah
| blah blah. It helps the business, it helps _you_ as developer by
| giving you debug information as well as helping you to cover your
| ass when you are blamed for some data loss bug that was really
| user error.
|
| Soft deletion has obvious drawbacks but is usually far less work
| than implementing equivalent functionality out-of-stream, with
| verbose logging or some such.
|
| Retrofitting your app and adding soft deletion and audit trails
| after the fact is usually an order of magnitude more work. Can
| always add it pre-launch and leave it turned off.
|
| If performance is a concern, this is usually something that can
| be mitigated. You can e.g. have a reaper job that runs daily and
| hard-deletes everything that was soft-deleted more than _n_ days
| ago, or whatever.
| rjzzleep wrote:
| In rails you get these things for free. What I don't get is why
| everyone rolls their own framework with node.js. It's basically
| 90s PHP all over again.
|
| EDIT: Soft delete is a trivial piece of code when the framework
| has a well defined transaction system for its ORM. It's not
| really related to Rails per se. Your statement is extremely
| disingenuous, while trying to look smart. Audit trails _can_
| be(but don't have to be) more complex, especially when the
| framework uses a lot of stored procedures to handle operations.
| But other than that these frameworks are specifically designed
| to REDUCE complexity of such operations, dependency costs -
| which are huge in node.js, specifically because you can mix and
| match anything into everything.
|
| Node.js people tend to stitch together XSS solutions, random
| templating solutions based on their frontend work, even basic
| salting of auth passwords becomes unpredictable because you
| have 30 options on minimal auth libraries.
|
| But yes nothing is ever free. If you want to use Rails you
| still have to learn ruby and the framework and a basic
| understand of how ActiveRecord builds queries if you want to be
| writing performant code. And the same applies to Laravel,
| Django, or whatever of the 50 patchwork node.js solutions you
| want to base your code on.
| joshmanders wrote:
| I don't know why you're ragging on Node.js users or even PHP
| for that matter as both ecosystems have this stuff covered
| too.
|
| Also you're comparing language/runtime with an actual
| framework and then dogging those users...
|
| If you want to compare Rails with Node/PHP then I'd suggest
| comparing with things like Laravel (PHP), Adonis (Node) and
| you'll find everything you can do in Rails is done in
| Node/PHP too.
| sky_rw wrote:
| Nothing is free. In Rails you have _currently well maintained
| libraries_ for this. There are still complexity costs,
| dependency costs, data costs, performance cots, etc, etc,
| etc.
| gmiller123456 wrote:
| The author uses the "no one ever undeleted anything" as the
| primary justification. I think this is the part they miss. I've
| never undeleted a user either, but there have been many times
| I've gone back to look at something. Either a complaint finally
| gets around to me as to why the user wanted their account
| deleted (e.g. feature not working) and it helps to figure out
| why. Or they're returning and want things set up like they
| were. Or someone is taking over their roll and needs to be set
| up like the last person who's already gone.
|
| Though you really shouldn't be relying on a database for an
| audit trail. It might help find some issues, but things
| actually used for security shouldn't be writable so easily.
| alerighi wrote:
| It may be convenient, but under the GDPR is illegal. When an
| user deletes an account, all the personal data associated
| with that user must be deleted (or anonymize it in a way that
| it's no longer possible to associate it back to the
| particular user).
|
| You cannot just keep user information forever "just in case"
| they are useful again.
| changoplatanero wrote:
| User deleting an account is just one way that stuff gets
| deleted. There are other deletion scenarios where it is
| appropriate to keep the information after its deleted.
| Akronymus wrote:
| Yeah. Basically anything that an employee can mess up,
| should be reversible IMO. But actions such as deleting an
| account should have the option of "YES, DELETE IT
| PERMANENTLY, THIS CAN'T BE UNDONE"
| smolder wrote:
| I wonder what percentage of companies who even _appear_ to
| comply with deletion requests actually do full deletion in
| practice. I suspect it 's small, knowing how many things
| are coded to fake-delete for convenience. Businesses also
| tend to keep cold data backups around. (Maybe backups are
| exempted? I don't know.) There might even be cases where
| ostensibly deleted data can still be recovered from a disk,
| if they haven't overwritten.
| dataflow wrote:
| It's illegal but companies don't necessarily care to avoid
| soft deletes regardless. I think companies wait to get sued
| so they can try to argue in court why their soft deletions
| are reasonable and why it's too technically difficult for
| them to do hard deletes.
|
| To be honest, in the age of modern overprovisioned storage
| drives that remap blocks frequently, I'm not really sure
| you _can_ implement genuine "hard" deletes without
| choosing significantly unorthodox hardware (or destroying a
| drive every time you need a single bit erased), no matter
| how much you want to in software. One of those details that
| I'm both surprised and unsurprised doesn't seem to have
| been addressed legally. I feel like a court ought to at
| least buy this aspect of the argument, so maybe they'll buy
| that it can be difficult in terms of the software too? Who
| knows. My guess is that a reasonable court would
| accommodate something that's reasonable for a given
| company, but there are lots of variations that could fall
| into that category.
| afiori wrote:
| I would guess that there are quite a few limits to that...
|
| A user has a long history of participating on your forum
| and other users have quoted their messages far and wide.
| Collectively all of the messages posted on your forums
| (with or without timestamps) reveal a few PII about the
| user. Do you have to delete those?
|
| The user filed a bug report about a functionality not
| working, do you have to delete the text of the bug report?
|
| Arguably if your user table look like [user_id,
| creation_date, deletion_date, status, account_type] then
| this table does not contain any PII.
|
| Assuming that user content is not automatically PII, whose
| responsibility is it to track where PII can be?
|
| Sometimes users doxx themselves (like mistakenly sharing
| tax return forms instead cat pics), in such a case it is
| the user responsibility to signal this to you.
|
| If the user filed an issue saying "when I insert my name
| (Abe Cox) the input validation fails" is it you or the user
| that need to read through all the issues to find this case?
|
| My point is that GDPR + right to be forgotten cannot make
| it look like you never had an account at all, especially
| without user assistance.
| 411111111111111 wrote:
| Stop spreading misinformation. Nobody is required to hard
| delete the very second the [delete] button is pressed.
| You're following the law as long as that mentioned reaper
| job exists and runs at least once per quarter (90 days)
| Akronymus wrote:
| I messed up a mass update query enough times to leave myself
| SOME provision to undo it.
|
| The exceptions are when there is a well tested query that
| affects a single account or something. Like GDPR
| wnevets wrote:
| This is something that I was forced to learn the hard way more
| than once. Literally today I needed to undelete a record
| because a customer was confused by what the "delete" button did
| and wanted their record back.
| wst_ wrote:
| Isn't it the problem of UI, though. If the user would be
| informed about the consequences (possibly with bold red font
| and with a confirmation checkbox) would they still click that
| button?
| thrashh wrote:
| As a user, I would still sometimes and then regret my
| decision
|
| Soft deletions are awesome
| robertlagrant wrote:
| One will, one day.
| wnevets wrote:
| All of that exist.
| doctor_eval wrote:
| File this under "falsehoods programmers believe about
| users": they act rationally.
| pg_1234 wrote:
| This 100%
| nicoburns wrote:
| +1 on audit trails. And one should always store audit trails in
| machine readable format. That way you can not only manually
| inspect what happened, but you can query it too (and
| reconstruct the entire state as it existed in the past if
| necessary).
| pradn wrote:
| Soft deletion is just one way to achieve undeletion. The
| author's proposed solution of moving the resource to another
| table works just as well. You can move it back to the non-
| deleted table to perform the undeletion. You can keep around
| these deleted objects as long as you want; they work as a
| subset of a proper audit trail. The cost of course is you have
| more tables, but that is less of a cost than having to add
| "deleted=False" predicates in all of your queries.
|
| Also note, if you use a soft-deleted column, indexes need to be
| keyed by that column as well if you want to access non-deleted
| objects quickly. That's extra complexity.
| mjevans wrote:
| Even more important; the deleted records don't need to live
| in your cache / RAM / etc. Potentially faster queries.
| smackeyacky wrote:
| What seems to be missing here is the DB tech he is using. On
| a proper database you can do your "undeleted" with triggers
| and it's relatively trivial. Nonsense like a "deleted" column
| on your main data table just seems silly.
| robertlagrant wrote:
| Triggers don't solve what the author's getting at, i.e. who
| knows what else outside this database changes upon deletion
| that would need to be reversed? It's all in the article.
| icedchai wrote:
| I worked at a place that kept all the deleted stuff in
| their main tables. It turned out over 90% of the rows were
| deleted. I'm not sure how often something was undeleted,
| but it was not very frequent. Some of these soft deleted
| rows were 5+ years old. Archive that crap.
| augustiine wrote:
| Scarbutt wrote:
| With the "cloud" you can do PITR as a fork for most mainstream
| databases, which is an enough solution for some.
| efsavage wrote:
| Also if people know that deletion is reversible, they're more
| likely to actually do it, which can keep things generally
| tidier.
|
| I don't actually like using a "deleted" column, my standard
| table has a status column, and deleted is one of those states,
| along with active/pending/suspended/etc, as the needs dictate.
| This way I get soft deletes for basically free both in the
| schema, but also in the queries (which would generally default
| to active), so it's not really the spaghetti that the author
| discusses.
| robertlagrant wrote:
| That still has the same issues. You have to remember to set
| every linked table's records to the same state, or remember
| to query every linked table through the table that has the
| lifecycle column on it.
| augustl wrote:
| This is why I don't understand why Datomic isn't more popular.
| Pretty much every system I've worked on never needed to scale
| past 100s of writes per second due to hard limits on the system
| (internal backoffice stuff, fundamenally scoped/shardable to
| defined regions, etc etc). And since Datomic is built with that
| in mind, you get the trade-off of full history, first class
| transactions and being able to query for things like "who
| changes this attribute to its current value, when, and why" is
| such as super power!
| rjbwork wrote:
| In the past I've used MSSQL's Temporal Tables (also called
| System-Versioned Tables) to implement this kind of
| functionality. This also gets you, for free, Type 2 SCD
| functionality for OLAP-style queries.
|
| I can't wait until Postgres has this kind of functionality
| baked in. It's such a nice feature.
| synthc wrote:
| Datomic is great but i think its missing some features that
| many enterprises need (access control and a query planner).
| Also it seems to be mostly built for DynamoDB/AWS.
| nightski wrote:
| Personally looking at their pricing since it is so tied to
| AWS it is completely non-transparent how much it's going to
| cost us now or in the future.
|
| I really like the concept of datomic though.
| Scarbutt wrote:
| Too niche of a technology, tied to clojure, not open source
| and very slow(doesn't matter for most internal apps though).
| For many, it's better to do the tedious thing here and there
| with postgres. SQL also has strong grip on databases and if
| you look at it the other way around, postgres has lots of
| features that datomic lacks, with datomic you almost always
| need a secondary database.
| tsuujin wrote:
| It's hard to get adoption for expensive toys.
|
| I think Datomic is neat, and I'd like to use it, but it is
| prohibitively expensive for a personal or hobby project.
| Personal projects are where I get excited about tech, and
| when I'm excited I'm more likely to adopt it in my day job.
|
| They're really shooting themselves in the foot by not having
| a one-click install free tier or a self hosted option.
| marcofiset wrote:
| You may want to look into XTDB then. Not quite the same as
| Datomic, but they share many similarities. It's free and
| open-source.
| tsuujin wrote:
| Neat, thanks!
| th0ma5 wrote:
| For me it is simply that isn't open source (at least last I
| checked.)
| corrral wrote:
| I'd add tagging, for anything that could conceivably use it,
| when you're doing DB design. May as well start with support,
| even if the functionality's initially dormant. Someone _will_
| ask for it, directly or indirectly, and it won 't take long
| before they do.
| pgt wrote:
| XTDB: xtdb.com
| jandrewrogers wrote:
| The complexity of soft deletes is that they implicitly introduce
| the difficult semantics of bi-temporality into the data model,
| typically without the benefit of a formal specification that
| minimizes the number of edge cases that have to be dealt with.
|
| Mechanically, I've typically supported soft deletes with audit
| tables that shadow the primary table, with a bunch of automation
| in the database to make management mostly automagic. It isn't too
| bad in PostgreSQL.
| Minor49er wrote:
| It's interesting that the author notes that, as far as he's
| aware, nobody's ever undeleted something. It could be true. But
| I'm wondering if maybe he simply hasn't seen it first-hand since
| the action of recovering something is often handled by a
| customer-facing team and not by a developer.
| wizofaus wrote:
| The assumption seems to be that the undelete operation is
| performed by the vendor's support staff, rather than the end
| user. I've been involved in the implementation/ maintenance of
| systems with soft delete that was entirely for that purpose - it
| allowed the user to delete/undelete at will. In our case it also
| meant certain uniqueness constraints were kept in place
| effectively reserving things like email addresses or business
| registration numbers that couldn't be reused until a hard delete
| was issued. Arguably it's more like an "is active" flag in such a
| case, but it's debatable what the distinction is.
| vivegi wrote:
| If you do want to retain the deleted records for any purpose
| (audit, compliance etc.,) it is better to design a DELETED table
| to maintain the history (just as suggested in the article towards
| the end).
|
| Once your main tables start getting to the order of tens of
| millions of records, the filtering by 'deleted_at is NULL' or
| 'deleted_at is NOT NULL' gets in the way of query performance.
|
| NULL is also not indexed. So, that throws the spanner in the
| works sometimes (depending on the query).
| danielrhodes wrote:
| In a previous place I worked, we were programmatically using Box
| to store files. One day we were presented with a case study in
| Murphy's Law: a script went awry and deleted everything (10s of
| thousands of files). There was no clear way to recover these
| files, they were gone from what we could see. It was a disaster.
| We got a Box support person on the phone and described what had
| happened. There was a pause, some mouse clicking and then: "Ok,
| those files will be back in your account in an hour."
|
| It was 100% our fault. But soft deletes saved us that day. If
| you're in a situation where you or your customers could benefit
| from the same, it's wise to not only embrace them but also make
| sure they work.
| latchkey wrote:
| That sounds more like a lack of backups and disaster recovery
| than it does soft deletes.
| pradn wrote:
| The author agrees with you in principle. All the author is
| arguing against is the use of "deleted" bool column to indicate
| deletion. His solution of moving deleted objects to their own
| column gives you the ability to un-delete, just as before. Only
| now, your queries and indexes are simpler and you get to use
| foreign keys and other useful futures.
| jtwebman wrote:
| The bigger reason to use soft deletes is to keep history. Just
| because someone does not access doesn't mean we should report on
| the things they did months ago.
| Ensorceled wrote:
| I use soft deletes in our system and literally used it to restore
| an accidentally deleted item about 3 hours ago. Took a second to
| toggle the deleted item.
|
| I don't get how this rocket science. Almost every query in the
| system is some kind of where clause on a fk to account or user or
| project or some other critical object ... so there are only a few
| places in the ORM where I need to support this.
| vyrotek wrote:
| I've found SQL Server Temporal Tables are a good alternative to
| get the benefits of soft-deletes without some of the drawbacks.
|
| https://docs.microsoft.com/en-us/sql/relational-databases/ta...
| tfigment wrote:
| Mysql also has this now. I've wanted to rewrite out apps to use
| it but haven't gotten around to it. Postgres has it as an addon
| but feels like it wouldn't work for us until its first class
| supported.
| evanelias wrote:
| MariaDB has this -- called system-versioned tables -- but
| MySQL actually does not. Although they share a common
| lineage, MySQL and MariaDB are somewhat distinct databases at
| this point, with each one having a number of features that
| the other lacks.
| nicoburns wrote:
| I believe first-class support is in development for Postgres.
| The article I read made it sound like it would probably land
| in either the next major version, or the one after that.
| AdrianB1 wrote:
| Soft deletes are really worth in the right scenario. There are
| cases when they can be avoided, cases when they are not worth and
| cases when they are worth, for the problems presented in the
| article there are solutions or workarounds.
| ajuc wrote:
| If you need this why reimplement it when you can use database
| history (dbms_flashback or SELECT AS OF in Oracle)?
| 202206241203 wrote:
| It's something that a team of a PM, a QA and two developers can
| bill for at least a sprint. So, well worth it.
| gigatexal wrote:
| There's a lot wrong with this write up. Why would anyone want to
| delete corresponding invoices when you "delete" the corresponding
| user? And GDPR provides a caveat that if you need the data for a
| biz usecase like legacy reporting you can keep the data (I think
| it has to be masked or something but it's not insane to say you
| must delete data on request that could materially affect a
| company like removing transactions).
|
| Just put a filtered index on the column to better query non
| deleted data.
|
| On the whole I don't think in practice the author's take makes
| much sense.
| justin_oaks wrote:
| For those who are expressing favor with soft deletes, do you
| default to soft deletes on every table unless you know you won't
| need them? Or do you only apply them where you know you'll need
| them?
|
| I think people arguing for and against soft deletes both
| understand that there are cases where you want to use them and
| when you don't.
| baq wrote:
| soft delete everywhere by default. true deletes only after
| retention policy expires, if FK constraints allow it (best if
| you can drop whole partitions).
| mrinterweb wrote:
| For audit trails in rails, I still like papertrail.
| https://github.com/paper-trail-gem/paper_trail. It provides the
| ability to restore records as well as auditing abilities.
| muhaaa wrote:
| Always use a temporal database (datomic, postgres with
| temporal_tables extension). You get out of the box the full
| history of your data. That is really helpful for business
| intelligence and analytics, auditing / audit log (security,
| accountability), live sync & real-time features and as a bonus
| easy recovery after application fails.
|
| If disk gets to full, project the latest time slice into a new
| database and move the old database onto a cold storage.
| sam_lowry_ wrote:
| I do not understand the foreign keys issue. Do not use the
| _deleted_at_ timestamp that is nullable by default. Instead,
| nullify the field when the line is deleted. Foreign keys on NULL
| values will be possible.
|
| In any case, soft deletion is usually a sign of incompetence.
| Whenever I saw it on a project, both soft deletion and the
| project turned sour.
| nwah1 wrote:
| If you have a lot of stored procs then the argument makes some
| sense. If you do most things in code, then I would argue these
| complaints are moot.
|
| In your code you can isolate all soft-deleting from business
| logic in the ORM layer or data layer, so the complaint about
| littering your codebase is moot for me. For instance, using
| Entity Framework, you can change deletes to soft deletes in a
| centralized place for all records matching a particular
| interface, then add a query filter that applies in the background
| for all queries.
|
| The complaint that soft deleting is never done is maybe valid
| since you can review things with audit logging or backups without
| risking unknown effects of an un-delete. But if you need a
| recycle bin feature then you get that for free if you just build
| that in from the start, and it is one more guarantee.
|
| The risk of orphaned records is real, although you could probably
| handle most cases generically in the data layer or ORM as well.
| It seems like there's just tradeoffs to the various approaches.
| Do you want to err on the side of deleting data, or on the side
| of keeping it? Do you worry more about orphaned records or data
| loss?
| [deleted]
| openthc wrote:
| I use a delta-log table, so each INSERT/UPDATE/DELETE on objects
| I care about are captured (via trigger) -- but that one has to
| get date partitioned. So in my system a DELETE statement (and
| DELETE CASCADE) work as expected -- any history has to be
| discovered from the logs
| timomax2 wrote:
| We just have a history table (for each table) where all deleted
| and past versions of record are stored. Seems to solve all the
| issues. The history table is NOT part of the application, but is
| there for audit and diagnostics etc.
| [deleted]
| ivank wrote:
| https://github.com/xocolatl/periods implements SYSTEM VERSIONING
| for PostgreSQL and moves deleted rows to a history table.
| llimos wrote:
| Do any databases let you refer to constant values in foreign
| keys?
|
| Then you could do FOREIGN KEY (foreign_id, NULL)
| REFERENCES foreign_table(id, deleted_at)
| munk-a wrote:
| I don't believe that's possible in postgres at least - but I
| don't think it's a huge concern either - you can have
| deleted_at cascade via trigger or just use views to hide the
| data - both are extremely easy to implement at the DB level
| without the application devs ever needing to worry about what's
| what.
| agentultra wrote:
| Soft deletion by storing the row in JSON won't survive months of
| schema migrations. If restoring a record is rare you don't want
| to have to find out that there's no way to map the old data to
| the new table when it matters.
|
| There are cases where you shouldn't be deleting or updating data;
| auditable and non-repudiation systems for some regulatory
| compliance come to mind. Best to use patterns that don't require
| those operations.
|
| Soft deletion does come at a cost. Choose carefully!
| empiko wrote:
| The "Code leakage" problem can easily be solved by using views.
| Or am I missing something?
| dboreham wrote:
| Solved with "...and deleted = false"
| krascovict wrote:
| If it's the case of deleting files safely, I recommend shared,
| it's very good...
|
| https://wiki.archlinux.org/title/Securely_wipe_disk
| Smoosh wrote:
| DB2 has implemented temporal tables which can automatically
| capture all changes to the primary table.
|
| https://www.ibm.com/docs/en/db2/10.1.0?topic=tables-history
| jacksnipe wrote:
| The ONLY reason that you should avoid soft deletion is that
| deleting things permanently in a soft-deletion-based system is
| hard and error prone.
|
| GDPR, among other regulations, requires that you be able to do
| this sometimes; and it requires that the data REALLY BE GONE.
|
| But I really think that soft deletion should be the default
| unless you think you'll be fielding user data deletion requests.
___________________________________________________________________
(page generated 2022-07-19 23:00 UTC)