[HN Gopher] Soft Deletion Probably Isn't Worth It (2022)
___________________________________________________________________
Soft Deletion Probably Isn't Worth It (2022)
Author : thunderbong
Score : 71 points
Date : 2024-01-09 12:18 UTC (1 days ago)
(HTM) web link (brandur.org)
(TXT) w3m dump (brandur.org)
| bazil376 wrote:
| The GDPR deletion requirement has been most compelling (and a
| headache) for my application
| allan_s wrote:
| IANAL but I got confirmed several time that you can soft delete
| if all fields are then anonymized, the law does not care about
| the concept of database row as long as the PII is deleted
| Detrytus wrote:
| If you anonymize data then it isn't soft delete anymore as
| you cannot un-delete it. What you are describing is more
| suited to generating test data for machine learning
| algorithms, or simply for developers that need to set up db
| instance in DEV environment.
| rawgabbit wrote:
| To satisfy GDPR requirements when customers leave, you can
| blank out the PII fields. No need to blow up your application
| by deleting records.
| taeric wrote:
| Also, a lot of records are the businesses? They may have to
| remove who it was that they did business with, for
| compliance, but they do not have to forget that they did
| business? (Right? Genuine question.)
| The_Colonel wrote:
| Soft-deletion ("recycle bin") is an important feature in many
| user-facing applications.
|
| But if it's not surfaced in UI then I agree it doesn't make much
| sense.
| graftak wrote:
| Soft delete is not a recycle bin but the process of marking a
| record as deleted and revoke access instead of actually
| deleting it.
| antonyt wrote:
| One of the most common reasons to "mark a record as deleted"
| rather than deleting it is so you can undo deletes without
| having to fish data out of a backup. I.e. soft delete is
| effectively a recycle bin. It might not be a user-facing
| recycle bin, but it's a recycle bin nonetheless.
| graftak wrote:
| That makes absolute sense.
|
| The way it was overlooked by me is because a 'bin' is
| something you clear regularly, but deleted records will
| likely persist forever.
|
| That being said; I once helped out a friend who's computer
| became slow to a crawl, only to find out they'd never
| emptied their bin so storage had only a few GB of free
| space left.
| falcor84 wrote:
| >deleted records will likely persist forever.
|
| If space is an issue, you can have a scheduled task to
| really delete all "deleted" items that were last updated
| e.g. more than 3 months ago.
| fhub wrote:
| > And forgetting that extra predicate on deleted_at can have
| dangerous consequences as it accidentally returns data that's no
| longer meant to be seen.
|
| For postgres you could consider using a view instead. See
| https://evilmartians.com/chronicles/soft-deletion-with-postg...
| mike256 wrote:
| You can do the same in other dbms. Just use an instead of
| delete trigger instead of the rule.
| icedchai wrote:
| This sort of logic is a good candidate for views. Of course,
| someone could choose not to use the view, but you can solve it
| with DB permissions, etc.
| OscarCunningham wrote:
| The 'deleted records table' they suggest at the end of the
| article is a really good alternative!
| brycelarkin wrote:
| Large tables use the delete table. Small tables use the
| deleted_at column.
|
| Surprised you never got a request to undelete something. It's
| such a common customer request in b2b saas.
| urdbjtdvbg wrote:
| Exactly. As a "backup" for when a client shoots themselves in
| the foot (or face!) it's invaluable. But only if the client is
| worth enough to bother.
|
| If you don't have high value clients I can see how it's
| unnecessary I suppose.
| alganet wrote:
| The way I like to solve this is using database snapshots. Just
| make sure you have an incremental snapshot setup for:
|
| - Last round hour
|
| - Last round day
|
| - Last round week
|
| These periods can be adjusted to your/client needs. If you need
| something that was deleted, you restore the snapshot into a new
| live db and copy whatever you need. You could charge for
| solving "I just shot myself in the foot" support requests based
| on the cost of restoring those snapshots.
|
| From my experience, this is much cleaner than having a spread
| of soft delete columns all over the codebase.
| pixl97 wrote:
| I mean, if you have a functional fast acting organization,
| maybe.
|
| It seems in the vast majority of organizations I talk to,
| this is not how things work at all. To get a db restore on a
| different box you have to have at least 2 different teams
| involved. One to provide a machine/vm/aws instance to restore
| to, and another to provide the snapshot/backup.
|
| With a soft delete all you need is the application
| administrator that has SQL access, which is typically far
| closer to the person that needs the data restored than the
| operations team is, hence it gets 'fixed' in a more timely
| fashion.
| alganet wrote:
| That's unfair. For the snapshot scenario, you described the
| bureaucratic sad path for doing it. For the soft delete,
| you used a happy path.
|
| Let's compare both in either best-scenario or worst-
| scenario:
|
| Snapshots on good team with no bureaucracy: Everything is
| automated. Even retrieval is automated via individual,
| commited, retrieval commands. Setup is done once and
| requires little maintenance, retrievals can be reused if
| fallen in same category (restore account, restore
| conversation, etc).
|
| Snapshots on shitty team with lots of bureaucracy:
| Everything is manual, AWS-panel operated and only a few
| people have access, you have to escalate to initiate the
| proccess.
|
| Soft deletes on good team with no bureaucracy: Everyones
| respect the soft deletion, no one queries it or do funny
| stuff unless for recovery purposes. The soft delete related
| columns, structures and tools are standardized.
|
| Soft deletes on shitty team with lots of bureaucracy: There
| might be multiple, competing standards for soft delete
| column naming and structure. Other teams ignore the soft
| delete and query hidden data either knowingly (tricky
| hacks) or unknowingly (data team unaware of soft delete
| model), you can't do shit about it. When it breaks your
| soft deleted data, you're the one who has to fix it.
|
| If we put external stuff in the mix (teams, people, org
| structures) we can spin it the way we want, but that
| doesn't reveal much information about the technical issue
| itself (having some kind of reasonable data recovery).
| bhouston wrote:
| Always. And sometimes they are a big customer who screwed up
| something big and if you can not restore it easily, you have to
| go find a DB backup and write some scripts to bring it back
| anyhow.
|
| Soft delete is a necessity in SaaS. And then combine it with
| hard delete either triggered manually (empty recycle bin) or
| time based (after a month, etc.)
| joshstrange wrote:
| I hate that I agree with the author overall. I mean I probably
| will keep adding a `deleted` column to my tables and cursing
| myself when I mess up queries to hide them but I can't help
| thinking they will save me a ton of headache one day...
|
| It's one of those "If you have it you will never need it but if
| you don't you will wish you had it"-types of things. I'm sure
| it's silly/stupid but it makes me feel better/safer.
| graftak wrote:
| Would a safer method be to have a <entities>-deleted table and
| migrate a deleted record to that table when it's 'deleted'.
|
| I'm not versed in databases but I'd expect you can even set
| additional access rules to individual tables making cross
| referencing very hard.
| joshstrange wrote:
| Yes that would work and be safer in a number of ways. I think
| I always avoided that because it requires 2 operations
| (delete and insert) and a tiny bit of added complexity. I
| think I'll try that approach the next time I do something
| like this.
| vaidhy wrote:
| Yes and no. If you are thinking of a no-sql database, then
| this is an easy approach. However, I would expect most of the
| systems to be using normalized tables and this approach would
| be pretty hard due to relational constraints.
|
| You would have to have a separate database copy running for
| deleted entries so that it can be consistent within itself.
| graftak wrote:
| They way I see it relationships to the deleted record must
| always be nullified/marked deleted, because otherwise you
| get broken records referencing deletions. And relationships
| from a deleted record might as well persist because the
| foreign key will (fingers crossed) remain untouched.
|
| Again, not versed in db at all so this is likely total
| rubbish.
| vaidhy wrote:
| You are correct.. But if you move the deleted record to a
| different db, you have to follow the chain of
| relationships and move more records to the "deleted" db.
| Which means you would have a create a full database to
| maintain the entity relationships.
| nielsbot wrote:
| That's what the article suggests (at the end)
| Sohcahtoa82 wrote:
| Why not name your database "my_db_all" and then create a view
| called "my_db" that's just "select * from my_db_all where
| deleted != true", or something to that effect?
| waych wrote:
| This seems to ignore the issues raised concerning foreign
| keys and consumer data protection pruning.
| icedchai wrote:
| Just because a row is soft-deleted now doesn't mean it
| can't be actually deleted later.
| eastbound wrote:
| Why not use a history table with JSON serialization of data, if
| the goal is to restore but not have to deal with orphan records
| and forgotten WHERE clauses?
| vlod wrote:
| Previous comments:
|
| https://news.ycombinator.com/item?id=32156009
| krasznahorkai wrote:
| For Data Warehousing purposes it's nice if source systems use
| soft deletion for easy detection of deleted rows since a certain
| point in time. I don't think the use-case is limited to undoing
| user deletions.
|
| If there is no soft deletion or audit table available, the
| alternative is to either 1) Select all the source primary keys to
| find out which no longer exist in the source but do exist in the
| data warehouse 2) If there is no primary key available, select
| the whole dateset for full row comparison
|
| Since in corporate settings data warehouses load at least once a
| day, you can imagine the amounts of data being moved around
| needlessly.
| taeric wrote:
| For data warehousing, you usually have a ledger of events? Such
| that you would just record that a record with a given
| identifier was created/modified/deleted/etc. Right?
| kdamica wrote:
| Soft deletion is important if you want to do analytics on your
| data. Hard deletes mean that you can never reconstruct an
| accurate history.
|
| Also, if you're not careful, hard deletes can lead to primary
| keys getting reused, which can cause all kinds of problems if the
| delete doesn't cascade into every other system. I had this happen
| in practice once and it was a huge pain.
| taeric wrote:
| Of course, soft deletes can cause their own problems where you
| do want to reuse a key. That, or you have to start imposing
| uniqueness constraints on a non-key value. Certainly valid, but
| not at all user friendly.
|
| Consider, if you are creating resources in something like a
| cloud provider, you can setup permissions to things if they are
| based on names or tags. If that name or tag is a UUID like
| thing, that means you cannot setup the permissions before the
| thing is created. And if it is ever accidentally deleted, you
| cannot restore things easily.
| 0x457 wrote:
| You can't reconstruct an accurate history with either style of
| deletes. Only append-only event logs allow that.
| eternityforest wrote:
| Is there a database that does the WHERE deleted=null for you?
|
| Like, some way to create a special column that makes it so every
| query to that table automatically has a certain filter unless you
| use some other keyword to override it?
| rawgabbit wrote:
| DBs with Materialized Views
|
| Here is the syntax for Snowflake. CREATE MATERIALIZED VIEW mymv
| COMMENT='filtered view' AS SELECT * FROM mytable WHERE
| isdeleted=0;
| setr wrote:
| Why materialize? A normal view is sufficient
| rawgabbit wrote:
| It was to answer the OP question. Querying a materialized
| view is essentially the same as querying the original
| table. The MV allows for automated indexes etc.
| culturedsystems wrote:
| Temporal tables, aka system versioned tables, are something
| like this: https://mariadb.com/kb/en/system-versioned-tables/
| loevborg wrote:
| IS NULL!!!
|
| not = NULL!!!
| 6th wrote:
| cover both bases due to this ambiguity =)
|
| deleted IS NULL OR LOWER(deleted) = 'null'
| vyrotek wrote:
| After having done "deleted_at" for years, we prefer systems like
| MSSQL Temporal Tables. Otherwise, managing a deleted_at column is
| something you have to remember to account for in indexes and in
| all queries.
|
| Instead, just nuke that row and have a copy in a separate time-
| based table.
|
| https://learn.microsoft.com/en-us/sql/relational-databases/t...
| 0x457 wrote:
| Someone either never worked with b2b, LEA or UGC...
|
| Someone breaks community rules and bullies someone or something
| similar, they delete their comments. What do you do now?
|
| Business customers expect that everything they delete is possible
| to restore, even if the delete button clearly says, "this action
| is irreversible".
|
| Law Enforcement Agencies would expect you to be able to pull data
| from them as well.
| miki123211 wrote:
| On the other hand, other law enforcement agencies explicitly
| require that kind of data to be deleted, especially when the
| GDPR is concerned.
| zachmu wrote:
| Using a version controlled database gives you "soft deletes" with
| none of the downsides he mentions (except GDPR -- a versioned
| data store is probably hard to make GDPR compliant).
|
| https://www.dolthub.com/blog/2022-11-03-soft-deletes/
___________________________________________________________________
(page generated 2024-01-10 23:01 UTC)