[HN Gopher] Soft Deletion Probably Isn't Worth It (2022)
       ___________________________________________________________________
        
       Soft Deletion Probably Isn't Worth It (2022)
        
       Author : thunderbong
       Score  : 71 points
       Date   : 2024-01-09 12:18 UTC (1 days ago)
        
 (HTM) web link (brandur.org)
 (TXT) w3m dump (brandur.org)
        
       | bazil376 wrote:
       | The GDPR deletion requirement has been most compelling (and a
       | headache) for my application
        
         | allan_s wrote:
         | IANAL but I got confirmed several time that you can soft delete
         | if all fields are then anonymized, the law does not care about
         | the concept of database row as long as the PII is deleted
        
           | Detrytus wrote:
           | If you anonymize data then it isn't soft delete anymore as
           | you cannot un-delete it. What you are describing is more
           | suited to generating test data for machine learning
           | algorithms, or simply for developers that need to set up db
           | instance in DEV environment.
        
         | rawgabbit wrote:
         | To satisfy GDPR requirements when customers leave, you can
         | blank out the PII fields. No need to blow up your application
         | by deleting records.
        
           | taeric wrote:
           | Also, a lot of records are the businesses? They may have to
           | remove who it was that they did business with, for
           | compliance, but they do not have to forget that they did
           | business? (Right? Genuine question.)
        
       | The_Colonel wrote:
       | Soft-deletion ("recycle bin") is an important feature in many
       | user-facing applications.
       | 
       | But if it's not surfaced in UI then I agree it doesn't make much
       | sense.
        
         | graftak wrote:
         | Soft delete is not a recycle bin but the process of marking a
         | record as deleted and revoke access instead of actually
         | deleting it.
        
           | antonyt wrote:
           | One of the most common reasons to "mark a record as deleted"
           | rather than deleting it is so you can undo deletes without
           | having to fish data out of a backup. I.e. soft delete is
           | effectively a recycle bin. It might not be a user-facing
           | recycle bin, but it's a recycle bin nonetheless.
        
             | graftak wrote:
             | That makes absolute sense.
             | 
             | The way it was overlooked by me is because a 'bin' is
             | something you clear regularly, but deleted records will
             | likely persist forever.
             | 
             | That being said; I once helped out a friend who's computer
             | became slow to a crawl, only to find out they'd never
             | emptied their bin so storage had only a few GB of free
             | space left.
        
               | falcor84 wrote:
               | >deleted records will likely persist forever.
               | 
               | If space is an issue, you can have a scheduled task to
               | really delete all "deleted" items that were last updated
               | e.g. more than 3 months ago.
        
       | fhub wrote:
       | > And forgetting that extra predicate on deleted_at can have
       | dangerous consequences as it accidentally returns data that's no
       | longer meant to be seen.
       | 
       | For postgres you could consider using a view instead. See
       | https://evilmartians.com/chronicles/soft-deletion-with-postg...
        
         | mike256 wrote:
         | You can do the same in other dbms. Just use an instead of
         | delete trigger instead of the rule.
        
         | icedchai wrote:
         | This sort of logic is a good candidate for views. Of course,
         | someone could choose not to use the view, but you can solve it
         | with DB permissions, etc.
        
       | OscarCunningham wrote:
       | The 'deleted records table' they suggest at the end of the
       | article is a really good alternative!
        
       | brycelarkin wrote:
       | Large tables use the delete table. Small tables use the
       | deleted_at column.
       | 
       | Surprised you never got a request to undelete something. It's
       | such a common customer request in b2b saas.
        
         | urdbjtdvbg wrote:
         | Exactly. As a "backup" for when a client shoots themselves in
         | the foot (or face!) it's invaluable. But only if the client is
         | worth enough to bother.
         | 
         | If you don't have high value clients I can see how it's
         | unnecessary I suppose.
        
         | alganet wrote:
         | The way I like to solve this is using database snapshots. Just
         | make sure you have an incremental snapshot setup for:
         | 
         | - Last round hour
         | 
         | - Last round day
         | 
         | - Last round week
         | 
         | These periods can be adjusted to your/client needs. If you need
         | something that was deleted, you restore the snapshot into a new
         | live db and copy whatever you need. You could charge for
         | solving "I just shot myself in the foot" support requests based
         | on the cost of restoring those snapshots.
         | 
         | From my experience, this is much cleaner than having a spread
         | of soft delete columns all over the codebase.
        
           | pixl97 wrote:
           | I mean, if you have a functional fast acting organization,
           | maybe.
           | 
           | It seems in the vast majority of organizations I talk to,
           | this is not how things work at all. To get a db restore on a
           | different box you have to have at least 2 different teams
           | involved. One to provide a machine/vm/aws instance to restore
           | to, and another to provide the snapshot/backup.
           | 
           | With a soft delete all you need is the application
           | administrator that has SQL access, which is typically far
           | closer to the person that needs the data restored than the
           | operations team is, hence it gets 'fixed' in a more timely
           | fashion.
        
             | alganet wrote:
             | That's unfair. For the snapshot scenario, you described the
             | bureaucratic sad path for doing it. For the soft delete,
             | you used a happy path.
             | 
             | Let's compare both in either best-scenario or worst-
             | scenario:
             | 
             | Snapshots on good team with no bureaucracy: Everything is
             | automated. Even retrieval is automated via individual,
             | commited, retrieval commands. Setup is done once and
             | requires little maintenance, retrievals can be reused if
             | fallen in same category (restore account, restore
             | conversation, etc).
             | 
             | Snapshots on shitty team with lots of bureaucracy:
             | Everything is manual, AWS-panel operated and only a few
             | people have access, you have to escalate to initiate the
             | proccess.
             | 
             | Soft deletes on good team with no bureaucracy: Everyones
             | respect the soft deletion, no one queries it or do funny
             | stuff unless for recovery purposes. The soft delete related
             | columns, structures and tools are standardized.
             | 
             | Soft deletes on shitty team with lots of bureaucracy: There
             | might be multiple, competing standards for soft delete
             | column naming and structure. Other teams ignore the soft
             | delete and query hidden data either knowingly (tricky
             | hacks) or unknowingly (data team unaware of soft delete
             | model), you can't do shit about it. When it breaks your
             | soft deleted data, you're the one who has to fix it.
             | 
             | If we put external stuff in the mix (teams, people, org
             | structures) we can spin it the way we want, but that
             | doesn't reveal much information about the technical issue
             | itself (having some kind of reasonable data recovery).
        
         | bhouston wrote:
         | Always. And sometimes they are a big customer who screwed up
         | something big and if you can not restore it easily, you have to
         | go find a DB backup and write some scripts to bring it back
         | anyhow.
         | 
         | Soft delete is a necessity in SaaS. And then combine it with
         | hard delete either triggered manually (empty recycle bin) or
         | time based (after a month, etc.)
        
       | joshstrange wrote:
       | I hate that I agree with the author overall. I mean I probably
       | will keep adding a `deleted` column to my tables and cursing
       | myself when I mess up queries to hide them but I can't help
       | thinking they will save me a ton of headache one day...
       | 
       | It's one of those "If you have it you will never need it but if
       | you don't you will wish you had it"-types of things. I'm sure
       | it's silly/stupid but it makes me feel better/safer.
        
         | graftak wrote:
         | Would a safer method be to have a <entities>-deleted table and
         | migrate a deleted record to that table when it's 'deleted'.
         | 
         | I'm not versed in databases but I'd expect you can even set
         | additional access rules to individual tables making cross
         | referencing very hard.
        
           | joshstrange wrote:
           | Yes that would work and be safer in a number of ways. I think
           | I always avoided that because it requires 2 operations
           | (delete and insert) and a tiny bit of added complexity. I
           | think I'll try that approach the next time I do something
           | like this.
        
           | vaidhy wrote:
           | Yes and no. If you are thinking of a no-sql database, then
           | this is an easy approach. However, I would expect most of the
           | systems to be using normalized tables and this approach would
           | be pretty hard due to relational constraints.
           | 
           | You would have to have a separate database copy running for
           | deleted entries so that it can be consistent within itself.
        
             | graftak wrote:
             | They way I see it relationships to the deleted record must
             | always be nullified/marked deleted, because otherwise you
             | get broken records referencing deletions. And relationships
             | from a deleted record might as well persist because the
             | foreign key will (fingers crossed) remain untouched.
             | 
             | Again, not versed in db at all so this is likely total
             | rubbish.
        
               | vaidhy wrote:
               | You are correct.. But if you move the deleted record to a
               | different db, you have to follow the chain of
               | relationships and move more records to the "deleted" db.
               | Which means you would have a create a full database to
               | maintain the entity relationships.
        
           | nielsbot wrote:
           | That's what the article suggests (at the end)
        
         | Sohcahtoa82 wrote:
         | Why not name your database "my_db_all" and then create a view
         | called "my_db" that's just "select * from my_db_all where
         | deleted != true", or something to that effect?
        
           | waych wrote:
           | This seems to ignore the issues raised concerning foreign
           | keys and consumer data protection pruning.
        
             | icedchai wrote:
             | Just because a row is soft-deleted now doesn't mean it
             | can't be actually deleted later.
        
         | eastbound wrote:
         | Why not use a history table with JSON serialization of data, if
         | the goal is to restore but not have to deal with orphan records
         | and forgotten WHERE clauses?
        
       | vlod wrote:
       | Previous comments:
       | 
       | https://news.ycombinator.com/item?id=32156009
        
       | krasznahorkai wrote:
       | For Data Warehousing purposes it's nice if source systems use
       | soft deletion for easy detection of deleted rows since a certain
       | point in time. I don't think the use-case is limited to undoing
       | user deletions.
       | 
       | If there is no soft deletion or audit table available, the
       | alternative is to either 1) Select all the source primary keys to
       | find out which no longer exist in the source but do exist in the
       | data warehouse 2) If there is no primary key available, select
       | the whole dateset for full row comparison
       | 
       | Since in corporate settings data warehouses load at least once a
       | day, you can imagine the amounts of data being moved around
       | needlessly.
        
         | taeric wrote:
         | For data warehousing, you usually have a ledger of events? Such
         | that you would just record that a record with a given
         | identifier was created/modified/deleted/etc. Right?
        
       | kdamica wrote:
       | Soft deletion is important if you want to do analytics on your
       | data. Hard deletes mean that you can never reconstruct an
       | accurate history.
       | 
       | Also, if you're not careful, hard deletes can lead to primary
       | keys getting reused, which can cause all kinds of problems if the
       | delete doesn't cascade into every other system. I had this happen
       | in practice once and it was a huge pain.
        
         | taeric wrote:
         | Of course, soft deletes can cause their own problems where you
         | do want to reuse a key. That, or you have to start imposing
         | uniqueness constraints on a non-key value. Certainly valid, but
         | not at all user friendly.
         | 
         | Consider, if you are creating resources in something like a
         | cloud provider, you can setup permissions to things if they are
         | based on names or tags. If that name or tag is a UUID like
         | thing, that means you cannot setup the permissions before the
         | thing is created. And if it is ever accidentally deleted, you
         | cannot restore things easily.
        
         | 0x457 wrote:
         | You can't reconstruct an accurate history with either style of
         | deletes. Only append-only event logs allow that.
        
       | eternityforest wrote:
       | Is there a database that does the WHERE deleted=null for you?
       | 
       | Like, some way to create a special column that makes it so every
       | query to that table automatically has a certain filter unless you
       | use some other keyword to override it?
        
         | rawgabbit wrote:
         | DBs with Materialized Views
         | 
         | Here is the syntax for Snowflake. CREATE MATERIALIZED VIEW mymv
         | COMMENT='filtered view' AS SELECT * FROM mytable WHERE
         | isdeleted=0;
        
           | setr wrote:
           | Why materialize? A normal view is sufficient
        
             | rawgabbit wrote:
             | It was to answer the OP question. Querying a materialized
             | view is essentially the same as querying the original
             | table. The MV allows for automated indexes etc.
        
         | culturedsystems wrote:
         | Temporal tables, aka system versioned tables, are something
         | like this: https://mariadb.com/kb/en/system-versioned-tables/
        
         | loevborg wrote:
         | IS NULL!!!
         | 
         | not = NULL!!!
        
           | 6th wrote:
           | cover both bases due to this ambiguity =)
           | 
           | deleted IS NULL OR LOWER(deleted) = 'null'
        
       | vyrotek wrote:
       | After having done "deleted_at" for years, we prefer systems like
       | MSSQL Temporal Tables. Otherwise, managing a deleted_at column is
       | something you have to remember to account for in indexes and in
       | all queries.
       | 
       | Instead, just nuke that row and have a copy in a separate time-
       | based table.
       | 
       | https://learn.microsoft.com/en-us/sql/relational-databases/t...
        
       | 0x457 wrote:
       | Someone either never worked with b2b, LEA or UGC...
       | 
       | Someone breaks community rules and bullies someone or something
       | similar, they delete their comments. What do you do now?
       | 
       | Business customers expect that everything they delete is possible
       | to restore, even if the delete button clearly says, "this action
       | is irreversible".
       | 
       | Law Enforcement Agencies would expect you to be able to pull data
       | from them as well.
        
         | miki123211 wrote:
         | On the other hand, other law enforcement agencies explicitly
         | require that kind of data to be deleted, especially when the
         | GDPR is concerned.
        
       | zachmu wrote:
       | Using a version controlled database gives you "soft deletes" with
       | none of the downsides he mentions (except GDPR -- a versioned
       | data store is probably hard to make GDPR compliant).
       | 
       | https://www.dolthub.com/blog/2022-11-03-soft-deletes/
        
       ___________________________________________________________________
       (page generated 2024-01-10 23:01 UTC)