[HN Gopher] Why is Snapshot Isolation not enough?
       ___________________________________________________________________
        
       Why is Snapshot Isolation not enough?
        
       Author : todsacerdoti
       Score  : 58 points
       Date   : 2024-06-01 11:43 UTC (1 days ago)
        
 (HTM) web link (concurrencyfreaks.blogspot.com)
 (TXT) w3m dump (concurrencyfreaks.blogspot.com)
        
       | belter wrote:
       | Related: "A Critique of Snapshot Isolation" -
       | https://news.ycombinator.com/item?id=40516582
        
       | Max-Ganz-II wrote:
       | Amazon Redshift very recently has made this the default choice.
        
         | mirekrusin wrote:
         | Mssql on azure also has READ_COMMITTED_SNAPSHOT (server level
         | setting, default ON on Azure SQL Database, default OFF on SQL
         | Server <<wtf?>>). When ON, it transparently changes your
         | explicilty specified isolation level in query/transaction of
         | "READ COMMITTED" to "READ COMMITTED SNAPSHIT" (genuine typo
         | made here, decided to not correct) instead. Good luck fixing
         | your prod incident when locally you're using mssql docker image
         | (default OFF).
         | 
         | If you want to say "just use their edge mssql container image",
         | yeah man - mssql/azure sql databse/mssql docker all support
         | JSON_PATH_EXISTS, edge sql docker doesn't so that's it for
         | better compatiblity.
         | 
         | What a mess.
        
           | jiggawatts wrote:
           | Microsoft loves to make their support matrix look like a
           | random tapestry.
        
       | _3u10 wrote:
       | It's enough, just make sure you're not trying to fly an airplane
       | or prevent the detonation of nuclear weapons using a database,
       | it's the wrong tool for the job.
       | 
       | If you're storing doubly linked lists in a DB you're doing it
       | wrong.
       | 
       | Updating doubly linked lists can be done at about 200 million
       | ops/sec, single threaded, not sure why you need multiple threads
       | updating the list at the same time, exactly what are you doing
       | that can't be solved by putting a ring buffer in front of a
       | single thread that updates the values, doesn't need locks and is
       | cache coherent.
        
         | roncesvalles wrote:
         | >It's enough, just make sure you're not trying to fly an
         | airplane or prevent the detonation of nuclear weapons
         | 
         | My gripe with this kind of argument is that _today, you aren
         | 't_. You can write application-side duct tape to deal with any
         | kind of wonky database situation, but the whole point of having
         | a database with strong and easy-to-reason guarantees is that it
         | makes future development easier.
        
         | bsaul wrote:
         | OP provides even more simple examples than double linked lists.
         | It made me realize how pg default isolation level was actually
         | quite a nice footgun and reread the doc about the various
         | isolation levels much more carefully.
        
         | macintux wrote:
         | Good luck troubleshooting why your data is subtly (or not-so-
         | subtly) corrupt when the database transactions that started the
         | problem occurred months previously, because the guarantees were
         | too loose/not well-understood.
        
         | sam_bishop wrote:
         | > If you're storing doubly linked lists in a DB you're doing it
         | wrong.
         | 
         | Assuming that the database uses B+ trees (like most do), then
         | the database records themselves are very likely to be in a
         | doubly linked list.
         | 
         | Not every doubly linked list is the kind you see in an
         | introductory data structures class.
        
       | ko_pivot wrote:
       | This is so well explained. Cockroachdb implements serializable by
       | default and it is a great experience being able to write sql with
       | that in mind.
        
       | lll-o-lll wrote:
       | It's the ABA problem that also comes up with normal
       | multithreading when trying to go lock free. Lock free data
       | structures are hard. SI is giving you the same guarantees (your
       | update is atomic, but things might have changed between when you
       | read and when you write). If you can handle this extra
       | complexity, all good, but it's generally something that needs to
       | be carefully abstracted. Nobody wants to be thinking this hard
       | every time they have to interact with a DB.
        
       | josephg wrote:
       | The problem with all of these examples is that it's using a model
       | of Snapshot Isolation which guarantees writes don't conflict, but
       | doesn't provide a similar guarantee for reads. What you really
       | want is the pair of constraints:
       | 
       | - Anything I write hasn't been concurrently written to by another
       | transaction
       | 
       | - (added): Anything I read also hasn't been concurrently written
       | to by another transaction
       | 
       | Adding "conflicting reads" fixes every single example in this
       | post. The linked list example can be fixed in any number of ways.
       | The most efficient might be to "spread out" the set of
       | conflicting keys to also include the next / prev pointers of the
       | deleted items b and c. This makes the transactions conflict. Or
       | maybe better - include all the next pointers of all items before
       | the modified item. This adds an implicit guarantee that the
       | visited / modified item must be in the list for the transaction
       | to be successfully committed.
       | 
       | Now, adding reads to the conflict set might change SI into a
       | different concurrency model. The blog post does use the same
       | definition as Wikipedia. And that definition only mentions
       | writes. But I think any SI system can be made to work this way by
       | setting X = X for any value read during each transaction.
       | 
       | The "two threads" example becomes this:
       | beginTxn();           if (x == 0 && y == 0) x = 1;           x =
       | x; y = y; // implicit         endTxn();               beginTxn();
       | if (x == 0 && y == 0) y = 1;           x = x; y = y;
       | endTxn();
       | 
       | And the two transactions will function correctly.
       | 
       | Having the database handle this for you is best because:
       | 
       | - Humans will forget
       | 
       | - If two transactions read some value but neither transaction
       | writes to it, the two transactions don't actually need to
       | conflict. This is too strict.
       | 
       | Foundationdb in my opinion has the best API for this I've seen.
       | Reads and writes within a transaction are tracked like this
       | automatically, and you can manually add, remove or change the set
       | of conflicting keys using an explicit api if you want to. But I'm
       | sure it's not alone.
        
         | Negitivefrags wrote:
         | I think most databases have some method of turning your Reads
         | into Write locks without needing to actually do a write.
         | 
         | Like SELECT FOR UPDATE.
        
       ___________________________________________________________________
       (page generated 2024-06-02 23:00 UTC)