[HN Gopher] Does Your Test Suite Account for Weak Transaction Is...
___________________________________________________________________
Does Your Test Suite Account for Weak Transaction Isolation?
Author : todsacerdoti
Score : 34 points
Date : 2024-01-04 15:06 UTC (7 hours ago)
(HTM) web link (concerningquality.com)
(TXT) w3m dump (concerningquality.com)
| skyde wrote:
| I knew of this issue for some time (default DB isolation is less
| than SERIALIZABLE on most DB).
|
| But assuming you do not want to change the default isolation for
| your transaction or for the whole DB. How do you write unit-test
| that check your code will work correctly because it's using
| "select balance from accounts FOR UPDATE"?
|
| It seem to me the only way to really validate this is using tool
| like TLA+. Or something like https://github.com/microsoft/coyote
| skyde wrote:
| Even using tool like this it seem to be a combinatorial
| explosion issue.
|
| If your app contain 10 distinct transactions.
|
| you have to test what happen if 2 instances of tx #1 run
| concurrently.
|
| But also if tx #1 and tx #2 run concurrently.
|
| And also what if tx #1 and tx #3 run concurrently ....
| amw-zero wrote:
| The issue is that the race condition exists in the database,
| not in any application code. So you can either simulate the
| race, or you need to integration test, and that's outside the
| scope of TLA+ and model checking.
|
| I haven't heard of Coyote (it's still amazing how many tools
| are out there). It's good that it operates on the actual code
| level, but I'm still not sure that would reproduce concurrency
| non-determinism at the database level.
| skyde wrote:
| yes if your test are using a mock of the db ex: https://githu
| b.com/microsoft/coyote/blob/main/Samples/Accoun... you can
| simulate the races.
|
| The problem is your mock need to implement the same isolation
| level as your real DB and support transaction ...
|
| You could use SQLITE in memory DB to run your test but that
| would make your test a lot slower I assume.
| amw-zero wrote:
| Exactly, it's a tricky problem. Implementing all
| transaction isolation levels in a mock is quite an
| ambitious endeavor.
| mrloba wrote:
| It's not perfect, but I have had success running postgres in
| Docker and running integration tests against that. Usually you
| can trigger the problem by running a handful of queries in
| parallel.
| amw-zero wrote:
| Yes exactly. That's the approach taken here. The main
| downside is that it can require sleeps to properly reproduce
| race conditions.
| skyde wrote:
| this is the current setup we use in my team but I am trying
| to move away for this to something more systematic.
| nick__m wrote:
| Why optimistic locking is never considered in those articles ?
| Adding a version column and checking the affected numbers of row
| scales a lot better that a select for update...
| amw-zero wrote:
| Probably because it's more work, and it's logic that you have
| to write and get correct. Whereas "for update" pretty much just
| works.
|
| But there are definitely many valid solutions, including just
| setting the specific transaction to serializable and handling
| transaction retries.
| GauntletWizard wrote:
| _and handling retries_ is something I almost never see, and
| more infrequently see done right.
| gav wrote:
| Optimistic locking is a great solution to avoid holding a
| lock for an indeterminate period of time as you wait for a
| user or external system to provide updates.
| hinkley wrote:
| I don't know about databases but I've done similar with testing
| interactive web apps and it works quite well. When your
| architecture supports it, at least.
|
| Essentially a hidden field on the page that tracks interactions
| and outstanding Ajax calls. When you assert that a field on the
| page hasn't changed, you need to know that it's not still in
| the process of painting. And just waiting "long enough" makes
| for very slow testing overall.
| skyde wrote:
| because row version does not make it possible to update more
| than 1 row atomically.
| giovannibonetti wrote:
| Yes it does. UPDATE foo = 'abc' WHERE (id, version) IN ((123,
| 1), (234, 2))
| skyde wrote:
| while true this is not what I mean.
|
| if your goal is to transfer 10 dollars for account A to
| account B but only if account A balance is larger than 10
| dollars.
|
| You have to update account A to be 10 dollars less
|
| and
|
| you have to update account B to be 10 dollars more.
|
| you can read both account a keep track of current version
| number for each.
|
| then do
|
| "UPDATE Accounts SET Balance = CASE AccountID WHEN 1 THEN
| Balance - 10 WHEN 2 THEN Balance + 10 END WHERE (AccountID,
| version) IN ((1,v1), (2,v2))"
|
| But if using Read Uncommited isolation level I am not sure
| this UPDATE would actually lock both row until commit.
| skyde wrote:
| you still have to do
|
| IF @@ROWCOUNT = 2 BEGIN COMMIT TRANSACTION; END ELSE
| BEGIN ROLLBACK TRANSACTION; END
|
| to make sure both rows have been updated. so its a lot
| simpler to explicitly take a lock on the rows.
| giovannibonetti wrote:
| You're right. Locking the rows or using serializable
| isolation would be required to achieve an atomic
| operation if and only if both accounts are in the
| pristine state before it.
| trhway wrote:
| >Hopefully it's clear why this is an issue. If you have an
| important column value, say a user's account balance, you might
| query multiple different values in the same transaction
|
| and that is a feature, not a bug. During your transaction user
| withdrew money from his account. If you wanna see the same value,
| i.e. block that user from withdrawals while you process his
| current balance, you take a lock on that row. Similar in many
| ways to the concurrent programming with threads.
| aphyr wrote:
| The point of isolation levels is that the database manages
| those locks (or other concurrency control mechanisms) for you.
| wrs wrote:
| Bearing in mind that "handling" the locks may mean a deadlock
| is detected and the transaction is aborted by the DB, which
| the application has to handle by retrying in a correct
| manner. (Which is another point often missed in naive
| application code.)
| trhway wrote:
| Not exactly - the database doesn't know whether you want to
| block that account update or not. So, speaking very roughly,
| it allows you to choose general level of such a blocking
| behavior. It manages those locks, very coarse, at
| serializable and some of those locks at repeatable read. At
| read committed it manages basically only small set of locks
| just to avoid dirty reads (plus of course all the
| WAL/rollback/etc. which is the main features and the point of
| the database with the isolation levels being just the icing
| on that cake). Isolation levels is a tool, and you choose
| what is needed and suitable.
|
| The serializable actually is a pain in the neck for any
| meaningfully serious large enterprise application. The
| decreased concurrency kills overall performance, and I
| haven't seen any such application running in serializable
| (and for example at my current job our customers are running
| our application on servers with low tens of TB RAM (largest
| so far was 40TB RAM) per single machine with high hundreds of
| cores with thousands of users - in serializable that would
| slow to a crawl as it wouldn't allow to achieve the
| concurrency the database on those machines is otherwise
| capable of). While not precise illustration it still
| communicates my point - the serializable against read
| committed is like the global Python lock against the fine
| grained locks in normal languages.
| jen20 wrote:
| If you're storing a balance, rather than a ledger of the
| transactions which caused that balance to be the case, you're
| probably already off to a bad start when it comes to design.
|
| Just imagine if you called your bank and asked them why my
| balance was $500, and they said "I don't know, hit F5 and see
| if it changes"...
| gav wrote:
| When it comes to this topic, almost all examples don't match
| how things work in the real world where they are eventually
| consistent.
|
| For example, my bank lets me go negative and then if I don't
| settle by the end of the day, overdraft protection will kick
| in for a little more than that negative amount.
| allknowingfrog wrote:
| I briefly dabbled in financial ledgers in a previous job. We
| made plans to do what you seem to be describing, which is to
| store the transactions themselves and calculate a balance as
| needed, but we frankly lacked expertise. I always wondered
| what would happen if we scaled to millions of transactions in
| that system.
|
| Is there a standard strategy for optimizing balance queries
| against a series of transactions? I assumed that we would
| eventually have to cache some kind of snapshot for each
| account to avoid the overhead of totaling every transaction
| on every balance check.
| agentultra wrote:
| Generally, for a lot of application usage patterns, _read
| committed_ is the default for a reason.
|
| From the Postgres docs:
|
| > _The partial transaction isolation provided by Read Committed
| level is adequate for many applications, and this level is fast
| and simple to use. However, for applications that do complex
| queries and updates, it may be necessary to guarantee a more
| rigorously consistent view of the database than the Read
| Committed level provides._
|
| Also, consider storing the transactions that change the balances
| (credits, debits, etc) in a ledger and calculate the balance. You
| can avoid the complex update logic and keep your accountants and
| auditors happy.
|
| Stricter isolation levels are useful but have a lot of trade-offs
| to be aware of.
|
| Sometimes restructuring your data is a lot more effective than
| trying to reason about locks and concurrency. The more locks you
| have to more points of contention, dead locks, etc you have to be
| concerned about. Humans and general rules/folklore are alright at
| avoiding some mistakes but these systems get complicated quickly
| and can hide a surprising number of errors.
|
| As for testing these things, I've gotten a lot of good mileage
| out of model checking. TLA+ and Alloy are better at finding
| concurrency errors than I am at avoiding them.
| karamanolev wrote:
| Can you recommended a good article that goes over the ledger
| pattern (the easy part) and the performant calculation of
| balances as the ledger grows? I'm going to start working on a
| warehouse management system in the coming months and that would
| come in handy.
| SoftTalker wrote:
| I don't have a link but when I've implemented systems with a
| "ledger" of changes I would have a field that indicated
| whether the ledger item had been "posted" to the the affected
| account(s) and therefore the balance was just the account
| balance +/- any unposted ledger items.
|
| I've had ledgers that were posted once a day, or more often,
| up to nearly real-time.
|
| But once a ledger item is posted, it should never be changed.
| If an error is discovered, make a new ledger entry to correct
| it.
| amw-zero wrote:
| How can you store ledger operations without locking on the
| balance? The race condition still exists - you can't accept a
| withdrawal if the user doesn't have enough funds.
| stetrain wrote:
| You can use some sort of concurrency index on the ledger.
|
| For example you read the ledger entries for Account Id X
|
| Those ledger entries are numbered sequentially up to the
| current index Y
|
| You calculate the current balance, approve the withdrawal,
| and try to insert the next ledger entry as Y + 1
|
| If another operation has already inserted a Y + 1 your insert
| fails and you throw an error or try again, starting with the
| new ledger state and balance.
|
| Basically optimistic concurrency but knowing that you are
| always appending to the ledger and that the index is always
| incrementing.
___________________________________________________________________
(page generated 2024-01-04 23:01 UTC)