[HN Gopher] Jepsen: Datomic Pro 1.0.7075
___________________________________________________________________
Jepsen: Datomic Pro 1.0.7075
Author : aphyr
Score : 140 points
Date : 2024-05-15 16:57 UTC (6 hours ago)
(HTM) web link (jepsen.io)
(TXT) w3m dump (jepsen.io)
| amgreg wrote:
| It struck me that Jepsen has identified clear situations leading
| to invariant violations but Datomic's approach seems to have been
| purely to clarify their documentation. Does this essentially mean
| the Datomic team accepts that the violations will happen, but
| don't care?
|
| From the article:
|
| > From Datomic's point of view, the grant workload's invariant
| violation is a matter of user error. Transaction functions do not
| execute atomically in sequence. Checking that a precondition
| holds in a transaction function is unsafe when some other
| operation in the transaction could invalidate that precondition!
| aphyr wrote:
| Yeah, this basically boils down to "a potential pitfall, but
| consistent with documentation, and working as designed".
| Whether this actually matters depends on whether users are
| writing transaction functions which are _intended_ to preserve
| some invariant, but would only do so if executed sequentially,
| rather than concurrently.
|
| Datomic's position (and Datomic, please chime in here!) is that
| users simply do not write transaction functions like this very
| often. This is defensible: the docs did explicitly state that
| transaction functions observe the start-of-transaction state,
| not one another! On the other hand, there was also language in
| the docs that suggested transaction functions could be used to
| preserve invariants: "[txn fns] can atomically analyze and
| transform database values. You can use them to ensure atomic
| read-modify-update processing, and integrity constraints...".
| That language, combined with the fact that basically every
| other Serializable DB uses sequential intra-transaction
| semantics, is why I devoted so much attention to this issue in
| the report.
|
| It's a complex question and I don't have a clear-cut answer!
| I'd love to hear what the general DB community and Datomic
| users in particular make of these semantics.
| nickpeterson wrote:
| I feel like "enough rope to shoot yourself" is kind of baked
| into any high power, low ceremony tool.
| stuarthalloway wrote:
| As a proponent of just such tools I would say also that
| "enough rope to shoot(?) yourself" is inherent in tools
| powerful enough to get anything done, and is not a tradeoff
| encountered only when reaching for high power or low
| ceremony.
| refset wrote:
| I don't know whether it was intentional or not, but IIRC
| DataScript opted for sequential intra-transaction semantics
| instead.
| SoftTalker wrote:
| Sounds similar to the need to know that in some relational
| databases, you need to SELECT ... FOR UPDATE if you intend to
| perform an update that depends on the values you just selected.
| stuarthalloway wrote:
| As Jepsen confirmed, Datomic's mechanisms for enforcing
| invariants work as designed. What does this mean practically
| for users? Consider the following transactional pseudo-data:
|
| [
|
| [Stu favorite-number 41]
|
| ;; maybe more stuff
|
| [Stu favorite-number 42]
|
| ]
|
| An operational reading of this data would be that early in the
| transaction I liked 41, and that later in the transaction I
| liked 42. Observers after the end of the transaction would
| hopefully see only that I liked 42, and we would have to worry
| about the conditions under which observers might see that 41.
|
| This operational reading of intra-transaction semantics is
| typical of many databases, but it presumes the existence of
| multiple time points inside a transaction, which Datomic
| neither has nor wants -- we quite like not worrying about what
| happened "in the middle of" a transaction. All facts in a
| transaction take place at the same point in time, so in Datomic
| this transaction states that I started liking both numbers
| simultaneously.
|
| If you incorrectly read Datomic transactions as composed of
| multiple operations, you can of course find all kinds of
| "invariant anomalies". Conversely, you can find "invariant
| anomalies" in SQL by incorrectly imposing Datomic's model on
| SQL transactions. Such potential misreadings emphasize the need
| for good documentation. To that end, we have worked with Jepsen
| to enhance our documentation [1], tightening up casual language
| in the hopes of preventing misconceptions. We also added a tech
| note [2] addressing this particular misconception directly.
|
| [1]
| https://docs.datomic.com/transactions/transactions.html#tran...
|
| [2] https://docs.datomic.com/tech-notes/comparison-with-
| updating...
| aphyr wrote:
| To build on this, Datomic includes a pre-commit conflict
| check that would prevent this particular example from
| committing at all: it detects that there are two incompatible
| assertions for the same entity/attribute pair, and rejects
| the transaction. We think this conflict check likely prevents
| many users from actually hitting this issue in production.
|
| The issue we discuss in the report only occurs when the
| transaction expands to non-conflicting datoms--for instance:
|
| [Stu favorite-number 41]
|
| [Stu hates-all-numbers-and-has-no-favorite true]
|
| These entity/attribute pairs are disjoint, so the conflict
| checker allows the transaction to commit, producing a record
| which is in a logically inconsistent state!
|
| On the documentation front--Datomic users could be forgiven
| for thinking of the elements of transactions as "operations",
| since Datomic's docs called them both "operations" and
| "statements". ;-)
| stuarthalloway wrote:
| Mea culpa on the docs, mea culpa. Better now [1].
|
| In order for user code to impose invariants over the entire
| transaction, it must have access to the entire transaction.
| Entity predicates have such access (they are passed the
| after db, which includes the pending transaction and all
| other transactions to boot). Transaction functions are
| unsuitable, as they have access only to the before db. [2]
|
| Use entity predicates for arbitrary functional validations
| of the entire transaction.
|
| [1] https://docs.datomic.com/transactions/transactions.html
| #tran...
|
| [2] https://docs.datomic.com/transactions/transaction-
| functions....
| Voultapher wrote:
| The man the myth the legend himself. I haven't ceased to be
| awed by how often the relevant person shows up in the HN
| comment section.
|
| Loved your talks.
| puredanger wrote:
| Datomic transactions are not "operations to perform", they
| are a set of novel facts to incorporate at a point in time.
|
| Just like a git commit describes a set of modifications, do
| you or should you want to care about which order or how the
| adds, updates, and deletes occur in a single git commit? OMG
| no, that sounds awful.
|
| The really unusual thing is that developers expect intra-
| transaction ordering to be a thing they accept from any other
| database. OMG, that sounds awful, how do you live like that.
| voganmother42 wrote:
| Nested transactions or savepoints also exist in other
| systems
| koito17 wrote:
| This is the first time I try reading a Jepsen report in-depth,
| but I really like the clear description of Datomic's intra-
| transaction behavior. I didn't realize how little I understood
| the difference between Datomic's transactions and those of SQL
| databases.
|
| One thing that stands out to me is this paragraph
| Datomic used to refer to the data structure passed to d/transact
| as a "transaction", and to its elements as "statements" or
| "operations". Going forward, Datomic intends to refer to this
| structure as a "transaction request", and to its elements as
| "data".
|
| What does this mean for d/transact-async and related
| functionality from the datomic.api namespace? I haven't used
| Datomic in nearly a year. A lot seems to have changed.
| stuarthalloway wrote:
| Datomic software needed no changes as a result of Jepsen
| testing. All functionality in datomic.api is unchanged.
| CrazyPyroLinux wrote:
| aphyr had given some conference talks on previous analyses
| (available on youtube) that are informative and entertaining
| adrianco wrote:
| I was a fly on the wall as this work was being done and it was
| super interesting to see the discussions. I was also surprised
| that Jepsen didn't find critical bugs. Clarifying the docs and
| unusual (intentional) behaviors was a very useful outcome. It was
| a very worthwhile confidence building exercise given that we're
| running a bank on Datomic...
| belter wrote:
| > I was also surprised that Jepsen didn't find critical bugs.
|
| From the report..."...we can prove the presence of bugs, but
| not their absence..."
| vasco wrote:
| That's consistent with the usual definition of "finding"
| anything.
| thom wrote:
| I've not really spent much time with Datomic in anger because
| it's super weird, but is any of this surprising? Datomic
| transactions are basically just batches and I always thought it
| was single threaded so obviously it doesn't have a lot of race
| conditions. It's slow and safe by design.
___________________________________________________________________
(page generated 2024-05-15 23:00 UTC)