[HN Gopher] Jepsen: Datomic Pro 1.0.7075
       ___________________________________________________________________
        
       Jepsen: Datomic Pro 1.0.7075
        
       Author : aphyr
       Score  : 140 points
       Date   : 2024-05-15 16:57 UTC (6 hours ago)
        
 (HTM) web link (jepsen.io)
 (TXT) w3m dump (jepsen.io)
        
       | amgreg wrote:
       | It struck me that Jepsen has identified clear situations leading
       | to invariant violations but Datomic's approach seems to have been
       | purely to clarify their documentation. Does this essentially mean
       | the Datomic team accepts that the violations will happen, but
       | don't care?
       | 
       | From the article:
       | 
       | > From Datomic's point of view, the grant workload's invariant
       | violation is a matter of user error. Transaction functions do not
       | execute atomically in sequence. Checking that a precondition
       | holds in a transaction function is unsafe when some other
       | operation in the transaction could invalidate that precondition!
        
         | aphyr wrote:
         | Yeah, this basically boils down to "a potential pitfall, but
         | consistent with documentation, and working as designed".
         | Whether this actually matters depends on whether users are
         | writing transaction functions which are _intended_ to preserve
         | some invariant, but would only do so if executed sequentially,
         | rather than concurrently.
         | 
         | Datomic's position (and Datomic, please chime in here!) is that
         | users simply do not write transaction functions like this very
         | often. This is defensible: the docs did explicitly state that
         | transaction functions observe the start-of-transaction state,
         | not one another! On the other hand, there was also language in
         | the docs that suggested transaction functions could be used to
         | preserve invariants: "[txn fns] can atomically analyze and
         | transform database values. You can use them to ensure atomic
         | read-modify-update processing, and integrity constraints...".
         | That language, combined with the fact that basically every
         | other Serializable DB uses sequential intra-transaction
         | semantics, is why I devoted so much attention to this issue in
         | the report.
         | 
         | It's a complex question and I don't have a clear-cut answer!
         | I'd love to hear what the general DB community and Datomic
         | users in particular make of these semantics.
        
           | nickpeterson wrote:
           | I feel like "enough rope to shoot yourself" is kind of baked
           | into any high power, low ceremony tool.
        
             | stuarthalloway wrote:
             | As a proponent of just such tools I would say also that
             | "enough rope to shoot(?) yourself" is inherent in tools
             | powerful enough to get anything done, and is not a tradeoff
             | encountered only when reaching for high power or low
             | ceremony.
        
           | refset wrote:
           | I don't know whether it was intentional or not, but IIRC
           | DataScript opted for sequential intra-transaction semantics
           | instead.
        
         | SoftTalker wrote:
         | Sounds similar to the need to know that in some relational
         | databases, you need to SELECT ... FOR UPDATE if you intend to
         | perform an update that depends on the values you just selected.
        
         | stuarthalloway wrote:
         | As Jepsen confirmed, Datomic's mechanisms for enforcing
         | invariants work as designed. What does this mean practically
         | for users? Consider the following transactional pseudo-data:
         | 
         | [
         | 
         | [Stu favorite-number 41]
         | 
         | ;; maybe more stuff
         | 
         | [Stu favorite-number 42]
         | 
         | ]
         | 
         | An operational reading of this data would be that early in the
         | transaction I liked 41, and that later in the transaction I
         | liked 42. Observers after the end of the transaction would
         | hopefully see only that I liked 42, and we would have to worry
         | about the conditions under which observers might see that 41.
         | 
         | This operational reading of intra-transaction semantics is
         | typical of many databases, but it presumes the existence of
         | multiple time points inside a transaction, which Datomic
         | neither has nor wants -- we quite like not worrying about what
         | happened "in the middle of" a transaction. All facts in a
         | transaction take place at the same point in time, so in Datomic
         | this transaction states that I started liking both numbers
         | simultaneously.
         | 
         | If you incorrectly read Datomic transactions as composed of
         | multiple operations, you can of course find all kinds of
         | "invariant anomalies". Conversely, you can find "invariant
         | anomalies" in SQL by incorrectly imposing Datomic's model on
         | SQL transactions. Such potential misreadings emphasize the need
         | for good documentation. To that end, we have worked with Jepsen
         | to enhance our documentation [1], tightening up casual language
         | in the hopes of preventing misconceptions. We also added a tech
         | note [2] addressing this particular misconception directly.
         | 
         | [1]
         | https://docs.datomic.com/transactions/transactions.html#tran...
         | 
         | [2] https://docs.datomic.com/tech-notes/comparison-with-
         | updating...
        
           | aphyr wrote:
           | To build on this, Datomic includes a pre-commit conflict
           | check that would prevent this particular example from
           | committing at all: it detects that there are two incompatible
           | assertions for the same entity/attribute pair, and rejects
           | the transaction. We think this conflict check likely prevents
           | many users from actually hitting this issue in production.
           | 
           | The issue we discuss in the report only occurs when the
           | transaction expands to non-conflicting datoms--for instance:
           | 
           | [Stu favorite-number 41]
           | 
           | [Stu hates-all-numbers-and-has-no-favorite true]
           | 
           | These entity/attribute pairs are disjoint, so the conflict
           | checker allows the transaction to commit, producing a record
           | which is in a logically inconsistent state!
           | 
           | On the documentation front--Datomic users could be forgiven
           | for thinking of the elements of transactions as "operations",
           | since Datomic's docs called them both "operations" and
           | "statements". ;-)
        
             | stuarthalloway wrote:
             | Mea culpa on the docs, mea culpa. Better now [1].
             | 
             | In order for user code to impose invariants over the entire
             | transaction, it must have access to the entire transaction.
             | Entity predicates have such access (they are passed the
             | after db, which includes the pending transaction and all
             | other transactions to boot). Transaction functions are
             | unsuitable, as they have access only to the before db. [2]
             | 
             | Use entity predicates for arbitrary functional validations
             | of the entire transaction.
             | 
             | [1] https://docs.datomic.com/transactions/transactions.html
             | #tran...
             | 
             | [2] https://docs.datomic.com/transactions/transaction-
             | functions....
        
             | Voultapher wrote:
             | The man the myth the legend himself. I haven't ceased to be
             | awed by how often the relevant person shows up in the HN
             | comment section.
             | 
             | Loved your talks.
        
           | puredanger wrote:
           | Datomic transactions are not "operations to perform", they
           | are a set of novel facts to incorporate at a point in time.
           | 
           | Just like a git commit describes a set of modifications, do
           | you or should you want to care about which order or how the
           | adds, updates, and deletes occur in a single git commit? OMG
           | no, that sounds awful.
           | 
           | The really unusual thing is that developers expect intra-
           | transaction ordering to be a thing they accept from any other
           | database. OMG, that sounds awful, how do you live like that.
        
             | voganmother42 wrote:
             | Nested transactions or savepoints also exist in other
             | systems
        
       | koito17 wrote:
       | This is the first time I try reading a Jepsen report in-depth,
       | but I really like the clear description of Datomic's intra-
       | transaction behavior. I didn't realize how little I understood
       | the difference between Datomic's transactions and those of SQL
       | databases.
       | 
       | One thing that stands out to me is this paragraph
       | Datomic used to refer to the data structure passed to d/transact
       | as a "transaction", and to its elements as "statements" or
       | "operations". Going forward, Datomic intends to refer to this
       | structure as a "transaction request", and to its elements as
       | "data".
       | 
       | What does this mean for d/transact-async and related
       | functionality from the datomic.api namespace? I haven't used
       | Datomic in nearly a year. A lot seems to have changed.
        
         | stuarthalloway wrote:
         | Datomic software needed no changes as a result of Jepsen
         | testing. All functionality in datomic.api is unchanged.
        
       | CrazyPyroLinux wrote:
       | aphyr had given some conference talks on previous analyses
       | (available on youtube) that are informative and entertaining
        
       | adrianco wrote:
       | I was a fly on the wall as this work was being done and it was
       | super interesting to see the discussions. I was also surprised
       | that Jepsen didn't find critical bugs. Clarifying the docs and
       | unusual (intentional) behaviors was a very useful outcome. It was
       | a very worthwhile confidence building exercise given that we're
       | running a bank on Datomic...
        
         | belter wrote:
         | > I was also surprised that Jepsen didn't find critical bugs.
         | 
         | From the report..."...we can prove the presence of bugs, but
         | not their absence..."
        
           | vasco wrote:
           | That's consistent with the usual definition of "finding"
           | anything.
        
       | thom wrote:
       | I've not really spent much time with Datomic in anger because
       | it's super weird, but is any of this surprising? Datomic
       | transactions are basically just batches and I always thought it
       | was single threaded so obviously it doesn't have a lot of race
       | conditions. It's slow and safe by design.
        
       ___________________________________________________________________
       (page generated 2024-05-15 23:00 UTC)