[HN Gopher] Building a BFT JSON CRDT
___________________________________________________________________
Building a BFT JSON CRDT
Author : raykyri
Score : 127 points
Date : 2022-11-21 16:41 UTC (6 hours ago)
(HTM) web link (jzhao.xyz)
(TXT) w3m dump (jzhao.xyz)
| adamdusty wrote:
| For people like me:
|
| BFT - Byzantine Fault Tolerant [0]
|
| CRDT - Conflict-free Replicated Data Type [1]
|
| [0] https://en.wikipedia.org/wiki/Byzantine_fault
|
| [1] https://en.wikipedia.org/wiki/Conflict-
| free_replicated_data_...
| [deleted]
| karmakaze wrote:
| RGA/CT - Replicated Growable Array/Causal Tree
|
| I don't like unexplained acronyms/initialisms.
| hansvm wrote:
| JSON - JavaScript Object Notation [0]
|
| [0] https://en.m.wikipedia.org/wiki/JSON
| MWil wrote:
| when you see it...them...
| athorax wrote:
| we are legion
| Karrot_Kream wrote:
| Hash graph reconciliation is generally the harder part of this
| problem as noted in future work, but this is a very exciting
| direction.
| jchanimal wrote:
| Check out this implementation of hash-stable trees. The same
| dataset generates the same Merkle hash, regardless of insertion
| orders. This makes verifiable sync computationally efficient.
| https://github.com/mikeal/prolly-trees
| simonw wrote:
| > I write this blog post mostly as a note to my past self,
| distilling a lot of what I've learned since into a blog post I
| wish I had read before going in
|
| The best kind of blog post!
| lukeigel wrote:
| Let's go Jackie!!!
| thruflo wrote:
| From the article, this appears to be an implementation of Making
| CRDTs Byzantine Fault Tolerant [0] in Rust [1].
|
| [0]: https://martin.kleppmann.com/papers/bft-crdt-papoc22.pdf
|
| [1]: https://github.com/jackyzha0/bft-json-crdt
| recuter wrote:
| To somebody who knows Erlang, isn't this reinventing the wheel
| basically?
| NavinF wrote:
| AFAIK Erlang isn't byzantine fault tolerant, doesn't use CRDTs,
| and doesn't even have a JSON library in its stdlib. So just
| from the title you can tell it has nothing to do with the
| article.
| jansommer wrote:
| Could you elaborate? How do you get CRDT's for free in Erlang?
| thruflo wrote:
| Erlang is a great language and runtime to build distributed
| systems in but it doesn't provide primitives to resolve
| concurrent overlapping updates.
|
| Hence projects like https://www.antidotedb.eu (CRDT database in
| Erlang)
| PointyFluff wrote:
| A little early in the morning for so much alphabet soup.
| MWil wrote:
| or to give up on an opportunity to learn/enjoy something
| jitl wrote:
| I'm quite surprised by the [benchmarks versus Automerge JS &
| Rust](https://github.com/jackyzha0/bft-json-crdt#benchmarks) when
| it comes to memory:
|
| > Ours (Basic) 27.6MB
|
| > Ours (BFT) 59.5MB
|
| > Automerge (Rust) 232.5MB
|
| I would expect adding the public key tracking to use more memory;
| I wonder how Automerge is spending so much more memory. Possibly
| on a bunch of internal caches or memoization that give the order-
| of-magnitude improvement in speed?
|
| > Ops: 100k
|
| > Ours (Basic) 9.321s
|
| > Ours (BFT) 38.842s
|
| > Automerge (Rust) 0.597s
| samwillis wrote:
| It's the Byzantine Fault Tolerant part of this that is
| particularly innovative and based on Kleppmanns most recent work.
| I believe it solves the issue of either malicious actors in the
| networks modifying others transactions, spoofing them, or the
| messages being modified by third parties ("outside" the network)
| who have MITM the connection. These are really great problems to
| solve.
|
| However, when I was experimenting with CRDTs a while back, it
| seemed to me the other big issue is where multiple transactions
| from different users combine to create an invalid state. Most
| CRDT toolkits are aiming for a combination of a JSON like type
| structure, with the addition on specific structures for rich
| text. The JSON structures for general purpose data, and those for
| rich text as that is the most common current use case. These sort
| of general purpose CRDTs don't have a way to handle, say, the
| maximum length of an array/string, or minimum and maximum values
| for a number. They leave this to the application using the
| toolkit.
|
| For the Yjs + ProseMirror system, effectively the CRDTs are
| resolved outside of ProseMirror. Thats useful as they can be
| combined/resolved on the server without a ProseMirror instance
| running. However there is a strong possibility that the resulting
| structure is no longer valid for your ProseMirror Schema, this is
| currently handled by ProseMirror throwing away the invalid parts
| of the document when it loads.
|
| What I think is needed is a Schema system that is a layer either
| on top of these toolkits, or as part of them, that provides rules
| and conflict resolution. So there is a way to specify the maximum
| length of an array/string, or what the maximum value of a number
| is. Effectively generic CRDTs that have an understanding of
| developer supplied rules and bounds.
|
| The "maximum length" is an interesting one, as depending on the
| order of transactions you could end up with a different result.
| sambroner wrote:
| When I was working on this problem with Fluid Framework, we did
| a few interesting experiments with "owned objects" and
| centralized ACL objects. I believe the team primarily
| implemented centralized ACL because that implementation works
| for many enterprise use cases.
|
| With a centralized schema provider, you run a connected node on
| a trusted server and reject changes that are out of schema or
| should not be accessed by a user.
|
| An owned object is an object where a user (or user group that
| votes via quorum) that owns the object can veto changes to the
| object. The changes are temporarily applied until accepted by
| the owners. I haven't dug deep enough into this BFT
| implementation to know how our model would map to this model.
| sroussey wrote:
| How does that compare to Google's Zanzibar?
| sbazerque wrote:
| I think this is the schema system you're asking for:
| https://www.hyperhyperspace.org/
|
| (specifically, the data representation part)
| jitl wrote:
| This is a cool idea, but I didn't find any examples of max
| length constraints or other normalization rules in the source
| code I reviewed. Maybe there's something in there.
|
| Here's some source code for an early, work-in-progress Wiki
| CRDT: https://github.com/hyperhyperspace/wiki-
| collab/blob/master/s...
|
| Page in the Wiki. Note that data types have a validate method
| that returns true or false; maybe if false, they're just
| dropped from the UI? Not sure how the method is used.
| https://github.com/hyperhyperspace/wiki-
| collab/blob/master/s...
|
| I haven't found the underlying text or rich text CRDT
| implementation yet.
| jackyzhao wrote:
| Hey! Author of the post responding here :) Unfortunately
| max/min-length is a global invariant and not one that can
| be reinforced by CRDTs without coordination
|
| bloom-lang.net is a really cool project working on trying
| to figure out what types of program state actually require
| coordination at compile time
| jitl wrote:
| Right - I think what Sam is wondering about is if there's
| a better way to maintain some of those invariants when
| decoding the CRDT data or at least preserve user intent
| with more fidelity by taking the invariants into account.
| For example, if a ProseMirror schema says "figure can
| have one caption child", then a CRDT library can assist
| by making Child a LWW register instead of an array; or
| picking the last item in the array instead of the first
| when limiting length.
| tekkk wrote:
| Interesting, yeah access-control is kinda open problem with
| Yjs. Regarding ProseMirror and rich-text documents, you can
| mess up documents in other ways as well. Eg deploy a faulty
| command with a transaction that inserts nodes with invalid
| children (can be prevented though by using createChecked). Or
| just changing your schema in an incompatible way with the
| previous version. So you kinda have to deal with possible
| malformed documents either way.
|
| Havent had documents corrupted by Yjs allowing changes that are
| not parseable by schema though - has this happened to you?
|
| And about the schema layer on top of Yjs, you possibly could
| inspect every update and apply some validation rules. Arent all
| operations just inserts, updates or deletes of nodes? You can
| at least rollback to previous version as you flush the updates
| to the doc in the db. Not ideal though.
| samwillis wrote:
| No, I haven't seen a corrupt "unparceable" document. It's
| more issues around documents that don't comply with the
| schema.
|
| Say for example you have a <figure> node that can contain a
| single optional <caption>. If two users concurrently add a
| caption, then merge their changes, the Yjs document will
| contain both. It has no concept of what the valid structure
| is. When this is loaded into the ProseMirror the second
| caption will be dropped.
| tekkk wrote:
| Hmm hadn't thought about that. Yeah Yjs should use more
| granular steps instead of just replacing the whole doc each
| time remote update is received.
| barbazoo wrote:
| That was a bit creepy while it lasted. (the mouse pointer thing
| that is)
| EGreg wrote:
| Yessss finally someone is doing it.
___________________________________________________________________
(page generated 2022-11-21 23:00 UTC)