[HN Gopher] CRDT-richtext: Rust implementation of Peritext and F...
___________________________________________________________________
CRDT-richtext: Rust implementation of Peritext and Fugue
Author : samwillis
Score : 146 points
Date : 2023-05-18 14:00 UTC (9 hours ago)
(HTM) web link (loro-dev.notion.site)
(TXT) w3m dump (loro-dev.notion.site)
| xpe wrote:
| I'm hoping to find a literature review that situates CRDT text
| algorithms along with text differencing and source control. Are
| there some key "contours" where certain algorithms shine with
| regards to (a) number of concurrent edits; (b) length of source
| text; (c) length of changes; (d) time difference (is this a
| reasonable metric?) between synchronization; (e) reconciliation /
| merging / verification.
|
| Apologies if this sounds like a professor without funding asking
| for free work. I can assure you I am not a professor, but it is
| true that I have no funding.
| josephg wrote:
| Diamond types author here! Congratulations on getting your crdt
| working! It's lovely to see a new generation of CRDTs which have
| decent performance.
|
| And nice stuff implementing peritext! I'd love to do the same in
| diamond types at some point. You beat me to it!
|
| Im building a little repository of real world collaborative
| editing traces to use when benchmarking, comparing and optimising
| text based CRDTs[1]. The automerge-perf editing trace isn't
| enough on its own. And we're increasingly converging on a format
| for multi user concurrent editing traces too[2]. It'd be great to
| add some rich text editing traces in the mix if you're interested
| in recording something, so we can also compare how peritext
| performs in different systems.
|
| Anyway, welcome to the community! Love to have more
| implementations around!
|
| https://github.com/josephg/crdt-benchmarks
|
| https://github.com/dmonad/crdt-benchmarks/issues/20
| Zanfa wrote:
| I'm wondering if anybody has seen a way to represent mixed rich-
| text and structured data like tables as CRDTs?
| samwillis wrote:
| So both Yjs and Automerge support rich text types and JSON like
| structures. They also have bindings to various rich text editor
| frameworks such a ProseMirror. It's completely possible to
| build an editor that combines rich text and _typographic_
| tables.
|
| Your reference to "structured data" however makes me thing your
| referring to data tables. Again with both Yjs and Automerge
| these can be represented as just a list of maps. So you have
| one CRDT document that combines a rich text section and the
| data section, or even nests them.
|
| It would even be possible with Prosemirror, for example, to
| build a rich text and spreadsheet like editor backed by CRDTs.
| xpe wrote:
| > Peritext is a novel rich-text CRDT algorithm. It can merge
| concurrent edits of rich text without losing the intended edits.
|
| The page lost some foundational credibility with me by making
| this claim without immediately following it [1] with citations,
| clarifications, or caveats.
|
| Why? Because "intention" is a _very_ big tent. At the very least,
| the project would benefit by characterizing the mapping between
| _intention_ and edits.
|
| Perhaps this project's notion of intention could be summarized
| very compactly. Let's see what we find and share it here?
|
| P.S. I'm feeling like philosophy, which clarifies so many
| concepts -- or at least explodes them by asking a bunch of
| specific questions -- is more and more relevant when it comes to
| designing computational systems. The idea of intention is so key
| to so many human interactions. It would be a travesty for
| computer scientists and software engineers to gloss over its
| impacts.
|
| [1] Even a "lowly" superscript would do.
| czx111331 wrote:
| Thanks for pointing that out. I refined the text and added a
| bookmark to make the citation stand out.
| xpe wrote:
| Did I miss the bookmark?
|
| Also, in terms of a text chance, I noticed this:
|
| > It is capable of merging concurrent edits in rich text
| format while preserving users' intent as much as possible.
|
| Saying "as much as possible" doesn't address my concerns. It
| still leaves open the question of intent and what
| specifically that means.
|
| The Peritext page has a whole section on intent. It would be
| nice to have one sentence pointing specifically to it.
| czx111331 wrote:
| Alright, I just added a link to the section of the original
| Peritext document related to preserving the author's
| intent: https://www.inkandswitch.com/peritext/#preserving-
| the-author...
|
| The specific definition of user intent in the context of
| concurrent rich text editing can't be clearly explained in
| a few words. it's best understood through specific
| examples. I've provided some examples in my brief
| introduction to Peritext.
| [deleted]
| samwillis wrote:
| The page here has a good overview of Peritext:
| https://www.inkandswitch.com/peritext/
|
| It's a separate project from the leading academics in CRDTs. It
| does what is claimed.
| xpe wrote:
| In reading your comment, I don't see the connection to my
| concerns about intent. This leaves open the question of "is
| it worth clicking?"
|
| In this case it was, so thank you. Here is the quote:
|
| > Before we can implement an algorithm to perform this
| automatic merge, we first need to define a clear model for
| user intent, and define the expected results when concurrent
| edits are merged. In this section we develop such a model
| through a series of examples.
|
| I only go to the trouble of mentioning this to help remind
| people that it is a win-win to make a clear connection when
| commenting. In this case, simply saying "This link talks
| about intent in detail" would have worked.
| samwillis wrote:
| Agreed, apologies for that.
|
| In this case, and I'm not joking, my wife called me through
| as she had just put dinner was on the table, clearly that
| took priority. So I hit send before adding any more context
| or comment, which I would otherwise have done.
|
| I'm pleased you clicked, that page is such a brilliant
| overview of CRDTs, rich text editing with them and
| capturing intent.
| matlin wrote:
| I'm happy to see even further performance improvements towards
| rich-text CRDTs but at this point, I think the barrier to
| adoption isn't speed or compactness but instead integration with
| existing backends and databases.
|
| I have a hunch that we're reinventing the wheel by creating new
| B-Tree implementations in Rust when we could be figuring out how
| to make an existing database do the hard work of storing and
| retrieving characters in the correct order. I know Martin
| Kleppmann has looked at Datalog being a potential solution to
| this problem but until we have a good full-stack solution to
| collaborative text editing, I don't think we'll see major
| adoption of these CRDT solutions.
| conradev wrote:
| Yeah, I would rather have Peritext inside of SQLite itself:
| https://github.com/vlcn-io/cr-sqlite
| dspillett wrote:
| _> we 're reinventing the wheel by creating new B-Tree
| implementations in Rust when we could be figuring out how to
| make an existing database do the hard work of storing and
| retrieving characters in the correct order_
|
| That is liable to produce a significantly less efficient result
| in both computational complexity and storage requirements. The
| b-tree implementations used in existing databases are optimised
| for managing data in fixed(ish) size pages which I don't think
| matches the CRDT use case.
|
| Heck, there are projects looking at whether current
| implementations in DBs which were designed when spinning disks
| were the best online storage we had (or all but the very monied
| could afford in significant size) are sub-optimal enough in our
| current RAM-is-relatively-cheap and random-access-IO-is-
| massively-less-latent-than-it-used-to-be world that the
| compatibility hump might be worth making significant changes to
| pages sizes and other considerations in those structures for
| even the DB work they are designed for.
| samwillis wrote:
| I agree that full stack support is the missing pice that's need
| to make use explode. But I do think the current implementations
| will get there.
|
| CRDTs are fairly unique in that you need them to be exposed
| very close to the front of your stack in order to capture user
| intent, but you also need support further back in your stack
| for merging, replication, and querying.
|
| We have good front end support and there are multiple exciting
| projects building collaboration servers (PartyKit in particular
| looks awesome) for them. What I think is missing is support _in
| database_ , that's what I've been experimenting with (below).
| If you are building an offline enabled app, having the ability
| to generate diffs and merge in database enables easy multi
| document sync.
|
| Also most general purpose CRDTs are a combination of JSON and
| XML like data structures, it's useful to be able to query the
| structures in your database. For example if you build a notes
| app that supports inline tags, it's useful to be able to query
| and index those from within the XML like structure without
| having to dump the whole thing out at another layer of your
| stack.
|
| Essentially, you need to capture user intent in the CRDT on the
| front end, but then have support through your whole backend.
|
| Yjs Postgres: https://github.com/samwillis/yjs-pg-test
|
| Yjs SQLite: https://github.com/samwillis/yjs-sqlite-test
|
| (These are just early experiments, I'm working on a cleaner
| shared implementation with support for various SQLite bindings,
| and better querying)
| tantaman wrote:
| Very cool. Left some comments related to some drawbacks I see
| of your current approach -- https://github.com/vlcn-io/cr-
| sqlite/issues/65#issuecomment-...
|
| If you're open to resolving those I'd love to stick it into
| cr-sqlite :)
| samwillis wrote:
| Hey Matt, thanks. I will drop my thoughts into that thread
| tomorrow. Cr-SQLite is awesome, I found that thread a few
| weeks ago and actually had it on my list to get in touch.
| matlin wrote:
| Cool, I'll check out your repos. Supabase also has Postgres
| CRDT (https://github.com/supabase/pg_crdt).
|
| But, in general, I think searching in and across documents
| and database storage is going to a thorny problem even with
| existing, hyper-optimized CRDT algorithms. For example, if
| you just store your Yjs document as a binary blob in a
| Postgres column, Postgres becomes the bottleneck and all the
| fancy range-tree or b-tree optimizations are no longer that
| helpful on the server. Plus, it'd be great to have an
| document edit just be a tiny insert into the database that
| can be streamed to other clients rather than a big row
| update.
|
| This may because the focus of CRDTs seems to be rooted in a
| fully decentralized, P2P system but many developers want
| collaborative text editing in a more traditional client-
| server model.
| [deleted]
| smasher164 wrote:
| Yeah, I think a good step would be to integrate a rich CRDT
| into sqlite, whether that's through CTEs, a custom extension,
| or library code that manages the tables.
|
| Layering a CRDT on top of an existing schema might not be very
| efficient though. You'd likely need to modify the db's
| internals. I have no evidence for this statement though :)
| pvh wrote:
| Hey there! It's great to see more folks building more CRDTs. On
| behalf of the Peritext authors, Automerge, and Ink & Switch,
| welcome to the community.
| czx111331 wrote:
| Thank you for the invitation! I'm really looking forward to
| being part of the community.
| [deleted]
| stusmall wrote:
| Is the long term plan for this to open source it or monetize it?
| awestroke wrote:
| MIT-licensed: https://github.com/loro-dev/crdt-richtext
| stusmall wrote:
| Their site points to another repo that gives a 404 and the
| you linked to says its a subset of the larger project. It
| says the larger project isn't open source _yet_ , which I
| guess is the answer.
| czx111331 wrote:
| Yeah, that's right
| stusmall wrote:
| Awesome! Super appreciate it and this project looks
| extremely interesting.
| agg23 wrote:
| Really interested to see how this turns out. In addition to
| solving the number of problems that CRDTs solve when it comes to
| text, I would like to see examples of integrating a CRDT system
| that doesn't solely have text as its stored asset.
|
| For example, say you're storing events alongside your CRDT
| datastructure (the text). Assuming the events can be blindly
| merged, what does a P2P sync look like? If it's something that's
| handled by the CRDT framework (as it looks like Loro will do), do
| you have to maintain two different sync processes?
| codetrotter wrote:
| What kind of events do you have in mind?
| agg23 wrote:
| Naively, literally logs/timeseries events (i.e. stuff that
| can literally be inserted and queried in order of their
| timestamp).
|
| This scenario can be made increasingly complex as you add
| user interaction, such as a settings JSON blob that you
| probably do want to intelligently sync.
|
| ----
|
| In general, particularly for the timeseries case, I just want
| to know what the intended P2P flow looks like for different
| datatypes, including those that probably aren't managed by
| the CRDT algorithms. It seems like these systems beeline for
| the hard text case (which makes sense, because it's the
| interesting and challenging problem), but forget that in the
| real world, you need to sync more than just text.
| mycall wrote:
| I've never need an event-based data flow that doesn't require
| correct sequencing. Blindly merging is quite a delicate
| assumption.
| agg23 wrote:
| Sure, there's plenty of complexity there too. For simplicity,
| I was just assuming timestamps are fine, and we're not
| concerned about apparent concurrent events or incorrect
| timestamps from a host.
| iudqnolq wrote:
| Your crdt library should be maintaining some kind of
| logical timestamp. I'd put that in your event and use it to
| order. That way you get a consistent view of time.
|
| (For example yjs calls this a state vector. Might also be
| called a Lamport timestamp or a vector clock.)
| iudqnolq wrote:
| Simple CRDTs are surprisingly simple. If you don't need text
| the complexity drops off quite quickly.
|
| I just ripped yjs out of a personal project and replaced it
| with something I wrote from scratch. Getting exactly the
| semantics I want is nice. For example it's easier to make
| compound properties update atomically. A good place to get
| started is one of the figma founders' blogs:
| https://madebyevan.com/algos/crdt-fractional-indexing/
|
| You'll notice if you use Figma that it's mostly last-writer-
| wins fields. That seems to often turn out to be the ideal UI
| and it's the simplest to implement. If you the text to green
| and I set it to red the right answer isn't yellow.
| czx111331 wrote:
| Hi, I'm the developer of crdt-richtext. I'm humbly grateful for
| everyone's support and quite surprised to see the level of
| interest this project has generated. It's still just an
| experimental first step for us and not mature for production yet.
| As some of the comments have pointed out, performance may not be
| the main bottleneck for adopting CRDTs anymore. The potential
| bottlenecks in the user path might lie in the DX and the
| integration with existing tool stacks. Our primary project, Loro,
| which is still in closed development, aims to address these
| issues in the near future. We look forward to sharing updates
| with you as we progress.
| satvikpendem wrote:
| How does this compare to automerge, yjs-rust and especially
| diamond-types which was also designed for text?
| red_hare wrote:
| The author has references for how it compares in behavior and
| performance to yjs and automerge in the article.
| alecnotthompson wrote:
| There's a link to everything in the blog post, including
| crates.io in the sentence it talks about the crate the post is
| about, but no link to the crate itself :/
| samwillis wrote:
| This looks super interesting, always excited for a new CRDT
| implementation.
|
| I'm slightly confused by the benchmarks, they look incredible,
| somewhat surprised by its performance over Yjs/Yrs and Automerge.
| Going to have to dig in more to understand.
| czx111331 wrote:
| The source code of the benchmark is available here
| https://github.com/zxch3n/fugue-bench
| cmrdporcupine wrote:
| Has anybody done any work or research integrating CRDT-ish text
| conflict resolution with a database transaction process? E.g. in
| MVCC when detection of version conflict (e.g. via timestamps)
| happens, the approach is generally to abort and retry; but if the
| conflict in the tuple/row is at textual level and can be resolved
| by this kind of algorithm, it seems this could handle some
| conflict resolution without full retry or abort?
| samsquire wrote:
| I am working toward that goal of integrating text diffing or
| CRDT diffing into a database.
|
| I wanted extremely high performance multimaster scalability of
| writes with linear scalability and semiautomatic text diffing
| and merging. I think this calls for an eventually consistent
| architecture.
|
| The idea is to implement the GUI so that diffs are exposed to
| the user if there is a conflict but otherwise use the Myers
| algorithm.
|
| I implemented a simple MVCC in Java.
| https://github.com/samsquire/multiversion-concurrency-contro...
|
| And I implemented a 3 way text diff with myers algorithm based
| on https://blog.jcoglan.com/2017/02/12/the-myers-diff-
| algorithm...
|
| https://github.com/samsquire/text-diff
|
| I implemented an eventually consistent mesh protocol that uses
| timestamps to provide last write wins
| https://github.com/samsquire/eventually-consistent-mesh
|
| I implemented an barebones toy SQL and Graph cypher and
| document storage database https://github.com/samsquire/hash-db
|
| I need to combine the ideas in each of these projects into a
| cohesive solution.
|
| I did some incomplete work on trying to implement the YATA
| algorithm.
| samwillis wrote:
| Not quite the answer to your question, but there are quite a
| few people looking at doing eventually consistent database
| syncing with CRDTs. Two interesting ones are:
|
| https://electric-sql.com/ - SQLite and Postgres syncing
|
| https://vlcn.io/ - SQLite syncing
|
| I've been experimenting with Yjs and merging documents in
| database via SQL, for both SQLite and Postgres. And allowing
| you to query the contents of those documents, similar to how
| you can query JSON. Early days, these are just tests:
|
| https://github.com/samwillis/yjs-sqlite-test
|
| https://github.com/samwillis/yjs-pg-test
| agg23 wrote:
| I hadn't heard of Electric-SQL before. Was potentially
| interested, but the lack of native language bindings is a
| dealbreaker. I don't particularly like a centralized server
| being required for the sync either, but it's ok.
|
| VLCN is very interesting to me, and I really need to play
| around with it. It's not incredibly obvious to me how these
| extensions to SQLite play out in actual use.
| stusmall wrote:
| Have you looked at https://www.ditto.live/
|
| Full disclosure: I work there
| agg23 wrote:
| I have actually, but I'm only interested in open source
| and self-hostable solutions, so I stopped looking at it
| pretty quickly, sorry.
| thruflo wrote:
| We (ElectricSQL) have some work underway on a Rust/WASM
| port of our core client component, that's designed to open
| up more language support.
|
| This is a community contributed Dart/Flutter client
| https://github.com/SkillDevs/electric_dart that may also be
| a useful reference.
| cmrdporcupine wrote:
| Yeah I'm more interested in having this as part of the core
| DB tx commit logic, rather than as an ad-on or sync piece.
___________________________________________________________________
(page generated 2023-05-18 23:01 UTC)