hngopher.com

       [HN Gopher] CRDT-richtext: Rust implementation of Peritext and F...
       ___________________________________________________________________
        
       CRDT-richtext: Rust implementation of Peritext and Fugue
        
       Author : samwillis
       Score  : 146 points
       Date   : 2023-05-18 14:00 UTC (9 hours ago)
        
 (HTM) web link (loro-dev.notion.site)
 (TXT) w3m dump (loro-dev.notion.site)
        
       | xpe wrote:
       | I'm hoping to find a literature review that situates CRDT text
       | algorithms along with text differencing and source control. Are
       | there some key "contours" where certain algorithms shine with
       | regards to (a) number of concurrent edits; (b) length of source
       | text; (c) length of changes; (d) time difference (is this a
       | reasonable metric?) between synchronization; (e) reconciliation /
       | merging / verification.
       | 
       | Apologies if this sounds like a professor without funding asking
       | for free work. I can assure you I am not a professor, but it is
       | true that I have no funding.
        
       | josephg wrote:
       | Diamond types author here! Congratulations on getting your crdt
       | working! It's lovely to see a new generation of CRDTs which have
       | decent performance.
       | 
       | And nice stuff implementing peritext! I'd love to do the same in
       | diamond types at some point. You beat me to it!
       | 
       | Im building a little repository of real world collaborative
       | editing traces to use when benchmarking, comparing and optimising
       | text based CRDTs[1]. The automerge-perf editing trace isn't
       | enough on its own. And we're increasingly converging on a format
       | for multi user concurrent editing traces too[2]. It'd be great to
       | add some rich text editing traces in the mix if you're interested
       | in recording something, so we can also compare how peritext
       | performs in different systems.
       | 
       | Anyway, welcome to the community! Love to have more
       | implementations around!
       | 
       | https://github.com/josephg/crdt-benchmarks
       | 
       | https://github.com/dmonad/crdt-benchmarks/issues/20
        
       | Zanfa wrote:
       | I'm wondering if anybody has seen a way to represent mixed rich-
       | text and structured data like tables as CRDTs?
        
         | samwillis wrote:
         | So both Yjs and Automerge support rich text types and JSON like
         | structures. They also have bindings to various rich text editor
         | frameworks such a ProseMirror. It's completely possible to
         | build an editor that combines rich text and _typographic_
         | tables.
         | 
         | Your reference to "structured data" however makes me thing your
         | referring to data tables. Again with both Yjs and Automerge
         | these can be represented as just a list of maps. So you have
         | one CRDT document that combines a rich text section and the
         | data section, or even nests them.
         | 
         | It would even be possible with Prosemirror, for example, to
         | build a rich text and spreadsheet like editor backed by CRDTs.
        
       | xpe wrote:
       | > Peritext is a novel rich-text CRDT algorithm. It can merge
       | concurrent edits of rich text without losing the intended edits.
       | 
       | The page lost some foundational credibility with me by making
       | this claim without immediately following it [1] with citations,
       | clarifications, or caveats.
       | 
       | Why? Because "intention" is a _very_ big tent. At the very least,
       | the project would benefit by characterizing the mapping between
       | _intention_ and edits.
       | 
       | Perhaps this project's notion of intention could be summarized
       | very compactly. Let's see what we find and share it here?
       | 
       | P.S. I'm feeling like philosophy, which clarifies so many
       | concepts -- or at least explodes them by asking a bunch of
       | specific questions -- is more and more relevant when it comes to
       | designing computational systems. The idea of intention is so key
       | to so many human interactions. It would be a travesty for
       | computer scientists and software engineers to gloss over its
       | impacts.
       | 
       | [1] Even a "lowly" superscript would do.
        
         | czx111331 wrote:
         | Thanks for pointing that out. I refined the text and added a
         | bookmark to make the citation stand out.
        
           | xpe wrote:
           | Did I miss the bookmark?
           | 
           | Also, in terms of a text chance, I noticed this:
           | 
           | > It is capable of merging concurrent edits in rich text
           | format while preserving users' intent as much as possible.
           | 
           | Saying "as much as possible" doesn't address my concerns. It
           | still leaves open the question of intent and what
           | specifically that means.
           | 
           | The Peritext page has a whole section on intent. It would be
           | nice to have one sentence pointing specifically to it.
        
             | czx111331 wrote:
             | Alright, I just added a link to the section of the original
             | Peritext document related to preserving the author's
             | intent: https://www.inkandswitch.com/peritext/#preserving-
             | the-author...
             | 
             | The specific definition of user intent in the context of
             | concurrent rich text editing can't be clearly explained in
             | a few words. it's best understood through specific
             | examples. I've provided some examples in my brief
             | introduction to Peritext.
        
         | [deleted]
        
         | samwillis wrote:
         | The page here has a good overview of Peritext:
         | https://www.inkandswitch.com/peritext/
         | 
         | It's a separate project from the leading academics in CRDTs. It
         | does what is claimed.
        
           | xpe wrote:
           | In reading your comment, I don't see the connection to my
           | concerns about intent. This leaves open the question of "is
           | it worth clicking?"
           | 
           | In this case it was, so thank you. Here is the quote:
           | 
           | > Before we can implement an algorithm to perform this
           | automatic merge, we first need to define a clear model for
           | user intent, and define the expected results when concurrent
           | edits are merged. In this section we develop such a model
           | through a series of examples.
           | 
           | I only go to the trouble of mentioning this to help remind
           | people that it is a win-win to make a clear connection when
           | commenting. In this case, simply saying "This link talks
           | about intent in detail" would have worked.
        
             | samwillis wrote:
             | Agreed, apologies for that.
             | 
             | In this case, and I'm not joking, my wife called me through
             | as she had just put dinner was on the table, clearly that
             | took priority. So I hit send before adding any more context
             | or comment, which I would otherwise have done.
             | 
             | I'm pleased you clicked, that page is such a brilliant
             | overview of CRDTs, rich text editing with them and
             | capturing intent.
        
       | matlin wrote:
       | I'm happy to see even further performance improvements towards
       | rich-text CRDTs but at this point, I think the barrier to
       | adoption isn't speed or compactness but instead integration with
       | existing backends and databases.
       | 
       | I have a hunch that we're reinventing the wheel by creating new
       | B-Tree implementations in Rust when we could be figuring out how
       | to make an existing database do the hard work of storing and
       | retrieving characters in the correct order. I know Martin
       | Kleppmann has looked at Datalog being a potential solution to
       | this problem but until we have a good full-stack solution to
       | collaborative text editing, I don't think we'll see major
       | adoption of these CRDT solutions.
        
         | conradev wrote:
         | Yeah, I would rather have Peritext inside of SQLite itself:
         | https://github.com/vlcn-io/cr-sqlite
        
         | dspillett wrote:
         | _> we 're reinventing the wheel by creating new B-Tree
         | implementations in Rust when we could be figuring out how to
         | make an existing database do the hard work of storing and
         | retrieving characters in the correct order_
         | 
         | That is liable to produce a significantly less efficient result
         | in both computational complexity and storage requirements. The
         | b-tree implementations used in existing databases are optimised
         | for managing data in fixed(ish) size pages which I don't think
         | matches the CRDT use case.
         | 
         | Heck, there are projects looking at whether current
         | implementations in DBs which were designed when spinning disks
         | were the best online storage we had (or all but the very monied
         | could afford in significant size) are sub-optimal enough in our
         | current RAM-is-relatively-cheap and random-access-IO-is-
         | massively-less-latent-than-it-used-to-be world that the
         | compatibility hump might be worth making significant changes to
         | pages sizes and other considerations in those structures for
         | even the DB work they are designed for.
        
         | samwillis wrote:
         | I agree that full stack support is the missing pice that's need
         | to make use explode. But I do think the current implementations
         | will get there.
         | 
         | CRDTs are fairly unique in that you need them to be exposed
         | very close to the front of your stack in order to capture user
         | intent, but you also need support further back in your stack
         | for merging, replication, and querying.
         | 
         | We have good front end support and there are multiple exciting
         | projects building collaboration servers (PartyKit in particular
         | looks awesome) for them. What I think is missing is support _in
         | database_ , that's what I've been experimenting with (below).
         | If you are building an offline enabled app, having the ability
         | to generate diffs and merge in database enables easy multi
         | document sync.
         | 
         | Also most general purpose CRDTs are a combination of JSON and
         | XML like data structures, it's useful to be able to query the
         | structures in your database. For example if you build a notes
         | app that supports inline tags, it's useful to be able to query
         | and index those from within the XML like structure without
         | having to dump the whole thing out at another layer of your
         | stack.
         | 
         | Essentially, you need to capture user intent in the CRDT on the
         | front end, but then have support through your whole backend.
         | 
         | Yjs Postgres: https://github.com/samwillis/yjs-pg-test
         | 
         | Yjs SQLite: https://github.com/samwillis/yjs-sqlite-test
         | 
         | (These are just early experiments, I'm working on a cleaner
         | shared implementation with support for various SQLite bindings,
         | and better querying)
        
           | tantaman wrote:
           | Very cool. Left some comments related to some drawbacks I see
           | of your current approach -- https://github.com/vlcn-io/cr-
           | sqlite/issues/65#issuecomment-...
           | 
           | If you're open to resolving those I'd love to stick it into
           | cr-sqlite :)
        
             | samwillis wrote:
             | Hey Matt, thanks. I will drop my thoughts into that thread
             | tomorrow. Cr-SQLite is awesome, I found that thread a few
             | weeks ago and actually had it on my list to get in touch.
        
           | matlin wrote:
           | Cool, I'll check out your repos. Supabase also has Postgres
           | CRDT (https://github.com/supabase/pg_crdt).
           | 
           | But, in general, I think searching in and across documents
           | and database storage is going to a thorny problem even with
           | existing, hyper-optimized CRDT algorithms. For example, if
           | you just store your Yjs document as a binary blob in a
           | Postgres column, Postgres becomes the bottleneck and all the
           | fancy range-tree or b-tree optimizations are no longer that
           | helpful on the server. Plus, it'd be great to have an
           | document edit just be a tiny insert into the database that
           | can be streamed to other clients rather than a big row
           | update.
           | 
           | This may because the focus of CRDTs seems to be rooted in a
           | fully decentralized, P2P system but many developers want
           | collaborative text editing in a more traditional client-
           | server model.
        
             | [deleted]
        
         | smasher164 wrote:
         | Yeah, I think a good step would be to integrate a rich CRDT
         | into sqlite, whether that's through CTEs, a custom extension,
         | or library code that manages the tables.
         | 
         | Layering a CRDT on top of an existing schema might not be very
         | efficient though. You'd likely need to modify the db's
         | internals. I have no evidence for this statement though :)
        
       | pvh wrote:
       | Hey there! It's great to see more folks building more CRDTs. On
       | behalf of the Peritext authors, Automerge, and Ink & Switch,
       | welcome to the community.
        
         | czx111331 wrote:
         | Thank you for the invitation! I'm really looking forward to
         | being part of the community.
        
       | [deleted]
        
       | stusmall wrote:
       | Is the long term plan for this to open source it or monetize it?
        
         | awestroke wrote:
         | MIT-licensed: https://github.com/loro-dev/crdt-richtext
        
           | stusmall wrote:
           | Their site points to another repo that gives a 404 and the
           | you linked to says its a subset of the larger project. It
           | says the larger project isn't open source _yet_ , which I
           | guess is the answer.
        
             | czx111331 wrote:
             | Yeah, that's right
        
               | stusmall wrote:
               | Awesome! Super appreciate it and this project looks
               | extremely interesting.
        
       | agg23 wrote:
       | Really interested to see how this turns out. In addition to
       | solving the number of problems that CRDTs solve when it comes to
       | text, I would like to see examples of integrating a CRDT system
       | that doesn't solely have text as its stored asset.
       | 
       | For example, say you're storing events alongside your CRDT
       | datastructure (the text). Assuming the events can be blindly
       | merged, what does a P2P sync look like? If it's something that's
       | handled by the CRDT framework (as it looks like Loro will do), do
       | you have to maintain two different sync processes?
        
         | codetrotter wrote:
         | What kind of events do you have in mind?
        
           | agg23 wrote:
           | Naively, literally logs/timeseries events (i.e. stuff that
           | can literally be inserted and queried in order of their
           | timestamp).
           | 
           | This scenario can be made increasingly complex as you add
           | user interaction, such as a settings JSON blob that you
           | probably do want to intelligently sync.
           | 
           | ----
           | 
           | In general, particularly for the timeseries case, I just want
           | to know what the intended P2P flow looks like for different
           | datatypes, including those that probably aren't managed by
           | the CRDT algorithms. It seems like these systems beeline for
           | the hard text case (which makes sense, because it's the
           | interesting and challenging problem), but forget that in the
           | real world, you need to sync more than just text.
        
         | mycall wrote:
         | I've never need an event-based data flow that doesn't require
         | correct sequencing. Blindly merging is quite a delicate
         | assumption.
        
           | agg23 wrote:
           | Sure, there's plenty of complexity there too. For simplicity,
           | I was just assuming timestamps are fine, and we're not
           | concerned about apparent concurrent events or incorrect
           | timestamps from a host.
        
             | iudqnolq wrote:
             | Your crdt library should be maintaining some kind of
             | logical timestamp. I'd put that in your event and use it to
             | order. That way you get a consistent view of time.
             | 
             | (For example yjs calls this a state vector. Might also be
             | called a Lamport timestamp or a vector clock.)
        
         | iudqnolq wrote:
         | Simple CRDTs are surprisingly simple. If you don't need text
         | the complexity drops off quite quickly.
         | 
         | I just ripped yjs out of a personal project and replaced it
         | with something I wrote from scratch. Getting exactly the
         | semantics I want is nice. For example it's easier to make
         | compound properties update atomically. A good place to get
         | started is one of the figma founders' blogs:
         | https://madebyevan.com/algos/crdt-fractional-indexing/
         | 
         | You'll notice if you use Figma that it's mostly last-writer-
         | wins fields. That seems to often turn out to be the ideal UI
         | and it's the simplest to implement. If you the text to green
         | and I set it to red the right answer isn't yellow.
        
       | czx111331 wrote:
       | Hi, I'm the developer of crdt-richtext. I'm humbly grateful for
       | everyone's support and quite surprised to see the level of
       | interest this project has generated. It's still just an
       | experimental first step for us and not mature for production yet.
       | As some of the comments have pointed out, performance may not be
       | the main bottleneck for adopting CRDTs anymore. The potential
       | bottlenecks in the user path might lie in the DX and the
       | integration with existing tool stacks. Our primary project, Loro,
       | which is still in closed development, aims to address these
       | issues in the near future. We look forward to sharing updates
       | with you as we progress.
        
       | satvikpendem wrote:
       | How does this compare to automerge, yjs-rust and especially
       | diamond-types which was also designed for text?
        
         | red_hare wrote:
         | The author has references for how it compares in behavior and
         | performance to yjs and automerge in the article.
        
       | alecnotthompson wrote:
       | There's a link to everything in the blog post, including
       | crates.io in the sentence it talks about the crate the post is
       | about, but no link to the crate itself :/
        
       | samwillis wrote:
       | This looks super interesting, always excited for a new CRDT
       | implementation.
       | 
       | I'm slightly confused by the benchmarks, they look incredible,
       | somewhat surprised by its performance over Yjs/Yrs and Automerge.
       | Going to have to dig in more to understand.
        
         | czx111331 wrote:
         | The source code of the benchmark is available here
         | https://github.com/zxch3n/fugue-bench
        
       | cmrdporcupine wrote:
       | Has anybody done any work or research integrating CRDT-ish text
       | conflict resolution with a database transaction process? E.g. in
       | MVCC when detection of version conflict (e.g. via timestamps)
       | happens, the approach is generally to abort and retry; but if the
       | conflict in the tuple/row is at textual level and can be resolved
       | by this kind of algorithm, it seems this could handle some
       | conflict resolution without full retry or abort?
        
         | samsquire wrote:
         | I am working toward that goal of integrating text diffing or
         | CRDT diffing into a database.
         | 
         | I wanted extremely high performance multimaster scalability of
         | writes with linear scalability and semiautomatic text diffing
         | and merging. I think this calls for an eventually consistent
         | architecture.
         | 
         | The idea is to implement the GUI so that diffs are exposed to
         | the user if there is a conflict but otherwise use the Myers
         | algorithm.
         | 
         | I implemented a simple MVCC in Java.
         | https://github.com/samsquire/multiversion-concurrency-contro...
         | 
         | And I implemented a 3 way text diff with myers algorithm based
         | on https://blog.jcoglan.com/2017/02/12/the-myers-diff-
         | algorithm...
         | 
         | https://github.com/samsquire/text-diff
         | 
         | I implemented an eventually consistent mesh protocol that uses
         | timestamps to provide last write wins
         | https://github.com/samsquire/eventually-consistent-mesh
         | 
         | I implemented an barebones toy SQL and Graph cypher and
         | document storage database https://github.com/samsquire/hash-db
         | 
         | I need to combine the ideas in each of these projects into a
         | cohesive solution.
         | 
         | I did some incomplete work on trying to implement the YATA
         | algorithm.
        
         | samwillis wrote:
         | Not quite the answer to your question, but there are quite a
         | few people looking at doing eventually consistent database
         | syncing with CRDTs. Two interesting ones are:
         | 
         | https://electric-sql.com/ - SQLite and Postgres syncing
         | 
         | https://vlcn.io/ - SQLite syncing
         | 
         | I've been experimenting with Yjs and merging documents in
         | database via SQL, for both SQLite and Postgres. And allowing
         | you to query the contents of those documents, similar to how
         | you can query JSON. Early days, these are just tests:
         | 
         | https://github.com/samwillis/yjs-sqlite-test
         | 
         | https://github.com/samwillis/yjs-pg-test
        
           | agg23 wrote:
           | I hadn't heard of Electric-SQL before. Was potentially
           | interested, but the lack of native language bindings is a
           | dealbreaker. I don't particularly like a centralized server
           | being required for the sync either, but it's ok.
           | 
           | VLCN is very interesting to me, and I really need to play
           | around with it. It's not incredibly obvious to me how these
           | extensions to SQLite play out in actual use.
        
             | stusmall wrote:
             | Have you looked at https://www.ditto.live/
             | 
             | Full disclosure: I work there
        
               | agg23 wrote:
               | I have actually, but I'm only interested in open source
               | and self-hostable solutions, so I stopped looking at it
               | pretty quickly, sorry.
        
             | thruflo wrote:
             | We (ElectricSQL) have some work underway on a Rust/WASM
             | port of our core client component, that's designed to open
             | up more language support.
             | 
             | This is a community contributed Dart/Flutter client
             | https://github.com/SkillDevs/electric_dart that may also be
             | a useful reference.
        
           | cmrdporcupine wrote:
           | Yeah I'm more interested in having this as part of the core
           | DB tx commit logic, rather than as an ad-on or sync piece.
        
       ___________________________________________________________________
       (page generated 2023-05-18 23:01 UTC)