hngopher.com

       [HN Gopher] Faster CRDTs: An Adventure in Optimization
       ___________________________________________________________________
        
       Faster CRDTs: An Adventure in Optimization
        
       Author : xnx
       Score  : 469 points
       Date   : 2021-07-31 11:37 UTC (11 hours ago)
        
 (HTM) web link (josephg.com)
 (TXT) w3m dump (josephg.com)
        
       | hinkley wrote:
       | I'm getting mixed messages on CRDTs. Are we at the point now
       | where they are general enough that the human observer is not
       | constantly confronted with 'surprises' from the behavior of the
       | system?
       | 
       | Some of the talks by Kleppmann go straight into the weeds and
       | make it hard to tell if he's just nerding out about finer points
       | or lamenting unsolved problems, or even paradoxes.
        
       | JW_00000 wrote:
       | By the way, as someone who has published academic papers, if
       | you're ever bothered about a paper or have some comments, don't
       | hesitate to mail the authors. (Their e-mail addresses are always
       | on the paper; especially target the first author because they
       | have normally done the work.) We are happy to hear when someone
       | has read our work and I at least would've liked to have known if
       | someone found a problem with my papers.
        
         | zladuric wrote:
         | > especially target the first author because they have normally
         | done the work
         | 
         | As someone living with a recently promoted? (is that the
         | correct term?) PhD in social sciences, this surprises me. Is
         | that something specific for my country, for social sciences or
         | my wife simply landed in a case full of rotten apples?
        
           | galgalesh wrote:
           | In my uni in Belgium, this is the case too. First author does
           | most of the work.
        
           | shrimpx wrote:
           | In computer science the first author does the work and is
           | usually a PhD student. The last author is usually the
           | professor that pushed and helped develop the idea, provided
           | funding, and probably wrote or was heavily involved in
           | writing the paper's abstract, intro and conclusion sections
           | -- the bulk of "framing" the work.
           | 
           | But there are exceptions. Some profs are less student-
           | oriented or don't like delegating so much, and remain
           | "individual contributors" deep in their careers. Those tend
           | to publish nonzero number of first- and single-author papers.
           | 
           | Edit: I've noticed that in Theory and Algorithms, profs tend
           | to take first author even though the student slaved out the
           | proofs. That field is kind of an outlier in that it's close
           | pure math, and I think borrows cultural artifacts from math
           | research.
        
             | [deleted]
        
           | exmadscientist wrote:
           | At least in the fields I've published in, first author does
           | the work (or at least most of it, almost always including
           | writing the paper) and last author secured the funding.
           | Sometimes authors are listed in alphabetical order, in which
           | case one or two are usually listed as "corresponding author".
           | The less-senior corresponding author usually did the work,
           | and the more-senior corresponding author is usually either
           | the one who secured the funding, or a long-term member of the
           | research group who is likely to actually be _around_ to
           | answer correspondence in the future (and who also probably
           | helped out enough to be worth corresponding with).
        
           | anonymousDan wrote:
           | I think it's pretty field specific. In a lot of CS for
           | example the first author did most of the work and the last
           | author got the funding. The people in the middle may have
           | done varying degrees of work ranging from helping
           | substantially with the implementation, evaluation or writing
           | of the paper to once having read the paper abstract :)
        
           | robbles wrote:
           | I think the term you're looking for might be "postdoc"
           | (postdoctoral researcher)? I've seen "recent PhD" a few times
           | as well, seems pretty clear.
        
       | lewisjoe wrote:
       | I've been looking for a practical OT alternative for our online
       | word processor (https://zoho.com/writer). We already use OT for
       | syncing our realtime edits and exploring CRDTs targetting
       | stronger consistency for tackling offline edits (which are
       | typically huge & defragmented, since the edits are not syncing in
       | realtime)
       | 
       | So the baseline is that OT has a better model for holding state
       | in terms of performance/memory, since the edits can be compiled
       | into plain string types. CRDTs in comparison forces us to hold
       | deleted states as well and demands richer information per unit
       | (character/string/etc) - which makes it harder on the CPU/RAM.
       | 
       | Here's the story as I understand:
       | 
       | 1. Automerge tackles this by just moving to a better lower-level
       | runtime: Rust.
       | 
       | 2. Yjs handles this by using a similar technique i.e relying on
       | V8's hidden classes to handle the performance optimizations and
       | assuming real-world cases to narrow down and optimize
       | datastructures.
       | 
       | But none of these, seem to be a fundamental breakthrough in the
       | efficiency of the algorithm itself. They all at best look like a
       | workaround and this keeps bothering me.
        
         | kevinjahns wrote:
         | I know that it is hard to comprehend why modern CRDT
         | implementations are fast. But the data confirms that they work
         | great. OT seems to be much simpler, but there are real
         | advantages in using CRDTs. The performance problems have been
         | solved through an efficient representation of the CRDT model.
         | 
         | The gist of the below [1] read is that it is impossible for a
         | human to create a document that Yjs can't handle (even in the
         | worst case scenario). But yes, it handles real-world scenarios
         | particularily well.
         | 
         | The concept of "hidden classes" is super old. It has first been
         | implemented in a fork of smalltalk and then became foundational
         | concept of runtime engines for scripting languages. It is
         | implemented in V8, python, ruby, spidermonkey, ..
         | 
         | Yjs does not assume a "real-world scenario" and it is not
         | optimized for any specific runtime engine. It runs fast in any
         | browser. The benchmarks confirm this. [2]
         | 
         | Yjs is being used in practice by several companies (eg Nimbus
         | Notes with >3 million users) for quite some time now. I'm not
         | aware of any performance problems.
         | 
         | [1]: https://blog.kevinjahns.de/are-crdts-suitable-for-shared-
         | edi... [2]: https://github.com/dmonad/crdt-benchmarks
        
         | pfraze wrote:
         | You can remove tombstones in a cleanup pass if you constrain
         | behavior a bit.
         | 
         | For instance, if there's just a single server, after 5 minutes
         | of no active connections you could clean out tombstones. After
         | that, if a client connects with some changes they had been
         | holding onto but got DCed, you can reject the write and let the
         | user's client merge by some other means (perhaps even manual).
        
         | josephg wrote:
         | If you've got big offline edits (or you're merging multiple
         | large sets of edits), even existing CRDTs will generally handle
         | that more efficiently than OT will. OT algorithms are usually
         | O(n * m) time complexity when merging _n_ edits from one peer
         | with _m_ edits from another peer. A CRDT like diamond-types is
         | O((n + m) * log(s)) where s is the current size of the
         | document. In practice its super fast.
         | 
         | As for holding deleted states and richer information per unit,
         | its not so bad in absolute terms. 1-2mb of data in memory for a
         | 17 page document is honestly fine. But there's also a few
         | different techniques that exist to solve this in CRDTs:
         | 
         | 1. Yjs supports "garbage collection" APIs. Essentially you say
         | "anything deleted earlier than this point is irrelevant now"
         | and the data structures will flatten all runs of items which
         | were deleted earlier. So storage stays proportional to the size
         | of the not-deleted content.
         | 
         | 2. Sync9 has an algorithm called "antimatter" which mike still
         | hasn't written up _poke poke mike!_. Antimatter actively tracks
         | the set of all peers which are on the network. When a version
         | is known to have been witnessed by all peers, all extra
         | information is safely discarded. You can also set it up to
         | assume any peer which has been offline for however long is gone
         | forever.
         | 
         | 3. Personally I want a library to have an API method for taking
         | all the old data and just saving it to disk somewhere. The idea
         | would be to reintroduce the same devops simplicity of OT where
         | you can just archive old history when you know it probably
         | won't ever be referenced again. Keep the last week or two hot,
         | and delete or archive history at will. If you combined this
         | with a "rename" operation, you could reduce the "hot" dataset
         | to basically nothing. This would also make the implementation
         | much simpler - because we wouldn't need all these performance
         | tricks to make a CRDT like diamond-types fast if the dataset
         | stayed tiny anyway.
        
       | vanderZwan wrote:
       | On a meta-level, does anyone else think that the whole idea of
       | writing a peer reviewed paper that is just a benchmark of
       | different algorithms should be really rigorously reviewed before
       | being accepted? Writing good benchmarks is hard, and so highly
       | contextual that writing fair comparisons beteen algorithms (or
       | data structures) is almost impossible unless you're an expert in
       | all of the algorithms involved.
        
         | thysultan wrote:
         | Only the holy experts can divulge the realm of gods that is
         | benchmarking and reveal to us the results. How--to do we
         | beseech them o wise one?
        
         | Diggsey wrote:
         | Yeah, I've also seen several academic papers on performance or
         | "optimization" of existing algorithms which just demonstrate a
         | complete lack of knowledge about how those algorithms are
         | implemented in practice.
         | 
         | For example, there was a paper explaining how you could
         | optimize the GJK algorithm by reducing the number of distance
         | checks required, and in turn the number of square-roots...
         | Despite the fact that everyone (including the authors of the
         | original GJK algorithm) knows that you don't _actually_ need to
         | do a square-root to compare distances...
        
           | meepmorp wrote:
           | > Despite the fact that everyone (including the authors of
           | the original GJK algorithm) knows that you don't actually
           | need to do a square-root to compare distances..
           | 
           | Academia's purpose is to produce research, typically measured
           | in publications per unit time. Optimizing one paper leads to
           | a global reduction in the size of the literature by pruning
           | opportunities for subsequent research, harming the overall
           | performance of the system.
        
         | lostdog wrote:
         | Benchmarking papers are inaccurate when the original algorithms
         | are not open sourced, and the grad student needs to rewrite the
         | algorithm from scratch. They can easily create different
         | implementation details, and wind up with an algorithm that's
         | slower than the original.
         | 
         | I do think that the original algorithm authors should have the
         | opportunity to correct the benchmarking code, or to release
         | their original implementation as open source to be benchmarked.
         | 
         | In some sense, the benchmarking paper with a slower
         | implementation is more "correct," since an engineer evaluating
         | which algorithm to use is just as likely to implement the
         | algorithm in a slower way than the original paper. The
         | incentives are right too: the original paper author should be
         | giving enough details to recreate their work, and the
         | benchmarker is showing that really their published algorithm is
         | slow.
        
         | knuthsat wrote:
         | Problem is that academics are rarely experts at programming or
         | have knowledge of computer architectures as much as someone in
         | the industry. There are various tricks that are never taught at
         | college, therefore academics have no idea some stuff even
         | exists.
         | 
         | Best example is discrete optimization research (traveling
         | salesman, vehicle routing and its variants, schedule rostering
         | etc.). Stuff you find in the papers there achieves state-of-
         | the-art results very slowly (using integer linear programming
         | or some rarely optimized heuristics) making you believe these
         | instances of a general NP-hard problem can't be solved quickly.
         | 
         | When you start tinkering, you either find that data structures
         | can be added that reduce the complexity significantly or that
         | there are regularities in instances that, when exploited,
         | support massive speedups.
         | 
         | I would say that TSP research is an exception but most of stuff
         | coming out that has a lot citations is way too slow and is
         | never as brilliantly implemented as Lin Kernighan heuristic or
         | other stuff from the age of insanely slow computers.
        
           | makmanalp wrote:
           | > Problem is that academics are rarely experts at programming
           | or have knowledge of computer architectures as much as
           | someone in the industry. There are various tricks that are
           | never taught at college, therefore academics have no idea
           | some stuff even exists.
           | 
           | I want to push back on this generalization a bit. The
           | academics that are focused on pushing the mathematical
           | boundaries of discrete optimization are focused, no surprise,
           | on only that. If theoretical improvements is not what you
           | want, don't read those papers, read ones what people more
           | focused on applications. Read literally any databases paper,
           | or stuff like hyperscan, simdjson. I'd argue that a non-
           | trivial amount of these are vastly /ahead/ of what's standard
           | in industry, but industry is slow to adapt new ideas and
           | catch up because of legacy software and existing codebases.
           | Very similar stuff in the programming languages sphere, it
           | took ages for industry to adapt ideas that were popular in
           | academia for a long time (eg: Rust, LINQ). The idea that
           | academia (at least in CS) is an ivory tower, far from the
           | concerns of industry, is not very true as of recent. There's
           | a lot of cross pollination and a revolving door of people
           | going back and forth from one to the other spreading ideas.
        
             | knuthsat wrote:
             | I must say that when it comes to discrete optimization, the
             | genetic/ant/simulated annealing/etc. stuff is more popular
             | in academia than in industry (at least the industry that
             | doesn't heavily include academics).
             | 
             | Works like Lin-Kernighan heuristic are extremely rare and a
             | bunch of knowledge exists in industry only. Even the
             | mentioned heuristic was for decades being implemented
             | incorrectly until one individual came and demonstrated its
             | superiority (K. Helsgaun).
             | 
             | I mean, most of the code that gets released today, doesn't
             | even handle time windows constraint in the best way
             | possible (optimal updates, constant time feasibility
             | evaluations etc.). I believe all open source vehicle
             | routing libraries do not have any data-structure relevant
             | optimizations for handling time windows, task precedence,
             | dynamic workshift starts etc. All are fairly simple
             | implementation tricks that just got lost under giant
             | amounts of population based heuristics or mathematical
             | linear programming approaches.
        
             | hodgesrm wrote:
             | 100x +1. The whole field of query optimization has been
             | devoted to improvement of efficiency and speed of database
             | systems for decades. [1] Academic results are studied quite
             | closely in industry. Also, "academic" includes people like
             | Mike Stonebraker and Andy Pavlo. They aren't exactly
             | slouches regarding issues of performance.
             | 
             | More generally, major waves of performance innovation in
             | the IT field have been driven by advances in storage,
             | compression, vectorization, virtualization, etc. Academic
             | results have led to entirely new products like Vertica
             | (from C-Store [2]) or VMware (from the Disco system [3]).
             | 
             | [1] https://en.wikipedia.org/wiki/Query_optimization
             | 
             | [2] https://w6113.github.io/files/papers/cstore-vldb05.pdf
             | 
             | [3] http://www.cs.cmu.edu/~15712/papers/bugnion97.pdf
             | 
             | edit: clarity
        
           | kaba0 wrote:
           | You do realize there is a whole area of the research field
           | dedicated for heuristic algorithms? They have a proper
           | academic basis just as much as the "correct" solutions.
        
           | pyrale wrote:
           | From what I saw (I worked in an academia-consulting
           | partnership to solve scheduling problems in transports), the
           | OR researchers very frequently work in association with OR
           | consulting shops, and they're well-aware of the difference
           | between academic datasets and industrial datasets. In the
           | conferences I saw, it was not infrequent to see different
           | tracks for industrial and academic papers, both attended by
           | the same crowd.
           | 
           | The point I agree with, though, is that this is not reflected
           | in the papers. Academic papers focus on academic instances
           | because they are more general (and usually harder, as you
           | said) and because optimizations of specific instances of the
           | problem are not that useful from an academic pov.
           | 
           | It's hard to know who works with who and who has experience
           | with what if you're not an insider, though.
        
           | raverbashing wrote:
           | > Stuff you find in the papers there achieves state-of-the-
           | art results very slowly (using integer linear programming or
           | some rarely optimized heuristics) making you believe these
           | instances of a general NP-hard problem can't be solved
           | quickly.
           | 
           | Yes, or looking for things like mathematical purity, linear
           | problem statement, etc
           | 
           | In practice: you don't need the best solution and you can get
           | a great solution in a multitude of ways.
        
         | comicjk wrote:
         | On the one hand, peer review takes long enough already. On the
         | other... I saw an influential paper that published obviously-
         | wrong timing data, essentially saying that time(A) + time(B) <
         | time(B). It seems they were including the Python interpreter
         | startup time (~0.1s) in a profile of a ~0.01s algorithm.
        
       | z3t4 wrote:
       | What I like about "tests" in software development is that anyone
       | can run them, just download the source code, then run ./test or
       | right click and "run tests". It would be cool if computer science
       | could offer the same experience, just download the source code
       | and run it, compare if you got the same result, inspect and learn
       | from the source code, etc. Instead of "here's some pseudo-code
       | we've never tried", and here's a mathematical formula that you
       | need to be a mathematics professor to understand... Yes we know
       | you are not a professional software developer, the code is going
       | to be at a beginners level, but that is fine, I am not reading
       | your paper to criticize your code for being "impure", or not
       | using the latest syntax and frameworks, I'm reading it to
       | understand how to implement something, to solve a problem.
        
       | paulgb wrote:
       | This is great! I'd like to quote a line here, because I think the
       | answer is "someone on HN knows" and I'd like to hear the answer
       | as well.
       | 
       | > V8 is actually suspiciously fast at this part, so maybe v8
       | isn't using an array internally to implement Arrays? Who knows!
        
         | bouk wrote:
         | The V8 blog is a good starting point for learning how it works
         | under the hood: https://v8.dev/blog/fast-properties
        
           | vlovich123 wrote:
           | Hmm... I would have thought they implemented something
           | similar to a std::deque so that you have ammortized O(1)
           | insertions into the middle of a vector.
        
       | mfbx9da4 wrote:
       | Reminds me of the data structures in this markdown library
       | https://github.com/markdown-it/markdown-it/blob/master/docs/...
       | 
       | The author hand waves away the possibility that there could be
       | memory locality performance benefits by using arrays in JS but my
       | hunch is that there is something in that. I know the react code
       | base for example went for a monomorphic Object structure to
       | represent components to leverage inline caching of hidden
       | classes.
        
       | dpiepgrass wrote:
       | > my range tree is just a slightly modified b-tree. But usually
       | when people talk about b-trees they mean a BTreeMap. Thats not
       | what I'm doing here. Instead of storing keys, each internal node
       | of the b-tree stores the total number of characters (recursively)
       | in that item's children. So we can look up any item in the
       | document by character position, or insert or delete anywhere in
       | the document in log(n) time.
       | 
       | Cool! This is essentially the same idea I implemented in 2012; I
       | call it the AList or A-List data structure:
       | http://core.loyc.net/collections/alists-part1
       | 
       | Ever since I made it, I've been looking for a good application
       | for it. I guess this is it! I mean, I knew A-List was a good data
       | structure for text editing, but most applications can use a Gap
       | Buffer which is much simpler. But when it comes to concurrent
       | editing, you've got multiple editing points so a Gap Buffer is
       | suddenly much less attractive.
       | 
       | > Honestly I'm shocked and a little suspicious of how little ram
       | Yjs uses in this test.
       | 
       | It's good, but still it uses ~30x as much RAM as plain string
       | edits. Not surprisingly, you got 3x better memory usage by using
       | A-List and a more efficient language (Rust in this case, but C#
       | and C/C++ can also do well.)
       | 
       | There is a great article about a CRDT concept called "Causal
       | Trees[1]. I wonder how it compares to flat-list-based CRDTs (it's
       | been too long since I researched this).
       | 
       | By the way, Microsoft has a new set of libraries for concurrent
       | editing called Fluid Framework[2]. I'm told it's a "generalized
       | data structure" that was inspired by Causal Trees but with a
       | unique and intention-preserving editing scheme. I found out about
       | it after they decided to use my fast-cloning-copy-on-write B+Tree
       | for TypeScript[3]... they sent me a pull request for diffing
       | versions of B+ trees, but I haven't yet looked into the
       | architecture of their concurrent data type.
       | 
       | [1] http://archagon.net/blog/2018/03/24/data-laced-with-history/
       | 
       | [2] https://fluidframework.com/
       | 
       | [3] https://www.npmjs.com/package/sorted-btree
        
       | stephc_int13 wrote:
       | Trees are a powerful and practical data structure, but even if it
       | does not appear clearly when doing O(n) style complexity
       | analysis, they are usually slow.
       | 
       | Unfortunately, the difference between slow and fast can be
       | several orders of magnitude, while the perception of the
       | programmer doing a back of the envelope analysis seems to be a
       | logarithmic scaling of the reality...
        
         | josephg wrote:
         | Trees seemed to work pretty well for me here!
         | 
         | The problem with trees is usually that people don't pack enough
         | data into each node in the tree. I implemented a skip list in C
         | a few years ago[1]. For a lark I benchmarked its performance
         | against the C++ SGI rope class which was shipped in the C++
         | header directory somewhere. My skip list was 20x faster - which
         | was suspicious. I looked into what the SGI rope does and it
         | turns out it was only putting one character into each leaf node
         | in the tree it constructed. Benchmarking showed the optimal
         | number was ~120 or so characters per leaf. Memcpy is much much
         | faster in practice than main memory lookups.
         | 
         | In diamond-types (benchmarked here), the internal nodes in my
         | B-tree store 16 pointers and leaf nodes store 32 entries. With
         | run-length encoding, all 180,000 inserted characters in this
         | data set end up in a tree with just 88 internal nodes and a
         | depth of 3. It goes fast like this. But if you think an array
         | based solution would work better, I'd love to see it! It would
         | certainly need a lot less code.
         | 
         | [1] https://github.com/josephg/librope
        
       | feikname wrote:
       | Correct me I'm mistaken
       | 
       | The difference between diamond native and diamond WASM
       | demonstrates how, even with WASM, native implementations beat
       | browsers _hard_ , and native implementations performance-wise are
       | still very worth, specially for lower powered devices, and,
       | perhaps, reducing battery usage (as consequence of less CPU use)
       | in mobile devices.
        
         | Jweb_Guru wrote:
         | The wasm implementation here was still running under a
         | JavaScript test harness, so I suspect it's the JS-WASM boundary
         | interactions that are causing the slowdown. WASM itself (if it
         | doesn't need to interact with JavaScript) usually runs with a
         | much smaller performance penalty.
        
           | josephg wrote:
           | I suspect so too - given there are 280 000 calls across the
           | JS-wasm boundary, and most of those calls pass a string. I'd
           | love to know for sure though. I considered making this
           | benchmark pass the whole data set in one go through JSON or
           | something, but that felt like cheating - since thats not how
           | the API would be used in practice during a real editing
           | session.
           | 
           | But even paying that cost, it still seems much faster to use
           | rust + WASM than run the same algorithm in native javascript.
           | And the JS-wasm boundary will probably get gradually faster
           | over the next few years.
        
         | __s wrote:
         | Yes. Ultimately WASM is executing within a sandbox & involves
         | being JIT compiled (read: not heavily optimized except for hot
         | loops eventually). If native compilation is an option it makes
         | sense to go that route
         | 
         | WASM competes with asm.js not asm (or, arguably, jvm etc)
        
           | Rusky wrote:
           | WASM JIT implementations tend to be quite a bit different
           | from JavaScript JIT, so that's not really where the perf
           | difference comes from.
           | 
           | First, WASM gets all the heavy AOT optimizations from the
           | middle end of the compiler producing it. At runtime, WASM JIT
           | doesn't start from program source, but from something that's
           | already been through inlining, constant propagation, common
           | subexpression elimination, loop optimizations, dead code
           | elimination, etc. And WASM is already typed, so the JIT
           | doesn't have to bother with inline caching, collecting type
           | feedback, or supporting deoptimization.
           | 
           | Because of that, the only really beneficial work left to do
           | is from the back end (i.e. arch-specific) part of the
           | compiler- basically, register allocation and instruction
           | selection. WASM JIT compilers don't bother trying to find hot
           | loops or functions before optimizing. Instead, they do a fast
           | "streaming" or "baseline" codegen pass for fast startup, and
           | then eagerly run a smarter tier over the whole module and
           | hot-swap it in as soon as possible. (See e.g.
           | https://hacks.mozilla.org/2018/01/making-webassembly-even-
           | fa...)
           | 
           | The perf difference vs native rather comes from the
           | sandboxing itself- memory access is bounds checked, support
           | for threads and SIMD is limited (for now), talking to the
           | browser has some overhead from crossing the boundary into
           | JavaScript (though this overhead will go down over time as
           | WASM evolves), etc.
        
       | galesky wrote:
       | This is awesome! I'm researching delta state based CRDTs as a
       | master dissertation, this kinds of optimizations on op-Based are
       | really interesting
        
       | rawoke083600 wrote:
       | Damn! What a fantastic read ! And of course brilliant
       | coding/thinking :) Well done sir !
        
       | lewisjoe wrote:
       | People who are interested in the topic: I just found out they
       | have open meetings about the parent project and seems like
       | anybody could join - https://braid.org/
       | 
       | Great way to share progress. Kudos! :)
        
         | toomim wrote:
         | Yep! Our next meeting is in two Mondays from now on August 2nd,
         | at 4:00pm Pacific Time. All are welcome:
         | https://braid.org/meeting-16
        
       | iampims wrote:
       | This is a fantastic write-up. Worth a read even if you don't care
       | about CRDTs.
        
       | josephg wrote:
       | Hello HN! Post author here. I'm happy to answer questions & fix
       | typos once morning rolls around here in Australia
        
         | gfodor wrote:
         | Thanks for ShareDB. It's dope. I extended it to support
         | collaborative voxel editing (https://jel.app) and works great.
        
         | teodorlu wrote:
         | Have you used CRDTs to solve any practical problems?
         | 
         | If so, how does the CRDT solution compare to a non-CRDT
         | solution? If a non-CRDT solution is feasible at all?
        
           | mirekrusin wrote:
           | Article mentions at the beginning that the author used CRDT
           | in Google Wave/ShareJS.
        
             | mirekrusin wrote:
             | My mistake, it says OT was used, thank you for correcting
             | me.
        
             | paulgb wrote:
             | Wave used OT rather than CRDTs, as the author discusses in
             | another post: https://josephg.com/blog/crdts-are-the-
             | future/
        
             | saurik wrote:
             | AFAIK Wave and ShareJS both used OT (which the paper that
             | this article referred to was also attempting to benchmark).
             | 
             | FWIW, I am myself also curious about this (the question of
             | comparing CRDT to non-CRDT solutions): I found OT
             | beautiful, but never really felt CRDT had the same feeling
             | of elegance; and so I am downright fascinated to see the
             | person I have always seen as a "god of OT" deciding to
             | forsake it and move to CRDTs going forward. Are they really
             | that great? Is there something simple I am missing for the
             | explanation for why they are so much better? (Is it maybe
             | that they can support arbitrary merging without a sequencer
             | to linearize the edits? Honestly, my question probably
             | sucks a bit as I haven't spent any time thinking about OT
             | or CRDT in at least three years--and even then only once
             | since a few years before that, as I have had other things
             | to spend most of my mental energy on recently--and so I am
             | failing to remember the details of my use cases or the
             | tradeoffs I saw or the implementation issues I felt were
             | easy/hard.)
        
               | the-smug-one wrote:
               | Do you want a centralized server to control the data?
               | Then just use OT. Do you want users to control the data,
               | and have your server essentially just be a forever-
               | present user? Then use CRDT.
               | 
               | CRDTs certainly do have a mathematical elegance to them.
        
               | josephg wrote:
               | Yep, this is the best practical advice at the moment.
               | Well, for list CRDTs. State CRDTs (like a counter) are
               | small and fast, and kinda better than OT in every way.
               | 
               | List ("operation based") CRDTs and OT systems are
               | "equivalent" in a very academic sense that nobody really
               | talks about or understands. Its really not obvious unless
               | you've been staring at this stuff for years but the
               | equivalence is there:
               | 
               | You can make a CRDT out of any OT system by just shipping
               | the entire history of operations to each peer. List CRDTs
               | essentially do that, with a whole lot of tricks to
               | compress that data set and use it without needing to
               | linearly scan.
               | 
               | And you can convert the other way too. You can add a
               | "rename" operation into a list CRDT which assigns a new
               | name to each element currently in the document. Before
               | the rename operation document "hello" might have IDs [a4,
               | b2, b3, b1, a5]. The rename operation changes the IDs to
               | [c1, c2, c3, c4, c5]. When an operation happens you
               | specify the version and the ID at that version of the
               | predecessor (eg c2). The insert happens there. Then you
               | need a method to take the ID at one version and
               | "transform" it to the ID of the same item at a different
               | version. Do the rename operation implicitly after _every_
               | change, and viola! You now have OT semantics.  "Insert
               | after c1" means "Insert after position 1".
               | 
               | OT systems have one big advantage which is that you don't
               | have to ship the CRDT state to every peer. With a rename
               | operation, we can add back the operational simplicity of
               | OT systems into a CRDT. But the code is (and always will
               | be) much more complicated. So I think OT makes sense for
               | strictly server-client systems.
               | 
               | You can also have a hybrid server, which talks CRDT to
               | full peers on the network but just does OT when talking
               | to browser clients and things like that. We talked about
               | this at our public braid meeting at the start of the
               | week. The discussion about this stuff starts about 30
               | minutes in: https://braid.org/meeting-15
        
               | zozbot234 wrote:
               | > And you can convert the other way too. You can add a
               | "rename" operation into a list CRDT which assigns a new
               | name to each element currently in the document.
               | 
               | Operations in a CRDT must be commutative for merge/update
               | to be well-defined, so it's not immediately clear how a
               | "rename" operation can be expected to work properly.
        
           | Jailbird wrote:
           | Drifting off-topic but I've wondered this myself - I've been
           | interested in CRDTs in a "read the news" way but not a "this
           | rises to something I'm going to implement" way.
           | 
           | Perhaps it's blindingly obvious to all here, so no one
           | mentions it: Thinking about more practical and real-world
           | problems seems like collaboration on on more
           | complex/structured data. Today, it seems one would still have
           | to transform that data to a large flat string underneath, and
           | implement an editor that only performs edits that maintain
           | the integrity of the higher-level object, while the flat
           | string provides collaboration features. Perhaps it's possible
           | to implement an XML schema known to the editor so all
           | insertions/deletions keep the document syntactically correct.
           | 
           | I wonder if there's some other way to either generalize on
           | top of "big string" or create another base model (somehow?)
        
             | detaro wrote:
             | CRDT is a general concept, editing text is just one
             | possible application. If you have a stronger datatype,
             | great, you can build operations on top of it to implement a
             | CRDT system, depending on its properties.
        
               | Jailbird wrote:
               | What I've read talks about character insertion and
               | concepts that apply to editing text.
               | 
               | Perhaps I just need to find bedrock to build up from
               | about the properties of a stronger datatype that allow
               | CRDTs to work.
        
               | detaro wrote:
               | https://arxiv.org/pdf/1805.06358.pdf looks like a decent
               | starting point, with references to work on other types.
        
               | Jailbird wrote:
               | and pointers to other papers, too - thanks!
        
             | josephg wrote:
             | > Today, it seems one would still have to transform that
             | data to a large flat string underneath, and implement an
             | editor that only performs edits that maintain the integrity
             | of the higher-level object, while the flat string provides
             | collaboration features.
             | 
             | Lots of people think this and have mentioned it over the
             | years, but its a dangerous idea. The way concurrent edits
             | are handled makes it really easy for the automatic merging
             | algorithms to mess up the syntax of your XML / JSON
             | content. And thats really hard for non-programmers to fix.
             | 
             | The right answer for this stuff is to just make CRDTs which
             | support other data types, the same way we did for OT with
             | ot-types[1]. We need CRDT code for lists + text (working
             | now - text is just a list of characters). And rich-text and
             | JSON. And that'd cover 99% of the use cases. I'm tackling
             | strings first in diamond-types because its probably the
             | hardest case; but I want to have native support for other
             | data structures soon too.
             | 
             | Yjs and automerge already do this. They have support for
             | plain text, XML and arbitrary JSON structures.
             | 
             | The simplest implementation for JSON structures + tuples is
             | probably shelf[2], which is so simple you can implement it
             | in about 25 lines of javascript. Shelf doesn't support
             | lists, but combining shelf with the list code I already
             | have in diamond-types is probably going to be good enough
             | for most applications.
             | 
             | [1] https://github.com/ottypes/
             | 
             | [2] Shelf's description + code is here:
             | https://braid.org/algorithms/shelf . Or you can watch the
             | video of Greg (shelf's author) discussing the algorithm
             | here: https://braid.org/meeting-8 . Kevin Jahns (Yjs's
             | author) is in that recording too, and he's super jazzed
             | about how simple and beautiful shelf is.
        
               | lewisl9029 wrote:
               | First of all, thank you for the amazing read! I
               | thoroughly enjoyed the entire article, and it gave me a
               | new perspective on the feasibility of CRDTs for real
               | world applications performance-wise.
               | 
               | Though I am curious now to hear your thoughts on the
               | conflict resolution side of the equation for complex data
               | structures like deeply nested JSON.
               | 
               | The biggest takeaway I got from Martin's talk on the
               | topic from a few years ago was that while CRDTs are
               | theoretically guaranteed to eventually converge, the
               | resulting converged state might not make any sense to
               | applications that need to consume and act on that state
               | [1].
               | 
               | It seemed like a pretty fundamental challenge to using
               | CRDTs to store arbitrary application state to me at the
               | time, but I imagine the field has progressed immensely
               | since then, so would love to hear any insights you might
               | have around strategies that could be used to alleviate or
               | at least work around this challenge if I wanted to build
               | a CRDT-based app today.
               | 
               | [1] https://youtu.be/8_DfwEpHE88?t=2232
        
         | username91 wrote:
         | It's a great article - really informative and enjoyable to
         | read. Thanks for making it happen. :)
        
         | conaclos wrote:
         | Hi josephg, I'm a CRDT researcher. This is great to see so much
         | work around CRDT nowadays!
         | 
         | Some optimizations whom you discuss are already proposed by
         | some papers and implementations.
         | 
         | For instance, LogootSplit [1] proposes an implementation based
         | on an AVL tree with extra metadatas to get a range tree.
         | LogootSplit proposes also a block-wise approach that stores
         | strings instead of individual characters. Xray [2], an
         | experimental editor built by Github and written in Rust, uses a
         | copy-on-write B-tree. Teletype [3] uses a splay tree to speedup
         | local insertions/deletions based on the observation that a user
         | performs several edits on the same region.
         | 
         | [1]
         | https://members.loria.fr/CIgnat/files/pdf/AndreCollabCom13.p...
         | [2] https://github.com/atom-archive/xray [3]
         | https://github.com/atom/teletype
        
           | josephg wrote:
           | Cool! It'd be interesting to see those CRDT implementations
           | added to Kevin Jahns' CRDT Benchmarks page[1]. The
           | LogootSplit paper looks interesting. It looks like xray is
           | abandoned, and I'm not sure about teletype. Though teletype's
           | CRDT looks to be entirely implemented in javascript[2]? If
           | the authors are around I'd love to see some benchmarks so we
           | can compare approaches and learn what actually works well.
           | 
           | And I'm not surprised these techniques have been invented
           | before. Realising a tree is an appropriate data structure
           | here is a pretty obvious step if you have a mind for data
           | structures.
           | 
           | To name it, I often find myself feeling defensive when people
           | read my work and respond with a bunch of links to academic
           | papers. Its probably totally unfair and a complete projection
           | from my side, but I hear a voice in my head reword your
           | comment to instead say something awful like: "Cool, but
           | everything you did was done before. Even if they didn't make
           | any of their work practical, usable or good they still
           | published first and you obviously didn't do a good enough
           | literature review if you didn't know that." And I feel an
           | unfair defensiveness arise in me as a result that wants to
           | find excuses to dismiss the work, even if the work might be
           | otherwise interesting.
           | 
           | Its hard to compare their benchmark results because they used
           | synthetic randomized editing traces, which always have
           | different performance profiles than real edits for this
           | stuff. Their own university gathered some great real world
           | data in an earlier study. It would have been much more
           | instructive if that data set was used here. At a glance their
           | RAM usage looks to be about 2 orders of magnitude worse than
           | diamond-types or yjs. And their CPU usage... ?? I can't tell
           | because they have no tables of results. Just some hard to
           | read charts with log scales, so you can't even really eyeball
           | the figures. So its really hard to tell if their work ends up
           | performance-competitive without spending a couple days
           | getting their enterprise style java code running with a
           | better data set. Do you think thats worth doing?
           | 
           | [1] https://github.com/dmonad/crdt-benchmarks
           | 
           | [2] https://github.com/atom/teletype-crdt
        
         | mirekrusin wrote:
         | It seeems that the issue of reproducibility in computer science
         | where no gigantic/proprietary datasets are needed should not be
         | a problem by simply publishing repository with the code. Are
         | there any forces present that make it so rare in practice?
        
           | josephg wrote:
           | Credit where its due, the academics did publish the code they
           | wrote on github. But I don't know if anyone - reviewers or
           | readers - actually took the time to read it. Let alone
           | understand why it throws doubt on the paper's conclusions.
        
             | JW_00000 wrote:
             | Usually, (at least in my specific niche of the computer
             | science field,) if the code is published it's only
             | published after the paper has been reviewed. This is partly
             | to preserve anonymity during the review process, and also
             | because usually the code isn't seen as "part of the paper"
             | (i.e. "the paper should stand on its own"). Although I
             | agree that you could argue that for papers about
             | benchmarks, the code should definitely be considered an
             | essential part of the paper.
        
         | robmorris wrote:
         | That's an impressive optimisation! Out of curiosity, what do
         | you think are the most interesting or useful possible
         | applications for an optimised CRDT?
         | 
         | When you're approaching an optimisation like this, do you mind
         | me asking how you think about it and approach it?
        
         | politician wrote:
         | Great post! I had no idea that list CRDTs could actually be
         | fast because I read the papers showing how they were
         | impractical. Thanks for investigating and writing this up --
         | please accept my offer of a legitimate academic credential.
        
         | [deleted]
        
         | GlennS wrote:
         | When optimizing `findItem`, did you consider storing the
         | original index of each item on itself and using that as a
         | starting point?
         | 
         | Obviously this might move later (maybe it can only increase?),
         | but usually not by much, so I would guess it would make an
         | efficient starting point / be immediately correct 99% of the
         | time?
         | 
         | Looks like you already have 2 good solutions to this though
         | (start from index of recent edits and range tree).
        
         | benjismith wrote:
         | I've been following your work for years (and I'm actually neck-
         | deep in a ShareDB project right now) so I just want to say
         | thank you for all of your contributions! I especially enjoyed
         | this post.
        
         | pookeh wrote:
         | Wait it doesn't look like you used the performance branch of
         | automerge (which is now merged into master). It is
         | significantly faster.
         | 
         | https://github.com/automerge/automerge/pull/253
        
           | josephg wrote:
           | I did use the performance branch. And I had a chat with a few
           | people in the automerge community about the performance
           | numbers I was seeing long before I published to see if I was
           | doing anything wrong. I tested a few different versions of
           | automerge but in this test there wasn't much performance
           | difference between 0.12, 1.0.x-preview versions (which are
           | built from the merged performance branch) and I tried the
           | unreleased automerge-rs. When I ran my tests timing numbers
           | for automerge ranged from about 5m with the old non
           | performance branch down to about 4m20s or so with automerge-
           | rs. Still far from Yjs's 0.9 seconds.
           | 
           | I just checked and it looks like automerge 1.0.1-preview-4
           | has landed. I wrote the post benchmarking preview-2. I've
           | been knee deep in diamond types lately and haven't been
           | watching. Fingers crossed there's some more performance
           | improvements in the pipeline. I'd love to do a follow up in 6
           | months showing much improved performance.
        
         | lewisjoe wrote:
         | Thank you for writing this piece Joseph.
         | 
         | Just want to make sure if something's a possible typo or I'm
         | getting it all wrong :)
         | 
         | Quote: "But how do we figure out which character goes first? We
         | could just sort using their agent IDs or something. But argh,
         | if we do that the document could end up as _abcX_ , even though
         | Mike inserted X before the b. That would be really confusing."
         | 
         | Since the conflict is only between the children of _(seph, 0)_
         | the only possibilities are, either ending up with _" aXbc"_ or
         | _" abXc"_ right? Or is there a legitimate possibility of ending
         | up with "abcX" ?
         | 
         | I'm assuming we'll apply a common sorting logic only to
         | clashing siblings.
        
           | josephg wrote:
           | Good question. That part of the article could probably use
           | another diagram to explain it.
           | 
           | The resulting document is generated by doing a depth-first
           | prefix traversal of the tree. The ambiguity comes because "b"
           | and "X" are both direct children of "a". So its not clear how
           | they should be ordered relative to each other. Because "c" is
           | a child of "b" in this example, the "X" can't appear between
           | the "c" and "b". The only valid orderings are, as I said,
           | "aXbc" or "abcX". But without knowing how "b" and "X" should
           | be ordered, its ambiguous which one to use.
           | 
           | Let me know if thats still confusing! This stuff is hard to
           | explain without a whiteboard.
        
             | lewisjoe wrote:
             | Clear enough Joseph. Thanks :)
        
         | trishume wrote:
         | Have you seen my Xi CRDT writeup from 2017 before? https://xi-
         | editor.io/docs/crdt-details.html
         | 
         | It's a CRDT in Rust and it uses a lot of similar ideas. Raph
         | and I had a plan for how to make it fast and memory efficient
         | in very similar ways to your implementation. I think the piece
         | I got working during my internship hits most of the memory
         | efficiency goals like using a Rope and segment list
         | representation. However we put off some of the speed
         | optimizations you've done, like using a range tree instead of a
         | Vec of ranges. I think it also uses a different style of
         | algorithm without any parents.
         | 
         | We never finished the optimizing and polished it up, so it's
         | awesome that there's now an optimized text CRDT in Rust people
         | can use!
        
           | josephg wrote:
           | Oooohhhh no I haven't read that - thanks for the link! I feel
           | embarrassed to say this but I knew about Xi editor years ago
           | but I totally forgot to go read & learn about your crdt
           | implementation when I was learning about Yjs and automerge
           | and others. I'll have a read.
           | 
           | And thanks for writing such an in depth article. It's really
           | valuable going forward. Maybe it's addressed in your write up
           | but are there any plans for that code, or has everyone moved
           | on? I'd love to have a zoom chat about it and hear about your
           | experiences at some point if you'd be willing.
        
           | neolog wrote:
           | Out of curiosity, what do you use to make those diagrams?
        
             | trishume wrote:
             | https://www.figma.com/ and putting a lot of effort into
             | them
        
         | ta988 wrote:
         | This was a great read, thank you. I wish there were more
         | explanations of the "black magic" part of Yjs. I'll have to dig
         | into that.
        
           | josephg wrote:
           | If you're interested in learning more about Yjs's internals,
           | I interviewed Kevin for 3 hours on zoom and got him to take
           | me through Yjs's code. He's a great guy.
           | 
           | The video of that conversation is here:
           | https://youtu.be/0l5XgnQ6rB4
        
         | thechao wrote:
         | I love high level systems languages like C/++ and Rust... but
         | everything you said about JavaScript being slow is the same
         | thing assembly programmers experience when optimizing high
         | level systems languages.
         | 
         | In general, when I see C code and I'm asked to speed it up, I
         | always use "100x" as my target baseline.
        
           | josephg wrote:
           | Whoa thats a really impressive baseline to reach for when
           | optimizing C code! I'd love to hear some war stories.
           | 
           | As you can probably tell from my article, most of my skill at
           | this stuff is from hard won tricks I've picked up over the
           | years - like reducing heap allocations and packing memory for
           | cache coherency. There's probably lots of things I just
           | haven't learned because I haven't discovered it on my own.
           | 
           | Do you have a blog, or any recommendations for stuff to read
           | by you or others?
        
             | thechao wrote:
             | I learned low level programming while at Intel, working
             | with one of their performance teams; so, nothing written.
             | In general, though, assembly is more _expressive_ than
             | higher level languages; it lets you "tighten up" the code
             | the computer executes, reducing work  & bandwidth.
             | 
             | Specifically, a guy named Mike Abrash taught me some of
             | this stuff, and he's got some books explaining the theory
             | in terms of practice.
        
         | Majromax wrote:
         | When you write:
         | 
         | > Yjs does one more thing to improve performance. Humans
         | usually type in runs of characters. So when we type "hello" in
         | a document, instead of storing ['h','e','l','l,'o'], Yjs just
         | stores: ['hello']. [...] This is the same information, just
         | stored more compactly.
         | 
         | Isn't this not just the same information when faced with
         | multiple editors? In the first implementation, if I pause to
         | think after typing 'hel', another editor might be able to
         | interject with 'd' to finish the word in another way.
         | 
         | In my view, these data structures are only "the same
         | information" if you provide for a reasonably-sized, fixed
         | quantum of synchronization. The merging makes sense if e.g. you
         | batch changes every one or two seconds. It makes less sense if
         | you would otherwise stream changes to the coordinating agent as
         | they happen, even with latency.
        
           | goldenkey wrote:
           | The chunking should depend entirely on the network
           | synchronization. If synchronization is editing-debounced,
           | then chunking could be applied rather safely.
        
           | josephg wrote:
           | I didn't explain this well but the transformation is
           | lossless. No data is lost from compressing like this. It has
           | no impact on the concurrency protocol or network protocol; it
           | just impacts how the data is stored locally.
           | 
           | If we need to, we could split the run back out again into
           | individual characters without losing information. And that
           | does happen - we do that if something later gets inserted
           | into the middle of the run of characters. Pausing doesn't
           | help. Even the same user could later type something in the
           | middle of a word they'd typed previously.
           | 
           | This limits what we can run-length encode. The code in
           | diamond only run-length encodes items when they are:
           | 
           | - Typed sequentially in time
           | 
           | - And sequentially in insert location (position 10, 11, 12,
           | 13, ...)
           | 
           | - And either all of the characters have been deleted or none
           | of them have been deleted
           | 
           | Notice the other fields in that example in the post. Each
           | subsequent character has:
           | 
           | - An id field = the previous id + 1
           | 
           | - A parent field = the id of the previous item
           | 
           | - The same value for _isDeleted_
           | 
           | If any of this stuff didn't match, the run would be split up.
           | 
           | Instead of storing those fields individually, we can store
           | the id and parent of the first item, an isDeleted flag and a
           | length field. Thats all the information we actually need to
           | store. With yjs style entries (which is what diamond-types
           | actually implements), the code is here if you can make heads
           | or tails of it. The poorly named "truncate" method splits an
           | item in two. Spans are only extended if can_append() returns
           | true: https://github.com/josephg/diamond-
           | types/blob/c4d24499b70a23...
           | 
           | With real data from Martin's editing trace, 182 000 inserted
           | characters get compacted down to 12 500 list items. Which is
           | still pretty good - thats 14x fewer entries. And it means
           | performance doesn't stutter when large paste events happen.
        
             | jg23 wrote:
             | I have a related question to this, if you're storing
             | ["hello"] as one chunk, what happens when you perform an
             | edit to say adding an extra ["e"] after the ["e"]? In the
             | unoptimised structure I know you can just add the new ["e"]
             | as a child of the original ["e"]. So here would you then
             | delete the chunk ["hello"] and split it into two halves
             | like ["he"] and ["llo"]?
        
               | josephg wrote:
               | Yes exactly. You replace the chunk with "he" and "llo"
               | then insert the extra item in the middle. The result is
               | ["he", "e", "llo"]. The code to do all this inline in a
               | b-tree is pretty hairy but it works pretty well!
        
             | Majromax wrote:
             | Ah, I see. I had thought that the consolidation gave a
             | batched update with a single ID, so 'h' + 'ell' + 'o' would
             | have IDs of 1, 2, and 3 respectively. That would have made
             | an editing conflict in the middle of 'ell' impossible.
        
               | josephg wrote:
               | Ah that makes sense! Concurrent changes are one thing,
               | but concurrent changes are rare. The big problem if we
               | did it that way is that it would become impossible to
               | insert in the middle of "ell", because we wouldn't be
               | able to name any of those internal positions.
               | 
               | To get around that we assign a sequential ID for every
               | inserted character, regardless of how they're typed or
               | stored. Typing "hello" would assign IDs 1-5 even if you
               | paste "hello" from the clipboard.
        
       | kohlerm wrote:
       | Very nice, when I read "double linked list" I immediately thought
       | "what about a btree like structure?" I guess Martins idea to
       | replace the IDs comes from the "vector clock" idea for concurrent
       | updates
        
         | audidude wrote:
         | If anyone is looking for the combination of a piecetable and
         | b+tree (which appears to be what is talked about in this
         | article), I have one that I've been using for years across
         | various GTK components.
         | 
         | https://gitlab.gnome.org/chergert/textrange/
        
       | gritzko wrote:
       | About a decade ago, I implemented the Causal Tree CRDT (aka RGA,
       | Timestamped Insertion Tree) _in regular expressions_ using a
       | Unicode string as a storage. Later we made a collaborative editor
       | for Yandex based on that code. It used many of the tricks as
       | described in the text, even the optimization where you remember
       | the last insertion point. So I am terribly glad it all gets
       | rediscovered.
       | 
       | The code is on GitHub [1] There is also an article which might be
       | a tough reading, so no link. Recently I greatly improved CTs by
       | introducing the Chronofold data structure [2]. Regarding that
       | benchmark article, I spent many years in academia, so the quality
       | of content problem is familiar to me. In other words, I don't
       | take such articles seriously. CRDTs are fast enough, that is not
       | a concern.
       | 
       | [1]: https://github.com/gritzko/citrea-model
       | 
       | [2]: https://arxiv.org/abs/2002.09511
        
         | omgtehlion wrote:
         | That is nice!
         | 
         | A couple of questions: Do you have released a CT implementation
         | in top of Chronofold? Have you any plans to benchmark it
         | against other algs?
        
           | gritzko wrote:
           | Thanks. No, I didn't release it, although there are
           | implementations on GitHub. Rust and Kotlin, at least. The RON
           | proof-of-concept wiki runs on the (unreleased) C++
           | implementation [1]. I benchmarked it at some point,
           | everything was like "k nanoseconds" [2].
           | 
           | [1]: http://doc.replicated.cc/%5EWiki/ron.sm
           | 
           | [2]: https://github.com/dmonad/crdt-
           | benchmarks/issues/3#issue-599...
        
             | omgtehlion wrote:
             | Thanks!
             | 
             | I googled only 1 impl in Rust:
             | https://github.com/dkellner/chronofold which seems to
             | produce invalid results on some inputs. Actually the hard
             | part (integration) is made of hacks there...
             | 
             | That PoC Wiki sounds really interesting and the whole
             | replicated.cc project! Any plans on releasing it?
        
               | gritzko wrote:
               | Here is Kotlin https://github.com/decentralized-
               | hse/collab-edit
               | 
               | libron will be released, yes.
        
         | _hl_ wrote:
         | I remember seeing that (regex CTs) and immediately thinking
         | "wtf, why would anyone want to do that". Took me quite a while
         | to understand that it's actually a pretty clever way to write
         | fast state machines in browserland. So thank you for this work!
        
       | ta988 wrote:
       | CRDT: https://en.m.wikipedia.org/wiki/Conflict-
       | free_replicated_dat...
        
       | teleforce wrote:
       | This previous HN discussions on OT vs CRDT paper is an excellent
       | overview of the topics [1],[2].
       | 
       | From the paper conclusions "Our discoveries from this work
       | revealed facts and evidences that refute CRDT superiority claims
       | over OT on all accounts, which helps to explain the underlying
       | reasons behind the choices between OT and CRDT in the real
       | world."
       | 
       | The fact that the paper provided the refutation to one of the HN
       | discussion points being made in [1] (i.e. reference to itself),
       | regarding the claimed of CRDT superiority in its footnotes is
       | rather amusing and the first such attempt I have personally seen
       | in a published paper.
       | 
       | [1]https://news.ycombinator.com/item?id=18191867
       | 
       | [2]https://arxiv.org/abs/1810.02137
        
       | tekkk wrote:
       | Excellent article! As someone who has to work with collaborative
       | editing I must say the complexity of the whole area is at times
       | daunting to say the least. So many edge-cases. So many mines to
       | step on.
       | 
       | Now I think I am convinced that the OT vs CRDT performance
       | comparison is kind of moot point and the question is more about
       | the user experience. Which version produces nicer results when
       | two very diverged documents are merged. Maybe one of these days
       | I'll read an article about that too.
       | 
       | To get off on a tangent a little bit, I'd be interested to know
       | how one could add in tracking of changes to Diamond or other
       | CRDT? Can you add an arbitrary payload to the operation and then
       | just materialize the areas different from the original snapshot?
       | I know Yjs can do this by creating snapshots and then comparing
       | them to another snapshot but it seemed a bit awkward and not
       | suited for real-time editing.
        
       | johnarno wrote:
       | If it weren't for academic papers, we probably wouldn't have the
       | beautiful and nice concept of a CRDT in the first place. We might
       | have 100 different solutions promising us 100 different
       | replication schemes, some with and some without guarantees,
       | hiding behind a gazillion different undefined terms that want to
       | make us believe each solution is better than sliced bread. Also,
       | I don't thing Google Wave didn't make it due to its OT algorithm.
        
       | uyt wrote:
       | I think this data structure is usually called a Counted B-tree
       | https://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtr...
       | instead of range tree
        
       ___________________________________________________________________
       (page generated 2021-07-31 23:00 UTC)