[HN Gopher] Faster CRDTs: An Adventure in Optimization
___________________________________________________________________
Faster CRDTs: An Adventure in Optimization
Author : xnx
Score : 469 points
Date : 2021-07-31 11:37 UTC (11 hours ago)
(HTM) web link (josephg.com)
(TXT) w3m dump (josephg.com)
| hinkley wrote:
| I'm getting mixed messages on CRDTs. Are we at the point now
| where they are general enough that the human observer is not
| constantly confronted with 'surprises' from the behavior of the
| system?
|
| Some of the talks by Kleppmann go straight into the weeds and
| make it hard to tell if he's just nerding out about finer points
| or lamenting unsolved problems, or even paradoxes.
| JW_00000 wrote:
| By the way, as someone who has published academic papers, if
| you're ever bothered about a paper or have some comments, don't
| hesitate to mail the authors. (Their e-mail addresses are always
| on the paper; especially target the first author because they
| have normally done the work.) We are happy to hear when someone
| has read our work and I at least would've liked to have known if
| someone found a problem with my papers.
| zladuric wrote:
| > especially target the first author because they have normally
| done the work
|
| As someone living with a recently promoted? (is that the
| correct term?) PhD in social sciences, this surprises me. Is
| that something specific for my country, for social sciences or
| my wife simply landed in a case full of rotten apples?
| galgalesh wrote:
| In my uni in Belgium, this is the case too. First author does
| most of the work.
| shrimpx wrote:
| In computer science the first author does the work and is
| usually a PhD student. The last author is usually the
| professor that pushed and helped develop the idea, provided
| funding, and probably wrote or was heavily involved in
| writing the paper's abstract, intro and conclusion sections
| -- the bulk of "framing" the work.
|
| But there are exceptions. Some profs are less student-
| oriented or don't like delegating so much, and remain
| "individual contributors" deep in their careers. Those tend
| to publish nonzero number of first- and single-author papers.
|
| Edit: I've noticed that in Theory and Algorithms, profs tend
| to take first author even though the student slaved out the
| proofs. That field is kind of an outlier in that it's close
| pure math, and I think borrows cultural artifacts from math
| research.
| [deleted]
| exmadscientist wrote:
| At least in the fields I've published in, first author does
| the work (or at least most of it, almost always including
| writing the paper) and last author secured the funding.
| Sometimes authors are listed in alphabetical order, in which
| case one or two are usually listed as "corresponding author".
| The less-senior corresponding author usually did the work,
| and the more-senior corresponding author is usually either
| the one who secured the funding, or a long-term member of the
| research group who is likely to actually be _around_ to
| answer correspondence in the future (and who also probably
| helped out enough to be worth corresponding with).
| anonymousDan wrote:
| I think it's pretty field specific. In a lot of CS for
| example the first author did most of the work and the last
| author got the funding. The people in the middle may have
| done varying degrees of work ranging from helping
| substantially with the implementation, evaluation or writing
| of the paper to once having read the paper abstract :)
| robbles wrote:
| I think the term you're looking for might be "postdoc"
| (postdoctoral researcher)? I've seen "recent PhD" a few times
| as well, seems pretty clear.
| lewisjoe wrote:
| I've been looking for a practical OT alternative for our online
| word processor (https://zoho.com/writer). We already use OT for
| syncing our realtime edits and exploring CRDTs targetting
| stronger consistency for tackling offline edits (which are
| typically huge & defragmented, since the edits are not syncing in
| realtime)
|
| So the baseline is that OT has a better model for holding state
| in terms of performance/memory, since the edits can be compiled
| into plain string types. CRDTs in comparison forces us to hold
| deleted states as well and demands richer information per unit
| (character/string/etc) - which makes it harder on the CPU/RAM.
|
| Here's the story as I understand:
|
| 1. Automerge tackles this by just moving to a better lower-level
| runtime: Rust.
|
| 2. Yjs handles this by using a similar technique i.e relying on
| V8's hidden classes to handle the performance optimizations and
| assuming real-world cases to narrow down and optimize
| datastructures.
|
| But none of these, seem to be a fundamental breakthrough in the
| efficiency of the algorithm itself. They all at best look like a
| workaround and this keeps bothering me.
| kevinjahns wrote:
| I know that it is hard to comprehend why modern CRDT
| implementations are fast. But the data confirms that they work
| great. OT seems to be much simpler, but there are real
| advantages in using CRDTs. The performance problems have been
| solved through an efficient representation of the CRDT model.
|
| The gist of the below [1] read is that it is impossible for a
| human to create a document that Yjs can't handle (even in the
| worst case scenario). But yes, it handles real-world scenarios
| particularily well.
|
| The concept of "hidden classes" is super old. It has first been
| implemented in a fork of smalltalk and then became foundational
| concept of runtime engines for scripting languages. It is
| implemented in V8, python, ruby, spidermonkey, ..
|
| Yjs does not assume a "real-world scenario" and it is not
| optimized for any specific runtime engine. It runs fast in any
| browser. The benchmarks confirm this. [2]
|
| Yjs is being used in practice by several companies (eg Nimbus
| Notes with >3 million users) for quite some time now. I'm not
| aware of any performance problems.
|
| [1]: https://blog.kevinjahns.de/are-crdts-suitable-for-shared-
| edi... [2]: https://github.com/dmonad/crdt-benchmarks
| pfraze wrote:
| You can remove tombstones in a cleanup pass if you constrain
| behavior a bit.
|
| For instance, if there's just a single server, after 5 minutes
| of no active connections you could clean out tombstones. After
| that, if a client connects with some changes they had been
| holding onto but got DCed, you can reject the write and let the
| user's client merge by some other means (perhaps even manual).
| josephg wrote:
| If you've got big offline edits (or you're merging multiple
| large sets of edits), even existing CRDTs will generally handle
| that more efficiently than OT will. OT algorithms are usually
| O(n * m) time complexity when merging _n_ edits from one peer
| with _m_ edits from another peer. A CRDT like diamond-types is
| O((n + m) * log(s)) where s is the current size of the
| document. In practice its super fast.
|
| As for holding deleted states and richer information per unit,
| its not so bad in absolute terms. 1-2mb of data in memory for a
| 17 page document is honestly fine. But there's also a few
| different techniques that exist to solve this in CRDTs:
|
| 1. Yjs supports "garbage collection" APIs. Essentially you say
| "anything deleted earlier than this point is irrelevant now"
| and the data structures will flatten all runs of items which
| were deleted earlier. So storage stays proportional to the size
| of the not-deleted content.
|
| 2. Sync9 has an algorithm called "antimatter" which mike still
| hasn't written up _poke poke mike!_. Antimatter actively tracks
| the set of all peers which are on the network. When a version
| is known to have been witnessed by all peers, all extra
| information is safely discarded. You can also set it up to
| assume any peer which has been offline for however long is gone
| forever.
|
| 3. Personally I want a library to have an API method for taking
| all the old data and just saving it to disk somewhere. The idea
| would be to reintroduce the same devops simplicity of OT where
| you can just archive old history when you know it probably
| won't ever be referenced again. Keep the last week or two hot,
| and delete or archive history at will. If you combined this
| with a "rename" operation, you could reduce the "hot" dataset
| to basically nothing. This would also make the implementation
| much simpler - because we wouldn't need all these performance
| tricks to make a CRDT like diamond-types fast if the dataset
| stayed tiny anyway.
| vanderZwan wrote:
| On a meta-level, does anyone else think that the whole idea of
| writing a peer reviewed paper that is just a benchmark of
| different algorithms should be really rigorously reviewed before
| being accepted? Writing good benchmarks is hard, and so highly
| contextual that writing fair comparisons beteen algorithms (or
| data structures) is almost impossible unless you're an expert in
| all of the algorithms involved.
| thysultan wrote:
| Only the holy experts can divulge the realm of gods that is
| benchmarking and reveal to us the results. How--to do we
| beseech them o wise one?
| Diggsey wrote:
| Yeah, I've also seen several academic papers on performance or
| "optimization" of existing algorithms which just demonstrate a
| complete lack of knowledge about how those algorithms are
| implemented in practice.
|
| For example, there was a paper explaining how you could
| optimize the GJK algorithm by reducing the number of distance
| checks required, and in turn the number of square-roots...
| Despite the fact that everyone (including the authors of the
| original GJK algorithm) knows that you don't _actually_ need to
| do a square-root to compare distances...
| meepmorp wrote:
| > Despite the fact that everyone (including the authors of
| the original GJK algorithm) knows that you don't actually
| need to do a square-root to compare distances..
|
| Academia's purpose is to produce research, typically measured
| in publications per unit time. Optimizing one paper leads to
| a global reduction in the size of the literature by pruning
| opportunities for subsequent research, harming the overall
| performance of the system.
| lostdog wrote:
| Benchmarking papers are inaccurate when the original algorithms
| are not open sourced, and the grad student needs to rewrite the
| algorithm from scratch. They can easily create different
| implementation details, and wind up with an algorithm that's
| slower than the original.
|
| I do think that the original algorithm authors should have the
| opportunity to correct the benchmarking code, or to release
| their original implementation as open source to be benchmarked.
|
| In some sense, the benchmarking paper with a slower
| implementation is more "correct," since an engineer evaluating
| which algorithm to use is just as likely to implement the
| algorithm in a slower way than the original paper. The
| incentives are right too: the original paper author should be
| giving enough details to recreate their work, and the
| benchmarker is showing that really their published algorithm is
| slow.
| knuthsat wrote:
| Problem is that academics are rarely experts at programming or
| have knowledge of computer architectures as much as someone in
| the industry. There are various tricks that are never taught at
| college, therefore academics have no idea some stuff even
| exists.
|
| Best example is discrete optimization research (traveling
| salesman, vehicle routing and its variants, schedule rostering
| etc.). Stuff you find in the papers there achieves state-of-
| the-art results very slowly (using integer linear programming
| or some rarely optimized heuristics) making you believe these
| instances of a general NP-hard problem can't be solved quickly.
|
| When you start tinkering, you either find that data structures
| can be added that reduce the complexity significantly or that
| there are regularities in instances that, when exploited,
| support massive speedups.
|
| I would say that TSP research is an exception but most of stuff
| coming out that has a lot citations is way too slow and is
| never as brilliantly implemented as Lin Kernighan heuristic or
| other stuff from the age of insanely slow computers.
| makmanalp wrote:
| > Problem is that academics are rarely experts at programming
| or have knowledge of computer architectures as much as
| someone in the industry. There are various tricks that are
| never taught at college, therefore academics have no idea
| some stuff even exists.
|
| I want to push back on this generalization a bit. The
| academics that are focused on pushing the mathematical
| boundaries of discrete optimization are focused, no surprise,
| on only that. If theoretical improvements is not what you
| want, don't read those papers, read ones what people more
| focused on applications. Read literally any databases paper,
| or stuff like hyperscan, simdjson. I'd argue that a non-
| trivial amount of these are vastly /ahead/ of what's standard
| in industry, but industry is slow to adapt new ideas and
| catch up because of legacy software and existing codebases.
| Very similar stuff in the programming languages sphere, it
| took ages for industry to adapt ideas that were popular in
| academia for a long time (eg: Rust, LINQ). The idea that
| academia (at least in CS) is an ivory tower, far from the
| concerns of industry, is not very true as of recent. There's
| a lot of cross pollination and a revolving door of people
| going back and forth from one to the other spreading ideas.
| knuthsat wrote:
| I must say that when it comes to discrete optimization, the
| genetic/ant/simulated annealing/etc. stuff is more popular
| in academia than in industry (at least the industry that
| doesn't heavily include academics).
|
| Works like Lin-Kernighan heuristic are extremely rare and a
| bunch of knowledge exists in industry only. Even the
| mentioned heuristic was for decades being implemented
| incorrectly until one individual came and demonstrated its
| superiority (K. Helsgaun).
|
| I mean, most of the code that gets released today, doesn't
| even handle time windows constraint in the best way
| possible (optimal updates, constant time feasibility
| evaluations etc.). I believe all open source vehicle
| routing libraries do not have any data-structure relevant
| optimizations for handling time windows, task precedence,
| dynamic workshift starts etc. All are fairly simple
| implementation tricks that just got lost under giant
| amounts of population based heuristics or mathematical
| linear programming approaches.
| hodgesrm wrote:
| 100x +1. The whole field of query optimization has been
| devoted to improvement of efficiency and speed of database
| systems for decades. [1] Academic results are studied quite
| closely in industry. Also, "academic" includes people like
| Mike Stonebraker and Andy Pavlo. They aren't exactly
| slouches regarding issues of performance.
|
| More generally, major waves of performance innovation in
| the IT field have been driven by advances in storage,
| compression, vectorization, virtualization, etc. Academic
| results have led to entirely new products like Vertica
| (from C-Store [2]) or VMware (from the Disco system [3]).
|
| [1] https://en.wikipedia.org/wiki/Query_optimization
|
| [2] https://w6113.github.io/files/papers/cstore-vldb05.pdf
|
| [3] http://www.cs.cmu.edu/~15712/papers/bugnion97.pdf
|
| edit: clarity
| kaba0 wrote:
| You do realize there is a whole area of the research field
| dedicated for heuristic algorithms? They have a proper
| academic basis just as much as the "correct" solutions.
| pyrale wrote:
| From what I saw (I worked in an academia-consulting
| partnership to solve scheduling problems in transports), the
| OR researchers very frequently work in association with OR
| consulting shops, and they're well-aware of the difference
| between academic datasets and industrial datasets. In the
| conferences I saw, it was not infrequent to see different
| tracks for industrial and academic papers, both attended by
| the same crowd.
|
| The point I agree with, though, is that this is not reflected
| in the papers. Academic papers focus on academic instances
| because they are more general (and usually harder, as you
| said) and because optimizations of specific instances of the
| problem are not that useful from an academic pov.
|
| It's hard to know who works with who and who has experience
| with what if you're not an insider, though.
| raverbashing wrote:
| > Stuff you find in the papers there achieves state-of-the-
| art results very slowly (using integer linear programming or
| some rarely optimized heuristics) making you believe these
| instances of a general NP-hard problem can't be solved
| quickly.
|
| Yes, or looking for things like mathematical purity, linear
| problem statement, etc
|
| In practice: you don't need the best solution and you can get
| a great solution in a multitude of ways.
| comicjk wrote:
| On the one hand, peer review takes long enough already. On the
| other... I saw an influential paper that published obviously-
| wrong timing data, essentially saying that time(A) + time(B) <
| time(B). It seems they were including the Python interpreter
| startup time (~0.1s) in a profile of a ~0.01s algorithm.
| z3t4 wrote:
| What I like about "tests" in software development is that anyone
| can run them, just download the source code, then run ./test or
| right click and "run tests". It would be cool if computer science
| could offer the same experience, just download the source code
| and run it, compare if you got the same result, inspect and learn
| from the source code, etc. Instead of "here's some pseudo-code
| we've never tried", and here's a mathematical formula that you
| need to be a mathematics professor to understand... Yes we know
| you are not a professional software developer, the code is going
| to be at a beginners level, but that is fine, I am not reading
| your paper to criticize your code for being "impure", or not
| using the latest syntax and frameworks, I'm reading it to
| understand how to implement something, to solve a problem.
| paulgb wrote:
| This is great! I'd like to quote a line here, because I think the
| answer is "someone on HN knows" and I'd like to hear the answer
| as well.
|
| > V8 is actually suspiciously fast at this part, so maybe v8
| isn't using an array internally to implement Arrays? Who knows!
| bouk wrote:
| The V8 blog is a good starting point for learning how it works
| under the hood: https://v8.dev/blog/fast-properties
| vlovich123 wrote:
| Hmm... I would have thought they implemented something
| similar to a std::deque so that you have ammortized O(1)
| insertions into the middle of a vector.
| mfbx9da4 wrote:
| Reminds me of the data structures in this markdown library
| https://github.com/markdown-it/markdown-it/blob/master/docs/...
|
| The author hand waves away the possibility that there could be
| memory locality performance benefits by using arrays in JS but my
| hunch is that there is something in that. I know the react code
| base for example went for a monomorphic Object structure to
| represent components to leverage inline caching of hidden
| classes.
| dpiepgrass wrote:
| > my range tree is just a slightly modified b-tree. But usually
| when people talk about b-trees they mean a BTreeMap. Thats not
| what I'm doing here. Instead of storing keys, each internal node
| of the b-tree stores the total number of characters (recursively)
| in that item's children. So we can look up any item in the
| document by character position, or insert or delete anywhere in
| the document in log(n) time.
|
| Cool! This is essentially the same idea I implemented in 2012; I
| call it the AList or A-List data structure:
| http://core.loyc.net/collections/alists-part1
|
| Ever since I made it, I've been looking for a good application
| for it. I guess this is it! I mean, I knew A-List was a good data
| structure for text editing, but most applications can use a Gap
| Buffer which is much simpler. But when it comes to concurrent
| editing, you've got multiple editing points so a Gap Buffer is
| suddenly much less attractive.
|
| > Honestly I'm shocked and a little suspicious of how little ram
| Yjs uses in this test.
|
| It's good, but still it uses ~30x as much RAM as plain string
| edits. Not surprisingly, you got 3x better memory usage by using
| A-List and a more efficient language (Rust in this case, but C#
| and C/C++ can also do well.)
|
| There is a great article about a CRDT concept called "Causal
| Trees[1]. I wonder how it compares to flat-list-based CRDTs (it's
| been too long since I researched this).
|
| By the way, Microsoft has a new set of libraries for concurrent
| editing called Fluid Framework[2]. I'm told it's a "generalized
| data structure" that was inspired by Causal Trees but with a
| unique and intention-preserving editing scheme. I found out about
| it after they decided to use my fast-cloning-copy-on-write B+Tree
| for TypeScript[3]... they sent me a pull request for diffing
| versions of B+ trees, but I haven't yet looked into the
| architecture of their concurrent data type.
|
| [1] http://archagon.net/blog/2018/03/24/data-laced-with-history/
|
| [2] https://fluidframework.com/
|
| [3] https://www.npmjs.com/package/sorted-btree
| stephc_int13 wrote:
| Trees are a powerful and practical data structure, but even if it
| does not appear clearly when doing O(n) style complexity
| analysis, they are usually slow.
|
| Unfortunately, the difference between slow and fast can be
| several orders of magnitude, while the perception of the
| programmer doing a back of the envelope analysis seems to be a
| logarithmic scaling of the reality...
| josephg wrote:
| Trees seemed to work pretty well for me here!
|
| The problem with trees is usually that people don't pack enough
| data into each node in the tree. I implemented a skip list in C
| a few years ago[1]. For a lark I benchmarked its performance
| against the C++ SGI rope class which was shipped in the C++
| header directory somewhere. My skip list was 20x faster - which
| was suspicious. I looked into what the SGI rope does and it
| turns out it was only putting one character into each leaf node
| in the tree it constructed. Benchmarking showed the optimal
| number was ~120 or so characters per leaf. Memcpy is much much
| faster in practice than main memory lookups.
|
| In diamond-types (benchmarked here), the internal nodes in my
| B-tree store 16 pointers and leaf nodes store 32 entries. With
| run-length encoding, all 180,000 inserted characters in this
| data set end up in a tree with just 88 internal nodes and a
| depth of 3. It goes fast like this. But if you think an array
| based solution would work better, I'd love to see it! It would
| certainly need a lot less code.
|
| [1] https://github.com/josephg/librope
| feikname wrote:
| Correct me I'm mistaken
|
| The difference between diamond native and diamond WASM
| demonstrates how, even with WASM, native implementations beat
| browsers _hard_ , and native implementations performance-wise are
| still very worth, specially for lower powered devices, and,
| perhaps, reducing battery usage (as consequence of less CPU use)
| in mobile devices.
| Jweb_Guru wrote:
| The wasm implementation here was still running under a
| JavaScript test harness, so I suspect it's the JS-WASM boundary
| interactions that are causing the slowdown. WASM itself (if it
| doesn't need to interact with JavaScript) usually runs with a
| much smaller performance penalty.
| josephg wrote:
| I suspect so too - given there are 280 000 calls across the
| JS-wasm boundary, and most of those calls pass a string. I'd
| love to know for sure though. I considered making this
| benchmark pass the whole data set in one go through JSON or
| something, but that felt like cheating - since thats not how
| the API would be used in practice during a real editing
| session.
|
| But even paying that cost, it still seems much faster to use
| rust + WASM than run the same algorithm in native javascript.
| And the JS-wasm boundary will probably get gradually faster
| over the next few years.
| __s wrote:
| Yes. Ultimately WASM is executing within a sandbox & involves
| being JIT compiled (read: not heavily optimized except for hot
| loops eventually). If native compilation is an option it makes
| sense to go that route
|
| WASM competes with asm.js not asm (or, arguably, jvm etc)
| Rusky wrote:
| WASM JIT implementations tend to be quite a bit different
| from JavaScript JIT, so that's not really where the perf
| difference comes from.
|
| First, WASM gets all the heavy AOT optimizations from the
| middle end of the compiler producing it. At runtime, WASM JIT
| doesn't start from program source, but from something that's
| already been through inlining, constant propagation, common
| subexpression elimination, loop optimizations, dead code
| elimination, etc. And WASM is already typed, so the JIT
| doesn't have to bother with inline caching, collecting type
| feedback, or supporting deoptimization.
|
| Because of that, the only really beneficial work left to do
| is from the back end (i.e. arch-specific) part of the
| compiler- basically, register allocation and instruction
| selection. WASM JIT compilers don't bother trying to find hot
| loops or functions before optimizing. Instead, they do a fast
| "streaming" or "baseline" codegen pass for fast startup, and
| then eagerly run a smarter tier over the whole module and
| hot-swap it in as soon as possible. (See e.g.
| https://hacks.mozilla.org/2018/01/making-webassembly-even-
| fa...)
|
| The perf difference vs native rather comes from the
| sandboxing itself- memory access is bounds checked, support
| for threads and SIMD is limited (for now), talking to the
| browser has some overhead from crossing the boundary into
| JavaScript (though this overhead will go down over time as
| WASM evolves), etc.
| galesky wrote:
| This is awesome! I'm researching delta state based CRDTs as a
| master dissertation, this kinds of optimizations on op-Based are
| really interesting
| rawoke083600 wrote:
| Damn! What a fantastic read ! And of course brilliant
| coding/thinking :) Well done sir !
| lewisjoe wrote:
| People who are interested in the topic: I just found out they
| have open meetings about the parent project and seems like
| anybody could join - https://braid.org/
|
| Great way to share progress. Kudos! :)
| toomim wrote:
| Yep! Our next meeting is in two Mondays from now on August 2nd,
| at 4:00pm Pacific Time. All are welcome:
| https://braid.org/meeting-16
| iampims wrote:
| This is a fantastic write-up. Worth a read even if you don't care
| about CRDTs.
| josephg wrote:
| Hello HN! Post author here. I'm happy to answer questions & fix
| typos once morning rolls around here in Australia
| gfodor wrote:
| Thanks for ShareDB. It's dope. I extended it to support
| collaborative voxel editing (https://jel.app) and works great.
| teodorlu wrote:
| Have you used CRDTs to solve any practical problems?
|
| If so, how does the CRDT solution compare to a non-CRDT
| solution? If a non-CRDT solution is feasible at all?
| mirekrusin wrote:
| Article mentions at the beginning that the author used CRDT
| in Google Wave/ShareJS.
| mirekrusin wrote:
| My mistake, it says OT was used, thank you for correcting
| me.
| paulgb wrote:
| Wave used OT rather than CRDTs, as the author discusses in
| another post: https://josephg.com/blog/crdts-are-the-
| future/
| saurik wrote:
| AFAIK Wave and ShareJS both used OT (which the paper that
| this article referred to was also attempting to benchmark).
|
| FWIW, I am myself also curious about this (the question of
| comparing CRDT to non-CRDT solutions): I found OT
| beautiful, but never really felt CRDT had the same feeling
| of elegance; and so I am downright fascinated to see the
| person I have always seen as a "god of OT" deciding to
| forsake it and move to CRDTs going forward. Are they really
| that great? Is there something simple I am missing for the
| explanation for why they are so much better? (Is it maybe
| that they can support arbitrary merging without a sequencer
| to linearize the edits? Honestly, my question probably
| sucks a bit as I haven't spent any time thinking about OT
| or CRDT in at least three years--and even then only once
| since a few years before that, as I have had other things
| to spend most of my mental energy on recently--and so I am
| failing to remember the details of my use cases or the
| tradeoffs I saw or the implementation issues I felt were
| easy/hard.)
| the-smug-one wrote:
| Do you want a centralized server to control the data?
| Then just use OT. Do you want users to control the data,
| and have your server essentially just be a forever-
| present user? Then use CRDT.
|
| CRDTs certainly do have a mathematical elegance to them.
| josephg wrote:
| Yep, this is the best practical advice at the moment.
| Well, for list CRDTs. State CRDTs (like a counter) are
| small and fast, and kinda better than OT in every way.
|
| List ("operation based") CRDTs and OT systems are
| "equivalent" in a very academic sense that nobody really
| talks about or understands. Its really not obvious unless
| you've been staring at this stuff for years but the
| equivalence is there:
|
| You can make a CRDT out of any OT system by just shipping
| the entire history of operations to each peer. List CRDTs
| essentially do that, with a whole lot of tricks to
| compress that data set and use it without needing to
| linearly scan.
|
| And you can convert the other way too. You can add a
| "rename" operation into a list CRDT which assigns a new
| name to each element currently in the document. Before
| the rename operation document "hello" might have IDs [a4,
| b2, b3, b1, a5]. The rename operation changes the IDs to
| [c1, c2, c3, c4, c5]. When an operation happens you
| specify the version and the ID at that version of the
| predecessor (eg c2). The insert happens there. Then you
| need a method to take the ID at one version and
| "transform" it to the ID of the same item at a different
| version. Do the rename operation implicitly after _every_
| change, and viola! You now have OT semantics. "Insert
| after c1" means "Insert after position 1".
|
| OT systems have one big advantage which is that you don't
| have to ship the CRDT state to every peer. With a rename
| operation, we can add back the operational simplicity of
| OT systems into a CRDT. But the code is (and always will
| be) much more complicated. So I think OT makes sense for
| strictly server-client systems.
|
| You can also have a hybrid server, which talks CRDT to
| full peers on the network but just does OT when talking
| to browser clients and things like that. We talked about
| this at our public braid meeting at the start of the
| week. The discussion about this stuff starts about 30
| minutes in: https://braid.org/meeting-15
| zozbot234 wrote:
| > And you can convert the other way too. You can add a
| "rename" operation into a list CRDT which assigns a new
| name to each element currently in the document.
|
| Operations in a CRDT must be commutative for merge/update
| to be well-defined, so it's not immediately clear how a
| "rename" operation can be expected to work properly.
| Jailbird wrote:
| Drifting off-topic but I've wondered this myself - I've been
| interested in CRDTs in a "read the news" way but not a "this
| rises to something I'm going to implement" way.
|
| Perhaps it's blindingly obvious to all here, so no one
| mentions it: Thinking about more practical and real-world
| problems seems like collaboration on on more
| complex/structured data. Today, it seems one would still have
| to transform that data to a large flat string underneath, and
| implement an editor that only performs edits that maintain
| the integrity of the higher-level object, while the flat
| string provides collaboration features. Perhaps it's possible
| to implement an XML schema known to the editor so all
| insertions/deletions keep the document syntactically correct.
|
| I wonder if there's some other way to either generalize on
| top of "big string" or create another base model (somehow?)
| detaro wrote:
| CRDT is a general concept, editing text is just one
| possible application. If you have a stronger datatype,
| great, you can build operations on top of it to implement a
| CRDT system, depending on its properties.
| Jailbird wrote:
| What I've read talks about character insertion and
| concepts that apply to editing text.
|
| Perhaps I just need to find bedrock to build up from
| about the properties of a stronger datatype that allow
| CRDTs to work.
| detaro wrote:
| https://arxiv.org/pdf/1805.06358.pdf looks like a decent
| starting point, with references to work on other types.
| Jailbird wrote:
| and pointers to other papers, too - thanks!
| josephg wrote:
| > Today, it seems one would still have to transform that
| data to a large flat string underneath, and implement an
| editor that only performs edits that maintain the integrity
| of the higher-level object, while the flat string provides
| collaboration features.
|
| Lots of people think this and have mentioned it over the
| years, but its a dangerous idea. The way concurrent edits
| are handled makes it really easy for the automatic merging
| algorithms to mess up the syntax of your XML / JSON
| content. And thats really hard for non-programmers to fix.
|
| The right answer for this stuff is to just make CRDTs which
| support other data types, the same way we did for OT with
| ot-types[1]. We need CRDT code for lists + text (working
| now - text is just a list of characters). And rich-text and
| JSON. And that'd cover 99% of the use cases. I'm tackling
| strings first in diamond-types because its probably the
| hardest case; but I want to have native support for other
| data structures soon too.
|
| Yjs and automerge already do this. They have support for
| plain text, XML and arbitrary JSON structures.
|
| The simplest implementation for JSON structures + tuples is
| probably shelf[2], which is so simple you can implement it
| in about 25 lines of javascript. Shelf doesn't support
| lists, but combining shelf with the list code I already
| have in diamond-types is probably going to be good enough
| for most applications.
|
| [1] https://github.com/ottypes/
|
| [2] Shelf's description + code is here:
| https://braid.org/algorithms/shelf . Or you can watch the
| video of Greg (shelf's author) discussing the algorithm
| here: https://braid.org/meeting-8 . Kevin Jahns (Yjs's
| author) is in that recording too, and he's super jazzed
| about how simple and beautiful shelf is.
| lewisl9029 wrote:
| First of all, thank you for the amazing read! I
| thoroughly enjoyed the entire article, and it gave me a
| new perspective on the feasibility of CRDTs for real
| world applications performance-wise.
|
| Though I am curious now to hear your thoughts on the
| conflict resolution side of the equation for complex data
| structures like deeply nested JSON.
|
| The biggest takeaway I got from Martin's talk on the
| topic from a few years ago was that while CRDTs are
| theoretically guaranteed to eventually converge, the
| resulting converged state might not make any sense to
| applications that need to consume and act on that state
| [1].
|
| It seemed like a pretty fundamental challenge to using
| CRDTs to store arbitrary application state to me at the
| time, but I imagine the field has progressed immensely
| since then, so would love to hear any insights you might
| have around strategies that could be used to alleviate or
| at least work around this challenge if I wanted to build
| a CRDT-based app today.
|
| [1] https://youtu.be/8_DfwEpHE88?t=2232
| username91 wrote:
| It's a great article - really informative and enjoyable to
| read. Thanks for making it happen. :)
| conaclos wrote:
| Hi josephg, I'm a CRDT researcher. This is great to see so much
| work around CRDT nowadays!
|
| Some optimizations whom you discuss are already proposed by
| some papers and implementations.
|
| For instance, LogootSplit [1] proposes an implementation based
| on an AVL tree with extra metadatas to get a range tree.
| LogootSplit proposes also a block-wise approach that stores
| strings instead of individual characters. Xray [2], an
| experimental editor built by Github and written in Rust, uses a
| copy-on-write B-tree. Teletype [3] uses a splay tree to speedup
| local insertions/deletions based on the observation that a user
| performs several edits on the same region.
|
| [1]
| https://members.loria.fr/CIgnat/files/pdf/AndreCollabCom13.p...
| [2] https://github.com/atom-archive/xray [3]
| https://github.com/atom/teletype
| josephg wrote:
| Cool! It'd be interesting to see those CRDT implementations
| added to Kevin Jahns' CRDT Benchmarks page[1]. The
| LogootSplit paper looks interesting. It looks like xray is
| abandoned, and I'm not sure about teletype. Though teletype's
| CRDT looks to be entirely implemented in javascript[2]? If
| the authors are around I'd love to see some benchmarks so we
| can compare approaches and learn what actually works well.
|
| And I'm not surprised these techniques have been invented
| before. Realising a tree is an appropriate data structure
| here is a pretty obvious step if you have a mind for data
| structures.
|
| To name it, I often find myself feeling defensive when people
| read my work and respond with a bunch of links to academic
| papers. Its probably totally unfair and a complete projection
| from my side, but I hear a voice in my head reword your
| comment to instead say something awful like: "Cool, but
| everything you did was done before. Even if they didn't make
| any of their work practical, usable or good they still
| published first and you obviously didn't do a good enough
| literature review if you didn't know that." And I feel an
| unfair defensiveness arise in me as a result that wants to
| find excuses to dismiss the work, even if the work might be
| otherwise interesting.
|
| Its hard to compare their benchmark results because they used
| synthetic randomized editing traces, which always have
| different performance profiles than real edits for this
| stuff. Their own university gathered some great real world
| data in an earlier study. It would have been much more
| instructive if that data set was used here. At a glance their
| RAM usage looks to be about 2 orders of magnitude worse than
| diamond-types or yjs. And their CPU usage... ?? I can't tell
| because they have no tables of results. Just some hard to
| read charts with log scales, so you can't even really eyeball
| the figures. So its really hard to tell if their work ends up
| performance-competitive without spending a couple days
| getting their enterprise style java code running with a
| better data set. Do you think thats worth doing?
|
| [1] https://github.com/dmonad/crdt-benchmarks
|
| [2] https://github.com/atom/teletype-crdt
| mirekrusin wrote:
| It seeems that the issue of reproducibility in computer science
| where no gigantic/proprietary datasets are needed should not be
| a problem by simply publishing repository with the code. Are
| there any forces present that make it so rare in practice?
| josephg wrote:
| Credit where its due, the academics did publish the code they
| wrote on github. But I don't know if anyone - reviewers or
| readers - actually took the time to read it. Let alone
| understand why it throws doubt on the paper's conclusions.
| JW_00000 wrote:
| Usually, (at least in my specific niche of the computer
| science field,) if the code is published it's only
| published after the paper has been reviewed. This is partly
| to preserve anonymity during the review process, and also
| because usually the code isn't seen as "part of the paper"
| (i.e. "the paper should stand on its own"). Although I
| agree that you could argue that for papers about
| benchmarks, the code should definitely be considered an
| essential part of the paper.
| robmorris wrote:
| That's an impressive optimisation! Out of curiosity, what do
| you think are the most interesting or useful possible
| applications for an optimised CRDT?
|
| When you're approaching an optimisation like this, do you mind
| me asking how you think about it and approach it?
| politician wrote:
| Great post! I had no idea that list CRDTs could actually be
| fast because I read the papers showing how they were
| impractical. Thanks for investigating and writing this up --
| please accept my offer of a legitimate academic credential.
| [deleted]
| GlennS wrote:
| When optimizing `findItem`, did you consider storing the
| original index of each item on itself and using that as a
| starting point?
|
| Obviously this might move later (maybe it can only increase?),
| but usually not by much, so I would guess it would make an
| efficient starting point / be immediately correct 99% of the
| time?
|
| Looks like you already have 2 good solutions to this though
| (start from index of recent edits and range tree).
| benjismith wrote:
| I've been following your work for years (and I'm actually neck-
| deep in a ShareDB project right now) so I just want to say
| thank you for all of your contributions! I especially enjoyed
| this post.
| pookeh wrote:
| Wait it doesn't look like you used the performance branch of
| automerge (which is now merged into master). It is
| significantly faster.
|
| https://github.com/automerge/automerge/pull/253
| josephg wrote:
| I did use the performance branch. And I had a chat with a few
| people in the automerge community about the performance
| numbers I was seeing long before I published to see if I was
| doing anything wrong. I tested a few different versions of
| automerge but in this test there wasn't much performance
| difference between 0.12, 1.0.x-preview versions (which are
| built from the merged performance branch) and I tried the
| unreleased automerge-rs. When I ran my tests timing numbers
| for automerge ranged from about 5m with the old non
| performance branch down to about 4m20s or so with automerge-
| rs. Still far from Yjs's 0.9 seconds.
|
| I just checked and it looks like automerge 1.0.1-preview-4
| has landed. I wrote the post benchmarking preview-2. I've
| been knee deep in diamond types lately and haven't been
| watching. Fingers crossed there's some more performance
| improvements in the pipeline. I'd love to do a follow up in 6
| months showing much improved performance.
| lewisjoe wrote:
| Thank you for writing this piece Joseph.
|
| Just want to make sure if something's a possible typo or I'm
| getting it all wrong :)
|
| Quote: "But how do we figure out which character goes first? We
| could just sort using their agent IDs or something. But argh,
| if we do that the document could end up as _abcX_ , even though
| Mike inserted X before the b. That would be really confusing."
|
| Since the conflict is only between the children of _(seph, 0)_
| the only possibilities are, either ending up with _" aXbc"_ or
| _" abXc"_ right? Or is there a legitimate possibility of ending
| up with "abcX" ?
|
| I'm assuming we'll apply a common sorting logic only to
| clashing siblings.
| josephg wrote:
| Good question. That part of the article could probably use
| another diagram to explain it.
|
| The resulting document is generated by doing a depth-first
| prefix traversal of the tree. The ambiguity comes because "b"
| and "X" are both direct children of "a". So its not clear how
| they should be ordered relative to each other. Because "c" is
| a child of "b" in this example, the "X" can't appear between
| the "c" and "b". The only valid orderings are, as I said,
| "aXbc" or "abcX". But without knowing how "b" and "X" should
| be ordered, its ambiguous which one to use.
|
| Let me know if thats still confusing! This stuff is hard to
| explain without a whiteboard.
| lewisjoe wrote:
| Clear enough Joseph. Thanks :)
| trishume wrote:
| Have you seen my Xi CRDT writeup from 2017 before? https://xi-
| editor.io/docs/crdt-details.html
|
| It's a CRDT in Rust and it uses a lot of similar ideas. Raph
| and I had a plan for how to make it fast and memory efficient
| in very similar ways to your implementation. I think the piece
| I got working during my internship hits most of the memory
| efficiency goals like using a Rope and segment list
| representation. However we put off some of the speed
| optimizations you've done, like using a range tree instead of a
| Vec of ranges. I think it also uses a different style of
| algorithm without any parents.
|
| We never finished the optimizing and polished it up, so it's
| awesome that there's now an optimized text CRDT in Rust people
| can use!
| josephg wrote:
| Oooohhhh no I haven't read that - thanks for the link! I feel
| embarrassed to say this but I knew about Xi editor years ago
| but I totally forgot to go read & learn about your crdt
| implementation when I was learning about Yjs and automerge
| and others. I'll have a read.
|
| And thanks for writing such an in depth article. It's really
| valuable going forward. Maybe it's addressed in your write up
| but are there any plans for that code, or has everyone moved
| on? I'd love to have a zoom chat about it and hear about your
| experiences at some point if you'd be willing.
| neolog wrote:
| Out of curiosity, what do you use to make those diagrams?
| trishume wrote:
| https://www.figma.com/ and putting a lot of effort into
| them
| ta988 wrote:
| This was a great read, thank you. I wish there were more
| explanations of the "black magic" part of Yjs. I'll have to dig
| into that.
| josephg wrote:
| If you're interested in learning more about Yjs's internals,
| I interviewed Kevin for 3 hours on zoom and got him to take
| me through Yjs's code. He's a great guy.
|
| The video of that conversation is here:
| https://youtu.be/0l5XgnQ6rB4
| thechao wrote:
| I love high level systems languages like C/++ and Rust... but
| everything you said about JavaScript being slow is the same
| thing assembly programmers experience when optimizing high
| level systems languages.
|
| In general, when I see C code and I'm asked to speed it up, I
| always use "100x" as my target baseline.
| josephg wrote:
| Whoa thats a really impressive baseline to reach for when
| optimizing C code! I'd love to hear some war stories.
|
| As you can probably tell from my article, most of my skill at
| this stuff is from hard won tricks I've picked up over the
| years - like reducing heap allocations and packing memory for
| cache coherency. There's probably lots of things I just
| haven't learned because I haven't discovered it on my own.
|
| Do you have a blog, or any recommendations for stuff to read
| by you or others?
| thechao wrote:
| I learned low level programming while at Intel, working
| with one of their performance teams; so, nothing written.
| In general, though, assembly is more _expressive_ than
| higher level languages; it lets you "tighten up" the code
| the computer executes, reducing work & bandwidth.
|
| Specifically, a guy named Mike Abrash taught me some of
| this stuff, and he's got some books explaining the theory
| in terms of practice.
| Majromax wrote:
| When you write:
|
| > Yjs does one more thing to improve performance. Humans
| usually type in runs of characters. So when we type "hello" in
| a document, instead of storing ['h','e','l','l,'o'], Yjs just
| stores: ['hello']. [...] This is the same information, just
| stored more compactly.
|
| Isn't this not just the same information when faced with
| multiple editors? In the first implementation, if I pause to
| think after typing 'hel', another editor might be able to
| interject with 'd' to finish the word in another way.
|
| In my view, these data structures are only "the same
| information" if you provide for a reasonably-sized, fixed
| quantum of synchronization. The merging makes sense if e.g. you
| batch changes every one or two seconds. It makes less sense if
| you would otherwise stream changes to the coordinating agent as
| they happen, even with latency.
| goldenkey wrote:
| The chunking should depend entirely on the network
| synchronization. If synchronization is editing-debounced,
| then chunking could be applied rather safely.
| josephg wrote:
| I didn't explain this well but the transformation is
| lossless. No data is lost from compressing like this. It has
| no impact on the concurrency protocol or network protocol; it
| just impacts how the data is stored locally.
|
| If we need to, we could split the run back out again into
| individual characters without losing information. And that
| does happen - we do that if something later gets inserted
| into the middle of the run of characters. Pausing doesn't
| help. Even the same user could later type something in the
| middle of a word they'd typed previously.
|
| This limits what we can run-length encode. The code in
| diamond only run-length encodes items when they are:
|
| - Typed sequentially in time
|
| - And sequentially in insert location (position 10, 11, 12,
| 13, ...)
|
| - And either all of the characters have been deleted or none
| of them have been deleted
|
| Notice the other fields in that example in the post. Each
| subsequent character has:
|
| - An id field = the previous id + 1
|
| - A parent field = the id of the previous item
|
| - The same value for _isDeleted_
|
| If any of this stuff didn't match, the run would be split up.
|
| Instead of storing those fields individually, we can store
| the id and parent of the first item, an isDeleted flag and a
| length field. Thats all the information we actually need to
| store. With yjs style entries (which is what diamond-types
| actually implements), the code is here if you can make heads
| or tails of it. The poorly named "truncate" method splits an
| item in two. Spans are only extended if can_append() returns
| true: https://github.com/josephg/diamond-
| types/blob/c4d24499b70a23...
|
| With real data from Martin's editing trace, 182 000 inserted
| characters get compacted down to 12 500 list items. Which is
| still pretty good - thats 14x fewer entries. And it means
| performance doesn't stutter when large paste events happen.
| jg23 wrote:
| I have a related question to this, if you're storing
| ["hello"] as one chunk, what happens when you perform an
| edit to say adding an extra ["e"] after the ["e"]? In the
| unoptimised structure I know you can just add the new ["e"]
| as a child of the original ["e"]. So here would you then
| delete the chunk ["hello"] and split it into two halves
| like ["he"] and ["llo"]?
| josephg wrote:
| Yes exactly. You replace the chunk with "he" and "llo"
| then insert the extra item in the middle. The result is
| ["he", "e", "llo"]. The code to do all this inline in a
| b-tree is pretty hairy but it works pretty well!
| Majromax wrote:
| Ah, I see. I had thought that the consolidation gave a
| batched update with a single ID, so 'h' + 'ell' + 'o' would
| have IDs of 1, 2, and 3 respectively. That would have made
| an editing conflict in the middle of 'ell' impossible.
| josephg wrote:
| Ah that makes sense! Concurrent changes are one thing,
| but concurrent changes are rare. The big problem if we
| did it that way is that it would become impossible to
| insert in the middle of "ell", because we wouldn't be
| able to name any of those internal positions.
|
| To get around that we assign a sequential ID for every
| inserted character, regardless of how they're typed or
| stored. Typing "hello" would assign IDs 1-5 even if you
| paste "hello" from the clipboard.
| kohlerm wrote:
| Very nice, when I read "double linked list" I immediately thought
| "what about a btree like structure?" I guess Martins idea to
| replace the IDs comes from the "vector clock" idea for concurrent
| updates
| audidude wrote:
| If anyone is looking for the combination of a piecetable and
| b+tree (which appears to be what is talked about in this
| article), I have one that I've been using for years across
| various GTK components.
|
| https://gitlab.gnome.org/chergert/textrange/
| gritzko wrote:
| About a decade ago, I implemented the Causal Tree CRDT (aka RGA,
| Timestamped Insertion Tree) _in regular expressions_ using a
| Unicode string as a storage. Later we made a collaborative editor
| for Yandex based on that code. It used many of the tricks as
| described in the text, even the optimization where you remember
| the last insertion point. So I am terribly glad it all gets
| rediscovered.
|
| The code is on GitHub [1] There is also an article which might be
| a tough reading, so no link. Recently I greatly improved CTs by
| introducing the Chronofold data structure [2]. Regarding that
| benchmark article, I spent many years in academia, so the quality
| of content problem is familiar to me. In other words, I don't
| take such articles seriously. CRDTs are fast enough, that is not
| a concern.
|
| [1]: https://github.com/gritzko/citrea-model
|
| [2]: https://arxiv.org/abs/2002.09511
| omgtehlion wrote:
| That is nice!
|
| A couple of questions: Do you have released a CT implementation
| in top of Chronofold? Have you any plans to benchmark it
| against other algs?
| gritzko wrote:
| Thanks. No, I didn't release it, although there are
| implementations on GitHub. Rust and Kotlin, at least. The RON
| proof-of-concept wiki runs on the (unreleased) C++
| implementation [1]. I benchmarked it at some point,
| everything was like "k nanoseconds" [2].
|
| [1]: http://doc.replicated.cc/%5EWiki/ron.sm
|
| [2]: https://github.com/dmonad/crdt-
| benchmarks/issues/3#issue-599...
| omgtehlion wrote:
| Thanks!
|
| I googled only 1 impl in Rust:
| https://github.com/dkellner/chronofold which seems to
| produce invalid results on some inputs. Actually the hard
| part (integration) is made of hacks there...
|
| That PoC Wiki sounds really interesting and the whole
| replicated.cc project! Any plans on releasing it?
| gritzko wrote:
| Here is Kotlin https://github.com/decentralized-
| hse/collab-edit
|
| libron will be released, yes.
| _hl_ wrote:
| I remember seeing that (regex CTs) and immediately thinking
| "wtf, why would anyone want to do that". Took me quite a while
| to understand that it's actually a pretty clever way to write
| fast state machines in browserland. So thank you for this work!
| ta988 wrote:
| CRDT: https://en.m.wikipedia.org/wiki/Conflict-
| free_replicated_dat...
| teleforce wrote:
| This previous HN discussions on OT vs CRDT paper is an excellent
| overview of the topics [1],[2].
|
| From the paper conclusions "Our discoveries from this work
| revealed facts and evidences that refute CRDT superiority claims
| over OT on all accounts, which helps to explain the underlying
| reasons behind the choices between OT and CRDT in the real
| world."
|
| The fact that the paper provided the refutation to one of the HN
| discussion points being made in [1] (i.e. reference to itself),
| regarding the claimed of CRDT superiority in its footnotes is
| rather amusing and the first such attempt I have personally seen
| in a published paper.
|
| [1]https://news.ycombinator.com/item?id=18191867
|
| [2]https://arxiv.org/abs/1810.02137
| tekkk wrote:
| Excellent article! As someone who has to work with collaborative
| editing I must say the complexity of the whole area is at times
| daunting to say the least. So many edge-cases. So many mines to
| step on.
|
| Now I think I am convinced that the OT vs CRDT performance
| comparison is kind of moot point and the question is more about
| the user experience. Which version produces nicer results when
| two very diverged documents are merged. Maybe one of these days
| I'll read an article about that too.
|
| To get off on a tangent a little bit, I'd be interested to know
| how one could add in tracking of changes to Diamond or other
| CRDT? Can you add an arbitrary payload to the operation and then
| just materialize the areas different from the original snapshot?
| I know Yjs can do this by creating snapshots and then comparing
| them to another snapshot but it seemed a bit awkward and not
| suited for real-time editing.
| johnarno wrote:
| If it weren't for academic papers, we probably wouldn't have the
| beautiful and nice concept of a CRDT in the first place. We might
| have 100 different solutions promising us 100 different
| replication schemes, some with and some without guarantees,
| hiding behind a gazillion different undefined terms that want to
| make us believe each solution is better than sliced bread. Also,
| I don't thing Google Wave didn't make it due to its OT algorithm.
| uyt wrote:
| I think this data structure is usually called a Counted B-tree
| https://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtr...
| instead of range tree
___________________________________________________________________
(page generated 2021-07-31 23:00 UTC)