[HN Gopher] CRDT: Fractional Indexing
___________________________________________________________________
CRDT: Fractional Indexing
Author : mblode
Score : 279 points
Date : 2022-11-27 17:03 UTC (11 hours ago)
(HTM) web link (madebyevan.com)
(TXT) w3m dump (madebyevan.com)
| [deleted]
| parentheses wrote:
| Maybe I'm missing the point. Random offsets feels like an
| inelegant solution in the space of elegant solutions (read:
| CRDTs).
| paulgb wrote:
| This stuff is fun to play with. I implemented a Rust version of
| fractional indexing based on another of Evan's blog posts.
|
| https://docs.rs/fractional_index/latest/fractional_index/
| dang wrote:
| Ok, let's try changing the URL from
| https://madebyevan.com/algos/ (a generic list) to
| https://docs.rs/fractional_index/latest/fractional_index/ (the
| item you're taking about).
|
| Explanation here:
| https://news.ycombinator.com/item?id=33764956. Thanks!
| chrisoverzero wrote:
| Does a Vec<u8> with that calculation listed end up smaller than
| two integers of (presumably) arbitrary length?
|
| When I read about the problem, I imagined using the mediant to
| calculate "between". I think that mediant of $value and 1/1
| would work for "after" and mediant of $value and 0/1 would work
| for "before".
|
| There must be a requirement I'm missing, though!
|
| mediant: https://en.wikipedia.org/wiki/Mediant_(mathematics)
| paulgb wrote:
| That would be an interesting thing to test! My first instinct
| is that the comparison operation (when denominators differ)
| would be more expensive.
|
| In general some approaches should be better than others
| depending on how inserts are distributed -- I think my
| approach is optimal when inserts are made at random (I
| haven't proved this), but in practice it may be common for a
| bunch of things to be inserted in sequence for some
| applications. In those cases, there are more space-optimal
| approaches.
| conaclos wrote:
| This is basically the idea behind Logoot [Weis_2009] that was
| improved by LSeq [Nedelec_2013] and later extended to the first
| block-wise sequence CRDT: LogootSplit [Andre_2013]. LogootSplit
| was recently improved as Dotted LogootSplit [1] [Elvinger_2021].
|
| Disclamer: I'm the author of Dotted LogootSplit.
|
| [Weis_2009] https://hal.inria.fr/inria-00432368
|
| [Nedelec_2013] https://hal.archives-ouvertes.fr/hal-00921633/en
|
| [Andre_2013] https://hal.archives-ouvertes.fr/hal-01246212
|
| [Elvinger_2021] https://hal.univ-lorraine.fr/tel-03284806
|
| [1] https://github.com/coast-team/dotted-logootsplit
| superb-owl wrote:
| CRDTs are incredible. I recommend checking out the original link
| [1] as well as this "awesome" list [2]
|
| [1] https://madebyevan.com/algos/
|
| [2] https://github.com/alangibson/awesome-crdt
| jiffyjeff wrote:
| Conflict-free replicated data type, or CRDT, is often referred to
| as simply "CRDT" in trade jargon without any definition.
| Apparently.
| dang wrote:
| This looks like great stuff if you follow the pointers, but lists
| don't make good submissions to HN (which itself is already a
| list). They tend not to lead to deep discussion because comments
| are about lowest common denominator of the items on the list, and
| this is usually pretty generic. What's better is to pick the most
| interesting item on the list and submit that instead.
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...
|
| Edit: let's do that in this case. I've changed the URL from
| https://madebyevan.com/algos/ based on
| https://news.ycombinator.com/item?id=33764875.
| mblode wrote:
| Thank you dang for updating the title and URL!
| dang wrote:
| The funny thing is that the discussion so far has still
| mostly been generic, yet I swear the comments are higher-
| quality than they would have been without the change.
| (Impossible to say for sure, of course)
| LAC-Tech wrote:
| CRDTs are often talked about in the same breath as collaborative
| editing software, but they're useful for much more than that.
|
| They really are a theoretical model of how distributed,
| convergent, multi-master systems have to work. IE the DT in CRDT
| could be a whole datastore, not as just an individual document.
|
| (Wish I could remember who on HN alerted me to this. I had read
| the paper but didn't grok the full implications).
| jdvh wrote:
| We're using a single end-to-end encrypted document tree synced
| with CRDTs for our collaborative task IDE[1]. All data for a
| team is a single tree (graph really, if you count
| transclusions) and its kind of magical how simple everything
| gets when you know all state will sync in a deterministic way
| between all clients. It doesn't matter whether you drag&drop
| some object or add a new user, or rename something. It all
| reduces to a handful of CRDT operations.
|
| (We have a central server and the app works offline so the
| algorithm from the linked article doesn't apply in our case.)
|
| [1] https://thymer.com
| jitl wrote:
| Are you using Yjs? How are you thinking about scale-up for
| teams with 100s - 1000s of users?
| jdvh wrote:
| We've looked at Yjs but ultimately decided to write our own
| thing from scratch.
|
| Even scale-ups with thousands of users will have people
| working on different parts of the document tree in
| practice. The client doesn't need to receive every
| keystroke for people editing somewhere far away in the
| document tree. Those updates can be sent in batches every
| once in a while.
|
| If 10.000 people decide to edit the exact same location in
| a document then performance will degrade. The client will
| start to lag behind if it can't keep up with the stream of
| updates (cpu or internet bottleneck) but eventually it will
| process everything and all clients will end up with the
| same state. We have two channels. One for state updates
| (soft real-time) and one for UI hints ("hard" real-time)
| and other user actions that aren't CRDT mutations.
| adamnemecek wrote:
| Probably something by Martin Kleppmann?
| newhouseb wrote:
| You might be thinking of this [1] paper? The core idea being if
| your problem space lends itself to monotonicity (see the paper
| for a more precise definition than I can give), then you can
| build a globally consistent database (around a CRDT) where the
| end-user doesn't need to concern themselves with inconsistent
| states while consistency is reached.
|
| [1] https://arxiv.org/pdf/1901.01930.pdf
| LAC-Tech wrote:
| I wasn't clear - by "the paper" I meant the original CRDT
| paper. I read it, thought I understood it on some level, but
| had not drawn the dots between the theory and real world
| problems.
|
| Regardless, thanks very much for linking the paper! Right up
| my alley.
| sitkack wrote:
| Marc Shapiro and Nuno Preguica created CRDTs in 2007.
|
| https://pages.lip6.fr/Marc.Shapiro/pubs.html#CRDTs
|
| The original paper would be
| https://hal.inria.fr/inria-00177693/
| bhl wrote:
| Is there a good resource on designing collaborative apps with
| CRDTS, that is which type of CRDT and conflict resolution to pick
| for a given application? Something like
| https://mattweidner.com/2022/02/10/collaborative-data-design...
| but generalizes.
| samwillis wrote:
| Anyone unsure of what a CRDT is (I think everyone on HN must know
| by now), this is the perfect intro:
| https://www.inkandswitch.com/peritext/
|
| The two most widely used CRDT implementations (combining JSON
| like general purpose types and rich text editing types) are:
|
| - Automerge https://github.com/automerge/automerge
|
| - Yjs https://github.com/yjs/yjs
|
| Both have JS and Rust implementations, and have bindings to most
| online rich text editors.
|
| CRDTs are addictive one you get into them.
| [deleted]
| taneq wrote:
| Even that link was 5 pages in (on my phone) before it deigned
| to mention:
|
| > It is a Conflict-free Replicated Data Type (CRDT)
|
| What happened to the idea of defining all non-universally-
| recognised acronyms the _first_ time you use the term? With
| people making up new terms exponentially faster than ever
| before, it's now more important than ever.
| al2o3cr wrote:
| The first use is a hyperlink to a whole article defining the
| term, what are you on about?
| lelandfe wrote:
| TBF, it uses the acronym 8 times across 500 words before
| giving you the actual term.
| glenngillen wrote:
| Also, outside the page title/headings and the reference to
| the name of said external paper the first use in the
| document _is_ where they use the expanded name.
| jitl wrote:
| To me the other algorithms described in the list are more novel
| and interesting:
|
| https://madebyevan.com/algos/crdt-tree-based-indexing/ - for when
| precise order is critical, like paragraphs in a document. This
| algorithm is _almost_ like storing adjacency information like a
| linked list, but is more convergent. Very interesting for [my
| use-case](https://www.notion.so/blog/data-model-behind-notion).
|
| https://madebyevan.com/algos/crdt-mutable-tree-hierarchy/ - for
| tree-shaped data, like blocks in a Notion page that should have
| exactly one parent, but allow concurrent re-parenting operations
|
| https://madebyevan.com/algos/log-spaced-snapshots/ - log space
| snapshots, for choosing what fidelity of historical information
| to store. For context, many CRDTs for rich text or sequences
| store unbounded history so that any edit made at any time can be
| merged into the sequence. For long-lived documents, this could be
| impractical to sync to all clients or keep in "hot" memory.
| Instead, we can decide to compact historical data and move it to
| cold storage, imposing a time boundary on what writes the system
| can accept on the hot path. The log-spaced snapshots algorithm
| here could be used to decide what should be kept "hot", and how
| to tune the cold storage.
| maxmcd wrote:
| If you're working on CRDT stuff in production (or possibly in
| production) do you have thoughts on the CRDT vs OT debate? I
| would expect Notion to use operational transform given the
| availability of reliable central servers. But I know quite
| little! Interested in your thoughts.
| ChadNauseam wrote:
| I'm not the GP, but OT is pretty annoying to implement. There
| are so many cases that it's quite difficult to formally prove
| an OT correct. On the other hand, a large subset of CRDTs can
| be implemented in Datalog and if you do that you can't
| possibly end up with an invalid CRDT.
|
| From wikipedia:
|
| > Similarly, Joseph Gentle who is a former Google Wave
| engineer and an author of the Share.JS library wrote,
| "Unfortunately, implementing OT sucks. There's a million
| algorithms with different tradeoffs, mostly trapped in
| academic papers. [...] Wave took 2 years to write and if we
| rewrote it today, it would take almost as long to write a
| second time." But later he amends his comment with "I no
| longer believe that wave would take 2 years to implement now
| - mostly because of advances in web frameworks and web
| browsers."
| throwaway81523 wrote:
| This is kind of interesting but "fractional indexing" doesn't
| seem to be a computer science topic, and I think it might be
| clearer to treat these indexes as lists of numbers (or ordinals
| in o^o, if you prefer) rather than fractions. Those are simpler
| to generate and compare than arbitrary-precision fractions. Or as
| jitl's post suggests, using trees as indexes (I haven't yet
| looked at jitl's linked articles). Those would presumably have
| order type e0. It's not clear to me why you'd want that, but it
| seems doable. In all these schemes you might occasionally want a
| "stop the world garbage collection" where you reset all the
| indices to be ordinary integers or maybe pairs of integers. I
| guess that is also doable without having to pause all the
| updates, at least if you use pairs.
| zaphar wrote:
| In a large distributed system stop the world garbage collection
| is probably not going to work very well.
| fatneckbeardz wrote:
| reminds me a lot of Ford Circle / Farey Diagram / Stern Brocot
| tree
|
| Basically a tree of fractions where you take two rational points
| on a number line, a/b and c/d, then the next point in the tree is
| (a+b) / (c+d). Turns out that every single point you create this
| way has a unique position and never duplicate each other, and it
| forms a tree like structure.
|
| https://en.wikipedia.org/wiki/Ford_circle
|
| not sure if this would be useful, but basically it could be a
| fractional index that has a built in tree structure, since it
| basically means any fraction is a leaf on a Stern-Brocot tree.
| brilee wrote:
| Doesn't this end up being effectively a binary heap, with a
| maximum tree depth of 23 (floating point mantissa precision)? I
| imagine there must be a rebalancing operation required every so
| often, possibly more frequently for pathological insertion
| orders.
| luto wrote:
| > Fractional positions should be represented using arbitrary-
| precision decimals so that they don't run out of precision.
| Floating-point numbers are insufficient.
| newhouseb wrote:
| Is this the same as LSeq [1] except rather than using bytes one
| is basically using digits in a floating point representation
| (given this is JS where most things are floats)?
|
| [1] https://bartoszsypytkowski.com/operation-based-crdts-
| arrays-...
| Shohrux wrote:
| est wrote:
| Reminds me of TokuDB engine for MySQL. Good times.
| lelandfe wrote:
| For those unaware: Evan is the cofounder, former CTO, and
| technical wizard behind Figma; creator of esbuild, etc.
___________________________________________________________________
(page generated 2022-11-28 05:00 UTC)