[HN Gopher] You might not need a CRDT
___________________________________________________________________
You might not need a CRDT
Author : paulgb
Score : 264 points
Date : 2022-12-05 13:57 UTC (9 hours ago)
(HTM) web link (driftingin.space)
(TXT) w3m dump (driftingin.space)
| dustingetz wrote:
| Remember, "conflict free" is the opposite of "transactional"
| (ACID). When you choose conflict-free, you're choosing to give up
| the ability to reject a change due to conflict and send it back
| to the user for resolution. This is the crux of the issue in my
| opinion.
| JayStavis wrote:
| Agreed - conflict resolution and authority handling are key to
| advanced game networking techniques and underpin the
| architecture of many big multiplayer games.
| aboodman wrote:
| I don't think this is true on both accounts:
|
| * There is no such thing as "conflict free". When users edit
| data disconnected from each other, they can conflict, because
| the user intent of the edits can be different and not possible
| to reconcile. Nothing can fix that, not even crdts (As a
| thought experiment - two users are working on a report about
| ferrets. While offline, one decides the report should be about
| lemurs, while the other decides it should be about otters. The
| resulting set of changes will not be mergable without conflicts
| in any reasonable sense.)
|
| * What CRDTs give you is actually _convergence_ which is very
| important. But there are other ways to get convergence, and it
| can be done transactionallym as long as your are willing to
| weaken the D (durability). Replicache (my thing -
| replicache.dev) is one way, the way described here is another.
|
| In general game networking is also transactional in this same
| sense. Operations are transactions, that can affect multiple
| data items and enforce arbitrary invariants. But the _order_
| they apply in drifts, and so durability is sacrificed.
|
| Stepping back even further, CRDTs are also often transactional
| in the same way. Each operation in an op-based crdt can be
| thought of a transaction that applies completely or not at all.
| It's just that the set of available operations in a CRDT is
| typically fixed.
| dustingetz wrote:
| Replicache is cool
|
| 1. So by "weaken durability" you mean basically allow
| committed changes to be "lost" in the case of a common
| conflict, aka give it up entirely?
|
| 2. CRDTs are meant to converge even in the presence of
| network partitions (offline first), so you're saying your
| committed transactions can be uncommitted? Or you're saying
| writers have no idea if their transaction is committed
| because it might converge the other way after arbitrary
| delay?
|
| "Durability" means "transactions that have committed will
| survive permanently" - wikipedia
| aboodman wrote:
| Thanks :)
|
| > So by "weaken durability" you mean basically allow
| committed changes to be "lost" in the case of a common
| conflict, aka give it up entirely?
|
| I should have said "weaken ordering".
|
| I mean that the transaction will run, but its order with
| respect to other concurrently running transactions may be
| different in the end than the submitting client initially
| thought. As an extreme case, you submitted the transaction
| and your local client thought you got the concert tickets,
| but by the time it hits the server, the last ticket was
| already sold. Transaction still runs: it just runs first
| optimistically on the client, and then later
| authoritatively on the server with a potentially different
| result. Clients converge to serve view of the world and
| application invariants are maintained regardless (number of
| sold tickets doesn't go negative).
|
| Or just read Jepsen's take:
| https://doc.replicache.dev/concepts/consistency
|
| In the case of concert tickets, this design doesn't make a
| lot of sense, but in lots of other cases -- in particular
| document editing applications -- it makes tons of sense.
| The chance of concurrent edits is low, but it is critical
| to maintain application-defined invariants when conflicts
| do occur.
|
| > CRDTs are meant to converge even in the presence of
| network partitions (offline first)
|
| Sure. Replicache converges in the presence of network
| partitions, as does OP's design, as do most games,
| distributed databases, etc. What CRDTs uniquely provide is
| the ability to converge *without a leader* -- in a p2p
| environment. As OP observes, this is not a property that
| most applications today need. Because they have an obvious
| leader: the server.
|
| > so you're saying your committed transactions can be
| uncommitted? Or you're saying writers have no idea if their
| transaction is committed because it might converge the
| other way after arbitrary delay?
|
| They run optimistically (speculatively) on the client, and
| then later authoritatively on the server. Application can
| render the optimistic result, and then later the
| authoritative result when it is known.
|
| This is exactly what CRDTs do. The only difference is that
| CRDTs can do it without a leader, but the Replicache (and
| OP's) design offers additional flexibility -- ability to
| define your own operation/transactions.
|
| ===
|
| We need new terminology for distributed systems. Terms
| invented in the 70s aren't going to make sense. I believe
| that "transactions" in the common parlance mean "a set of
| code that runs together or not at all, isolated from other
| concurrently running transactions". This is entirely
| possible to provide with convergence if we are willing to
| weaken ordering.
| dustingetz wrote:
| i accept we need new terminology 100%
|
| i accept we can weaken ordering, 100%
|
| under the model you propose, how does a writer know if
| their transaction has been committed (by the authority)
| and what is the upper bound on how long they will need to
| wait to know the _final_ result? and is there an
| operation for the writer to wait for it?
| aboodman wrote:
| Each client sends a sequence of mutations to the server,
| each identified by a mutationID. During downstream sync,
| server says up to which mutation it has processed.
|
| In Replicache, you can known when a mutation is finalized
| with the `onSync` API, or more ergonomically, with the
| upcoming `pendingMutation` API.
|
| https://doc.replicache.dev/api/classes/Replicache#onsync
| https://doc.replicache.dev/api/classes/Replicache#experim
| ent...
|
| PS- there actually is terminology for this already.
| Replicache is technically a Causal+ Consistent
| transactional system:
| https://jepsen.io/consistency/models/causal
| Sytten wrote:
| Unless I am wrong, if your server uses end-to-end encryption you
| are still better off with CRDT since the server cannot merge the
| state. So there is a caveat to be said about this claim that you
| only need CRDT for peer-to-peer scenarios.
| paulgb wrote:
| That's a good point, but the server can still enforce an
| ordering of encrypted messages it can't read. You just don't
| get the benefit of the server being able to detect invariant
| violations in this case; all change operations would be sent to
| all clients and the client state machine would throw away the
| ones that created conflicts.
| dgb23 wrote:
| Quite a big takeaway here:
|
| > A general theme of successful multiplayer approaches we've seen
| is not overcomplicating things. We've heard a number of companies
| confess that their multiplayer approach feels naive -- especially
| compared to the academic literature on the topic -- and yet it
| works just fine in practice.
|
| The KISS and YAGNI principles apply.
|
| And on another note, it seems like one big industry is often
| overlooked in these types of discussions. I wonder what web
| development could learn from online game development when it
| comes to "multiplayer" apps - I mean it's right there in the
| name.
|
| One very pragmatic approach that strikes me as reasonable is
| having fixed delay changes. The game Warcraft 3 (a real time
| strategy game) did something like this. What it accomplishes is
| that changes are basically forced to be in sync by delaying them.
|
| Note that you will get used to the delay very quickly, you won't
| notice it too much after a short while. It is _much_ more
| noticeable if the delays are variable, but a consistent delay
| gets basically filtered out after a short while.
| paulgb wrote:
| > I wonder what web development could learn from online game
| development when it comes to "multiplayer" apps - I mean it's
| right there in the name.
|
| 100%. The architecture we used for https://plane.dev takes a
| lot of inspiration from how game servers work, but using
| HTTP/WebSocket instead of UDP.
|
| A lot of state sharing techniques that people are now using on
| the web (like cursor position smoothing and optimistic updates)
| have decades of precedent in multiplayer games.
|
| We've also seen more apps start to use an approach to pixel
| streaming which more closely resembles stadia than traditional
| VNC. https://digest.browsertech.com/archive/browsertech-digest-
| wo...
| JayStavis wrote:
| The intersection of state replication and pixel streaming is
| indeed interesting. Cloud games for example are usually using
| game state sync techniques, and then add in the client<>gpu
| latency on top of that.
|
| A true cloud native game (sadly part of Stadia's original
| ambitions) would hopefully have a more e2e state management
| approach inclusive of the pixel latency.
|
| This is very different from "local multiplayer" pixel
| streaming, which may be what companies like Switchboard[1]
| are doing: multiple clients control the same I/O channel
| (e.g. you and I share a mouse on a single remote desktop)
|
| [1] https://www.switchboard.app/
| Joeri wrote:
| _but using HTTP /WebSocket instead of UDP_
|
| I think websockets run over UDP when hosted on http/3.
| paulgb wrote:
| They do, but with HTTP/3 there's also WebTransport, which
| is even better because it means we will be able to avoid
| head-of-line blocking.
| munificent wrote:
| There is definitely a lot we can learn from multiplayer games,
| but the non-game use cases often have different constraints
| that lead to different solutions.
|
| With a game, it's a given that everyone is playing in real-
| time, right now, with as little lag as they can get away with.
| The synchronization happens constantly and ideally the game's
| internal state is completely in sync across all players.
| Players _want_ the game to be kept in sync as quickly as
| possible so that they can react to what other players are
| doing. That means the amount of synchronization to perform at
| any one point in time is generally fairly small.
|
| Also, the structure of the game itself often implicitly
| partitions the game's mutable state among the players. Players
| can obviously do things that affect each other and there are
| mutable parts of the world that can be affected by multiple
| players. But each player often controls their own avatar in the
| game and has an implicit lock on that data.
|
| This means that there is rarely any "merging" happening. It's
| usually just look at the events, determine an order to apply
| them, and do them one at a time. If the result is a little
| weird in terms of state (maybe a wall is built right when
| another player is crossing it and the game lets the teleport
| through), well, it's a game. The programmers can just declare
| by fiat that it's part of the game mechanics.
|
| Outside of games, a common use case is "let me work on this
| document offline for the duration of a flight and then resync
| everything when I get to the airport". There may be hundreds of
| changes and other users may also have huge batches of changes.
| There can be a very complex reconciliation and megre process
| because the changes are many and the data is precious. You
| can't just say it's "part of the game" if your multi-user
| spreadsheet goofs someone's tax statement.
| smolder wrote:
| I think the general term should be multi-actor. I understand
| why people use the term multiplayer, I think: familiarity. But
| players imply play which implies games, so it's not a great fit
| outside of them.
|
| The type of thing you're talking about with Warcraft is a
| deterministic simulation. As long as you have deterministic
| simulation, and you synchronize your inputs by buffering
| sufficiently for the input latency of all the remote
| participants, that ensures everything stays 1:1 between each
| local simulation.
| yazaddaruvala wrote:
| Honestly, I just restarting playing World of Warcraft, and even
| tho there have been amazing tech improvements to Blizzard's
| engine and servers, there is so much simple stuff lacking.
|
| I would honestly say it's likely better for game server
| developers to be taking ideas from web development.
|
| When it comes to CRDT's everyone keeps talking about text-
| editing, honestly one of the hardest problems for a CRDT to
| solve. Text is entirely unstructured, requires extremely high
| precision on the merge of any conflict.
|
| With a video game instead, players already used to "rewind and
| replay" merges, they are already used to having a "miss" when
| it should have been a "hit", and they roll with it because
| while that is friction it's a game and precision isn't that
| important. I'll say differently, precision is already not what
| game servers provide.
|
| At the end of the day, game servers are "just" low-latency chat
| applications with "bots" that process the messages. I'd love to
| see the game server written by the maintainers of WhatsApp /
| Messenger / Slack / etc. That would be a truly scalable game!
|
| P.S. If you think "chat is so simple, can't compare it to a
| game server". Please just think through the design for FB's /
| Discord's display widget that shows "currently active friends".
| wtatum wrote:
| This is an interesting idea but I do wonder if it can be
| applied to generic collaborative editing "as is". The approach
| for synchronized ticks (I think most RTS games do something
| similar to the Blizzard approach) does require each participant
| to "ack" the tick. You can see this when one participant is
| lagging in that the simulation freezes for all players until
| they acknowledge. The online gaming approach is to let other
| players vote to kick someone who is lagging the game but that
| probably wouldn't be considered suitable in something like
| enterprise content management.
|
| Can some variation of the approach be taken without the strict
| requirement that: - There is a known set of participants at any
| time - The list of participants has a reasonable well-bounded
| size - Peers can communicate with each other directly (WebRTC
| is available but there are sharp corners) - It's acceptable to
| freeze all players based on one lagging player - Need an
| unambiguous way to handle simultaneous player actions against
| shared state
|
| Most likely the "fix" for the friction is that you just slow
| the ticks down _a lot_ compared to what you would do in a
| multi-player game.
| wongarsu wrote:
| There are probably good techniques worth stealing from FPS or
| MMORPGs. Those also usually run on a fixed tick rate of about
| 50ms, but can't afford to have clients vote on it (FPS
| because lag is unacceptable, MMORPGs because of user count).
| There are various approaches in use, depending on whether you
| want to just degrade the experience of the lagging
| participant, or give preference based on action (e.g. FPS
| resolve conflicts in favor of the person who fired).
| matlin wrote:
| There's actually an approach that sits in between this and formal
| CRDTs.
|
| I had the same insight that having a total ordered log of changes
| solves a lot of the issues with concurrent updates. We solved
| this by creating one sequence CRDT that ensures all events from
| clients eventually end up in the same order and then our data
| structures are just reductions over those events. It's a little
| bit like a distributed, synchronized Redux store. This let us
| avoid having to think in terms of CRDT types (LWW registers,
| Grow-only sets, sequences, etc) and just think in terms of the
| domain-specific business logic (e.g. "what should happen when
| someone marks a todo as done, after it's deleted?").
|
| There's a little but more to so if this is interesting to anyone,
| I can write a more detailed explanation.
| lifeisstillgood wrote:
| How do you solve the ordering? Timestamps can be wildly out of
| whack? Be interested to hear more
| matlin wrote:
| We use logical timestamps instead of datetime.
|
| So each "event" has three properties that allow us to resolve
| to a sensible order across clients: session (integer),
| device_id (uuid), order (integer). The `session` number is
| set by the highest `order` that your device has seen from the
| server and the `order` gets incremented on each new event.
|
| So an example you might have a sequence of events like this.
| [session] [order] 0 1 0 2 0 3 ---sync --- 3 4 3 5 3 6
|
| We can then sort all events by (session, device_id, order)
| and we'll get any events that happen on a device to be sorted
| in a row even if some other device created a bunch of
| concurrent events at the same time.
| Karrot_Kream wrote:
| What about situations where two different clients both
| begin to publish new events from the same starting point?
| Those events can't be absolutely ordered right? If you're
| thinking of Lamport-style happens-before relations, then
| you can't enforce total ordering of those events. Do you
| just arbitrarily mark one of those client event streams as
| failed and force the client to absorb new state and try
| again?
| LAC-Tech wrote:
| Isn't that essentially doing a CRDT, but the replica is the
| whole datastore?
|
| Ties in nicely with event sourcing - the ordered sequence of
| events is the CRDT, and sending events between nodes are the
| delta states.
| derefr wrote:
| Have you considered backing your application's state-
| synchronization with a centralized CQRS/ES event store (e.g.
| https://www.eventstore.com/)? You get the same semantics,
| without paying the overhead of building them on top of a CRDT
| abstraction; and the event store itself also "knows" (in the
| sense that it supplies the framework and you supply the logic)
| how to reduce over states in order to build snapshot states
| ("aggregates") for clients to download for fast initial
| synchronization.
| matlin wrote:
| We basically have that as well! We use CRDTs on the client to
| allow for optimistic updates and fast replication of events
| directly to other clients but we also do exactly what you
| describe on the server so that a user loading the data for
| the first time just gets the "snapshot" and then plays events
| on top of it.
| maclockard wrote:
| I actually talked to the drifting in space team about our
| experience implementing multiplayer. Really cool to see their
| findings!
|
| Our team ended up having a similar take away, we wrote a blog
| post detailing our approach and experience here:
| https://hex.tech/blog/a-pragmatic-approach-to-live-collabora...
|
| For the most part is been surprisingly solid! I was very much
| expecting to need to completely rewrite it by now, but its met
| most of our present needs.
|
| The only real shortcoming is not being able to support line
| specific commenting. While we _could_ support it, any minor
| update to the text would result in the comment being marked
| 'stale' with a high false positive rate. I've considered adopting
| CRDT/OT just for the text editing portion to solve this.
| jawadch93 wrote:
| toxicFork wrote:
| If I want to have an app that needs to eventually persist user
| state to a server but still work well when the user is offline,
| e.g. a personal vocabulary app that can work with multiple
| devices but also can work when the user is offline, so the user
| can still define new words manually on both devices, and then
| when the phone is back online, the state still makes sense in
| both, should I use CRDT?
| eternalban wrote:
| A definitive no is the answer. This complexity is necessary
| only if others care about your shared state aka vocabulary,
| and, multiple write ops are being done on same records at the
| same relative time on different devices. For example, a
| document editor _could_ use CRDTs without looking silly. A CRDT
| is an _eventually consistent_ distributed object but all its
| complexity is there to handle concurrent modifications.
|
| In your application, the consistency requirements are pretty
| weak and easy to meet with a basic client/server architecture
| with a well defined sync point set on its life-cycle. For
| example, sync on boot, sync on net-reconnect, etc. Or, you
| could use a simple P2P protocol so on boot you discover your
| other connected devices and your apps can shake hand and
| exchange notes to sync up as they modify their local state:
| "Hey gang, + "KISS". "Keep it simple ..". Use json if you must.
|
| Further simple characteristics of this app is that the state
| ops on your dictionary are commutative (like a CRDT!). On one
| device you add word "Foo" and on the other you add "Bar". You
| do not care (do you?) that you added them in a certain order,
| so you don't even have to bother with what is rather difficult
| to do in a performant manner: maintain a total ordering over
| the state of the system. (Think sharded ordered-sets..)
| fwip wrote:
| There's a few edge cases that you'd probably need to handle
| (like adding the same word with two different definitions),
| but probably few enough that you could do them ad-hoc and
| punt to the user. ("Hey, merging these two sets failed,
| here's the conflict, which do you want to keep?")
|
| Something fancy like Operational Transforms or CRDT might
| make your life easier if you find a library that's ergonomic
| for your use case, but it's definitely not necessary.
| jamil7 wrote:
| The article talks about this, in the case where your clients
| are syncing with a centralised server you don't necessarily
| need to use CRDTs, and it might be simpler not to.
| layer8 wrote:
| This doesn't address how changes that were committed offline
| are merged on the central server once the clients are online
| again.
| dinosaurdynasty wrote:
| Basic CRDTs are still useful (idempotency, associativity, and
| commutativity are powerful mental tools for all distributed
| systems, even OT) but you can _massively_ simplify the CRDTs
| if you assume a super-peer (aka centralized server all
| updates go through).
|
| Example: last-write-wins register, which is a (v, t) pair,
| normally has a merge function of "return the pair with the
| greatest t (with deterministic tie breaking for same t)", and
| for a super peer just have v's and "the latest value to hit
| the server wins".
| robertlagrant wrote:
| Good shoutout to Automerge in this.
| syrusakbary wrote:
| > CRDTs are designed for decentralized systems where there is no
| single central authority to decide what the final state should
| be. There is some unavoidable performance and memory overhead
| with doing this. Since Figma is centralized (our server is the
| central authority), we can simplify our system by removing this
| extra overhead and benefit from a faster and leaner
| implementation
|
| I think this is on point. It's also super refreshing to see the
| work on Aper [1] [2] (a Rust library implementing state machine
| synchronization across a trusted network). Looking forward next
| series of articles here!
|
| [1]: https://aper.dev/
|
| [2]: https://github.com/drifting-in-space/aper
| fwip wrote:
| Just to clarify, Aper also uses a client/server model, correct?
| It appears as though it uses a central server to totally order
| incoming change requests.
| paulgb wrote:
| Yes, it does.
| paulgb wrote:
| Thanks, Syrus!
| eatonphil wrote:
| Great breakdown of CRDTs vs replicated state machines (like Raft
| or VSR) and the difference between eventually convergent and
| globally orderable workloads, thanks for this Paul!
| satvikpendem wrote:
| Does anyone have any resources on making an offline-first app
| from scratch (I don't want to use stuff like Firebase and more
| importantly want to learn how it's done)? Like a task manager for
| example that can sync with an online account but still works
| completely offline.
| LAC-Tech wrote:
| I swear I've looked into firebase multiple times and I have no
| idea how they actually detect and handle conflicts. They just
| hand-wave that it works offline... not that comforting.
|
| Regardless, there's no one size fits all for offline.
| Approaches that work for append only data don't work for data
| where multiple nodes are doing destructive updates. And even if
| you're doing that, some data is amenable to deterministic
| merging (ie, CRDTs) and some you need a user to step in to
| decide (revision based multi-master systems, prior art being
| Pouch/Couch or indeed git).
|
| Basically if you can tell me what exactly a taskmanager is and
| how it might be updated I might be able to say something more
| helfpul!
| lewisjoe wrote:
| This is the most underrated problem with CRDTs:
|
| > Both paths will allow us to ensure that each replica converges
| on the same state, but that alone does not guarantee that this
| state is the "correct" state from the point of view of user
| intent.
|
| In context of richtext editors, it's easy to converge a rich-text
| JSON into a single state for all collaborators. What's difficult
| is to ensure that the converged state is renderable as richtext.
| For example, is there a table cell that was inserted where a
| column was deleted?
|
| I wrote a summary of this issue with CRDTs here -
| https://writer.zohopublic.com/writer/published/grcwy5c699d67...
| for those interested.
|
| While I'm optimistic it's not impossible to solve, it's
| surprising why most CRDT literatures don't go into these details
| on correctness. Appreciate the author for putting this out.
|
| PS: We currently use OT for collaboration in Zoho Writer
| (https://writer.zoho.com/). I'm looking out for practical CRDT
| ideas that works well with richtext.
| truculent wrote:
| To what extent is this an issue with CRDTs and to what extent
| is it an issue with the data structure(s) used to represent the
| state?
| dkjaudyeqooe wrote:
| I think the short answer is: if you want your CRDTs to
| enforce application constraints you need to bake them into
| the data structure.
| truculent wrote:
| That makes sense. Thank you.
| [deleted]
| jkaptur wrote:
| The intuitively obvious approach here is to have some kind of
| schema-guided or constraint-guided CRDT. That is, you get the
| guarantee that not only do you end up with _valid JSON_ , you
| get the guarantee that f(that_json) == true. I imagine this is
| considerably easier said than done, but I'm curious if there's
| research (or results!) in the area.
| quickthrower2 wrote:
| Parts of the tree with "don't merge children" might be ok,
| then you show the user a conflict dialog when this happens.
|
| Or, inspired by your comment: run f for the tree, if false,
| search for the smallest subset where f is false and show the
| conflict dialog for that. The user chooses "mine" or
| "theirs".
| kevsim wrote:
| We also had the OT vs. CRDT discussion with regards to our
| editor at Kitemaker (kitemaker.co), and also ended up with OT.
| For us there were a number of factors, but (in addition to the
| case you mentioned) it came down to the fact that (at that
| time) most of the CRDT libs like automerge generate huge diffs
| that we'd need to send over the wire and also that the open
| source editor we use (SlateJS) had operations that just slotted
| very nicely into OT.
|
| Your editor feels really nice BTW. Well done!
| expede wrote:
| > I'm looking out for practical CRDT ideas that works well with
| richtext.
|
| Have you seen Peritext from Ink & Switch?
| https://www.inkandswitch.com/peritext/ It's relatively new, but
| is a CRDT aimed at rich text!
| jamil7 wrote:
| Their linked summary mentions Peritext and the current
| limitations.
|
| Agree it's exciting though and you can implement it fairly
| easily yourself by following the blog post as it's so well
| written and explained.
| samwillis wrote:
| Exactly, the "C" in CRDT is a little misleading. They are
| "conflict free" in as much as they are able to merge and
| resolve in the basic sense. That does not mean that the
| resulting state is valid or meets your internal schema.
|
| A good example of this is using Yjs with ProseMirror, it's
| possible for two concurrent edits to resolve to a structure
| that doesn't meet your ProseMirror schema. An example I have
| used before is a <figure> element that can contain a single
| optional <caption> element. It's possible for two users to add
| that caption to the figure concurrently, Yjs will happily merge
| its XMLFragment structures and place two captions in the
| figure. However when this is loaded into ProseMirror it will
| see that the document does not match its schema and dump the
| second caption. Fortunately the captions are ordered and so the
| same state will re resolved by all parties.
|
| This is all ok if you are doing the merging on the front end,
| but if you try to merge the documents on the server, not
| involving ProseMirror, the structure you save, and potentially
| index or save as html, will be incorrect.
|
| Point is, it's still essential with CRDTs to have a schema and
| a validation/resolution process. That or you use completely
| custom CRDTs that encodes into their resolution process the
| validation of the schema.
| lewisjoe wrote:
| Your example is spot on :)
|
| I believe we shouldn't lock our richtext state to
| Prosemirror's specs. The state should better be modelled as a
| CRDT and ensuring correctness should be part of CRDT
| operations. This way we unlock other possibilities like
| server-side merging and the editing library can be swapped
| (eg. lexical)
|
| All this sounds right but it's lot of work.
| lijogdfljk wrote:
| Yea, this is kinda why i don't understand a lot of these off
| the shelf CRDT solutions. Like CRDTs for JSON.
|
| I'm still _trying_ to learn CRDTs; but early on in my
| learning process i had the thought that CRDTs don 't have
| mere data types - as you say, they have Data Type + Conflict
| Resolution bundled into one. It's not enough to make a String
| type, because different types need to be resolved
| differently. Plus some types of resolution have a lot more
| metadata, so you want to choose the best fitting one.
|
| I found that my goal to make SQL + CRDT meant i had to expose
| this combo of Type + Resolution to the end user. It seems
| essential to how the app works.
|
| But maybe i don't know anything, /shrug. I'm still learning
| them.
| toomim wrote:
| Yes! We call this the "Merge Type" of data in the braid.org
| group.
|
| Each datum as both a _data type_ and a _merge type_. The
| programmer just needs to specify these two types, and then
| the programming runtime can choose which synchronization
| algorithm to use.
| naasking wrote:
| > Point is, it's still essential with CRDTs to have a schema
| and a validation/resolution process. That or you use
| completely custom CRDTs that encodes into their resolution
| process the validation of the schema.
|
| I'm surprised that MS's concurrent revisions haven't taken
| off because this is what they do by default: you specify a
| custom merge function for any versioned data type so you can
| resolve conflicts using any computable strategy.
| CGamesPlay wrote:
| I don't think I've seen any implementations of this, but the
| table example you listed is solvable. The insight is to stop
| thinking of the table as a bundle of markup and think about it
| as itself a primitive CRDT type that contains text in the table
| cells.
|
| The columns are a CRDT set of column IDs. The rows are a CRDT
| map of row IDs to CRDT maps of column IDs to cell values, where
| the cell values are again the same type as your document text.
| Column headings are a specially formatted row. Rows are ordered
| using a sequence type.
|
| Removing a row is a normal CRDT map deletion. Removing a column
| is a normal CRDT map deletion from the columns map. Rendering
| the table will ignore map entries that don't correspond to real
| columns.
| georgelyon wrote:
| This comment really needs a "yo dawg" meme.
| CGamesPlay wrote:
| I mean, JSON is objects all the way down, CRDTs are
| conflict-free maps all the way down.
| inglor wrote:
| So with your solution, the inserted table cell would
| "disappear" from what is rendered?
|
| That's not necessarily (depends on the product) great or even
| acceptable user experience. As a user I would certainly want
| a way to know there is an "orphaned" "edited after delete"
| cell to salvage the data inputted or understand the confusion
| between editors.
| jameshart wrote:
| What you would 'certainly want' depends very much on the
| application this is implemented in. The data isn't gone
| from the CRDT table unless the program decides it is, so
| many ui choices are possible.
| georgelyon wrote:
| The best UX resolution to this kind of issue I have seen is
| to display "This table cell was added by Bob concurrently
| to Alice removing its row. How would you like to proceed:
| recover the row or ignore the change?"
| drfuchs wrote:
| Who gets the pop-up? If both Alice and Bob, and they
| simultaneously give conflicting replies, you're right
| back to where you started.
| georgelyon wrote:
| Yes, this is the correct behavior. If Alice and Bob
| disagree about whether or not that row should be in the
| table you aren't going to fix it with math. More often
| than not, the expected outcome will be obvious (i.e.
| spelling correction in a deleted cell can be ignored) and
| if it isn't obvious, there will need to be human
| interaction to determine the best path forward.
| layer8 wrote:
| This isn't really a problem. If Alice and Bob keep
| disagreeing, then it is appropriate that a conflict
| remains unresolved.
| Dylan16807 wrote:
| You're not back to where you started. You've gone from a
| technical conflict to a direct disagreement about what is
| supposed to happen. There's no technical solution to an
| edit war. So maybe ask anew about the conflicting
| answers, or pick one arbitrarily at that point. Or make
| them play rock paper scissors. But it still helps to ask
| _before_ it becomes a direct disagreement.
| jlokier wrote:
| Usually that doesn't happen because their replies don't
| conflict within the time interval of a round trip between
| the two of them, so one can see the other's action first,
| and decide to agree.
|
| But if it does happen, either they get a new popup,
| eventually, saying that their conflict-resolution
| decisions also conflicted and what would they like to do,
| or the system decays to a CRDT-style pre-determined
| resolution rule for the second-order conflict -- with
| similar problems to the original CRDT resolution rule we
| tried to avoid. Things like "give priority to the person
| who chose to keep information instead of deleting", "give
| priority to the person with the biggest random number",
| "merge them by keeping information along with a note in
| the document about the attempt to delete the column", or
| "delete the column and put the edited cell in a note".
| But this time, only when the users conflict after the
| first popup, so it doesn't occur as often.
|
| I think you're describing a metastability effect which
| occurs in all distributed systems that don't use pre-
| determined resolution rules like pure CRDTs. It happens
| in network protocols, distributed transactions, and
| consensus protocols in general. If the process you use to
| resolve depends on input from multiple parties who don't
| have a way to communicate and could choose conflicting
| values, there's a non-zero probability of needing another
| round or to escalate upwards to a higher-level system to
| resolve.
|
| In many technical systems, _provided there 's a way to
| communicate_ it's not difficult to make the probability
| of repeated rounds or escalations tend arbitrarily close
| to zero (but not actually zero). If there's a possibility
| of network partition, though, the conflict can come back
| later when communication returns; there's no way to
| completely make it go away.
| tomhallett wrote:
| Stupid question - are there any good resources on User
| Experience patterns/recipes for implementing OT/CRDTs?
|
| Context - a lot of the literature is about the backend
| algorithm. But that still leaves individual teams
| tripping over the frontend interaction design and
| usability.
|
| Example of what I'm looking for - hex.tech has a video
| tutorial of their "Cell locking" feature, where the cell
| gets disabled and a "Mac Lockard has taken control of a
| cell you were editing" toast notification appears. [1].
| This is fully baked and a recipe which if I put in a jira
| ticket could actually get implemented. :)
|
| I would love to read a book which is a "Don't make me
| think" and has an example with "This table cell was added
| by Bob concurrently to Alice removing its row. How would
| you like to proceed: recover the row or ignore the
| change?". All the UI's in my head would look like
| "Clippy" from MS Word - or "Pipey" from Silicon Valley
| [2], haha.
|
| 1: https://hex.tech/blog/a-pragmatic-approach-to-live-
| collabora...
|
| 2: https://www.youtube.com/watch?v=E2YcOV5C2x4
| toomim wrote:
| This is close: https://decentpatterns.com/library/
|
| ...but my general belief is that UI patterns need to
| evolve as (1) specific UIs before they become (2) general
| patterns, and we simply don't have enough good solutions
| yet to make a pattern library out of them.
| jdvh wrote:
| For the table situation we solve this by have a generic concept
| of a "card" that is rendered as a table cell when it's the
| grandchild of a table (table > tr > td) but otherwise it's just
| a standalone item, much like a markdown callout.
| aboodman wrote:
| This is a cool approach. It reminds me of statebox by mochimedia:
| https://github.com/mochi/statebox.
|
| If I'm understanding correctly, it requires the mutations to be
| deterministic in order for the nodes to converge.
|
| Replicache (replicache.dev - my thing) takes a similar approach
| except it does _not_ requires the mutations to be deterministic,
| which is very useful because it enables the mutations to reach
| out to other services, be authenticated, etc.
|
| Both the idea here and Replicache's approach are closely related
| to game networking. If you are interested in these ideas, a
| really excellent set of content is:
| https://www.gabrielgambetta.com/client-server-game-architect....
|
| If you are interested in how Replicache's sync protocol works and
| achieves transactional changes with server authority, it is
| described here: https://doc.replicache.dev/concepts/how-it-works.
| anonymousDan wrote:
| Am I missing something? I'm all for SMR when possible but the
| whole point of moving to something with weaker consistency is
| that you don't want to rely on availability of a central server
| (or quorum of servers in the case of SMR)?
| LAC-Tech wrote:
| First off, well done on examining your problem domain, realising
| that your operations are not in-fact commutative, and realising
| you need to discard your model. A lot of people would have just
| decided it was 'agile' to 'iterate' over there fundamentally
| broken solution until they had coded themselves in a corner, but
| I digress.
|
| I would note that your path 2 is essentially another kind of CRDT
| though. Instead of your replica being in individual document,
| your replica is now the entire DB, and your changes are a grow
| only sequence. (Wish I could remember how pointed this out to me
| on HN, my mind was blown).
| wim wrote:
| For certain data structures the "pure" CRDT approach can get
| tricky. In our case, we're building a collaborative notes/task
| IDE [1] which is a hybrid of a text editor and an outliner at the
| same time. So you can perform all the usual text editor
| operations on a document as if it was text, but the underlying
| data structure is a tree (or graph really).
|
| So just like the example in article we use a hybrid approach and
| global order for certain (tree) operations to enforce invariants
| on the data structure when syncing. That still works with
| offline-mode and end-to-end encryption, but as we don't have the
| requirement to get rid of a central server we can get away with
| this model.
|
| [1] https://thymer.com
| rubyist5eva wrote:
| "You might not (probably need) need $FANCY_TRENDY_THING"
| acrefoot wrote:
| Whatever Clickup uses for their Clickup docs, it reliably gets
| into a confused state and data is lost, during which not all
| editors show the same state locally.
| hardwaregeek wrote:
| I really loved the Signals and Thread episode[0] on state machine
| replication. It basically said "you could do all of these fancy
| distributed systems techniques that assume a non total ordering
| of events, ooooorr you could have a main server that determines a
| total ordering for events and make your life infinitely easier"
|
| [0]: https://signalsandthreads.com/state-machine-replication-
| and-...
| Joeri wrote:
| It breaks down as soon as you have an offline scenario. You can
| only assume the server is authorative if there will not be
| network partitions or if writes are not accepted on the other
| side of the network partition (meaning on the client). As soon
| as the client accepts writes while offline those writes become
| authorative and will need to be merged with whatever changes
| the server saw.
| layer8 wrote:
| It also breaks down when you have a passive/non-transactional
| server, like an app syncing its state via files on Dropbox.
| derefr wrote:
| Or even just have a simple central presence server hand out
| randomized-per-epoch node-IDs; have all nodes in the system use
| node-IDs as the prefix of the event temporal ordering key per
| "tick"; and trust your nodes to self-report their node-ID
| correctly in all events they send. Then your data protocol can
| still be p2p rather than all events having to flow through the
| server (if that benefits you in your problem domain) while
| getting all the same advantages of total ordering.
|
| And this works great for anything enterprise-y, where all the
| nodes are operated by one party or a trusted consortium of
| parties, and no nodes are are intentionally trying to cheat by
| misreporting their assigned node-IDs. You only _need_ central
| servers (or virtual central servers, a.k.a. blockchains) when
| you get to things like MMO games, or HFT trading bots talking
| to an exchange, where every client has a reason to want to
| pretend it 's the one that "should" go earliest in the per-tick
| transaction ordering.
| lifeisstillgood wrote:
| But if I have a conflict (ie one node changes a column
| datatype, another adds data to column) how do i know which
| one went first _inside the same tick?_
|
| If I understand you, there will be a tick between 12.01 and
| 12.02, and node A gets token 1234, and node B gets 5678.
| Events are published p2p, and the ordering is 5678_+9seconds
| and 1234_+12seconds - node A (1234) admits it has done the
| action 3 seconds after B. But this still depends on
| synchronised clocks I think?
|
| Also you still need to resolve each type of conflict (hence
| having CRDTs really helps) - we know there is a conflict
| between A and B changes, but how do we _resolve_ them?
|
| Maybe I dont get it :-)
| meheleventyone wrote:
| Yeah using an event stream like this requires synchronising
| the clock of participants. It's similar to rollback
| networking in games. Peers can recover from receiving late
| updates but if all the updates are late from one peer the
| situation still degrades. To avoid that participants run
| time per tick slightly shorter or longer to keep within a
| desired window.
| pookha wrote:
| I thought James Long's 200 or three hundred lines of code to
| implement a real-world CRDT solution (Actual) was pretty cool.
| Seemed to solve a lot of issues revolving around heavy duty
| servers.
| [deleted]
| mathgladiator wrote:
| Another reason to not need a CRDT is privacy. A CRDT assumes that
| every participate has equal rights to all data.
|
| I'm biased to simple WebSockets with OT using JSON deltas, and
| I'm building a platform that greatly simplifies how to
| efficiently build collaborative apps: https://www.adama-
| platform.com/
| CGamesPlay wrote:
| These are orthogonal to one another. For read-protection,
| encryption is a viable solution: specific paths into your data
| structure (e.g. certain keys in a map) can be encrypted with a
| key that isn't provided to all participants. For write-
| protection, the solution is the same as with any syncing
| service: when receiving an update, if you don't authorize the
| sender of the update, you don't accept it.
| samwillis wrote:
| > A CRDT assumes that every participate has equal rights to all
| data.
|
| Within a single data structure yes, so within a "room" or
| "document". If you need to partition right to the data
| (read/update) then you should use separate structures for each
| area.
|
| Yjs implements this as as "sub documents".
| charles_f wrote:
| That sounds like a weak counter argument to me, 1) crdt is
| focusing on the conflict resolution part, access control is not
| in scope so you need to implement on top. 2) if you are already
| implementing access control, then filtering out what people
| don't have access to doesn't seem much more complexity. If you
| go the full client-side way, you can also add a cryptographic
| layer to whatever needs hiding 3) one thing I don't see in your
| solution is the offline aspect to it. With a central authority
| in the middle and online connectivity, conflicts become
| unlikely and much smaller, but with offline support, documents
| et can evolve in drastic ways where having a formalized
| strategy like what crdt offers makes sense
| pookha wrote:
| CRDT's can be encrypted end-to-end. Privacy seems to be one of
| the selling points.
| pattrn wrote:
| Is this really the case? While CRDT's are designed to work
| peer-to-peer, they don't need to be fully connected to all
| clients. Forcing the synchronization through controlled nodes
| (a server or cluster) allows adding read/write permissions.
| Depending on the use case, it may require additional logic for
| reversing operations before propagating to other clients, or in
| some cases forcing a client to revert to a snapshot (this can
| be a bit complex). That's an approach I've used in the past.
|
| Have I overlooked something (highly likely)?
| dxchester wrote:
| We have used Automerge a bunch, but found that there is a
| threshold where beyond a given document size, performance gets
| exponentially worse, until even trivial updates take many
| seconds' worth of CPU. That is often how it works when the
| document end state is exclusively the sum of all the edits that
| have ever happened.
|
| Our answer was to reimplement the Automerge API with different
| mechanics underneath that allows for a "snapshots + recent
| changes" paradigm, instead of "the doc is the sum of all
| changes". That way performance doesn't have to degrade over time
| as changes accumulate.
|
| Project is here: https://github.com/frameable/pigeon, with some
| benchmarks: https://github.com/frameable/pigeon/wiki/Benchmarks
| in the wiki...
| supermatt wrote:
| > Conflict-free replicated datasets (CRDTs)
|
| Conflict-free Replicated Data Types, surely?
| paulgb wrote:
| Oof, in the first sentence, too, that's embarrassing. Fixed,
| thanks!
| supermatt wrote:
| No problem :)
| nobu-mori wrote:
| CRDTs have been on HN a lot recently. I'm working on a database
| that deals in events rather than raw data. Application developers
| specify event handlers in JavaScript. The database layer then
| takes the event handlers and distributes them to the clients.
| Clients receive the raw stream of events and reconcile the final
| data state independently. The key aspect is all the event
| handlers are reversible. This allows clients to insert locally
| generated events immediately. If any remote events are received
| out-of-order, the client can undo events, insert the new events,
| and reapply everything on top of the new state.
|
| I'm curious how many people have a need for solving the real-time
| multi-player use case. The database I'm working on was inspired
| by the networking code used in fighting games (rollback netcode),
| but I'm always curious to learn about additional use cases.
| no_wizard wrote:
| I think the most common outside of gaming is collaboration
| software. Think whiteboards, docs (e.g. Google Docs) and other
| collaborative tools.
___________________________________________________________________
(page generated 2022-12-05 23:00 UTC)