[HN Gopher] Homomorphically Encrypting CRDTs
___________________________________________________________________
Homomorphically Encrypting CRDTs
Author : jakelazaroff
Score : 181 points
Date : 2025-06-18 12:59 UTC (10 hours ago)
(HTM) web link (jakelazaroff.com)
(TXT) w3m dump (jakelazaroff.com)
| qualeed wrote:
| It doesn't actually say it anywhere, so: CRDT = Conflict-free
| replicated data type.
|
| https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...
| Xeoncross wrote:
| e.g. commonly used by applications that need to deal with
| multiple people editing a document.
| MangoToupe wrote:
| ...or, really, any kind of conflict-free syncing.
| Joker_vD wrote:
| > Now, however, the server can no longer understand the changes
| you send. If you want to see your friend's latest changes, you'll
| need to both be online at the same time.
|
| What? No, the server sends you the changes you've not seen yet,
| you decrypt and merge them, and so you get the latest version of
| the document. Right?
|
| The homomorphic encryption is a fascinating topic, but it's
| almost never an answer if you need anything resembling reasonable
| performance and/or reasonable bandwidth.
|
| I've seen a paper that ingeniuously uses homomorphic encryption
| to implement arbitrary algorithmic computations, totally secret,
| by encoding a (custom-crafted) CPU together with RAM and then
| running "tick a clock" algorithm on them. And it works, so you
| can borrow some AWS huge instance and run you super-important
| calculations there -- at 1 Hz. I am not kidding, it's literally 1
| virtual CPU instruction per second. Well, if you are okay with
| such speed and costs, you either have very small data -- at which
| point just run your computation locally, or you're really, really
| rich -- at which point just buy your own goddamn hardware and,
| again, run it locally.
| zawaideh wrote:
| If the server can't operate on the content, it can't merge it
| into the CRDT documents. Which means it would need to sending
| and receiving the entire state of the CRDT with each change.
|
| If the friend is online then sending operations is possible,
| because they can be decrypted and merged.
| Joker_vD wrote:
| I... still can't make heads or tails out of this description.
| Let me restate how I understand the scheme in TFA: there are
| two people, editing on the same document using CRDTs. When
| one person makes an edit, they push an encrypted CRDT to the
| sync server. Periodically, each of them pulls edits made by
| the other from the sync server, apply them to their own copy,
| and push the (encrypted) result back. Because of CRDT's
| properties, they both end up with the same document.
|
| This scheme doesn't require them two people to be on-line
| simultaneously -- all updates are mediated via the sync
| server, after all. So, where am I wrong?
| eightys3v3n wrote:
| I think the difference in understanding is that the article
| implies, as I understand it, that the server is applying
| the changes to the document when it receives a change
| message, not the clients. If the clients were applying the
| changes then we don't need Homomorphic encryption in the
| first place. The server would just store a log of all
| changes; cleaning it up once it was sure everyone played
| the changes if that is possible. Without Homomorphic
| encryption, the server must store all changes since some
| full snapshot and a full snapshot of the document. Where as
| with it, the server only ever stores the most recent copy
| of the document.
|
| This could be done to reduce the time required for a client
| to catch up once it comes online (because it would need to
| replay all changes that have happened since it last
| connected to achieve the conflict free modification). But
| the article also mentions something about keeping the
| latest version quickly accessible.
| Joker_vD wrote:
| Article literally starts with One way
| to solve this is end-to-end encryption. You and your
| friend agree on a secret key, known only to each
| other. You each use that key to encrypt your
| changes before sending them, decrypt them upon receipt,
| and no one in the middle is able to listen in.
| Because the document is a CRDT, you can each
| still get the latest document without the sync server
| merging the updates. That is indeed
| a solution,
|
| but then for some reason claims that this schemes
| requires both parties to be on-line simultaneously. No,
| it doesn't, unless this scheme is (tacitly) supposed to
| be directly peer-to-peer which I find unlikely: if it
| were P2P, there would be no need for "the sync server" in
| the first place, and the description clearly states that
| in this scheme it doesn't do anything with document
| updates except for relaying them.
| jakelazaroff wrote:
| Hi, author here! The scenario was meant to be high level
| -- I guess I should have gotten more into the various
| architectures and tradeoffs, but the article is already
| pretty long.
|
| The way I see it there are a couple of ways this can
| shake out:
|
| 1. If you have a sync server that only relays the updates
| between peers, then you can of course have it work
| asynchronously -- just store the encrypted updates and
| send them when a peer comes back online. The problem is
| that there's no way for the server to compress any of the
| updates; if a peer is offline for an extended period of
| time, they might need to download a ton of data.
|
| 2. If your sync server can merge updates, it can send
| compressed updates to each peer when it comes online. The
| downside, of course, is that the server can see
| everything.
|
| Ink & Switch's Keyhive (which I link to at the end)
| proposes a method for each peer to independently agree on
| how updates should be compressed [1] which attempts to
| solve the problems with #1.
|
| [1] https://github.com/inkandswitch/keyhive/blob/main/des
| ign/sed...
| killerstorm wrote:
| There's another option: let the clients do the
| compression. I.e. a client would sign & encrypt a message
| "I applied messages 0..1001 and got document X". Then
| this can be a starting point, perhaps after it's signed
| by multiple clients.
|
| That introduces a communication overhead, but is still
| likely to be orders of magnitude cheaper than homomorphic
| encryption
| vhcr wrote:
| Now you need a consensus algorithm.
| nixpulvis wrote:
| I would really love this to be added to the article (or a
| followup), since it was my conclusion as well, but most
| readers are going to be thinking the same thing.
|
| I'd also love to know how balancing the trade off of
| compute time between FHE and the bloat of storing large
| change sets affects latency for online and offline cases.
|
| Perhaps, as with many things, a hybrid approach would be
| best suited for online responsiveness and offline network
| and storage use?
|
| Admittedly, I haven't read the linked research papers at
| the end. Perhaps they have nice solutions. Thanks for
| that.
| jakelazaroff wrote:
| Okay, I updated it to hopefully clarify!
| josephg wrote:
| > The problem is that there's no way for the server to
| compress any of the updates; if a peer is offline for an
| extended period of time, they might need to download a
| ton of data.
|
| There are ways to solve this, using deterministic message
| chunking. Essentially clients compress and encrypt
| "chunks" of messages. You can use metadata tags to tell
| the server which chunks are being superseded. This is
| fast and efficient.
|
| Alex Good gave a talk about how he's implementing this in
| automerge at the local first conference a few weeks ago:
|
| https://www.youtube.com/watch?v=neRuBAPAsE0
| jason_oster wrote:
| The protocol half of most real world CRDTs do not want to
| send the raw stream of changes. They prefer to compress
| changes into a minimal patch set. Each patch set is
| specific to individual peers, based on the state of their
| local CRDT at merge time.
|
| The naive raw stream of changes is far too inefficient
| due to the immense amount of overhead required to
| indicate relationships between changes. Changing a single
| character in a document needs to include the peer ID
| (e.g., a 128-bit UUID, or a public key), a change ID
| (like a commit hash - also about 128-bit), and the
| character's position in the document (usually a reference
| to the parent's ID and relative marker indicating the
| insert is either before or after the parent).
|
| The other obvious compression is deletions. They will be
| compressed to tombstones so that the original change
| messages for deleted content does not need to be relayed.
|
| And I know it is only implied, but peer to peer
| independent edits are the point of CRDTs. The "relay
| server" is there only for the worst case scenario
| described: when peers are not simultaneously available to
| perform the merge operation.
| ath92 wrote:
| Generally, this is not really true. The point of CRDTs is
| that as long as all parties receive all messages (in any
| order), they should be able to recreate the same state.
|
| So instead of merging changes on the server, all you need is
| some way of knowing which messages you haven't received yet.
| Importantly this does not require the server to be able to
| actually read those messages. All it needs is some metadata
| (basically just an id per message), and when reconnecting, it
| needs to send all the not-yet-received messages to the
| client, so it's probably useful to keep track of which client
| has received which messages, to prevent having to figure that
| out every time a client connects.
| charcircuit wrote:
| If it takes 1 seconds per merge as per the article it
| sounds like a poor user experience for when new people join
| they have to wait hundreds or thousands of seconds to get
| to the doc.
| ctde wrote:
| it was 0.5 ns .. 1s was the FHE case
| josephg wrote:
| You're talking past each other. These are both valid
| descriptions of CRDTs - just different types of CRDTs.
|
| Generally there's two categories of CRDTs: state based and
| operation based CRDTs.
|
| State based CRDTs are like a variable with is set to a new
| value each time it changes. (Think couchdb if you've used
| it). In that case, yes, you generally do update the whole
| value each time.
|
| Operation based CRDTs - used in things like text editing -
| are more complex, but like the parent said, deal with
| editing events. So long as a peer eventually gets all the
| events, they can merge them together into the resulting
| document state. CRDTs have a correctness criteria that the
| same set of operations always merges into the same
| document, on all peers, regardless of the order you get the
| messages.
|
| Anyway, I think the parent comment is right here. If you
| want efficient E2E encryption, using an operation based
| crdt is probably a better choice.
| clawlor wrote:
| There are variants of CRDTs where each change is only a state
| delta, or each change is described in terms of operations
| performed, which don't require sending the entire state for
| each change.
| blamestross wrote:
| > Which means it would need to sending and receiving the
| entire state of the CRDT with each change. > If the friend is
| online then sending operations is possible, because they can
| be decrypted and merged.
|
| Or the user's client can flatten un-acked changes and tell
| the server to store that instead.
|
| It can just allways flatten until it hears back from a peer.
|
| The entire scenario is over-contrived. I wish they had just
| shown it off instead of making the lie of a justification.
| killerstorm wrote:
| It's very common for CS and cryptography-adjacent papers to
| describe something impractical. Even more impractical than what
| you described - e.g. complexity of an attack is reduced from
| 2^250 to 2^230.
|
| The purpose of these papers is to map out what's possible, etc,
| which might at some point help with actual R&D.
| teleforce wrote:
| >Runtime performance is also -- to put it lightly -- lacking. I
| benchmarked the unencrypted and encrypted versions of the last
| write wins register on an M4 MacBook Pro. The unencrypted one
| averaged a merge time of 0.52 nanoseconds.The encrypted one? 1.06
| seconds. That's not a typo: the homomorphically encrypted merge
| is two billion times slower.
|
| Ouch!
| thrance wrote:
| Another cool use of FHE, allowing to browse wikipedia privately
| while still online: https://news.ycombinator.com/item?id=31668814
| plopilop wrote:
| As the article mentions, fully homomorphic encryption is insanely
| slow and inefficient. But I have to say that it is a relatively
| new field (the first FHE scheme was discovered in 2009), and that
| the field has immensely progressed over the last decade and a
| half.
|
| The first FHE scheme required keys of several TB/PB,
| bootstrapping (an operation that is pivotal in FHE schemes, when
| too many multiplications are computed) would take thousands of
| hours. We are now down to keys of "only" 30 MB, and bootstrapping
| in less than 0.1 second.
|
| Hopefully progress will continue and FHE will become more
| practical.
| 6r17 wrote:
| CRDTs are also crazy slow due to their architecture ; even the
| best alg out there are costly by design ; so adding homomorphic
| encryption is even more of a challenge ; tough it really is
| impressing I'm curious if this can be usable at all;
|
| edit so i bring some "proof" of my claim: from this very page :
| `To calculate the new map, the server must go through and merge
| every single key. After that, it needs to transfer the full map
| to each peer -- because remember, as far as it knows, the
| entire map is different.`
| __MatrixMan__ wrote:
| Is it the CRDT that's slow there, or is the problem that
| they've made it one party's job to update everybody?
|
| By having a server in the mix it feels like we're forcing a
| hub/spoke model on something that wants to be a partial mesh.
| Not surprising that the hub is stressed out.
| svieira wrote:
| The whole point of Conflict-free Replicated Data Types is
| that you don't need an authoritative server. You're
| thinking of Operational Transform which does require an
| authority.
| __MatrixMan__ wrote:
| I may have worded that poorly but my intent was to
| suggest that it would work better if there wasn't a
| server.
|
| I didn't know the name for the serverful alternative
| though, so thanks for that.
| dtkav wrote:
| While it is true that CRDTs don't require an
| authoritative server, the hub and spoke model (which
| could also be thought of as having a well-known always-
| online super peer) is more efficient and provides a
| better user experience. In practice most products that
| are built with CRDTs today use this model.
| Asraelite wrote:
| > CRDTs are also crazy slow due to their architecture
|
| What kinds of CRDTs specifically are you referring to? On its
| own this statement sounds far too broad to be meaningful.
| It's like saying "nested for loops are crazy slow".
| jason_oster wrote:
| CRDTs are not inherently "crazy slow". Researchers just don't
| succumb to the appeal of premature optimization.
|
| See: https://josephg.com/blog/crdts-go-brrr/
|
| (And even these optimizations are nascent. It can still get
| so much better.)
|
| The section you quoted describes an effect of homomorphic
| encryption alone.
|
| There is the problem that both CRDTs and encryption add some
| overhead, and the overhead is additive when use together. But
| I can't tell if that is the point you are trying to make.
| hansvm wrote:
| > additive
|
| The overhead is usually multiplicative per-item. Let's say
| you're doing N things. CRDTs make that O(Nk) for some
| scaling factor k, and adding encryption makes it O(Nkj) for
| some scaling factor j.
|
| Give or take some multiplicative log (or worse) factors
| depending on the implementation.
| josephg wrote:
| Yep. Author here - that article is out of date now. I
| should really do a followup. Performance of CRDTs has
| improved again through a new grab bag of tricks. I've also
| been told the beta of automerge 3 uses a lot of the
| optimisations in that post, and it's now much faster as a
| result.
|
| A crdt library should be able to handle millions of changes
| per second. If it's the bottleneck, something somewhere has
| gone wrong.
| motorest wrote:
| > CRDTs are also crazy slow due to their architecture ;
|
| You must back up your extraordinary claim with some
| extraordinary evidence. There is nothing inherently slow in
| CRDTs.
|
| Also, applying changes is hardly on anyone's hot path.
|
| The only instance where I saw anyone complaining about CRDT
| performance, it turned out to be from very naive
| implementations that tried to spam changes with overly chatty
| implementations. If you come up with any code that requires a
| full HTTPS connection to send a single character down the
| wire, the problem is not the algorithm.
| MangoToupe wrote:
| > CRDTs are also crazy slow
|
| compared to what? c'mon
| westurner wrote:
| Should students trust and run FHE encrypted WASM or JS grading
| code that contains the answers on their own Chromebooks; for
| example with JupyterLite and ottergrader?
|
| On code signing and the SETI@home screensaver
| gritzko wrote:
| The first CRDTs have been remarkably impractical, e.g. WOOT[0].
| These days, state-of-the-art CRDT databases are not much
| different from your regular LSM performance-wise. For example,
| RDX CRDTs[1,2] are all implemented by a merge-sort-like
| algorithm, pure O(N). Metadata overheads have been tamed in
| most implementations.
|
| [0]: https://github.com/el10savio/woot-crdt
|
| [1]: https://github.com/gritzko/librdx
|
| [2]: https://github.com/gritzko/go-rdx
| 867-5309 wrote:
| _Conflict-Free Replicated Data Type_
| scyclow wrote:
| Encrypt the diffs for the server and write the hash to a
| blockchain to manage the ordering. Boom, problem solved without
| HME.
| ergl wrote:
| You don't need ordering guarantees for diffs: apply them out of
| order and the final result should be the same (that's one of
| the key properties of CRDTs).
| NetRunnerSu wrote:
| FHE is indeed slow, but the progress since 2009 is truly
| remarkable. Bootstrapping speed alone improved by tens of
| millions of times, and tfhe-rs already demonstrates homomorphic
| AES-128. Real-time FHE for AI inference/training feels
| increasingly plausible.
|
| > https://github.com/sharkbot1/tfhe-aes-128
| ProofHouse wrote:
| Can I double click on 'plausible' ?
| ketralnis wrote:
| Sure but it will just select the word in your browser
| throitallaway wrote:
| Someone's a Chamath fan.
| mihau wrote:
| Sorry for going off-topic, but kudos for UI/UX on your blog!
|
| To name a few: Nice style (colors, font, style), "footnotes"
| visible on the margin, always on table of contents, interactivity
| and link previews on hover.
|
| Nice. What's your tech stack?
| somezero wrote:
| FHE is simply the wrong tool here. FHE is for a central server
| operating on data held/known by another. They want MPC -multiple
| parties jointly computing on distributed data- and that's fairly
| more efficient.
| hoppp wrote:
| I agree. I also often want to use FHE only to realize there are
| better tools for the job and better ways to do things.
|
| The use-case is pretty niche
| meindnoch wrote:
| I like it! The slowness and inefficiency of homomorphic
| encryption is nicely complemented by the storage bloat of CRDTs.
| yusina wrote:
| I'm not sure I get the premise of the article. I know what a CRDT
| is and how homomorphic encryption works. But why do both parties
| have to be online at the same time to sync? They could send the
| updates asynchronously, store-and-forward style, and everything
| in flight is encrypted. Why does this need server that keeps
| state (which is kept in encrypted state and modified, as per the
| proposal)?
| noam_k wrote:
| I think the issue here is that the server would have to store a
| copy of the register per peer, as it can't calculate which one
| is the most recent. Using FHE allows the server to hold a
| single copy.
|
| In other words the server could forward and not store if all
| parties are always online (at the same time).
| yusina wrote:
| So it's "just" a storage optimization?
| neon_me wrote:
| Server will store encrypted blob and its hash/etag.
|
| Client before upload of data, check for hash/etag of blob he
| originally fetched. If blob on server has different one, it
| will download it, decrypt, patch new data on existing one,
| encrypt and reupload.
|
| Whats the catch?
|
| AES is hardware accelerated on the most devices - so with all
| the ops it will be significantly faster than any homomorphic
| enc nowadays.
| nixpulvis wrote:
| I too was wondering the same thing. FHE is cool tech, but
| this seems to me to be a bad application of it since it
| will undoubtedly be less efficient.
|
| FHE is useful when trying to compute on data from various
| sources who all mutually want to keep some information
| secret. For example, Apple's use of FHE to categorize
| photos [1]. In this case all the server is really doing is
| "compressing" for lack of a better word, the change sets,
| so each offline client doesn't need to sync every message
| since they are already merged by the server.
|
| If all you want is to keep a synchronizing server in the
| dark, but all clients can be trusted with the unencrypted
| data, traditional key exchange and symmetric encryption
| should suffice.
|
| [1]:
| https://machinelearning.apple.com/research/homomorphic-
| encry...
| drob518 wrote:
| Love the presentation.
| hoppp wrote:
| Don't use THFE-rs because Zama requires a commercial license for
| anything other than prototyping. Their license sucks.
|
| Instead use openFHE (C++) or lattigo (golang) libraries which are
| both free to use commercially.
___________________________________________________________________
(page generated 2025-06-18 23:00 UTC)