[HN Gopher] Sync Engines Are the Future
       ___________________________________________________________________
        
       Sync Engines Are the Future
        
       Author : GarethX
       Score  : 288 points
       Date   : 2025-03-18 10:18 UTC (3 days ago)
        
 (HTM) web link (www.instantdb.com)
 (TXT) w3m dump (www.instantdb.com)
        
       | theamk wrote:
       | TL/DR:
       | 
       | > If your database is smart enough and capable enough, why would
       | you even need a server? Hosted database saves you from the
       | horrors of hosting and lets your data flow freely to the
       | frontend.
       | 
       | (this is a blog of one such hosted database provider)
        
         | Sytten wrote:
         | That quote is why security people will always be employed.
         | 
         | Jokes aside firebase access control is a nightmare and all
         | those database as an APi thing have the same problem.
        
         | SuperNinKenDo wrote:
         | Apropos of the other reply to you about security. Maybe some
         | security people could let me know their thoughts on this.
         | 
         | It seems like generally, the best way to expose your database
         | to the internet is considered to be not doing so in the first
         | place, i.e., have your webserver query and cache a hosted
         | database that isn't directly exposed.
         | 
         | Is my understanding correct? It seems that almost all data
         | breaches we hear about are directly exposed databases or their
         | cloud equivalents.
         | 
         | Is doing this in the era of "cloud" being made impossible?
        
           | worthless-trash wrote:
           | I don't think most largescale breaches are directly exposed
           | databases, they are just the ones that summon the largest
           | face palms.
        
           | TeMPOraL wrote:
           | That's in some sense a "Swiss cheese security model". It's
           | not that databases should, _in principle_ , never be directly
           | exposed. It's that they rarely are designed for it security-
           | wise[0]; meanwhile, adding whatever complex assembly of
           | containers and applications written in random languages and
           | frameworks, to sit between users and the database, introduces
           | a swamp of better-secured systems that attackers also needs
           | to get through. The more cruft you pile on, the more annoying
           | it gets for attackers and users alike.
           | 
           | In fact, there are many benefits of directly exposed
           | databases - many of which would remove the need for
           | applications normally sitting on top of those databases,
           | which are strictly inferior and less ergonomic and overall
           | more shitty than a generic database browsing interface. But
           | that's another reason for why things are the way they are:
           | people wanna make money, and having your application be a
           | toll booth between useful data you own and the rest of the
           | world, is tried and true way of making money.
           | 
           | --
           | 
           | [0] - Because they're not normally exposed, because they're
           | not designed for it, because... it's a self-reinforcing loop.
        
         | dvrp wrote:
         | What do you propose as solution for companies to be able to
         | embrace more "liberating" philosophies such as anti-lock-in
         | measures or copyleft-friendly measures?
         | 
         | It seems that solving that is a cultural/economic problem, not
         | a technical one, and that's a shame.
        
       | tbrownaw wrote:
       | > _decoupled from the horrors of an unreliable network_
       | 
       | The first rule of network transparency is: the network is not
       | transparent.
       | 
       | > _Or: I've yet to see a code base that has maintained a separate
       | in-memory index for data they are querying_
       | 
       | Is boost::multi_index_container no longer a thing?
       | 
       | Also there's SQLite with the :memory: database.
       | 
       | And this ancient 4gl we use at work has in-memory tables (as in
       | database tables, with typed columns and any number of unique or
       | not indexes) as a basic language feature.
        
         | anonyfox wrote:
         | In Elixir/Erlang thats quite common I think, at least I do this
         | for when performance matters. Put the specific subset of
         | commonly used data into a ETS table (= in memory cache,
         | allowing concurrent reads) and have a GenServer (who owns that
         | table) listen to certain database change events to update the
         | data in the table as needed.
         | 
         | Helps a lot with high read situations and takes considerable
         | load off the database with probably 1 hour of coding effort if
         | you know what you're doing.
        
         | TeMPOraL wrote:
         | > _Is boost::multi_index_container no longer a thing?_
         | 
         | Depends on the shop. I haven't seen one in production so far,
         | but I don't doubt some people use it.
         | 
         | > _Also there 's SQLite with the :memory: database._
         | 
         | Ah, now that's cheating. I know, because I did that too. I did
         | that because of the realization that half the members I'm
         | stuffing into classes to store my game state are effectively
         | poor man's hand-rolled tables, indices and spatial indices, so
         | _why not just use a proper database for this?_.
         | 
         | > _And this ancient 4gl we use at work has in-memory tables (as
         | in database tables, with typed columns and any number of unique
         | or not indexes) as a basic language feature._
         | 
         | Which one is this? I've argued in the past that this is a basic
         | feature _missing from 4GL languages_ , and a lot of work in
         | every project is wasted on hand-rolling in-memory databases
         | left and right, without realizing it. It would seem I've missed
         | a language that recognized this fact?
         | 
         | (But then, so did most of the industry.)
        
           | phyrex wrote:
           | ABAP, the SAP language has that, if i remember correctly
        
           | tbrownaw wrote:
           | > _Which one is this? I 've argued in the past that this is a
           | basic feature missing from 4GL languages, and a lot of work
           | in every project is wasted on hand-rolling in-memory
           | databases left and right, without realizing it. It would seem
           | I've missed a language that recognized this fact?_
           | 
           | https://en.wikipedia.org/wiki/OpenEdge_Advanced_Business_Lan.
           | ..
           | 
           | Dates back to 1981, called "Progress 4GL" until 2006.
           | 
           | https://docs.progress.com/bundle/abl-
           | reference/page/DEFINE-T...
        
       | ximm wrote:
       | > have a theory that every major technology shift happened when
       | one part of the stack collapsed with another.
       | 
       | If that was true, we would ultimately end up with a single layer.
       | Instead I would say that major shifts happen when we move the
       | boundaries between layers.
       | 
       | The author here proposes to replace servers by synced client-side
       | data stores.
       | 
       | That is certainly a good idea for some applications, but it also
       | comes with drawbacks. For example, it would be easier to avoid
       | stale data, but it would be harder to enforce permissions.
        
         | worthless-trash wrote:
         | I feel like this is the "serverless" discussion all over again.
         | 
         | There was still a server, its just not YOUR server. In this
         | case, there will still be servers, just maybe not something
         | that you need to manage state on.
         | 
         | This misnaming creates endless conflict when trying to
         | communicate this with hyper excited management who want to get
         | on the latest trend.
         | 
         | Cant wait to be on the meeting and hearing: "We dont need
         | servers when we migrate to client side data stores".
        
           | TeMPOraL wrote:
           | I think the management isn't hyper excited about naming - in
           | fact, they couldn't care less for what the name means (it's
           | just a buzzword). What they're excited about is what the
           | thing does - which is, turn more capex into opex. With
           | "cloud", we can subscribe to servers instead of owning them.
           | With "serverless", we can subscribe directly to what servers
           | do, without managing servers themselves. Etc.
        
           | Diederich wrote:
           | Recently, something quite rare happened. I needed to Xerox
           | some paper documents. Well, such actions are rare today, but
           | years ago, it was quite common to Xerox things.
           | 
           | Over time, the meaning of the word 'Xerox' changed. More
           | specifically, it gained a new meaning. For a long time, Xerox
           | only referred to a company named in 1961. Some time in the
           | late 60s, it started to be used as a verb, and as I was
           | growing up in the 70s and 80s, the word 'Xerox' was
           | overwhelmingly used in its verb form.
           | 
           | Our society decided as a whole that it was ok for the noun
           | Xerox to be used a verb. That's a normal and natural part of
           | language development.
           | 
           | As others have noted, management doesn't care whether the
           | serverless thing you want to use is running on servers or
           | not. They care that they don't have to maintain servers
           | themselves. CapEx vs OpEx and all that.
           | 
           | I agree that there could be some small hazard with the idea
           | that, if I run my important thing in a 'serverless' fashion,
           | then I don't have to associate all of the
           | problems/challenges/concerns I have with 'servers' to my
           | important thing.
           | 
           | It's an abstraction, and all abstractions are leaky.
           | 
           | If we're lucky, this abstraction will, on average, leak very
           | little.
        
       | slifin wrote:
       | I'm surprised to see Tonsky here
       | 
       | Mostly because I consider the state of the art on this to be
       | Clojure Electric and he presumably is aware of it at least to
       | some degree but does not mention it
        
         | profstasiak wrote:
         | thank you for mentioning! I have been reading a lot about sync
         | engines and never saw Clojure Electric being mentioned here on
         | HN!
        
         | tonsky wrote:
         | Clojure Electric is different. It's not really a sync, it's
         | more of a thin client. It relies of having fast connection to
         | server at all times, and re-fetches everything all the time.
         | They innovation is that they found a really, really ergonomic
         | way to do it
        
           | quotemstr wrote:
           | Clojure Electric is proprietary software, which disqualifies
           | it immediately no matter its other purported benefits
        
           | dustingetz wrote:
           | Electric's network state distribution is fully incremental,
           | i'm not sure what you mean by "re-fetches everything all the
           | time" but that is not how i would describe it.
           | 
           | If you are referring to virtual scroll over large collections
           | - yes, we use the persistent connection to stream the window
           | of _visible records_ from the server in realtime as the user
           | scrolls, affording approximately realtime virtual scroll over
           | arbitrarily large views (we target collections of size
           | 500-50,000 records and test at 100ms artificial RT latency,
           | my actual prod latency to the Fly edge network is 6ms RT
           | ping), and the Electric client retains in memory precisely
           | the state needed to materialize the current DOM state, no
           | more no less. Which means the client process performance is
           | decoupled from the size of the dataset - which is NOT the
           | case for sync engines, which put high memory and compute
           | pressure on the end user device for enterprise scale
           | datasets. It also inherits the traditional backend-for-
           | frontend security model, which all enterprise apps require,
           | including consumer apps like Notion that make the bulk of
           | their revenue from enterprise citizen devs and therefore are
           | exposed to enterprise data security compliance. And this is
           | in an AI-focused world where companies want to defend against
           | AI scrapers so they can sell their data assets to foundation
           | model providers for use in training!
           | 
           | Which IMO is the real problem with sync engines: they are not
           | a good match for enterprise applications, nor are they a good
           | match for hyper scale consumer saas that aspire to sell into
           | enterprise. So what market are they for exactly?
        
         | mananaysiempre wrote:
         | I'm also surprised, but more because I remember very vividly
         | his previous post on sync[1] which described a much more user-
         | friendly (andm much less startup-friendly) system.
         | 
         | [1] https://tonsky.me/blog/crdt-filesync/
        
       | ForTheKidz wrote:
       | > You'll get your data synced for you
       | 
       | How does this happen without an interface for conflict
       | resolution? That's the hard part.
        
         | Sammi wrote:
         | All this recent hype about sync engines and local first
         | applications completely disregards conflict resolution. It's
         | the reason why syncing isn't mainstream already and it isn't
         | solved and arguably cannot be.
         | 
         | Imagine if git just on its own picked what to keep and what to
         | throw away when there's a conflict. You fundamentally need the
         | user to make the choice.
        
           | porridgeraisin wrote:
           | Precisely. The hype articles write all about the journey to
           | The Wall, and then leave out the bit where you smash
           | headfirst into it.
        
           | lifty wrote:
           | Very good point. The local-sync ecosystem is still in a young
           | phase, and conflict resolution hasn't been tackled or solved
           | yet. Most systems have a |last write wins" approach.
        
           | sgt wrote:
           | > All this recent hype about sync engines and local first
           | applications completely disregards conflict resolution.
           | 
           | Not really true though. I've used a couple of local sync
           | engines, one internally built and another one which is both
           | commercial and now open source called PowerSync[1]. Conflict
           | resolution is definitely on the agenda, and a developer is
           | definitely going to be mindful of conflicts when designing
           | the application.
           | 
           | [1] https://www.powersync.com/
        
             | Sammi wrote:
             | My unfortunate point is that the dev cannot know what the
             | user is doing, and so cannot in principle know what choice
             | to make on behalf of the user in case of a conflict. This
             | is not a code problem. It cannot be solved with code.
        
               | sgt wrote:
               | I've found that in almost all cases - the latest update
               | "wins" strategy is fine. You could have two sessions
               | working with conventional API calls and still have a
               | conflict. As a dev you need to restrict what the user
               | _can_ do.
        
           | Jyaif wrote:
           | > All this recent hype about sync engines and local first
           | applications completely disregards conflict resolution
           | 
           | The main concern of sync engines is precisely the conflict
           | resolution! Everything else is simple in comparison.
           | 
           | The good news is that under some circumstances it _is_
           | possible to solve conflicts without user intervention. The
           | simplest example is a counter that can only be incremented.
           | More advanced data structures automatically solving conflicts
           | exists, for example solving conflicts for strings exists, and
           | those are good enough for a text editor.
           | 
           | I agree that there will be conflicts that are resolved in a
           | way that yields non-sensical text, for example if there are 2
           | edits of the sentence "One cat":
           | 
           | One cat => Two cats
           | 
           | One cat => One dog
           | 
           | The resulting merge may be something like "Two cats dog".
           | Something else (the user, an LLM...) will then have to fix
           | it.
           | 
           | But that's totally OK, because in practice this will happen
           | extremely rarely, only when the user would have been offline
           | for a long time. That user will be happy to have been able to
           | work offline, largely compensating the fact that they have to
           | proof read the text again.
        
             | SkiFire13 wrote:
             | This doesn't "solve" conflict resolution, it just picks one
             | of the possible answers and then doesn't care whether it
             | was the correct one or not.
             | 
             | It can be acceptable for some usecases, but not for others
             | where you're still concerned about stuff that happens
             | "extremely rately" and is not under your direct control.
             | 
             | > Something else (the user, an LLM...) will then have to
             | fix it.
             | 
             | This assumes that user/llm knows the conflict was
             | automatically solved and might need to be fixed, so the
             | conflict is still there! You just made the manual part
             | delayed and non-mandatory, but if you want correctness it
             | will still have to be there.
        
             | brulard wrote:
             | > in practice this will happen extremely rarely, only when
             | the user would have been offline for a long time.
             | 
             | I don't think it would happen "extremely rarely". Drops in
             | connectivity happen a lot, especially on cellular
             | connection and this can absolutely happen a lot for some
             | applications. Especially when talking about "offline first"
             | apps.
        
               | Jyaif wrote:
               | You have to use another device during that drop of
               | connectivity on cellular connection, and edit the same
               | content. That doesn't happen often.
        
           | aboodman wrote:
           | Zero (zerosync.dev) uses transactional conflict resolution,
           | which is what our prior product Replicache and Reflect both
           | used. It is very similar to what multiplayer games have done
           | for decades.
           | 
           | It is described here:
           | 
           | https://rocicorp.dev/blog/ready-player-two
           | 
           | It works really well and we and our customers have found it
           | to be quite general.
           | 
           | It allows you to run an arbitrary transaction on the sever
           | side to decide what to do in case of conflicts. It is the
           | software equivalent of git asking the user what to do. Zero
           | asks your code what to do.
           | 
           | But it asks it in the form of the question "please run the
           | function named x with these inputs on the current backend db
           | state". Which is a much more ergonomic way to ask it than
           | "please do a 3-way merge between these three states".
           | 
           | Conflict resolution is not the reason why there has not been
           | a general-purpose sync engine. None of our customers have
           | ~ever complained about conflict resolution.
           | 
           | The reason there has not been a general-purpose sync engine
           | is actually on the _read_ side:                 - Previous
           | sync engines really want you to sync all data. This is
           | impractical for most apps.            - Previous sync engines
           | do not have practical approaches to permissions.
           | 
           | These problems are being solved in next generation of sync
           | engines.
           | 
           | For more on this, I talk about it some here:
           | 
           | https://www.youtube.com/watch?v=rqOUgqsWvbw
        
             | probabletrain wrote:
             | I think with good presence (being able to see what other
             | users are doing) and an app that isn't used offline,
             | conflicts are essentially not a problem. As long as
             | whatever is resolving the conflicts resolves them in a way
             | that doesn't break the app, e.g. making sure there aren't
             | cycles in some multiplayer app with a tree datastructure.
             | Sounds like Zero has the right idea here, I'll build
             | something on it imminently to try it out.
        
               | Sammi wrote:
               | Agree that if you don't have offline support, then
               | conflict resolution is such a minor issue that you can
               | just do "last write wins" and call it a day.
        
             | probabletrain wrote:
             | > Previous sync engines really want you to sync all data
             | 
             | Linear had to do all sorts of shenanigans to be able to
             | sync all data, for orgs with lots of it - there's a talk on
             | that here:
             | 
             | https://www.youtube.com/watch?v=Wo2m3jaJixU&t=1473s
        
             | Sammi wrote:
             | "It is the software equivalent of git asking the user what
             | to do. Zero asks your code what to do."
             | 
             | You are asking the dev what to do. You are _not_ asking the
             | user what to do. This is akin of the git devs baking in a
             | choice into git on what to keep in a merge conflict.
             | 
             | It's hard to trust you guys when you misrepresent like
             | this. I thought long and hard on whether to respond
             | confrontationally like this, but decided you really need to
             | hear the push back on this.
        
               | aboodman wrote:
               | lol wut?
               | 
               | I represented that we ask the dev what to do:
               | 
               | > Zero asks your code what to do
               | 
               | You agree that's what we do:
               | 
               | > You are asking the dev what to do. You are _not_ asking
               | the user what to do.
               | 
               | I get that your actual issue is you don't think that what
               | we do is "the software equivalent of git asking the user
               | what to do". But like, I also said what we do concretely
               | in the same paragraph. It's not like I was trying to hide
               | something. This is a metaphor for how to understand our
               | approach to conflict resolution that works for most
               | developers. Like all metaphors it is not perfect.
               | 
               | FWIW, there is nothing stopping a developer from having
               | this function just save off a forked copy and ask the
               | user what to do. Some developers do this.
               | 
               | Also FWIW, Zero does not allow offline writes
               | specifically because we want to educate people how to
               | properly handle conflicts before we do. I see down-thread
               | this is the majority of your concern.
        
               | Sammi wrote:
               | I assumed you were doing offline support yeah. I've heard
               | a lot about local first development lately, so I guessed
               | this what what you guys are tackling too.
               | 
               | Without offline support AND you're doing real time
               | updating of data, then conflict resolution is not a real
               | world practical concern. Users will be looking at the
               | same data at the same time anyways, so they generally see
               | what data won out in case of a conflict, as they are
               | looking at real time data as they are editing.
               | 
               | IF you had offline support, and for other sync engines
               | that do: There is a real and meaningful difference
               | between a backend dev and an end user of the application
               | choosing what to do in case of a conflict. A backend dev
               | cannot make a general case algorithm that knows that two
               | end users want to keep or throw away in a conflict,
               | because this is completely situational - users could be
               | doing whatever. And if you push the conflict resolution
               | to the end users, then you are asking a lot of those
               | users. They need to be technically inclined and motivated
               | people in order to take the time to understand and
               | resolve the conflict. Like with git users.
        
               | aboodman wrote:
               | > Without offline support AND you're doing real time
               | updating of data, then conflict resolution is not a real
               | world practical concern.
               | 
               | I disagree with this. There are many real-world cases
               | where keywise lww does the wrong thing. The article I
               | linked up-thread covers many of them. Even a simple
               | counter does the wrong thing.
               | 
               | This is where robust conflict resolution really matters
               | in these systems, not the long-time offline case people
               | often ask about.
               | 
               | You need robust conflict resolution to make correct
               | software and maintain invariants in the face of
               | write/write systems.
               | 
               | > A backend dev cannot make a general case algorithm that
               | knows that two end users want to keep or throw away in a
               | conflict, because this is completely situational - users
               | could be doing whatever. And if you push the conflict
               | resolution to the end users, then you are asking a lot of
               | those users. They need to be technically inclined and
               | motivated people in order to take the time to understand
               | and resolve the conflict. Like with git users.
               | 
               | I agree completely. In my opinion the ideal offline-first
               | write/write UI has never been built, but the team at Ink
               | & Switch are closest:
               | 
               | https://www.inkandswitch.com/patchwork/notebook/
               | 
               | I think the perfect UX in many cases is that syncs goes
               | ahead and tries to land the offline writes, but the user
               | has a history UI where they can see what happened. Like
               | how many collaborative apps do today.
               | 
               | But importantly in this UI the app would represent
               | branches and merges. But unlike Git's fine grained
               | branch/merge points, in this UI it would literally
               | represent points where people went offline and made
               | changes.
               | 
               | Users could then go back and recover the version of their
               | data from when they were offline, or compare (probably
               | manually in two tabs) the two different versions of the
               | data and recover.
               | 
               | This does still ask users to compare and resolve
               | conflicts in the worst case, but it is not a blocking
               | operation or one that is final. The more common case is
               | the user will go ahead with the merge and sometimes find
               | some corruption. They can always go back and see what
               | went wrong after the fact and fix. This seems like the
               | right tradeoff to me of making the common case (no
               | conflict) easy and automatic but making the uncommon but
               | scary case at least not dangerous.
               | 
               | There also needs to be clear first-class UX telling users
               | that they're going offline and what will happen when they
               | come online.
               | 
               | I'm looking forward to someday working on this, but it's
               | not what our users ask about most often so we're just
               | disabling offline writes for now.
        
           | jamil7 wrote:
           | > All this recent hype about sync engines and local first
           | applications
           | 
           | Kind of but only really in the web world, it was the default
           | on desktop for a long time and is pretty common on mobile.
        
         | phito wrote:
         | Right, first thing I did after opening the article is
         | CTRL-F'ing for conflict, and got zero result. How are they not
         | talking about the only real problem about the local-first
         | approach? The rest is just boiler plate code.
        
         | tonsky wrote:
         | Ah, no. Not really. People sometimes think about conflict
         | resolution as a problem that needs to be solved. But it's not
         | solvable, not really. It's part of the domain, it's not going
         | anywhere, it's irreducible complexity.
         | 
         | You _will_ have conflicts (because your app is distributed and
         | there are concurrent writes). They will happen on semantic
         | level, so only you (app developer) _will_ be able to solve
         | them. Database (or any other magical tool) can't do it for you.
         | 
         | Another misconception is that conflict resolution needs to be
         | "solved" perfectly before any progress can be made. That is not
         | true as well. You might have unhandled conflicts in your system
         | and still have a working, useful, successful product. Conflicts
         | might be rare, insignificant, or people (your users) will just
         | correct for/work around them.
         | 
         | I am not saying "drop data on the floor", of course, if you can
         | help it. But try not to overthink it, either.
        
           | DaiPlusPlus wrote:
           | > But it's not solvable, not really. It's part of the domain,
           | it's not going anywhere, it's irreducible complexity. You
           | _will_ have conflicts (because your app is distributed and
           | there are concurrent writes). [...] Another misconception is
           | that conflict resolution needs to be "solved" perfectly
           | before any progress can be made. That is not true as well.
           | You might have unhandled conflicts in your system and still
           | have a working, useful, successful product. Conflicts might
           | be rare, insignificant, or people (your users) will just
           | correct for/work around them.
           | 
           | I can't speak for whatever application-level problems you
           | were trying to solve, but many problem-cases can be massaged
           | into being conflict-free by adding constraints (or rather:
           | discovering constraints inherent in the business-domain you
           | can use). For example (and the best example, too) is to use
           | an append-only logical model: then the synchronization
           | problem reduces down to merge-sort. Another kind of
           | constraint might be to simply disallow "edit" access to local
           | data when working-offline (without a prior lock or lease
           | being taken) but still allowing "create".
           | 
           | > Database (or any other magical tool) can't do it for you.
           | 
           | Yes-and-no.
           | 
           | While I'm no fan of CORBA and COM+ (...or SOAP, or WS-
           | OhGodMakeItStop), but being "enterprise-y" it meant they
           | brought distributed-transactions to any application, and that
           | includes RDBMS-mediated distributed transactions (let's
           | agree, an RDBMS is in a far greater position to be a better
           | canonical transaction-server than an application-server
           | running in-front of it). For distributed systems needing
           | transient distributed locks to prevent conflicts in the first
           | place (so only used by interactive users in the same LAN,
           | really) this worked just-as-well as a local-only solution -
           | and make it fault-tolerant too.
           | 
           | ...so it is unfortunate that with the (absolutely justified)
           | back-to-basics approach with REST[1] that we lose built-in
           | support for distributed transactions (even some of the more
           | useful and legitimate parts of WebDAV (and so, piggy-backing
           | on our web-servers' built-in support for WebDAV verbs) seem
           | to be going-away) - this all raises the barrier-to-entry for
           | doing distributed-transactions _right_, which means the next
           | set of college-hires won't have been exposed to it, which
           | means it won't be a standard expected feature in the next
           | major internal application they'll write for your org, which
           | means you'll either have a race-condition impacting a multi-
           | billion-dollar business thing that no-one knows how to fix or
           | more likely, just a crappy UX where you have to tell your
           | users not to reload the page too quickly "just in case". Yes,
           | I see advisories like that in the Zendesk pages of the next
           | line-of-business SaaS you'll be voluntold to integrate into
           | your org.
           | 
           | (I think today, the "best" way to handle distributed-locking
           | between interactive-users in a web-app would necessitate
           | using a ServiceWorker using WebRTC, SSE, or a highly-reliable
           | WebSocket - which itself is a load of work right there - and
           | don't forget to do _all_ your JS feature-checks because
           | eventually someone will try to use your app on an old Safari
           | edition because they want to keep on using their vintage Mac)
           | - or anyone using Incognito mode, _gah_.
           | 
           | [1]: https://devblast.com/b/calling-your-web-api-restful-
           | youre-do...
        
         | avodonosov wrote:
         | They elaborate on the conflicts in the "80/20 for Multiplayer"
         | section of this essay:
         | https://www.instantdb.com/essays/next_firebase
         | 
         | (make sure to also read the footnote [28] there).
        
       | DeathArrow wrote:
       | I've solved data sync in distributed apps long time ago. I send
       | outgoing data to /dev/null and receive incoming data from
       | /dev/zero. This way data is always consistent. That also helps
       | with availability and partion tolerance.
        
       | paduc wrote:
       | Before I write anything to the DB, I validate with business
       | logic.
       | 
       | Should I write this logic in the DB itself ? Seems impractical.
        
         | scotty79 wrote:
         | I think that's the main issue. It's not enough to have a
         | database that can automatically sync between frontend and
         | backend. It would also need to be complex enough to keep some
         | logic just on the backend (because you don't want to reveal it
         | and entrust adherence to the client) and reject some changes
         | done on frontend if they are invalid. Database would become the
         | app itself.
        
           | acac10 wrote:
           | Which many DBs allow: - stored procedures - Oracle PL/SQL
           | 
           | I used to work for Oracle but never liked that approach.
        
             | Sammi wrote:
             | The issue with stored procedures is testing and code
             | maintenance. How do I run unit tests? How do I version
             | control and code review?
        
               | TeMPOraL wrote:
               | It's the same issue that killed the image-based
               | programming in favor of edit-compile-run cycle we're all
               | doing. "How do I test? How do I do version control? How
               | do I migrate?".
               | 
               | These are valid concerns, but $deity I wish we focused on
               | finding solutions for them, because the current paradigm
               | of edit/compile/run + plaintext single source of truth
               | codebase, is already severely limiting our ability to
               | build and maintain complex software.
        
               | brulard wrote:
               | While I don't like the idea of putting logic to the DBRMS
               | (if not for a really good reason), you can do unit tests
               | and code reviews. In a serious project you already should
               | have a way to make migrations and versioning of the DB
               | itself (for example using prisma, drizzle, etc.).
               | Procedures would be just another entry in the migrations
               | and unit tests can create testing temporary DB, run the
               | procedures and compare the results. I agree tooling is
               | (AFAIK) not good and there will be much more work around
               | that, but it is possible.
        
               | x0x0 wrote:
               | The other issue, from experience, is needing to
               | reimplement logic as well -- you end up with stored
               | procedures that duplicate logic that also must be run
               | either in your server or on your client. eg given the
               | state of the system, is this mutation valid.
               | 
               | Then those multiple implementations inevitably suffer
               | different bugs and drift, leading to really ugly bugs.
        
             | scotty79 wrote:
             | I don't think a stored procedure that operates only on
             | master copy of the database can reject update comming from
             | a second copy and nicely comminicate thus happened so that
             | the other copy can infrom the user through some ui.
        
         | Terr_ wrote:
         | > logic in the DB
         | 
         | Something similar but in the opposite direction of lessening
         | DB-responsibilities in favor of logic-layer ones: Driving
         | everything from an event log. (Related to CQRS, Event-
         | Sourcing.)
         | 
         | It means a bit less focus on "how do I ensure this data-
         | situation never ever ever happens" logic, and a bit more "how
         | shall I model escalation and intervention when weird stuff
         | happens anyway."
         | 
         | This isn't as bad as it sounds, because any sufficiently
         | old/large software tends to accrue a bunch of informal
         | tinkering processes anyway. It's what drives the unfortunate
         | popularity of DB rows with a soft-deleted mark (that often
         | require manual tinkering to selectively restore) because
         | somebody always wants a special undo which is never really just
         | one-time-only.
        
         | TeMPOraL wrote:
         | > _Should I write this logic in the DB itself ?_
         | 
         | Yes?
         | 
         | If it sounds impractical, it's because the whole industry got
         | used to _not learning databases_ beyond most basic SQL, and
         | doing everything by hand in application code itself. But given
         | how much of code in most applications is just ad-hoc
         | reimplementation of databases, and then how much of the
         | business logic is tied to data and not application-specific
         | things, I can 't help but wonder - maybe a better way would be
         | to treat RDBMS as an application framework and have application
         | itself be a thin UI layer on top?
         | 
         | On paper it definitely sounds like grouping concerns better.
        
           | lloeki wrote:
           | > treat RDBMS as an application framework and have
           | application itself be a thin UI layer on top?
           | 
           | Stored procedures have been a thing. I've seen countless apps
           | that had a thin VB UI and a MSSQL backend where most of the
           | logic is implemented. Or, y'know, Access. Or spreadsheets
           | even!
           | 
           | And before that AS/400&al.
           | 
           | But ORMs came in and the impedance mismatch is then too
           | great. Splitting data wrangling across two completely
           | differing points of views makes it extremely hard to reason
           | about.
        
           | brulard wrote:
           | While stored procedures/triggers etc. can be powerful, it has
           | been taught for decades now that it is an antipattern to put
           | business logic to the RDBMS (for more or less valid reasons).
           | Some concerns I would have would be vendor lock-in and limits
           | of the provided language.
        
           | Tobani wrote:
           | In very simple systems that makes sense. But as soon as your
           | validation requires talking to a third party, or you have
           | side effects like sending emails you have to suddenly move
           | all that logic back out. You end up with system that isn't
           | very easy to iterate on.
        
             | Nextgrid wrote:
             | You can model external system interactions with tables
             | representing "mailboxes" - so for example if a DB stored
             | procedure needs to call a third-party API to create a
             | resource, it writes a row in the "outbox" table for that
             | API, then application-level code picks that up, makes the
             | API call, parses the response (extracts the required
             | fields) and stores it in an "inbox" table so now the
             | database has access to the response (and a trigger can run
             | the remainder of the business process upon insertion of
             | that row).
        
               | Tobani wrote:
               | Yes, but then you've removed parent comments' assertion
               | that everything should be done by the RDBMS. And you've
               | changed the contract of the action.
        
               | TeMPOraL wrote:
               | Surely some RDBMS has the ability to run REST queries,
               | possibly via SQL by pretending it's a table or something.
               | 
               | I can imagine that working on a good day. I don't dare
               | imagine _error handling_ (though would love to look at
               | examples).
               | 
               | Ultimately, it probably makes no sense to do everything
               | in the database, but I still believe we're doing _way_
               | too much in the application, and too little in the DB.
               | Some of the logic really belongs to data (and needs to be
               | duplicated for any program using the same data, or
               | else...; probably why people don 't like to share
               | databases between programs).
               | 
               | And, at a higher level, I wonder how far we could go if
               | we pushed all data-specific logic into the DB, and the
               | rest (like REST calls) into dedicated components, _and_
               | used a generic orchestrator to glue the parts together?
               | What of the  "application code" would remain then, and
               | where would it sit?
        
         | tonsky wrote:
         | If you think of an existing database, like Postgres, sure. It's
         | not very convenient.
         | 
         | What I am saying is, in a perfect world, database and server
         | will be the one and run code _and_ data at the same time.
         | There's really no good reason why they are separated, and it
         | causes a lot of inconveniences right now.
        
           | Tobani wrote:
           | Sure in an ideal world we don't need to worry about resources
           | and everything is easy. There are very good reason why they
           | are separated now. There have been systems like 4th dimension
           | and K that combine them for decades. They're great for
           | systems of a certain size. They do struggle once their
           | workload is heavy enough, and seem to struggle to scale out.
           | Being able to update my application without updating the
           | storage engine reduces the risk. Having standardized backup
           | solutions for my RDBMS means is a whole level of effort I
           | don't have to worry about. Data storage can even be optimized
           | without my application having to be updated.
        
       | zx8080 wrote:
       | > decoupled from the horrors of an unreliable network
       | 
       | There's no such thing as reliable network in the world. The world
       | is network connected, there's almost no local-only systems
       | anymore (for a long long time now).
       | 
       | Some engineers dream that there's some cases when network is
       | reliable, like when a system fully lives in the same region and
       | single AZ. But even then it's actually not reliable and can have
       | some glitches quite frequently (like once per month or so,
       | depending on some luck).
        
         | 01HNNWZ0MV43FF wrote:
         | True. Even the network between the CPU and an SD card or USB
         | drive is not reliable
        
         | tonsky wrote:
         | > There's no such thing as reliable network in the world
         | 
         | I'm not saying there is
        
         | jimbokun wrote:
         | I believe the point is that given an unreliable network, it's
         | nice to have access to all the data available locally up to the
         | point when you had a network issue. And then when the network
         | is working again, your data comes up to date with no extra work
         | on the application developer's part.
        
       | myflash13 wrote:
       | Locally synced databases seem to be a new trend. Another example
       | is Turso, which works by maintaining a sort of SQLite-DB-per-
       | tenant architecture. Couple that with WASM and we've basically
       | come full circle back to old school desktop apps (albeit with
       | sync-on-load). Fat client thin client blah blah.
        
       | arkh wrote:
       | The future of webapps: wasm in the browser, direct SQL for the
       | API.
       | 
       | Main problem? No result caching but that's "just" a middleware to
       | implement.
        
         | TeMPOraL wrote:
         | Also the past of webapps. We don't have that because doing this
         | properly, in a way that's maximally useful and ergonomic for
         | the users, pretty much kills the entire business of the web. If
         | you give direct SQL access to the underlying data, you can no
         | longer seek rent by putting a bloated, barely-functional app in
         | front of the database, nor can you use it to funnel users or
         | upsell them stuff. Most of the money in this industry is made
         | from rent-seeking.
        
       | hyperbolablabla wrote:
       | How does this compare to supabase?
        
         | jFriedensreich wrote:
         | that question comes up all the time for some reason but
         | supabase does not support offline or sync, only some form of
         | subscription updating, but this has nothing to do with having
         | sync or local data.
        
       | zareith wrote:
       | I think an underappreciated library in this space is Logux [1]
       | 
       | It requires deeper (and more) integration work compared to
       | solutions that sync your state for you, but is a lot more
       | flexible wrt. the backend technology choices.
       | 
       | At its core, it is an action synchronizer. You manage both your
       | local state and remote state through redux-style actions, and the
       | library takes care of syncing and resequencing them (if needed)
       | so that all clients converge at the same state.
       | 
       | [1] https://logux.org/
        
       | mike_hearn wrote:
       | I recently took a part time role at Oracle Labs and have been
       | learning PL/SQL as part of a project. Seeing as Niki is shilling
       | for his employer, perhaps it's OK for me to do the same here :)
       | [1]. HN discourse could use a bit of a shakeup when it comes to
       | databases anyway. This may be of only casual interest to most
       | readers, but some HN readers work at places with Oracle licenses
       | and others might be surprised to discover it can be cheaper than
       | an AWS managed Postgres [2].
       | 
       | It has a couple of features relevant to this blog post.
       | 
       | The first: Niki points out that in standard SQL producing JSON
       | documents from relational tables is awkward and the syntax is
       | terrible. This is true, so there's a better syntax:
       | CREATE JSON RELATIONAL DUALITY VIEW dept_w_employees_dv AS
       | SELECT JSON {'_id'            : d.deptno,
       | 'departmentName' : d.dname,                      'location'
       | : d.loc,                      'employees'      :
       | [ SELECT JSON {'employeeNumber' :e.empno,
       | 'name' : e.ename}                            FROM employee e
       | WHERE e.deptno = d.deptno ]                     }         FROM
       | department d WITH UPDATE INSERT DELETE;
       | 
       | It makes compound JSON documents from data stored relationally.
       | This has three advantages: (1) JSON documents get materialized on
       | demand by the database instead of requiring frontend code to do
       | it, (2) the ORDS proxy server can serve these over HTTP via
       | generic authenticated endpoints (e.g. using OAuth or cookie based
       | auth) so you may not need to write any code beyond SQL to get
       | data to the browser, and (3) the JSON documents produced can be
       | written to, not only read.
       | 
       | The second feature is query change notifications. You can issue a
       | command on a connection that starts recording the queries issued
       | on it and then get a callback or a message posted to an MQ when
       | the results change (without polling). The message contains some
       | info about what changed. So by wiring this up to a web socket,
       | which is quite easy, the work of an hour or two in most web
       | frameworks, then you can stream changes to the client directly
       | from the database without needing much logic or third party
       | integrations. You either use the notification to trigger a full
       | requery and send the entire result json back to the browser, or
       | you can get fancier and transform the deltas to json subsets.
       | 
       | It'd be neat if there was a way to join these two features
       | together out of the box, but AFAIK if you want full streaming of
       | document deltas to the browser and reconstituting them there, it
       | would need a bit more on top.
       | 
       | Again, you may feel this is irrelevant because doesn't every
       | self-respecting HN reader use Postgres for everything, but it's
       | worth knowing what's out there. Especially as the moment you
       | decide to paying a cloud for hosting your DB you have crossed the
       | Rubicon anyway (all the hosted DBs are proprietary forks of
       | Postgres), so you might as well price out alternatives.
       | 
       | [1] and you know the drill, views are my own and nobody has
       | reviewed this post.
       | 
       | [2] https://news.ycombinator.com/item?id=42855546
        
         | vessenes wrote:
         | Lots of good people work at Oracle, and I am sure you're one of
         | them.
         | 
         | HOWEVER. There is no world where lifetime costs of using
         | Postgres for _any_ successful company _anywhere_ in the world
         | are greater than using Postgres. I understand that 's a key
         | message for your sales team to get out, but only one of the
         | CEOs at Oracle and Percona has flown a fighter jet underneath
         | the Golden Gate Bridge.
         | 
         | Oracle licensing is famously, famously sticky. Extremely.
         | Incredibly. It's how the company was built and is maintained.
        
           | mike_hearn wrote:
           | Great, let's debate!
           | 
           | I've never talked to database sales people and have no idea
           | what messages they have or care about. Actually I'm 99% sure
           | they don't care about the HN/startup crowd at all - did you
           | see anyone except me talk about this stuff here? Me neither.
           | I'm making this argument basically because I like making
           | arguments early that are surprising but correct, and
           | databases feel like fertile ground for such arguments.
           | There's a lot of groupthink in this space. And you know all
           | about my history with surprising technology arguments, Peter
           | ;)
           | 
           | Anyway I'd be interested to see a spreadsheet with a worked
           | set of scenarios for both cost and "stickiness" however it's
           | defined (genuinely). I think it's going to depend heavily on:
           | 
           | a) Whether you cloud host or not. The cost of a small
           | Postgres that you run yourself is pretty much whatever your
           | own time is valued at, as self-hosted hardware is cheap. The
           | costs of a Postgres you outsource can be really un-
           | intuitively high. I already showed that a cloud hosted
           | elastic Oracle DB can be cheaper for the same-spec AWS-
           | managed Postgres despite a massive feature disparity on one
           | side. Costs here aren't dominated by hardware nor software
           | purchase costs.
           | 
           | b) What features and scaling level you need, combined with
           | cost of labour in your area. If you want to scale up a
           | Postgres based operation very fast then that's going to take
           | a ton of skilled engineering effort, devs will be slowed down
           | a lot as they spend time on implementing custom sharding
           | schemes etc. At some point the cost of rolling your own ad-
           | hoc solutions to these things will cross with the cost of
           | just buying a system that already solves them all out of the
           | box. Where that cross-point is will depend on all kinds of
           | things like opportunity cost, cost of hiring, cost of
           | developer productivity....
           | 
           | b) Whether you consider unique features to be "stickiness".
           | You're claiming the licensing is sticky here but companies
           | negotiate all kinds of licenses so what does that mean? By
           | default it's charged per core like any other commercial db
           | (or in the cloud by core seconds/storage). If unique features
           | are the problem then that's an aspect of choosing any tech
           | platform. If you're taking advantage of full SQL joins on a
           | 50-node horizontally scaled multi-master cluster then yeah,
           | trying to migrate to something else is going to be sticky
           | because there aren't many other products that offer that.
           | That's tech for you. Still, these days I guess it must be
           | less sticky because there are other people selling very
           | scalable SQL-speaking databases like Spanner.
           | 
           | As for Larry Ellison's stunts, that's great but if you're
           | deciding what platform to use on the basis of executive
           | horsepower then you can pick between fighter jets, Jeff
           | Bezo's rockets, Bill Gates' yachts or Larry Page's flying
           | cars. Selling databases seems to go hand in hand with high
           | tech vehicles, which is probably a sign there's some actual
           | value being delivered there, somewhere.
        
             | vessenes wrote:
             | :)
             | 
             | I referenced Larry as a proxy for his extreme wealth.
             | Although it is true he's one of the great businessmen of
             | the late 20th century. Just not the sort you want to be in
             | a business deal with in general.
             | 
             | Oracle has always been good at both adding helpful
             | functions that developers rely on, making switching
             | difficult, and also at teasing companies into using more
             | licenses than they've purchased, then smacking them with
             | audits and fees as a stick, and a 'cheaper' larger license
             | as a carrot to avoid the audit fees.
             | 
             | In the 90s, this was tech like PL/SQL and Materialized
             | views - I'm long out of the Oracle game, so I have no idea
             | where they compete on features now vis-a-vis open source --
             | but I will say that I have owned companies where the Oracle
             | license was both HATED -- and outlived all original owners
             | of the company. It's hard to replace once it's in your
             | workflow, and that is 100% by design.
        
               | mike_hearn wrote:
               | I guess audits are fading away as more people move to the
               | cloud. Audits are used by other enterprise tech sellers
               | as well because you don't want DRM or telemetry in
               | something like a mission critical HA DB that runs behind
               | a firewall. So audits it is. Cloud solves all that
               | (admittedly, whilst trading off against data privacy).
        
               | vessenes wrote:
               | Makes sense. And metering is a leveling playing field in
               | terms of cost assessments (if you discount tail costs to
               | $0 that is)
        
       | avodonosov wrote:
       | Why he haven't implemented a full Datomic Peer for his DataScript
       | I never understood.
       | 
       | Having a datalog query engine, supplying it with data from
       | Datomic indexes - b-tree like collections storing entity-
       | attribute-value records - seems simple. Updating the local index
       | cache from log is also simple.
       | 
       | And that gets you a db in browser.
        
         | tonsky wrote:
         | It's not as simple as you make it sound:
         | 
         | - Reliable communication is hard - Optimistic writes should on
         | client are hard - Tracking subsets of data is hard (you don't
         | want the entirety of Datomic on the client, do you?) -
         | Permissions are hard in this model
         | 
         | Why didn't I implement it? Mostly comes down to free time. It's
         | a hobby project and it's hard to find time for it. I also
         | stopped writing web apps so immediate pressure for this went
         | away.
        
       | onion2k wrote:
       | Isn't this what CouchDB/PouchDB solves in quite a nice way?
        
         | paul_h wrote:
         | I always found the documentation lacking and it not 100% clear
         | what was in couchbase (commercial & OSS) vs couchdb and which I
         | really wanted
        
         | fridder wrote:
         | That was my first thought! https://couchdb.apache.org/ is
         | pretty good though is it still the incremental views with JS?
        
       | joeeverjk wrote:
       | If sync really is the future, do you think devs will finally stop
       | pretending local-first apps are some niche thing and start
       | building around sync as the core instead of the afterthought? Or
       | are we doomed to another decade of shitty conflict resolution
       | hacks?
        
         | Zanfa wrote:
         | > Or are we doomed to another decade of shitty conflict
         | resolution hacks?
         | 
         | Conflict resolution is never going away. It's important to
         | distinguish between syntactical and semantical conflicts
         | though, the first of which can be solved, but the other will
         | always require manual intervention.
        
         | Tobani wrote:
         | I think this makes sense for applications applications that are
         | just managing data maybe? But if your application needs to do
         | things when you change that data (like call to a third party
         | system)... Syncing is maybe not the solution. What happens when
         | the total dataset is large, do you need to download 6gb of data
         | every time you log in? Now you've blown up the quota on local
         | storage. How do you make sure the appropriate data is
         | downloaded or enough data? How do you prioritize the data you
         | need NOW instead of waiting for that last byte of the 6gb to
         | download?
         | 
         | It is like a useful tool, but not the only future.
        
       | keizo wrote:
       | didn't know that about roam research. I was a user, but also that
       | app convinced me that front-end went in the wrong direction for a
       | decade...
       | 
       | Rocicorp Zero Sync, instantdb, linear app like trend is great --
       | sync will be big. I hope a lot of the spa slop gets fixed!
        
       | mentalgear wrote:
       | Honourable mentions of some more excellent fully open-source sync
       | engines:
       | 
       | - Zero Sync: https://github.com/rocicorp/mono
       | 
       | - Triplit: https://github.com/aspen-cloud/triplit
        
         | guappa wrote:
         | > - Zero Sync: https://github.com/rocicorp/mono
         | 
         | Doesn't even have a readme :D Raise the bar a bit maybe.
        
           | thruflo wrote:
           | https://zero.rocicorp.dev/docs/introduction
           | 
           | Hard to raise the bar on Zero. It's a brilliant system.
        
             | profstasiak wrote:
             | can you share how are you using this? Production / side
             | projects?
             | 
             | Would you recommend it for side projects?
        
               | jakelazaroff wrote:
               | GP is probably not himself using Zero because he's the
               | CEO of Electric, which also makes a sync engine:
               | https://electric-sql.com
        
           | thunderbong wrote:
           | "Website and Docs" is the second line I see
        
           | daveguy wrote:
           | It does have a readme. Click the "View all files" button.
           | 
           | But you don't have to. GitHub shows the readme just below the
           | partial file list. That's what all the same-page docs on
           | GitHub/GitLab repositories are.
           | 
           | Full docs are linked from the readme.
        
         | mentalgear wrote:
         | if you know of other honourable mentions, reply with their
         | source link!
        
           | aboodman wrote:
           | There are so many:
           | 
           | - https://github.com/electric-sql/electric
           | 
           | - https://github.com/powersync-ja
           | 
           | - https://github.com/get-convex
           | 
           | - https://github.com/tinyplex/tinybase
           | 
           | - https://github.com/garden-co/jazz
        
             | mentalgear wrote:
             | Convex I didnt know yet - looks really crisp (even has
             | svelte support) ! Do you have experience with it? Does it
             | support (decentralized) E2E?
        
               | aboodman wrote:
               | No, Convex is a client/server system like zero, electric,
               | instant, powersync.
               | 
               | If you want a fully decentralized system, check out jazz.
               | It is the best of these currently IMO.
        
             | sergioisidoro wrote:
             | I've been very curious about electric -- the idea of giving
             | your application a replicated subset of your databse, using
             | your api as a proxy, is quite interesting for apps where
             | the business layer between the db and the client is thin
             | (our case).
             | 
             | edit: Also their decision to make it just one way sync
             | makes a LOT of sense. Write access brings a lot of scary
             | cases, so by making it only read sync eases some of my
             | anxieties. I can still use Rest / RPC for updating the data
        
           | bushido wrote:
           | https://rxdb.info/ is a good one.
        
         | ochiba wrote:
         | Useful directory of tools here: https://localfirstweb.dev/
        
       | mackopes wrote:
       | I'm not convinced that there is one generalised solution to sync
       | engines. To make them truly performant at large scale, engineers
       | need to have deep understanding of the underlying technology,
       | their query performance, database, networking, and build a custom
       | sync engine around their product and their data.
       | 
       | Abstracting all of this complexity away in one general
       | tool/library and pretending that it will always work is snake
       | oil. There are no shortcuts to building truly high quality
       | product at a large scale.
        
         | tonsky wrote:
         | - You can have many sync engines
         | 
         | - Sync engines might only solve small and medium scale, that
         | would be a huge win even without large scale
        
         | wim wrote:
         | We've built a sync engine from scratch. Our app is a
         | multiplayer "IDE" but for tasks/notes [1], so it's important to
         | have a fast local first/office experience like other editors,
         | and have changes sync in the background.
         | 
         | I definitely believe sync engines are the future as they make
         | it so much easier to enable things like no-spinners browsing
         | your data, optimistic rendering, offline use, real-time
         | collaboration and so on.
         | 
         | I'm also not entirely convinced yet though that it's possible
         | to get away with something that's not custom-built, or at least
         | large parts of it. There were so many micro decisions and
         | trade-offs going into the engine: what is the granularity of
         | updates (characters, rows?) that we need and how does that
         | affect the performance. Do we need a central server for things
         | like permissions and real-time collaboration? If so do we want
         | just deltas or also state snapshots for speedup. How much
         | versioning do we need, what are implications of that? Is there
         | end-to-end-encryption, how does that affect what the server can
         | do. What kind of data structure is being synced, a simple
         | list/map, or a graph with potential cycles? What kind of
         | conflict resolution business logic do we need, where does that
         | live?
         | 
         | It would be cool to have something general purpose so you don't
         | need to build any of this, but I wonder how much time it will
         | save in practice. Maybe the answer really is to have all kinds
         | of different sync engines to pick from and then you can decide
         | whether it's worth the trade-off not having everything custom-
         | built.
         | 
         | [1] https://thymer.com
        
           | mentalgear wrote:
           | Optimally, a sync engine would have the ability to be
           | configed to have the best settings for the project (e.g.
           | central server or completely decentralised). It'd be great if
           | one engine would be so performant/configurable, but having a
           | lot of sync engines to choose from for your project is the
           | best alternative.
           | 
           | btw: excellent questions to ask / insights - about the same I
           | also came across in my lo-fi ventures.
           | 
           | Would be great if someone could assemble all these questions
           | in a "walkthrough" step-by-step interface and in the end, the
           | user gets a list of the best matching engines.
           | 
           | Edit: Mh ... maybe something small enough to vibe code ... if
           | someone is interested to help let me know!
        
             | jdvh wrote:
             | Completely decentralized is cool, but I think there are two
             | key problems with it.
             | 
             | 1) in a decentralized system who is responsible for
             | backups? What happens when you restore from a backup?
             | 
             | 2) in a decentralized system who sends push notifications
             | and syncs with mobile devices?
             | 
             | I think that in an age of $5/mo cloud vms and free SSL
             | having a single coordination server has all the advantages
             | and none of the downsides.
        
         | thr0w wrote:
         | > Abstracting all of this complexity away in one general
         | tool/library and pretending that it will always work is snake
         | oil.
         | 
         | Remember Meteor?
        
         | xg15 wrote:
         | That might be true, but you might not have those engineers or
         | they might be busy with higher-priority tasks:
         | 
         | > _It's also ill-advised to try to solve data sync while also
         | working on a product. These problems require patience,
         | thoroughness, and extensive testing. They can't be rushed. And
         | you already have a problem on your hands you don't know how to
         | solve: your product. Try solving both, fail at both._
         | 
         | Also, you might not have that "large scale" yet.
         | 
         | (I get that you could also make the opposite case, that the
         | individual requirements for your product are _so special_ that
         | you cannot factor out any common behavior. I 'd see that as a
         | hypothesis to be tested.)
        
       | Phelinofist wrote:
       | The largest feature my team develops is a sync engine. We have a
       | distributed speech assistant app (multiple embeddeds [think car
       | and smartphone] & cloud) that utilizes the Blackboard pattern.
       | The sync engine keeps the blackboards on all instances in sync.
       | 
       | It is based on gRPC and uses a state machine on all instances
       | that transitions through different states for connection setup,
       | "bulk sync", "live sync" and connection wind down.
       | 
       | Bulk sync is the state that is used when an instance comes online
       | and needs to catch up on any missed changes. It is also the self-
       | heal mechanism if something goes wrong.
       | 
       | Unfortunately some embedded instances have super unreliable
       | clocks that drift quite a bit (in both directions). We consider
       | switching to a logical clock.
       | 
       | We have quite a bit of code that deals with conflicts.
       | 
       | I inherited this from my predecessor. Nowadays I would probably
       | not implement something like this again, as it is quite complex.
        
         | exceptione wrote:
         | I believe the idea of a Blackboard is that there is a single
         | blackboard for all processes to asynchronously scribble and
         | read from.
         | 
         | Syncing blackboards sounds like going straight against the
         | spirit of that design pattern.
        
       | sreekanth850 wrote:
       | We use indexedDB and signalr for real time sync. What is new
       | about this?
        
       | rockmeamedee wrote:
       | Idk man. It's a nice idea, but it has to be 10x better than what
       | we currently have to overcome the ecosystem advantages of the
       | existing tech. In practice, people in the frontend world already
       | use Apollo/Relay/Tanstack Query to do data caching and querying,
       | and don't worry too much about the occasional
       | overfetching/unoptimized-ness of the setup. If they need to do a
       | complex join they write a custom API endpoint for it. It works
       | fine. Everyone here is very wary of a "magic data access layer"
       | that will fix all of our problems. Serverless turned out to be a
       | nightmare because it only partially solves the problem.
       | 
       | At the same time, I had a great time developing on Meteorjs a
       | decade ago, which used Mongo on the backend and then synced the
       | DB to the frontend for you. It was really fluid. So I look
       | forward to things like this being tried. In the end though,
       | Meteor is essentially dead today, and there's nothing to replace
       | it. I'd be wary of depending so fully on something so important.
       | Recently Faunadb (a "serverless database") went bankrupt and is
       | closing down after only a few years.
       | 
       | I see the product being sold is pitched as a "relational version
       | of firebase", which I think good idea. It's a good idea for
       | starter projects/demos all the way up to medium-sized apps, (and
       | might even scale further than firebase by being relational), but
       | it's not "The Future" of all app development.
       | 
       | Also, I hate to be that guy but the SQL in example could be
       | simpler, when aggregating into JSON it's nice to use a LATERAL
       | join which essentially turns the join into a for loop and
       | synthesises rows "on demand":                 SELECT g.*,
       | COALESCE(t.todos, '[]'::json) as todos       FROM goals g
       | LEFT JOIN LATERAL (         SELECT json_agg(t.*) as todos
       | FROM todos t         WHERE t.goal_id = g.id       ) t ON true
       | 
       | That still proves the author's point that SQL is a very
       | complicated tool, but I will say the query itself looks simpler
       | (only 1 join vs 2 joins and a group by) if you know what you're
       | doing.
        
         | timita wrote:
         | > Meteor is essentially dead today
         | 
         | Care to explain what you mean by "dead"? Just today v3.2 came
         | out, and the company, the community, and their paid-for hosting
         | service seem pretty alive to me.
        
       | asdffdasy wrote:
       | > Such a library would be called a database.
       | 
       | bold of them to assume a database can manage even the most
       | trivial of conflicts.
       | 
       | There's a reason you bombard all your writes to a
       | "main/master/etc"
        
       | profstasiak wrote:
       | so... what do people that want to have sync engines do?
       | 
       | I want to try it for hobby project and I think I will go the
       | route of just one way sync (from database to clients) using
       | electric sql and I will have writes done in a traditional way
       | (POST requests).
       | 
       | I like the idea of having server db and local db in sync, but
       | what happens with writes? I know people say CRDT etc... but they
       | are solving conflicts in unintuitive ways...
       | 
       | I know I probably sound uneducated, but I think the biggest part
       | of this is still solving conflicts in a good way, and I don't
       | really see how you can solve those in a way that works for all
       | different domains and have it "collapsed" as the author says
        
       | codeulike wrote:
       | I've been thinking about this a lot - nearly every problem these
       | days is a synchronisation problem. You're regularly downloading
       | something from an API? Thats a sync. You've got a distributed
       | database? Sync problem. Cache Invalidation? Basically a sync
       | problem. You want online and offline functionality? sync problem.
       | Collaborative editing? sync problem.
       | 
       | And 'synchronisation' as a practice gets very little attention or
       | discussion. People just start with naive approaches like
       | 'download whats marked as changed' and then get stuck in the
       | quagmire of known problems and known edge cases (handling
       | deletions, handling transport errors, handling changes that
       | didn't get marked with a timestamp, how to repair after a bad
       | sync, dealing with conflicting updates etc).
       | 
       | The one piece of discussion or attempt at a systematic approach
       | I've seen to 'synchronisation' recently is to do with Conflict-
       | free Replicated Data Types https://crdt.tech which is essentially
       | restricting your data and the rules for dealing with conflicts to
       | situations that are known to be resolvable and then packaging it
       | all up into an object.
        
         | mrkeen wrote:
         | I've looked at CRDTs, and the concept really appeals to me in
         | the general case, but in the specific cases, my design always
         | ends up being "keep-all-the-facts" about a particular item. But
         | then you defer the problem of 'which facts can I throw away?'.
         | It's like inventing a domain-specific GC.
         | 
         | I'd love to hear about any success cases people have had with
         | CRDTs.
        
           | yccs27 wrote:
           | For me the main issue with CRDTs is that they have a fixed
           | merge algorithm baked in - if you want to change how
           | conflicts get resolved, you have to change the whole data
           | structure.
        
             | WorldMaker wrote:
             | I feel like the state-of-the-art here is slowly starting to
             | change. I think CRDTs for too many years got too caught up
             | in "conflict-free" as a "manifest destiny" sort of thing
             | more than "hope and prayer" and thought they'd keep finding
             | the right fixed merged algorithm for every situation. I
             | started watching CRDTs from the perspective of source
             | control and having a strong inkling that "data is always
             | messy" and "conflicts are _human_ " (conflicts are kind of
             | inevitable in any structure trying to encode data made by
             | people).
             | 
             | I've been thinking for a bit that it is probably about time
             | the industry renamed that first C to something other than
             | "conflict-free". There is no freedom from conflicts.
             | There's _conflict resistance_ , sure and CRDTs can provide
             | in their various data structures a lot of conflict
             | resistance. But at the end of the day if the data structure
             | is meant to encode an application for humans, it needs
             | every merge tool and review tool and audit tool it can
             | offer to deal with those.
             | 
             | I think we're finally starting to see some of the light in
             | the tunnel in the major CRDT efforts and we're finally
             | leaving the detour of "no it _must_ be conflict-free, we
             | named it that so it must be true ". I don't think any one
             | library is yet delivering it at a good high level, but I
             | have that feeling that "one of the next libraries" is maybe
             | going to start getting the ergonomics of conflict handling
             | right.
        
               | dtkav wrote:
               | This seems right to me -- imagine being able to tag
               | objects or sub-objects with conflict-resolution semantics
               | in a more supported way (like LWW, edits from a human,
               | edits from automation, human resolution required (with or
               | without optimistic application of defaults, etc).
               | 
               | Throwing small language models into the mix could make
               | merging less painful too -- like having the system take
               | its best guess at what you meant, apply it, and flag it
               | for later review.
        
               | satvikpendem wrote:
               | I just want some structure where it is conflict-free most
               | of the time but I can write custom logic in certain
               | situations that is used, sort of like an automated git
               | merge conflict resolution function.
        
             | dtkav wrote:
             | I've been running into this with automated regex edits. Our
             | product (Relay [0]) makes Obsidian real-time collaborative
             | using yjs, but I've been fighting with the automated
             | processes that rewrites markdown links within notes.
             | 
             | The issue happens when a file is renamed by one client, and
             | then all other clients pick up the rename and make the
             | change to the local files on disk. Since every edit is
             | broken down into delete/keep/insert runs, the automated
             | process runs rapidly in all clients and can break the
             | links.
             | 
             | I could limit the edits to just one client, but it feels
             | clunky. Another thought I've had is to use ytext
             | annotations, or just also store a ymap of the link metadata
             | and only apply updates if they can meet some kind of check
             | (kind of like schema validation for objects).
             | 
             | If anyone has a good mental model for modeling automated
             | operations (especially find/replace) in ytext please let me
             | know! (email in bio).
             | 
             | [0] https://system3.md/relay
        
           | jdvh wrote:
           | It's still early, but we have a checkpointing system that
           | works very well for us. And once you have checkpoints you can
           | start dropping inconsequential transactions in between
           | checkpoints, which you're right, can be considered GC.
           | However, checkpointing is desirable anyway otherwise new
           | users have to replay the transaction log from T=0 when they
           | join, and that's impractical.
        
             | dtkav wrote:
             | I've also had success with this method. "domain-specific
             | GC" is a fitting term.
        
           | FjordWarden wrote:
           | There was an article on this website not so long ago about
           | using CRDTs for collaborative editing and there was this
           | silly example to show how leaky this abstraction can be. What
           | if your have the word "color" and one user replaces it with
           | "colour" and another deletes the word, what does the CRDT do
           | in this case? Well it merges this two edits into "u". This
           | sort of makes me skeptical of using CRDTs for user facing
           | applications.
        
             | jakelazaroff wrote:
             | There isn't a monolithic "CRDT" in the way you're
             | describing. CRDTs are, broadly, a kind of data structure
             | that allows clients to eventually agree on a final state
             | without coordination. An integer `max` function is a simple
             | example of a CRDT.
             | 
             | The behavior the article found is peculiar to the
             | particular CRDT algorithms they looked at. But they're
             | probably right that it's impossible for all conflicting
             | edits to "just work" (in general, not just with CRDTs).
             | That doesn't mean CRDTs are pointless; you could imagine an
             | algorithm that attempts to detect such semantic conflicts
             | so the application can present some sort of resolution UI.
             | 
             | Here's the article, if interested (it's very good):
             | https://www.moment.dev/blog/lies-i-was-told-pt-1
        
               | debugnik wrote:
               | > There isn't a monolithic "CRDT" in the way you're
               | describing.
               | 
               | I can't blame people for thinking otherwise, pretty much
               | every self-called "CRDT library" I've come across
               | implements exactly one such data structure, maybe
               | parameterized.
               | 
               | It's like writing a "semiring library" and it's simply
               | (min, +).
        
         | josephg wrote:
         | I agree! Lots more things are sync. Also: the state of my
         | source files -> my compiler (in watch mode), about 20 different
         | APIs in the kernel - from keyboard state to filesystem watching
         | to process monitoring to connected USB devices.
         | 
         | Also, http caching is sort of a special case of sync - where
         | the cache (say, nginx) is trying to keep a synchronised copy of
         | a resource from the backend web server. But because there's no
         | way for the web server to notify nginx that the resource has
         | changed, you get both stale reads and unnecessary polling.
         | Doing fan-out would be way more efficient than a keep alive
         | header if we had a way to do it!
         | 
         | CRDTs are cool tech. (I would know - I've been playing with
         | them for years). But I think it's worth dividing data
         | interfaces into two types: owned data and shared data. Owned
         | data has a single owner (eg the database, the kernel, the web
         | server) and other devices live down stream of that owner.
         | Shared data sources have more complex systems - eg everyone in
         | the network has a copy of the data and can make changes, then
         | it's all eventually consistent. Or raft / paxos. Think git, or
         | a distributed database. And they can be combined - eg, the app
         | server is downstream of a distributed database. GitHub actions
         | is downstream of a git repo.
         | 
         | I've been meaning to write a blog post about this for years.
         | Once you realise how ubiquitous this problem is, you see it
         | absolutely everywhere.
        
           | jkaptur wrote:
           | I can't wait to read that blog post. I know you're an expert
           | in this and respect your views.
           | 
           | One thing I think that is missing in the discussion about
           | shared data (and maybe you can correct me) is that there are
           | two ways of looking at the problem: * The "math/engineering"
           | way, where once state is identical you are done! * The
           | "product manager" way where you have reasonable-sounding
           | requests like "I was typing in the middle of a paragraph,
           | then someone deleted that paragraph, and my text was gone! It
           | should be its own new paragraph in the same place."
           | 
           | Literally having identical state (or even identical state
           | that adheres to a schema) is hard enough, but I'm not aware
           | of techniques to ensure 1) identical state 2) adhering to a
           | schema 3) that anyone on the team can easily modify in
           | response to "PM-like" demands without being a sync expert.
        
           | miki123211 wrote:
           | And then there's the third super-special category of shared
           | data with no central server, and where only certain users
           | should be allowed to perform certain operations. This comes
           | up most often in p2p networks, censorship resistance etc.
           | 
           | In most cases, the easiest approach there is just "slap a
           | blockchain on it", as a good and modern (think Ethereum, not
           | Bitcoin) blockchain essentially "abstracts away" the
           | decentralization and mostly acts like a centralized computer
           | to higher layers.
           | 
           | That is certainly not the only viable approach, and I wish we
           | looked at others more. For example, a decentralized DNS-like
           | system, without an attached cryptocurrency, but with global
           | consensus on what a given name points to, would be extremely
           | useful. I'm not convinced that such a thing is possible, you
           | need some way of preventing one bad actor from grabbing all
           | the names, and monetary compensation seems like the easiest
           | one, but we should be looking in this direction a lot more.
        
         | danielvaughn wrote:
         | CRDTs work well for linear data structures, but there are known
         | issues with hierarchical ones. For instance, if you have a
         | tree, then two clients could send a transaction that would
         | cause a node to be a parent of itself.
         | 
         | That said, there's work that has been done towards fixing some
         | of those issues.
         | 
         | Evan Wallace (I think he's the CTO of Figma) has written about
         | a few solutions he tried for Figma's collaborative features.
         | And then Martin Kleppmann has a paper proposing a solution:
         | 
         | https://martin.kleppmann.com/papers/move-op.pdf
        
           | jdvh wrote:
           | As long as all clients agree on the order of CRDT operations
           | then cycles are no problem. It's just an invalid transaction
           | that can be dropped. Invalid or contradictory updates can
           | always happen (regardless of sync mechanism) and the
           | resolution is a UX issue. In some cases you might want to
           | inform the user, in other cases the user can choose how to
           | resolve the conflict, in other cases quiet failure is fine.
        
             | jakelazaroff wrote:
             | Unfortunately, a hard constraint of (state-based) CRDTs is
             | that merging causally concurrent changes must be
             | commutative. ie it is possible that clients will _not_ be
             | able to agree on the order of CRDT operations, and they
             | must be able to arrive at the same state after applying
             | them in _any_ order.
        
               | jdvh wrote:
               | I don't think that's required, unless you definitionally
               | believe otherwise.
               | 
               | When clients disagree about the the order of events and a
               | conflict results then clients can be required to roll
               | back (apply the inverse of each change) to the last point
               | in time where all clients were in agreement about the
               | world state. Then, all clients re-apply all changes in
               | the new now-agreed-upon order. Now all changes have been
               | applied and there is agreement about the world state and
               | the process starts anew.
               | 
               | This way multiple clients can work offline for extended
               | periods of time and then reconcile with other clients.
        
               | dboreham wrote:
               | That's not how the CRDT concept works.
        
               | jdvh wrote:
               | You're free to argue that this isn't "pure" CRDT, but the
               | CRDT algorithm still runs normally, just a bit later than
               | it otherwise would.
        
               | satvikpendem wrote:
               | Eg-walker seems similar to what you're proposing [0]. A
               | more in-depth video by the creator [1].
               | 
               | [0] https://loro.dev/docs/advanced/event_graph_walker
               | 
               | [1] https://www.youtube.com/watch?v=rjbEG7COj7o
        
           | rapnie wrote:
           | Martin Kleppmann in one of his recent talks about the future
           | of local-first, mentions the need for a generic sync service
           | for the 'local-first end-game' [0] as he calls it.
           | Standardization is needed. Right now everyone and their
           | mother is doing sync differently and building production
           | platforms around their own protocols and mechanisms.
           | 
           | [0] https://www.youtube.com/watch?v=NMq0vncHJvU&t=1016s
        
             | tmpfs wrote:
             | The problem is that the requirements can be vastly
             | different. A collaborative editor is very different to say
             | syncing encrypted blobs. Perhaps there is a one size fits
             | all but I doubt it.
             | 
             | I've been working on sync for the latter use case for a
             | while and CRDTs would definitely be overkill.
        
           | layer8 wrote:
           | Automatic conflict resolution will always be limited. For
           | example, who seriously believes that we'll ever be able to
           | fully automate the handling of merge conflicts in version
           | control? (Even if recorded every single edit operation on the
           | syntax-tree level.) And in regular documents the situation is
           | worse, because you don't have formal parsers and type
           | checkers and unit tests for them. Even for schematized
           | structured data, there are similar issues on the semantic
           | level, that a mere "it conforms to the schema" doesn't solve.
        
         | klabb3 wrote:
         | > The one piece of discussion or attempt at a systematic
         | approach I've seen to 'synchronisation' recently is to do with
         | Conflict-free Replicated Data Types https://crdt.tech
         | 
         | I will go against the grain and say CRDTs have been a
         | distraction and the overfocus on them have been delaying real
         | progress. They are immature and highly complex and thus hard to
         | debug and understand, and have extremely limited cross-language
         | support in practice - let alone any indexing or storage engine
         | support.
         | 
         | Yes, they are fascinating and yes they solve real problems but
         | they are absolute overkill to your problems (except collab
         | editing), at least currently. Why? Because they are all about
         | _conflict resolution_. You can get very far without addressing
         | this problem: for instance a cache, like you mentioned, has no
         | need for conflict resolution. The main data store owns the
         | data, and the cache follows. If you can have single ownership,
         | (single writer) or last write wins, or similar, you can drop a
         | massive pile of complexity on the floor and not worry about it.
         | (In the rare cases it's necessary like Google Docs or Figma I
         | would be very surprised if they use off-the-shelf CRDT libs - I
         | would bet they have an extremely bespoke and domain-specific
         | data structures that are _inspired by_ CRDTs.)
         | 
         | Instead, what I believe we need is end-to-end bidirectional
         | stream based data communication, simple patch/replace data
         | structures to efficiently notify of updates, and standard
         | algorithms and protocols for processing it all. Basically
         | adding async reactivity on the read path of existing data
         | engines like SQL databases. I believe even this is a massive
         | undertaking, but feasible, and delivers lasting tangible value.
        
           | mweidner wrote:
           | Indeed, the simple approach of "send your operations to the
           | server and it will apply them in the order it receives them"
           | gives you good-enough conflict resolution in many cases.
           | 
           | It is still tempting to turn to CRDTs to solve the next
           | problem: how to apply server-side changes to a client when
           | the client has its own pending local operations. But this can
           | be solved in a fully general way using server reconciliation,
           | which doesn't restrict your operations or data structures
           | like a CRDT does. I wrote about it here:
           | https://mattweidner.com/2024/06/04/server-
           | architectures.html...
        
           | ochiba wrote:
           | > Yes, they are fascinating and yes they solve real problems
           | but they are absolute overkill to your problems (except
           | collab editing), at least currently. Why? Because they are
           | all about conflict resolution. You can get very far without
           | addressing this problem: for instance a cache, like you
           | mentioned, has no need for conflict resolution. The main data
           | store owns the data, and the cache follows. If you can have
           | single ownership, (single writer) or last write wins, or
           | similar, you can drop a massive pile of complexity on the
           | floor and not worry about it. (In the rare cases it's
           | necessary like Google Docs or Figma I would be very surprised
           | if they use off-the-shelf CRDT libs - I would bet they have
           | an extremely bespoke and domain-specific data structures that
           | are inspired by CRDTs.)
           | 
           | I agree with this. CRDTs are cool tech but I think in
           | practice most folks would be surprised by the high percentage
           | of use cases that can be solved with much simpler conflict
           | resolution mechanism (and perhaps combined with server
           | reconciliation as Matt mentioned). I also agree that
           | collaborative document editing is a niche where CRDTs are
           | indeed very useful.
        
             | satvikpendem wrote:
             | You might not need a CRDT [0]. But also, CRDTs are the
             | future [1].
             | 
             | [0] https://news.ycombinator.com/item?id=33865672
             | 
             | [1] https://news.ycombinator.com/item?id=24617542
        
           | 9rx wrote:
           | _> In the rare cases it's necessary like Google Docs or Figma
           | I would be very surprised if they use off-the-shelf CRDT
           | libs_
           | 
           | Or CRDTs at all. Google Docs is based on operational
           | transforms and Figma on what they call multiplayer
           | technology.
        
         | pwdisswordfishz wrote:
         | > Cache Invalidation? Basically a sync problem.
         | 
         | Does naming things and off-by-one errors also count?
        
         | ochiba wrote:
         | > And 'synchronisation' as a practice gets very little
         | attention or discussion. People just start with naive
         | approaches like 'download whats marked as changed' and then get
         | stuck in the quagmire of known problems and known edge cases
         | (handling deletions, handling transport errors, handling
         | changes that didn't get marked with a timestamp, how to repair
         | after a bad sync, dealing with conflicting updates etc).
         | 
         | I've spent 16 years working on a sync engine and have worked
         | with hundreds of enterprises on sync use cases during this
         | time. I've seen countless cases of developers underestimating
         | the complexity of sync. In most cases it happens exactly as you
         | said: start with a naive approach and then the fractal
         | complexity spiral starts. Even if the team is able to do the
         | initial implementation, maintaining it usually turns into a
         | burden that they eventually find too big to bear.
        
         | jbmsf wrote:
         | Absolutely. My current product relies heavily on a handful of
         | partner systems and, adds an opinionated layer on top of these
         | systems, and propagates data to CRM, DW, and other analytical
         | systems.
         | 
         | One early insight was that we needed a representation of
         | partner data in our database (and the downstream systems need a
         | representation of our opinionated view as well). This is
         | clearly an (eventually consistent) synchronization problem.
         | 
         | We also realized that we often either fail to sync (due to
         | bugs, timing, or whatever) and need a regular process to resync
         | data.
         | 
         | We've ended up with a homegrown framework that does both
         | things, such that the same business logic gets used in both
         | cases. This also makes it easy to backfill data if a chosen
         | representation changes)
         | 
         | We're now on the third or fourth iteration of this system and
         | I'm pretty happy with it.
        
           | delusional wrote:
           | Once you add a periodic resync you have moved the true
           | synchronization away from the online "(eventually consistent)
           | synchronization" and into the batch resync. At that point the
           | online synchronization is just a performance optimization on
           | top of the batch resync.
           | 
           | I've been in that situation a lot, and I'd always carefully
           | consider if you even need the online synchronization at that
           | point. It's pretty rarely required.
        
             | jbmsf wrote:
             | In our case it absolutely is. There are user facing flows
             | that require data from partner systems to complete. Waiting
             | for the next sync cycle isn't a good UX.
        
         | mattnewport wrote:
         | UI is also a sync problem if you squint a bit. React like
         | systems are an attempt to be a sync engine between model and
         | view in a sense.
         | 
         | Multiplayer games too.
        
       | iansinnott wrote:
       | Have been using Instant for a few side projects recently and it
       | has been a phenomenal experience. 10/10, would build with it
       | again. I suspect this is also at least partially true of client-
       | server sync engines in general.
        
         | kenrick95 wrote:
         | I concur with this. Been using it on my side project that only
         | have a front-end. The "back-end" is 100% InstantDB. Although
         | for me, I found that the permissions part a bit hard to
         | understand, especially when it involves linking to other
         | namespace. Haven't checked them for a while, maybe they've
         | improved on this...
        
       | skybrian wrote:
       | This is also a tricky UI problem. Live updates, where web pages
       | move around on you while you're reading them, aren't always
       | desirable. When you're collaborating with someone you know on the
       | same document, you want to see edits immediately, but what about
       | a web forum? Do you really need to see the newest responses, or
       | is this a distraction? You might want a simple indicator that a
       | reload will show a change, though.
       | 
       | A white paper showing how Instant solves synchronization problems
       | might be nice.
        
       | qudat wrote:
       | The problem with sync engines is needing full-stack buy-in in
       | order for it to work properly. Having a separate backend-for-
       | frontend service defeats the purpose in my mind. So what do you
       | do when a company already has an API and other clients beyond a
       | web app? The web app has to accommodate. I see this as the major
       | downside with sync engines.
       | 
       | I've been using `starfx` which is able to "sync" with APIs using
       | structured concurrency: https://github.com/neurosnap/starfx
        
       | wslh wrote:
       | Sync, in general, is a very complex topic. There are past
       | examples, such as just trying to sync contacts across different
       | platforms where no definitive solution emerged. One fundamental
       | challenge is that you can't assume all endpoints behave fairly or
       | consistently, so error propagation becomes a core issue to
       | address.
       | 
       | Returning to the contacts example, Google Contacts attempts to
       | mitigate error propagation by introducing a review stage, where
       | users can decide how to handle duplicates (e.g., merge contacts
       | that contain different information).
       | 
       | In the broader context of sync, this highlights the need for
       | policies to handle situations where syncing is simply not
       | possible beyond all the smart logic we may implement.
        
       | zelon88 wrote:
       | Here's an idea.... Stop putting your critical business data on
       | disparate third party systems that you don't have access to.
       | Problem solved!
        
       | voidpointer wrote:
       | Probably a silly question, but if you take this all the way and
       | treat everything as a DB that is synchronized in the background,
       | how do you manage access control where not every user/client is
       | supposed to have access to every object represented in the DB?
       | Where does that logic go? If you do it on the document level like
       | figma or canvas, every document is a DB and you sync the changes
       | that happen to the document but first you need access to the
       | document/DB. But doesn't this whole idea break apart if you need
       | to do access control on individual parts of what you treat as the
       | DB because you would need to have that logic on the client which
       | could never be secure...
        
       | PaulHoule wrote:
       | Lotus Notes was a product far ahead of its time (nearly forgotten
       | today) which was an object database with synchronization
       | semantics. They made a lot of decisions that seem really strange
       | today, like building an email system around it, but that
       | empowered it for long-running business workflows. It's something
       | everybody in the low-code/no-code space really needs to think
       | about.
        
       | spankalee wrote:
       | The problem I have with "moving the database to the client" is
       | the same one I have in practice with CRDTs: In my apps, I need to
       | preserve the history of changes to documents, and I need to
       | validate and authenticate based on high-level change
       | descriptions, not low-level DB access.
       | 
       | This always leads me back to operational transforms. Operations
       | being reified changes function as undo records; a log of changes;
       | and a narrower, semantically-meaningful API, amenable to
       | validation and authz.
       | 
       | For the Roam Firebase example: this only works if you can either
       | trust the client to always perform valid actions, or you can
       | fully validate with Firebase's security rules.
       | 
       | OT has critiques, but almost all of the fall away in my
       | experience when you have a star topology with a central service
       | that mediates everything - defining the canonical order of
       | operations, performs validation & auth, and records the operation
       | log.
        
         | jimbokun wrote:
         | > This always leads me back to operational transforms.
         | Operations being reified changes function as undo records; a
         | log of changes; and a narrower, semantically-meaningful API,
         | amenable to validation and authz.
         | 
         | Sounds like another kind of synchronization database.
        
           | spankalee wrote:
           | I think it's only a database if you come down on the "logs
           | are the source of truth, not tables" side of the logs vs
           | tables debate. And if you do, any log is a database, I
           | guess...
        
       | VikingCoder wrote:
       | There are two hard problems:
       | 
       | 1. Naming things
       | 
       | 2. Caching
       | 
       | 3. Off-by-one errors
        
       | curtisblaine wrote:
       | Related:
       | 
       | - https://news.ycombinator.com/item?id=43436645
       | 
       | - https://greenvitriol.com/posts/sync-engine-for-everyone
        
       | loquisgon wrote:
       | The local first people (https://localfirstweb.dev/) have some
       | cool ideas about how to solve the data synch problem. Check it
       | out.
        
       | shikhar wrote:
       | We have had interest in using our serverless stream API
       | (https://s2.dev/) to power sync engines. Very excited about these
       | kinds of use cases, email in profile if anyone wants to chat.
        
       | finolex wrote:
       | If anyone could be kind to give feedback on the local-first x
       | data ownership db we're building, would really appreciate it!
       | https://docs.basic.tech/
       | 
       | Will do my best to take action on any feedback I receive here
        
       | Nelkins wrote:
       | Discussion of sync engines typically goes hand in hand with
       | local-first software. But it seems to be limited to use cases
       | when the amount of data is on the smaller side. For example, can
       | anyone imagine how there might be a local-first version of a
       | recommendation algorithm (I'm thinking something TikTok-esque)?
       | This would be a case where the determination of the
       | recommendation relies on a large amount of data.
       | 
       | Or think about any kind of large-ish scale enterprise SaaS. One
       | of the clients I'm working with currently sells a Transportation
       | Management Software system (think logistics, truck loads, etc).
       | There are very small portions of the app that I can imagine
       | relying on a sync engine, but being able to search over hundreds
       | of thousands of truck loads, their contents, drivers, etc seems
       | like it would be infeasible to do via a sync engine.
       | 
       | I mention this because it seems that sync engines get a lot of
       | hype and interest these days, but they apply to a relatively
       | small subset of applications. Which may still be a lot, but it's
       | a bit much to say they're the future (I'm inferring "of
       | application development"--which is what I'm getting from this
       | article).
        
         | ochiba wrote:
         | I think that is where sync engines come in that allow doing
         | arbitrary hybrid queries (across local and remote data) and
         | then keeping the results of those hybrid queries in sync on the
         | client.
         | 
         | This is one of the ideas that appears to be central to the
         | genesis of Zero [1]
         | 
         | ElectricSQL allows for a similar pattern and PowerSync is also
         | working on this [2]
         | 
         | [1] https://www.youtube.com/watch?v=rqOUgqsWvbw
         | 
         | [2] https://www.powersync.com/blog/powersync-2025-roadmap-
         | sqlite...
        
           | Nelkins wrote:
           | Interesting! I'll give these a look.
           | 
           | Edit: I watched the presentation (which I really enjoyed) and
           | also read the blog post. For anyone with less time, the
           | answer is essentially: don't sync everything, treat the local
           | data like a cache. Sync as much as you can into that cache,
           | and then reach out to the server for other things.
        
       | beders wrote:
       | I found it quite disappointing to find a marketing piece from
       | Nikki.
       | 
       | It is full of general statements that are only true for a subset
       | of solutions. Enterprise solutions in particular are vastly more
       | complex and can't be magically made simple by a syncing database.
       | (no solution comes even close to "99% business code". Not unless
       | you re-define what business code is)
       | 
       | It is astounding how many senior software engineers or architects
       | don't understand that their stack contains multiple data models
       | and even in a greenfield project you'll end up with 3 or more.
       | Reducing this to one is possible for simple cases - it won't
       | scale up. (Rama's attempt is interesting and I hope it proves me
       | wrong)
       | 
       | From: "yeah, now you don't need to think about the network too
       | much" to "humbug, who even needs SQL"
       | 
       | I've seen much bigger projects fail because they fell for one or
       | both of these ideas.
       | 
       | While I appreciate some magic on the front-end/back-end gap,
       | being explicit (calling endpoints, receiving server-side-events)
       | is much easier to reason about. If we have calls failing, we know
       | exactly where and why. Sprinkle enough magic over this gap and
       | you'll end up in debugging hell.
       | 
       | Make this a laser focused library and I might still be interested
       | because it might remove actual boilerplate. Turn it into a full-
       | stack and your addressable market will be tiny.
        
       | hamilyon2 wrote:
       | I am feeling a bit confused. Is not the stated problem solved
       | 99.9% with decades-old battle-proven optimistic locking and some
       | careful retries?
        
       | delusional wrote:
       | > I've yet to see a code base that has maintained a separate in-
       | memory index for data they are querying
       | 
       | Define "separate" but my old X11 compositor project neocomp I did
       | something like that with a series of AOS arrays and bitfields
       | that combined to make a sort of entity manager. Each index in the
       | arrays was an entity, and each array held a data associated with
       | a "type" of entity. An entity could hold multiple types that
       | would combine to specify behavior. The bitfield existed to make
       | it quick to query.
       | 
       | It waaay too complicated for what it was, but it was fun to code
       | and worked well enough. I called it a "swiss" (because it was
       | full of holes). It's still online on github
       | (https://github.com/DelusionalLogic/NeoComp/blob/master/src/s...)
       | even though I don't use it much anymore.
        
       | Pamar wrote:
       | Maybe I am just dumb but I really cannot see how data synch could
       | solve what (in my kind of business) is a real problem.
       | 
       | Example: you develop a web app to book for flights online.
       | 
       | My browser points to it and I login. Should synchronization start
       | right now? Before I even input my departure point and date?
       | 
       | Ok, no. I write NYC -> BER, and a dep date.
       | 
       | Should I start synching now?
       | 
       | Let's say I do. Is this really more efficient than querying a
       | webservice?
       | 
       | Ok, now all data are synched. Even potentially the ones for
       | business class, even if I just need economy.
       | 
       | You kniw, I could always change my mind later. Or find out that
       | on the day I need to travel no economy seats are available
       | anymore.
       | 
       | Whatever. I have all the inventory data that I need. Raw.
       | 
       | Guess what? As a LH frequent flyer I get special treatment in
       | terms of price. Not just for LH, but most Business Alliance
       | airlines.
       | 
       | This logic is usually on the server, because airlines want
       | maximum creativity and flexibility in handling inventory.
       | 
       | Should we just synch data and make the offer selection algorithm
       | run on the webserver instead?
       | 
       | Let's say it does not matter... I have somehow in front of me all
       | the options for my trip. So I call my wife to confirm she agrees
       | with my choice. I explain her the alternatives... this takes 5
       | minutes.
       | 
       | In this period, 367 other people are buying/cancelling trips to
       | Europe. So I either see my selection constantly change (yay!
       | Synchronization!!!) or I press confirm, and if my choice is gine
       | I get a warning message and I repeat my query.
       | 
       | Now add two elements: - airlines prefer not to show real numbers
       | of available seats - they will usually send you a single digit
       | from 1 to 9 or a "*" to mean "10 or more".
       | 
       | So just symching raw data and let the combinatorial engine work
       | in the browser is not a very good idea.
       | 
       | Also, I see the pontential to easily mount DDOS attacks if every
       | client is constantly being synchronized by copying high
       | contention tables in RT.
       | 
       | What am I missing here?
        
         | earthnail wrote:
         | Your use case doesn't benefit from your own data. There's
         | nothing you can do that doesn't require a direct interaction
         | from the server.
         | 
         | I write an audio recording app, and in my app, users have most
         | to gain from their own data. For most people, syncing is
         | basically an afterthought. In this use case, the ability of
         | having your recordings in your phone is the most important
         | thing.
         | 
         | The difference here lies that in my app, the user generates all
         | the valuable data themselves. In your app, nothing valuable can
         | happen without communication with the airline.
        
       | quantadev wrote:
       | IPFS is a technology very helpful for syncing. One way it's being
       | used in a modern context (although only sub-parts of IPFS stack)
       | is how BlueSky engineers, during their design process a few years
       | ago, accepted my proposal that for a new Social Media protocol,
       | each user should have his own "Repository" (Basically a Merkel
       | Tree) of everything he's ever posted. Then there's just a "Sync"
       | up to some master service provider node (decentralized set of
       | nodes/servers) for the rest of the world to consume.
       | 
       | Merkel-Tree based synching is as performant as you can possibly
       | get (used by Git protocol too I believe) because you can tell of
       | a root of a tree-structure is identical to some other remote tree
       | structure just by comparing the Hash Strings. And this can be
       | recursively applied down any "changed branches" of a tree to
       | implement very fast syncing mechanisms.
       | 
       | I think we need a NEW INTERNET (i.e. Web3, and dare I say
       | Semantic Web built in) where everyone's basically got their own
       | personal "Tree of Stuff" they can publish to the world, all
       | naively built into some new kind of tree structure-based killer
       | app. Like imagine having Jupyter Notebooks in Tree form, where
       | everything on it (that you want to be) is published to the web.
        
       | ativzzz wrote:
       | I've always wondered, how do applications with more stringent
       | security requirements handle this?
       | 
       | Assume that permissions to any row in the DB can be removed at
       | any time. If we store the data offline, this security measure is
       | already violated. If you don't care about a user potentially
       | storing data they no longer have access to, when they come
       | online, any operations they make are invalid and that's fine
       | 
       | But, if security access is part of your business logic, and is
       | complex enough to the point where it lives in your app and not in
       | your DB (other than using DB tools like RLS), how do you verify
       | that the user still has access to all cached data? Wouldn't you
       | need to re-query every row every time?
       | 
       | I'm still uncertain how these sync engines can be secured
       | properly
        
       | fxnn wrote:
       | The author would be excited to learn that CouchDB solves this
       | problem since 20 years.
       | 
       | The use case the article describes is exactly the idea behind
       | CouchDB: a database that is at the same time the server, and
       | that's made to be synced with the client.
       | 
       | You can even put your frontend code into it and it will happily
       | serve it (aka CouchApp).
       | 
       | https://couchdb.apache.org
        
       | ltbarcly3 wrote:
       | This has been solved every 5 years or so, and along the way
       | people learn why this solution doesn't actually work.
        
       ___________________________________________________________________
       (page generated 2025-03-21 23:01 UTC)