[HN Gopher] Sync Engines Are the Future
___________________________________________________________________
Sync Engines Are the Future
Author : GarethX
Score : 288 points
Date : 2025-03-18 10:18 UTC (3 days ago)
(HTM) web link (www.instantdb.com)
(TXT) w3m dump (www.instantdb.com)
| theamk wrote:
| TL/DR:
|
| > If your database is smart enough and capable enough, why would
| you even need a server? Hosted database saves you from the
| horrors of hosting and lets your data flow freely to the
| frontend.
|
| (this is a blog of one such hosted database provider)
| Sytten wrote:
| That quote is why security people will always be employed.
|
| Jokes aside firebase access control is a nightmare and all
| those database as an APi thing have the same problem.
| SuperNinKenDo wrote:
| Apropos of the other reply to you about security. Maybe some
| security people could let me know their thoughts on this.
|
| It seems like generally, the best way to expose your database
| to the internet is considered to be not doing so in the first
| place, i.e., have your webserver query and cache a hosted
| database that isn't directly exposed.
|
| Is my understanding correct? It seems that almost all data
| breaches we hear about are directly exposed databases or their
| cloud equivalents.
|
| Is doing this in the era of "cloud" being made impossible?
| worthless-trash wrote:
| I don't think most largescale breaches are directly exposed
| databases, they are just the ones that summon the largest
| face palms.
| TeMPOraL wrote:
| That's in some sense a "Swiss cheese security model". It's
| not that databases should, _in principle_ , never be directly
| exposed. It's that they rarely are designed for it security-
| wise[0]; meanwhile, adding whatever complex assembly of
| containers and applications written in random languages and
| frameworks, to sit between users and the database, introduces
| a swamp of better-secured systems that attackers also needs
| to get through. The more cruft you pile on, the more annoying
| it gets for attackers and users alike.
|
| In fact, there are many benefits of directly exposed
| databases - many of which would remove the need for
| applications normally sitting on top of those databases,
| which are strictly inferior and less ergonomic and overall
| more shitty than a generic database browsing interface. But
| that's another reason for why things are the way they are:
| people wanna make money, and having your application be a
| toll booth between useful data you own and the rest of the
| world, is tried and true way of making money.
|
| --
|
| [0] - Because they're not normally exposed, because they're
| not designed for it, because... it's a self-reinforcing loop.
| dvrp wrote:
| What do you propose as solution for companies to be able to
| embrace more "liberating" philosophies such as anti-lock-in
| measures or copyleft-friendly measures?
|
| It seems that solving that is a cultural/economic problem, not
| a technical one, and that's a shame.
| tbrownaw wrote:
| > _decoupled from the horrors of an unreliable network_
|
| The first rule of network transparency is: the network is not
| transparent.
|
| > _Or: I've yet to see a code base that has maintained a separate
| in-memory index for data they are querying_
|
| Is boost::multi_index_container no longer a thing?
|
| Also there's SQLite with the :memory: database.
|
| And this ancient 4gl we use at work has in-memory tables (as in
| database tables, with typed columns and any number of unique or
| not indexes) as a basic language feature.
| anonyfox wrote:
| In Elixir/Erlang thats quite common I think, at least I do this
| for when performance matters. Put the specific subset of
| commonly used data into a ETS table (= in memory cache,
| allowing concurrent reads) and have a GenServer (who owns that
| table) listen to certain database change events to update the
| data in the table as needed.
|
| Helps a lot with high read situations and takes considerable
| load off the database with probably 1 hour of coding effort if
| you know what you're doing.
| TeMPOraL wrote:
| > _Is boost::multi_index_container no longer a thing?_
|
| Depends on the shop. I haven't seen one in production so far,
| but I don't doubt some people use it.
|
| > _Also there 's SQLite with the :memory: database._
|
| Ah, now that's cheating. I know, because I did that too. I did
| that because of the realization that half the members I'm
| stuffing into classes to store my game state are effectively
| poor man's hand-rolled tables, indices and spatial indices, so
| _why not just use a proper database for this?_.
|
| > _And this ancient 4gl we use at work has in-memory tables (as
| in database tables, with typed columns and any number of unique
| or not indexes) as a basic language feature._
|
| Which one is this? I've argued in the past that this is a basic
| feature _missing from 4GL languages_ , and a lot of work in
| every project is wasted on hand-rolling in-memory databases
| left and right, without realizing it. It would seem I've missed
| a language that recognized this fact?
|
| (But then, so did most of the industry.)
| phyrex wrote:
| ABAP, the SAP language has that, if i remember correctly
| tbrownaw wrote:
| > _Which one is this? I 've argued in the past that this is a
| basic feature missing from 4GL languages, and a lot of work
| in every project is wasted on hand-rolling in-memory
| databases left and right, without realizing it. It would seem
| I've missed a language that recognized this fact?_
|
| https://en.wikipedia.org/wiki/OpenEdge_Advanced_Business_Lan.
| ..
|
| Dates back to 1981, called "Progress 4GL" until 2006.
|
| https://docs.progress.com/bundle/abl-
| reference/page/DEFINE-T...
| ximm wrote:
| > have a theory that every major technology shift happened when
| one part of the stack collapsed with another.
|
| If that was true, we would ultimately end up with a single layer.
| Instead I would say that major shifts happen when we move the
| boundaries between layers.
|
| The author here proposes to replace servers by synced client-side
| data stores.
|
| That is certainly a good idea for some applications, but it also
| comes with drawbacks. For example, it would be easier to avoid
| stale data, but it would be harder to enforce permissions.
| worthless-trash wrote:
| I feel like this is the "serverless" discussion all over again.
|
| There was still a server, its just not YOUR server. In this
| case, there will still be servers, just maybe not something
| that you need to manage state on.
|
| This misnaming creates endless conflict when trying to
| communicate this with hyper excited management who want to get
| on the latest trend.
|
| Cant wait to be on the meeting and hearing: "We dont need
| servers when we migrate to client side data stores".
| TeMPOraL wrote:
| I think the management isn't hyper excited about naming - in
| fact, they couldn't care less for what the name means (it's
| just a buzzword). What they're excited about is what the
| thing does - which is, turn more capex into opex. With
| "cloud", we can subscribe to servers instead of owning them.
| With "serverless", we can subscribe directly to what servers
| do, without managing servers themselves. Etc.
| Diederich wrote:
| Recently, something quite rare happened. I needed to Xerox
| some paper documents. Well, such actions are rare today, but
| years ago, it was quite common to Xerox things.
|
| Over time, the meaning of the word 'Xerox' changed. More
| specifically, it gained a new meaning. For a long time, Xerox
| only referred to a company named in 1961. Some time in the
| late 60s, it started to be used as a verb, and as I was
| growing up in the 70s and 80s, the word 'Xerox' was
| overwhelmingly used in its verb form.
|
| Our society decided as a whole that it was ok for the noun
| Xerox to be used a verb. That's a normal and natural part of
| language development.
|
| As others have noted, management doesn't care whether the
| serverless thing you want to use is running on servers or
| not. They care that they don't have to maintain servers
| themselves. CapEx vs OpEx and all that.
|
| I agree that there could be some small hazard with the idea
| that, if I run my important thing in a 'serverless' fashion,
| then I don't have to associate all of the
| problems/challenges/concerns I have with 'servers' to my
| important thing.
|
| It's an abstraction, and all abstractions are leaky.
|
| If we're lucky, this abstraction will, on average, leak very
| little.
| slifin wrote:
| I'm surprised to see Tonsky here
|
| Mostly because I consider the state of the art on this to be
| Clojure Electric and he presumably is aware of it at least to
| some degree but does not mention it
| profstasiak wrote:
| thank you for mentioning! I have been reading a lot about sync
| engines and never saw Clojure Electric being mentioned here on
| HN!
| tonsky wrote:
| Clojure Electric is different. It's not really a sync, it's
| more of a thin client. It relies of having fast connection to
| server at all times, and re-fetches everything all the time.
| They innovation is that they found a really, really ergonomic
| way to do it
| quotemstr wrote:
| Clojure Electric is proprietary software, which disqualifies
| it immediately no matter its other purported benefits
| dustingetz wrote:
| Electric's network state distribution is fully incremental,
| i'm not sure what you mean by "re-fetches everything all the
| time" but that is not how i would describe it.
|
| If you are referring to virtual scroll over large collections
| - yes, we use the persistent connection to stream the window
| of _visible records_ from the server in realtime as the user
| scrolls, affording approximately realtime virtual scroll over
| arbitrarily large views (we target collections of size
| 500-50,000 records and test at 100ms artificial RT latency,
| my actual prod latency to the Fly edge network is 6ms RT
| ping), and the Electric client retains in memory precisely
| the state needed to materialize the current DOM state, no
| more no less. Which means the client process performance is
| decoupled from the size of the dataset - which is NOT the
| case for sync engines, which put high memory and compute
| pressure on the end user device for enterprise scale
| datasets. It also inherits the traditional backend-for-
| frontend security model, which all enterprise apps require,
| including consumer apps like Notion that make the bulk of
| their revenue from enterprise citizen devs and therefore are
| exposed to enterprise data security compliance. And this is
| in an AI-focused world where companies want to defend against
| AI scrapers so they can sell their data assets to foundation
| model providers for use in training!
|
| Which IMO is the real problem with sync engines: they are not
| a good match for enterprise applications, nor are they a good
| match for hyper scale consumer saas that aspire to sell into
| enterprise. So what market are they for exactly?
| mananaysiempre wrote:
| I'm also surprised, but more because I remember very vividly
| his previous post on sync[1] which described a much more user-
| friendly (andm much less startup-friendly) system.
|
| [1] https://tonsky.me/blog/crdt-filesync/
| ForTheKidz wrote:
| > You'll get your data synced for you
|
| How does this happen without an interface for conflict
| resolution? That's the hard part.
| Sammi wrote:
| All this recent hype about sync engines and local first
| applications completely disregards conflict resolution. It's
| the reason why syncing isn't mainstream already and it isn't
| solved and arguably cannot be.
|
| Imagine if git just on its own picked what to keep and what to
| throw away when there's a conflict. You fundamentally need the
| user to make the choice.
| porridgeraisin wrote:
| Precisely. The hype articles write all about the journey to
| The Wall, and then leave out the bit where you smash
| headfirst into it.
| lifty wrote:
| Very good point. The local-sync ecosystem is still in a young
| phase, and conflict resolution hasn't been tackled or solved
| yet. Most systems have a |last write wins" approach.
| sgt wrote:
| > All this recent hype about sync engines and local first
| applications completely disregards conflict resolution.
|
| Not really true though. I've used a couple of local sync
| engines, one internally built and another one which is both
| commercial and now open source called PowerSync[1]. Conflict
| resolution is definitely on the agenda, and a developer is
| definitely going to be mindful of conflicts when designing
| the application.
|
| [1] https://www.powersync.com/
| Sammi wrote:
| My unfortunate point is that the dev cannot know what the
| user is doing, and so cannot in principle know what choice
| to make on behalf of the user in case of a conflict. This
| is not a code problem. It cannot be solved with code.
| sgt wrote:
| I've found that in almost all cases - the latest update
| "wins" strategy is fine. You could have two sessions
| working with conventional API calls and still have a
| conflict. As a dev you need to restrict what the user
| _can_ do.
| Jyaif wrote:
| > All this recent hype about sync engines and local first
| applications completely disregards conflict resolution
|
| The main concern of sync engines is precisely the conflict
| resolution! Everything else is simple in comparison.
|
| The good news is that under some circumstances it _is_
| possible to solve conflicts without user intervention. The
| simplest example is a counter that can only be incremented.
| More advanced data structures automatically solving conflicts
| exists, for example solving conflicts for strings exists, and
| those are good enough for a text editor.
|
| I agree that there will be conflicts that are resolved in a
| way that yields non-sensical text, for example if there are 2
| edits of the sentence "One cat":
|
| One cat => Two cats
|
| One cat => One dog
|
| The resulting merge may be something like "Two cats dog".
| Something else (the user, an LLM...) will then have to fix
| it.
|
| But that's totally OK, because in practice this will happen
| extremely rarely, only when the user would have been offline
| for a long time. That user will be happy to have been able to
| work offline, largely compensating the fact that they have to
| proof read the text again.
| SkiFire13 wrote:
| This doesn't "solve" conflict resolution, it just picks one
| of the possible answers and then doesn't care whether it
| was the correct one or not.
|
| It can be acceptable for some usecases, but not for others
| where you're still concerned about stuff that happens
| "extremely rately" and is not under your direct control.
|
| > Something else (the user, an LLM...) will then have to
| fix it.
|
| This assumes that user/llm knows the conflict was
| automatically solved and might need to be fixed, so the
| conflict is still there! You just made the manual part
| delayed and non-mandatory, but if you want correctness it
| will still have to be there.
| brulard wrote:
| > in practice this will happen extremely rarely, only when
| the user would have been offline for a long time.
|
| I don't think it would happen "extremely rarely". Drops in
| connectivity happen a lot, especially on cellular
| connection and this can absolutely happen a lot for some
| applications. Especially when talking about "offline first"
| apps.
| Jyaif wrote:
| You have to use another device during that drop of
| connectivity on cellular connection, and edit the same
| content. That doesn't happen often.
| aboodman wrote:
| Zero (zerosync.dev) uses transactional conflict resolution,
| which is what our prior product Replicache and Reflect both
| used. It is very similar to what multiplayer games have done
| for decades.
|
| It is described here:
|
| https://rocicorp.dev/blog/ready-player-two
|
| It works really well and we and our customers have found it
| to be quite general.
|
| It allows you to run an arbitrary transaction on the sever
| side to decide what to do in case of conflicts. It is the
| software equivalent of git asking the user what to do. Zero
| asks your code what to do.
|
| But it asks it in the form of the question "please run the
| function named x with these inputs on the current backend db
| state". Which is a much more ergonomic way to ask it than
| "please do a 3-way merge between these three states".
|
| Conflict resolution is not the reason why there has not been
| a general-purpose sync engine. None of our customers have
| ~ever complained about conflict resolution.
|
| The reason there has not been a general-purpose sync engine
| is actually on the _read_ side: - Previous
| sync engines really want you to sync all data. This is
| impractical for most apps. - Previous sync engines
| do not have practical approaches to permissions.
|
| These problems are being solved in next generation of sync
| engines.
|
| For more on this, I talk about it some here:
|
| https://www.youtube.com/watch?v=rqOUgqsWvbw
| probabletrain wrote:
| I think with good presence (being able to see what other
| users are doing) and an app that isn't used offline,
| conflicts are essentially not a problem. As long as
| whatever is resolving the conflicts resolves them in a way
| that doesn't break the app, e.g. making sure there aren't
| cycles in some multiplayer app with a tree datastructure.
| Sounds like Zero has the right idea here, I'll build
| something on it imminently to try it out.
| Sammi wrote:
| Agree that if you don't have offline support, then
| conflict resolution is such a minor issue that you can
| just do "last write wins" and call it a day.
| probabletrain wrote:
| > Previous sync engines really want you to sync all data
|
| Linear had to do all sorts of shenanigans to be able to
| sync all data, for orgs with lots of it - there's a talk on
| that here:
|
| https://www.youtube.com/watch?v=Wo2m3jaJixU&t=1473s
| Sammi wrote:
| "It is the software equivalent of git asking the user what
| to do. Zero asks your code what to do."
|
| You are asking the dev what to do. You are _not_ asking the
| user what to do. This is akin of the git devs baking in a
| choice into git on what to keep in a merge conflict.
|
| It's hard to trust you guys when you misrepresent like
| this. I thought long and hard on whether to respond
| confrontationally like this, but decided you really need to
| hear the push back on this.
| aboodman wrote:
| lol wut?
|
| I represented that we ask the dev what to do:
|
| > Zero asks your code what to do
|
| You agree that's what we do:
|
| > You are asking the dev what to do. You are _not_ asking
| the user what to do.
|
| I get that your actual issue is you don't think that what
| we do is "the software equivalent of git asking the user
| what to do". But like, I also said what we do concretely
| in the same paragraph. It's not like I was trying to hide
| something. This is a metaphor for how to understand our
| approach to conflict resolution that works for most
| developers. Like all metaphors it is not perfect.
|
| FWIW, there is nothing stopping a developer from having
| this function just save off a forked copy and ask the
| user what to do. Some developers do this.
|
| Also FWIW, Zero does not allow offline writes
| specifically because we want to educate people how to
| properly handle conflicts before we do. I see down-thread
| this is the majority of your concern.
| Sammi wrote:
| I assumed you were doing offline support yeah. I've heard
| a lot about local first development lately, so I guessed
| this what what you guys are tackling too.
|
| Without offline support AND you're doing real time
| updating of data, then conflict resolution is not a real
| world practical concern. Users will be looking at the
| same data at the same time anyways, so they generally see
| what data won out in case of a conflict, as they are
| looking at real time data as they are editing.
|
| IF you had offline support, and for other sync engines
| that do: There is a real and meaningful difference
| between a backend dev and an end user of the application
| choosing what to do in case of a conflict. A backend dev
| cannot make a general case algorithm that knows that two
| end users want to keep or throw away in a conflict,
| because this is completely situational - users could be
| doing whatever. And if you push the conflict resolution
| to the end users, then you are asking a lot of those
| users. They need to be technically inclined and motivated
| people in order to take the time to understand and
| resolve the conflict. Like with git users.
| aboodman wrote:
| > Without offline support AND you're doing real time
| updating of data, then conflict resolution is not a real
| world practical concern.
|
| I disagree with this. There are many real-world cases
| where keywise lww does the wrong thing. The article I
| linked up-thread covers many of them. Even a simple
| counter does the wrong thing.
|
| This is where robust conflict resolution really matters
| in these systems, not the long-time offline case people
| often ask about.
|
| You need robust conflict resolution to make correct
| software and maintain invariants in the face of
| write/write systems.
|
| > A backend dev cannot make a general case algorithm that
| knows that two end users want to keep or throw away in a
| conflict, because this is completely situational - users
| could be doing whatever. And if you push the conflict
| resolution to the end users, then you are asking a lot of
| those users. They need to be technically inclined and
| motivated people in order to take the time to understand
| and resolve the conflict. Like with git users.
|
| I agree completely. In my opinion the ideal offline-first
| write/write UI has never been built, but the team at Ink
| & Switch are closest:
|
| https://www.inkandswitch.com/patchwork/notebook/
|
| I think the perfect UX in many cases is that syncs goes
| ahead and tries to land the offline writes, but the user
| has a history UI where they can see what happened. Like
| how many collaborative apps do today.
|
| But importantly in this UI the app would represent
| branches and merges. But unlike Git's fine grained
| branch/merge points, in this UI it would literally
| represent points where people went offline and made
| changes.
|
| Users could then go back and recover the version of their
| data from when they were offline, or compare (probably
| manually in two tabs) the two different versions of the
| data and recover.
|
| This does still ask users to compare and resolve
| conflicts in the worst case, but it is not a blocking
| operation or one that is final. The more common case is
| the user will go ahead with the merge and sometimes find
| some corruption. They can always go back and see what
| went wrong after the fact and fix. This seems like the
| right tradeoff to me of making the common case (no
| conflict) easy and automatic but making the uncommon but
| scary case at least not dangerous.
|
| There also needs to be clear first-class UX telling users
| that they're going offline and what will happen when they
| come online.
|
| I'm looking forward to someday working on this, but it's
| not what our users ask about most often so we're just
| disabling offline writes for now.
| jamil7 wrote:
| > All this recent hype about sync engines and local first
| applications
|
| Kind of but only really in the web world, it was the default
| on desktop for a long time and is pretty common on mobile.
| phito wrote:
| Right, first thing I did after opening the article is
| CTRL-F'ing for conflict, and got zero result. How are they not
| talking about the only real problem about the local-first
| approach? The rest is just boiler plate code.
| tonsky wrote:
| Ah, no. Not really. People sometimes think about conflict
| resolution as a problem that needs to be solved. But it's not
| solvable, not really. It's part of the domain, it's not going
| anywhere, it's irreducible complexity.
|
| You _will_ have conflicts (because your app is distributed and
| there are concurrent writes). They will happen on semantic
| level, so only you (app developer) _will_ be able to solve
| them. Database (or any other magical tool) can't do it for you.
|
| Another misconception is that conflict resolution needs to be
| "solved" perfectly before any progress can be made. That is not
| true as well. You might have unhandled conflicts in your system
| and still have a working, useful, successful product. Conflicts
| might be rare, insignificant, or people (your users) will just
| correct for/work around them.
|
| I am not saying "drop data on the floor", of course, if you can
| help it. But try not to overthink it, either.
| DaiPlusPlus wrote:
| > But it's not solvable, not really. It's part of the domain,
| it's not going anywhere, it's irreducible complexity. You
| _will_ have conflicts (because your app is distributed and
| there are concurrent writes). [...] Another misconception is
| that conflict resolution needs to be "solved" perfectly
| before any progress can be made. That is not true as well.
| You might have unhandled conflicts in your system and still
| have a working, useful, successful product. Conflicts might
| be rare, insignificant, or people (your users) will just
| correct for/work around them.
|
| I can't speak for whatever application-level problems you
| were trying to solve, but many problem-cases can be massaged
| into being conflict-free by adding constraints (or rather:
| discovering constraints inherent in the business-domain you
| can use). For example (and the best example, too) is to use
| an append-only logical model: then the synchronization
| problem reduces down to merge-sort. Another kind of
| constraint might be to simply disallow "edit" access to local
| data when working-offline (without a prior lock or lease
| being taken) but still allowing "create".
|
| > Database (or any other magical tool) can't do it for you.
|
| Yes-and-no.
|
| While I'm no fan of CORBA and COM+ (...or SOAP, or WS-
| OhGodMakeItStop), but being "enterprise-y" it meant they
| brought distributed-transactions to any application, and that
| includes RDBMS-mediated distributed transactions (let's
| agree, an RDBMS is in a far greater position to be a better
| canonical transaction-server than an application-server
| running in-front of it). For distributed systems needing
| transient distributed locks to prevent conflicts in the first
| place (so only used by interactive users in the same LAN,
| really) this worked just-as-well as a local-only solution -
| and make it fault-tolerant too.
|
| ...so it is unfortunate that with the (absolutely justified)
| back-to-basics approach with REST[1] that we lose built-in
| support for distributed transactions (even some of the more
| useful and legitimate parts of WebDAV (and so, piggy-backing
| on our web-servers' built-in support for WebDAV verbs) seem
| to be going-away) - this all raises the barrier-to-entry for
| doing distributed-transactions _right_, which means the next
| set of college-hires won't have been exposed to it, which
| means it won't be a standard expected feature in the next
| major internal application they'll write for your org, which
| means you'll either have a race-condition impacting a multi-
| billion-dollar business thing that no-one knows how to fix or
| more likely, just a crappy UX where you have to tell your
| users not to reload the page too quickly "just in case". Yes,
| I see advisories like that in the Zendesk pages of the next
| line-of-business SaaS you'll be voluntold to integrate into
| your org.
|
| (I think today, the "best" way to handle distributed-locking
| between interactive-users in a web-app would necessitate
| using a ServiceWorker using WebRTC, SSE, or a highly-reliable
| WebSocket - which itself is a load of work right there - and
| don't forget to do _all_ your JS feature-checks because
| eventually someone will try to use your app on an old Safari
| edition because they want to keep on using their vintage Mac)
| - or anyone using Incognito mode, _gah_.
|
| [1]: https://devblast.com/b/calling-your-web-api-restful-
| youre-do...
| avodonosov wrote:
| They elaborate on the conflicts in the "80/20 for Multiplayer"
| section of this essay:
| https://www.instantdb.com/essays/next_firebase
|
| (make sure to also read the footnote [28] there).
| DeathArrow wrote:
| I've solved data sync in distributed apps long time ago. I send
| outgoing data to /dev/null and receive incoming data from
| /dev/zero. This way data is always consistent. That also helps
| with availability and partion tolerance.
| paduc wrote:
| Before I write anything to the DB, I validate with business
| logic.
|
| Should I write this logic in the DB itself ? Seems impractical.
| scotty79 wrote:
| I think that's the main issue. It's not enough to have a
| database that can automatically sync between frontend and
| backend. It would also need to be complex enough to keep some
| logic just on the backend (because you don't want to reveal it
| and entrust adherence to the client) and reject some changes
| done on frontend if they are invalid. Database would become the
| app itself.
| acac10 wrote:
| Which many DBs allow: - stored procedures - Oracle PL/SQL
|
| I used to work for Oracle but never liked that approach.
| Sammi wrote:
| The issue with stored procedures is testing and code
| maintenance. How do I run unit tests? How do I version
| control and code review?
| TeMPOraL wrote:
| It's the same issue that killed the image-based
| programming in favor of edit-compile-run cycle we're all
| doing. "How do I test? How do I do version control? How
| do I migrate?".
|
| These are valid concerns, but $deity I wish we focused on
| finding solutions for them, because the current paradigm
| of edit/compile/run + plaintext single source of truth
| codebase, is already severely limiting our ability to
| build and maintain complex software.
| brulard wrote:
| While I don't like the idea of putting logic to the DBRMS
| (if not for a really good reason), you can do unit tests
| and code reviews. In a serious project you already should
| have a way to make migrations and versioning of the DB
| itself (for example using prisma, drizzle, etc.).
| Procedures would be just another entry in the migrations
| and unit tests can create testing temporary DB, run the
| procedures and compare the results. I agree tooling is
| (AFAIK) not good and there will be much more work around
| that, but it is possible.
| x0x0 wrote:
| The other issue, from experience, is needing to
| reimplement logic as well -- you end up with stored
| procedures that duplicate logic that also must be run
| either in your server or on your client. eg given the
| state of the system, is this mutation valid.
|
| Then those multiple implementations inevitably suffer
| different bugs and drift, leading to really ugly bugs.
| scotty79 wrote:
| I don't think a stored procedure that operates only on
| master copy of the database can reject update comming from
| a second copy and nicely comminicate thus happened so that
| the other copy can infrom the user through some ui.
| Terr_ wrote:
| > logic in the DB
|
| Something similar but in the opposite direction of lessening
| DB-responsibilities in favor of logic-layer ones: Driving
| everything from an event log. (Related to CQRS, Event-
| Sourcing.)
|
| It means a bit less focus on "how do I ensure this data-
| situation never ever ever happens" logic, and a bit more "how
| shall I model escalation and intervention when weird stuff
| happens anyway."
|
| This isn't as bad as it sounds, because any sufficiently
| old/large software tends to accrue a bunch of informal
| tinkering processes anyway. It's what drives the unfortunate
| popularity of DB rows with a soft-deleted mark (that often
| require manual tinkering to selectively restore) because
| somebody always wants a special undo which is never really just
| one-time-only.
| TeMPOraL wrote:
| > _Should I write this logic in the DB itself ?_
|
| Yes?
|
| If it sounds impractical, it's because the whole industry got
| used to _not learning databases_ beyond most basic SQL, and
| doing everything by hand in application code itself. But given
| how much of code in most applications is just ad-hoc
| reimplementation of databases, and then how much of the
| business logic is tied to data and not application-specific
| things, I can 't help but wonder - maybe a better way would be
| to treat RDBMS as an application framework and have application
| itself be a thin UI layer on top?
|
| On paper it definitely sounds like grouping concerns better.
| lloeki wrote:
| > treat RDBMS as an application framework and have
| application itself be a thin UI layer on top?
|
| Stored procedures have been a thing. I've seen countless apps
| that had a thin VB UI and a MSSQL backend where most of the
| logic is implemented. Or, y'know, Access. Or spreadsheets
| even!
|
| And before that AS/400&al.
|
| But ORMs came in and the impedance mismatch is then too
| great. Splitting data wrangling across two completely
| differing points of views makes it extremely hard to reason
| about.
| brulard wrote:
| While stored procedures/triggers etc. can be powerful, it has
| been taught for decades now that it is an antipattern to put
| business logic to the RDBMS (for more or less valid reasons).
| Some concerns I would have would be vendor lock-in and limits
| of the provided language.
| Tobani wrote:
| In very simple systems that makes sense. But as soon as your
| validation requires talking to a third party, or you have
| side effects like sending emails you have to suddenly move
| all that logic back out. You end up with system that isn't
| very easy to iterate on.
| Nextgrid wrote:
| You can model external system interactions with tables
| representing "mailboxes" - so for example if a DB stored
| procedure needs to call a third-party API to create a
| resource, it writes a row in the "outbox" table for that
| API, then application-level code picks that up, makes the
| API call, parses the response (extracts the required
| fields) and stores it in an "inbox" table so now the
| database has access to the response (and a trigger can run
| the remainder of the business process upon insertion of
| that row).
| Tobani wrote:
| Yes, but then you've removed parent comments' assertion
| that everything should be done by the RDBMS. And you've
| changed the contract of the action.
| TeMPOraL wrote:
| Surely some RDBMS has the ability to run REST queries,
| possibly via SQL by pretending it's a table or something.
|
| I can imagine that working on a good day. I don't dare
| imagine _error handling_ (though would love to look at
| examples).
|
| Ultimately, it probably makes no sense to do everything
| in the database, but I still believe we're doing _way_
| too much in the application, and too little in the DB.
| Some of the logic really belongs to data (and needs to be
| duplicated for any program using the same data, or
| else...; probably why people don 't like to share
| databases between programs).
|
| And, at a higher level, I wonder how far we could go if
| we pushed all data-specific logic into the DB, and the
| rest (like REST calls) into dedicated components, _and_
| used a generic orchestrator to glue the parts together?
| What of the "application code" would remain then, and
| where would it sit?
| tonsky wrote:
| If you think of an existing database, like Postgres, sure. It's
| not very convenient.
|
| What I am saying is, in a perfect world, database and server
| will be the one and run code _and_ data at the same time.
| There's really no good reason why they are separated, and it
| causes a lot of inconveniences right now.
| Tobani wrote:
| Sure in an ideal world we don't need to worry about resources
| and everything is easy. There are very good reason why they
| are separated now. There have been systems like 4th dimension
| and K that combine them for decades. They're great for
| systems of a certain size. They do struggle once their
| workload is heavy enough, and seem to struggle to scale out.
| Being able to update my application without updating the
| storage engine reduces the risk. Having standardized backup
| solutions for my RDBMS means is a whole level of effort I
| don't have to worry about. Data storage can even be optimized
| without my application having to be updated.
| zx8080 wrote:
| > decoupled from the horrors of an unreliable network
|
| There's no such thing as reliable network in the world. The world
| is network connected, there's almost no local-only systems
| anymore (for a long long time now).
|
| Some engineers dream that there's some cases when network is
| reliable, like when a system fully lives in the same region and
| single AZ. But even then it's actually not reliable and can have
| some glitches quite frequently (like once per month or so,
| depending on some luck).
| 01HNNWZ0MV43FF wrote:
| True. Even the network between the CPU and an SD card or USB
| drive is not reliable
| tonsky wrote:
| > There's no such thing as reliable network in the world
|
| I'm not saying there is
| jimbokun wrote:
| I believe the point is that given an unreliable network, it's
| nice to have access to all the data available locally up to the
| point when you had a network issue. And then when the network
| is working again, your data comes up to date with no extra work
| on the application developer's part.
| myflash13 wrote:
| Locally synced databases seem to be a new trend. Another example
| is Turso, which works by maintaining a sort of SQLite-DB-per-
| tenant architecture. Couple that with WASM and we've basically
| come full circle back to old school desktop apps (albeit with
| sync-on-load). Fat client thin client blah blah.
| arkh wrote:
| The future of webapps: wasm in the browser, direct SQL for the
| API.
|
| Main problem? No result caching but that's "just" a middleware to
| implement.
| TeMPOraL wrote:
| Also the past of webapps. We don't have that because doing this
| properly, in a way that's maximally useful and ergonomic for
| the users, pretty much kills the entire business of the web. If
| you give direct SQL access to the underlying data, you can no
| longer seek rent by putting a bloated, barely-functional app in
| front of the database, nor can you use it to funnel users or
| upsell them stuff. Most of the money in this industry is made
| from rent-seeking.
| hyperbolablabla wrote:
| How does this compare to supabase?
| jFriedensreich wrote:
| that question comes up all the time for some reason but
| supabase does not support offline or sync, only some form of
| subscription updating, but this has nothing to do with having
| sync or local data.
| zareith wrote:
| I think an underappreciated library in this space is Logux [1]
|
| It requires deeper (and more) integration work compared to
| solutions that sync your state for you, but is a lot more
| flexible wrt. the backend technology choices.
|
| At its core, it is an action synchronizer. You manage both your
| local state and remote state through redux-style actions, and the
| library takes care of syncing and resequencing them (if needed)
| so that all clients converge at the same state.
|
| [1] https://logux.org/
| mike_hearn wrote:
| I recently took a part time role at Oracle Labs and have been
| learning PL/SQL as part of a project. Seeing as Niki is shilling
| for his employer, perhaps it's OK for me to do the same here :)
| [1]. HN discourse could use a bit of a shakeup when it comes to
| databases anyway. This may be of only casual interest to most
| readers, but some HN readers work at places with Oracle licenses
| and others might be surprised to discover it can be cheaper than
| an AWS managed Postgres [2].
|
| It has a couple of features relevant to this blog post.
|
| The first: Niki points out that in standard SQL producing JSON
| documents from relational tables is awkward and the syntax is
| terrible. This is true, so there's a better syntax:
| CREATE JSON RELATIONAL DUALITY VIEW dept_w_employees_dv AS
| SELECT JSON {'_id' : d.deptno,
| 'departmentName' : d.dname, 'location'
| : d.loc, 'employees' :
| [ SELECT JSON {'employeeNumber' :e.empno,
| 'name' : e.ename} FROM employee e
| WHERE e.deptno = d.deptno ] } FROM
| department d WITH UPDATE INSERT DELETE;
|
| It makes compound JSON documents from data stored relationally.
| This has three advantages: (1) JSON documents get materialized on
| demand by the database instead of requiring frontend code to do
| it, (2) the ORDS proxy server can serve these over HTTP via
| generic authenticated endpoints (e.g. using OAuth or cookie based
| auth) so you may not need to write any code beyond SQL to get
| data to the browser, and (3) the JSON documents produced can be
| written to, not only read.
|
| The second feature is query change notifications. You can issue a
| command on a connection that starts recording the queries issued
| on it and then get a callback or a message posted to an MQ when
| the results change (without polling). The message contains some
| info about what changed. So by wiring this up to a web socket,
| which is quite easy, the work of an hour or two in most web
| frameworks, then you can stream changes to the client directly
| from the database without needing much logic or third party
| integrations. You either use the notification to trigger a full
| requery and send the entire result json back to the browser, or
| you can get fancier and transform the deltas to json subsets.
|
| It'd be neat if there was a way to join these two features
| together out of the box, but AFAIK if you want full streaming of
| document deltas to the browser and reconstituting them there, it
| would need a bit more on top.
|
| Again, you may feel this is irrelevant because doesn't every
| self-respecting HN reader use Postgres for everything, but it's
| worth knowing what's out there. Especially as the moment you
| decide to paying a cloud for hosting your DB you have crossed the
| Rubicon anyway (all the hosted DBs are proprietary forks of
| Postgres), so you might as well price out alternatives.
|
| [1] and you know the drill, views are my own and nobody has
| reviewed this post.
|
| [2] https://news.ycombinator.com/item?id=42855546
| vessenes wrote:
| Lots of good people work at Oracle, and I am sure you're one of
| them.
|
| HOWEVER. There is no world where lifetime costs of using
| Postgres for _any_ successful company _anywhere_ in the world
| are greater than using Postgres. I understand that 's a key
| message for your sales team to get out, but only one of the
| CEOs at Oracle and Percona has flown a fighter jet underneath
| the Golden Gate Bridge.
|
| Oracle licensing is famously, famously sticky. Extremely.
| Incredibly. It's how the company was built and is maintained.
| mike_hearn wrote:
| Great, let's debate!
|
| I've never talked to database sales people and have no idea
| what messages they have or care about. Actually I'm 99% sure
| they don't care about the HN/startup crowd at all - did you
| see anyone except me talk about this stuff here? Me neither.
| I'm making this argument basically because I like making
| arguments early that are surprising but correct, and
| databases feel like fertile ground for such arguments.
| There's a lot of groupthink in this space. And you know all
| about my history with surprising technology arguments, Peter
| ;)
|
| Anyway I'd be interested to see a spreadsheet with a worked
| set of scenarios for both cost and "stickiness" however it's
| defined (genuinely). I think it's going to depend heavily on:
|
| a) Whether you cloud host or not. The cost of a small
| Postgres that you run yourself is pretty much whatever your
| own time is valued at, as self-hosted hardware is cheap. The
| costs of a Postgres you outsource can be really un-
| intuitively high. I already showed that a cloud hosted
| elastic Oracle DB can be cheaper for the same-spec AWS-
| managed Postgres despite a massive feature disparity on one
| side. Costs here aren't dominated by hardware nor software
| purchase costs.
|
| b) What features and scaling level you need, combined with
| cost of labour in your area. If you want to scale up a
| Postgres based operation very fast then that's going to take
| a ton of skilled engineering effort, devs will be slowed down
| a lot as they spend time on implementing custom sharding
| schemes etc. At some point the cost of rolling your own ad-
| hoc solutions to these things will cross with the cost of
| just buying a system that already solves them all out of the
| box. Where that cross-point is will depend on all kinds of
| things like opportunity cost, cost of hiring, cost of
| developer productivity....
|
| b) Whether you consider unique features to be "stickiness".
| You're claiming the licensing is sticky here but companies
| negotiate all kinds of licenses so what does that mean? By
| default it's charged per core like any other commercial db
| (or in the cloud by core seconds/storage). If unique features
| are the problem then that's an aspect of choosing any tech
| platform. If you're taking advantage of full SQL joins on a
| 50-node horizontally scaled multi-master cluster then yeah,
| trying to migrate to something else is going to be sticky
| because there aren't many other products that offer that.
| That's tech for you. Still, these days I guess it must be
| less sticky because there are other people selling very
| scalable SQL-speaking databases like Spanner.
|
| As for Larry Ellison's stunts, that's great but if you're
| deciding what platform to use on the basis of executive
| horsepower then you can pick between fighter jets, Jeff
| Bezo's rockets, Bill Gates' yachts or Larry Page's flying
| cars. Selling databases seems to go hand in hand with high
| tech vehicles, which is probably a sign there's some actual
| value being delivered there, somewhere.
| vessenes wrote:
| :)
|
| I referenced Larry as a proxy for his extreme wealth.
| Although it is true he's one of the great businessmen of
| the late 20th century. Just not the sort you want to be in
| a business deal with in general.
|
| Oracle has always been good at both adding helpful
| functions that developers rely on, making switching
| difficult, and also at teasing companies into using more
| licenses than they've purchased, then smacking them with
| audits and fees as a stick, and a 'cheaper' larger license
| as a carrot to avoid the audit fees.
|
| In the 90s, this was tech like PL/SQL and Materialized
| views - I'm long out of the Oracle game, so I have no idea
| where they compete on features now vis-a-vis open source --
| but I will say that I have owned companies where the Oracle
| license was both HATED -- and outlived all original owners
| of the company. It's hard to replace once it's in your
| workflow, and that is 100% by design.
| mike_hearn wrote:
| I guess audits are fading away as more people move to the
| cloud. Audits are used by other enterprise tech sellers
| as well because you don't want DRM or telemetry in
| something like a mission critical HA DB that runs behind
| a firewall. So audits it is. Cloud solves all that
| (admittedly, whilst trading off against data privacy).
| vessenes wrote:
| Makes sense. And metering is a leveling playing field in
| terms of cost assessments (if you discount tail costs to
| $0 that is)
| avodonosov wrote:
| Why he haven't implemented a full Datomic Peer for his DataScript
| I never understood.
|
| Having a datalog query engine, supplying it with data from
| Datomic indexes - b-tree like collections storing entity-
| attribute-value records - seems simple. Updating the local index
| cache from log is also simple.
|
| And that gets you a db in browser.
| tonsky wrote:
| It's not as simple as you make it sound:
|
| - Reliable communication is hard - Optimistic writes should on
| client are hard - Tracking subsets of data is hard (you don't
| want the entirety of Datomic on the client, do you?) -
| Permissions are hard in this model
|
| Why didn't I implement it? Mostly comes down to free time. It's
| a hobby project and it's hard to find time for it. I also
| stopped writing web apps so immediate pressure for this went
| away.
| onion2k wrote:
| Isn't this what CouchDB/PouchDB solves in quite a nice way?
| paul_h wrote:
| I always found the documentation lacking and it not 100% clear
| what was in couchbase (commercial & OSS) vs couchdb and which I
| really wanted
| fridder wrote:
| That was my first thought! https://couchdb.apache.org/ is
| pretty good though is it still the incremental views with JS?
| joeeverjk wrote:
| If sync really is the future, do you think devs will finally stop
| pretending local-first apps are some niche thing and start
| building around sync as the core instead of the afterthought? Or
| are we doomed to another decade of shitty conflict resolution
| hacks?
| Zanfa wrote:
| > Or are we doomed to another decade of shitty conflict
| resolution hacks?
|
| Conflict resolution is never going away. It's important to
| distinguish between syntactical and semantical conflicts
| though, the first of which can be solved, but the other will
| always require manual intervention.
| Tobani wrote:
| I think this makes sense for applications applications that are
| just managing data maybe? But if your application needs to do
| things when you change that data (like call to a third party
| system)... Syncing is maybe not the solution. What happens when
| the total dataset is large, do you need to download 6gb of data
| every time you log in? Now you've blown up the quota on local
| storage. How do you make sure the appropriate data is
| downloaded or enough data? How do you prioritize the data you
| need NOW instead of waiting for that last byte of the 6gb to
| download?
|
| It is like a useful tool, but not the only future.
| keizo wrote:
| didn't know that about roam research. I was a user, but also that
| app convinced me that front-end went in the wrong direction for a
| decade...
|
| Rocicorp Zero Sync, instantdb, linear app like trend is great --
| sync will be big. I hope a lot of the spa slop gets fixed!
| mentalgear wrote:
| Honourable mentions of some more excellent fully open-source sync
| engines:
|
| - Zero Sync: https://github.com/rocicorp/mono
|
| - Triplit: https://github.com/aspen-cloud/triplit
| guappa wrote:
| > - Zero Sync: https://github.com/rocicorp/mono
|
| Doesn't even have a readme :D Raise the bar a bit maybe.
| thruflo wrote:
| https://zero.rocicorp.dev/docs/introduction
|
| Hard to raise the bar on Zero. It's a brilliant system.
| profstasiak wrote:
| can you share how are you using this? Production / side
| projects?
|
| Would you recommend it for side projects?
| jakelazaroff wrote:
| GP is probably not himself using Zero because he's the
| CEO of Electric, which also makes a sync engine:
| https://electric-sql.com
| thunderbong wrote:
| "Website and Docs" is the second line I see
| daveguy wrote:
| It does have a readme. Click the "View all files" button.
|
| But you don't have to. GitHub shows the readme just below the
| partial file list. That's what all the same-page docs on
| GitHub/GitLab repositories are.
|
| Full docs are linked from the readme.
| mentalgear wrote:
| if you know of other honourable mentions, reply with their
| source link!
| aboodman wrote:
| There are so many:
|
| - https://github.com/electric-sql/electric
|
| - https://github.com/powersync-ja
|
| - https://github.com/get-convex
|
| - https://github.com/tinyplex/tinybase
|
| - https://github.com/garden-co/jazz
| mentalgear wrote:
| Convex I didnt know yet - looks really crisp (even has
| svelte support) ! Do you have experience with it? Does it
| support (decentralized) E2E?
| aboodman wrote:
| No, Convex is a client/server system like zero, electric,
| instant, powersync.
|
| If you want a fully decentralized system, check out jazz.
| It is the best of these currently IMO.
| sergioisidoro wrote:
| I've been very curious about electric -- the idea of giving
| your application a replicated subset of your databse, using
| your api as a proxy, is quite interesting for apps where
| the business layer between the db and the client is thin
| (our case).
|
| edit: Also their decision to make it just one way sync
| makes a LOT of sense. Write access brings a lot of scary
| cases, so by making it only read sync eases some of my
| anxieties. I can still use Rest / RPC for updating the data
| bushido wrote:
| https://rxdb.info/ is a good one.
| ochiba wrote:
| Useful directory of tools here: https://localfirstweb.dev/
| mackopes wrote:
| I'm not convinced that there is one generalised solution to sync
| engines. To make them truly performant at large scale, engineers
| need to have deep understanding of the underlying technology,
| their query performance, database, networking, and build a custom
| sync engine around their product and their data.
|
| Abstracting all of this complexity away in one general
| tool/library and pretending that it will always work is snake
| oil. There are no shortcuts to building truly high quality
| product at a large scale.
| tonsky wrote:
| - You can have many sync engines
|
| - Sync engines might only solve small and medium scale, that
| would be a huge win even without large scale
| wim wrote:
| We've built a sync engine from scratch. Our app is a
| multiplayer "IDE" but for tasks/notes [1], so it's important to
| have a fast local first/office experience like other editors,
| and have changes sync in the background.
|
| I definitely believe sync engines are the future as they make
| it so much easier to enable things like no-spinners browsing
| your data, optimistic rendering, offline use, real-time
| collaboration and so on.
|
| I'm also not entirely convinced yet though that it's possible
| to get away with something that's not custom-built, or at least
| large parts of it. There were so many micro decisions and
| trade-offs going into the engine: what is the granularity of
| updates (characters, rows?) that we need and how does that
| affect the performance. Do we need a central server for things
| like permissions and real-time collaboration? If so do we want
| just deltas or also state snapshots for speedup. How much
| versioning do we need, what are implications of that? Is there
| end-to-end-encryption, how does that affect what the server can
| do. What kind of data structure is being synced, a simple
| list/map, or a graph with potential cycles? What kind of
| conflict resolution business logic do we need, where does that
| live?
|
| It would be cool to have something general purpose so you don't
| need to build any of this, but I wonder how much time it will
| save in practice. Maybe the answer really is to have all kinds
| of different sync engines to pick from and then you can decide
| whether it's worth the trade-off not having everything custom-
| built.
|
| [1] https://thymer.com
| mentalgear wrote:
| Optimally, a sync engine would have the ability to be
| configed to have the best settings for the project (e.g.
| central server or completely decentralised). It'd be great if
| one engine would be so performant/configurable, but having a
| lot of sync engines to choose from for your project is the
| best alternative.
|
| btw: excellent questions to ask / insights - about the same I
| also came across in my lo-fi ventures.
|
| Would be great if someone could assemble all these questions
| in a "walkthrough" step-by-step interface and in the end, the
| user gets a list of the best matching engines.
|
| Edit: Mh ... maybe something small enough to vibe code ... if
| someone is interested to help let me know!
| jdvh wrote:
| Completely decentralized is cool, but I think there are two
| key problems with it.
|
| 1) in a decentralized system who is responsible for
| backups? What happens when you restore from a backup?
|
| 2) in a decentralized system who sends push notifications
| and syncs with mobile devices?
|
| I think that in an age of $5/mo cloud vms and free SSL
| having a single coordination server has all the advantages
| and none of the downsides.
| thr0w wrote:
| > Abstracting all of this complexity away in one general
| tool/library and pretending that it will always work is snake
| oil.
|
| Remember Meteor?
| xg15 wrote:
| That might be true, but you might not have those engineers or
| they might be busy with higher-priority tasks:
|
| > _It's also ill-advised to try to solve data sync while also
| working on a product. These problems require patience,
| thoroughness, and extensive testing. They can't be rushed. And
| you already have a problem on your hands you don't know how to
| solve: your product. Try solving both, fail at both._
|
| Also, you might not have that "large scale" yet.
|
| (I get that you could also make the opposite case, that the
| individual requirements for your product are _so special_ that
| you cannot factor out any common behavior. I 'd see that as a
| hypothesis to be tested.)
| Phelinofist wrote:
| The largest feature my team develops is a sync engine. We have a
| distributed speech assistant app (multiple embeddeds [think car
| and smartphone] & cloud) that utilizes the Blackboard pattern.
| The sync engine keeps the blackboards on all instances in sync.
|
| It is based on gRPC and uses a state machine on all instances
| that transitions through different states for connection setup,
| "bulk sync", "live sync" and connection wind down.
|
| Bulk sync is the state that is used when an instance comes online
| and needs to catch up on any missed changes. It is also the self-
| heal mechanism if something goes wrong.
|
| Unfortunately some embedded instances have super unreliable
| clocks that drift quite a bit (in both directions). We consider
| switching to a logical clock.
|
| We have quite a bit of code that deals with conflicts.
|
| I inherited this from my predecessor. Nowadays I would probably
| not implement something like this again, as it is quite complex.
| exceptione wrote:
| I believe the idea of a Blackboard is that there is a single
| blackboard for all processes to asynchronously scribble and
| read from.
|
| Syncing blackboards sounds like going straight against the
| spirit of that design pattern.
| sreekanth850 wrote:
| We use indexedDB and signalr for real time sync. What is new
| about this?
| rockmeamedee wrote:
| Idk man. It's a nice idea, but it has to be 10x better than what
| we currently have to overcome the ecosystem advantages of the
| existing tech. In practice, people in the frontend world already
| use Apollo/Relay/Tanstack Query to do data caching and querying,
| and don't worry too much about the occasional
| overfetching/unoptimized-ness of the setup. If they need to do a
| complex join they write a custom API endpoint for it. It works
| fine. Everyone here is very wary of a "magic data access layer"
| that will fix all of our problems. Serverless turned out to be a
| nightmare because it only partially solves the problem.
|
| At the same time, I had a great time developing on Meteorjs a
| decade ago, which used Mongo on the backend and then synced the
| DB to the frontend for you. It was really fluid. So I look
| forward to things like this being tried. In the end though,
| Meteor is essentially dead today, and there's nothing to replace
| it. I'd be wary of depending so fully on something so important.
| Recently Faunadb (a "serverless database") went bankrupt and is
| closing down after only a few years.
|
| I see the product being sold is pitched as a "relational version
| of firebase", which I think good idea. It's a good idea for
| starter projects/demos all the way up to medium-sized apps, (and
| might even scale further than firebase by being relational), but
| it's not "The Future" of all app development.
|
| Also, I hate to be that guy but the SQL in example could be
| simpler, when aggregating into JSON it's nice to use a LATERAL
| join which essentially turns the join into a for loop and
| synthesises rows "on demand": SELECT g.*,
| COALESCE(t.todos, '[]'::json) as todos FROM goals g
| LEFT JOIN LATERAL ( SELECT json_agg(t.*) as todos
| FROM todos t WHERE t.goal_id = g.id ) t ON true
|
| That still proves the author's point that SQL is a very
| complicated tool, but I will say the query itself looks simpler
| (only 1 join vs 2 joins and a group by) if you know what you're
| doing.
| timita wrote:
| > Meteor is essentially dead today
|
| Care to explain what you mean by "dead"? Just today v3.2 came
| out, and the company, the community, and their paid-for hosting
| service seem pretty alive to me.
| asdffdasy wrote:
| > Such a library would be called a database.
|
| bold of them to assume a database can manage even the most
| trivial of conflicts.
|
| There's a reason you bombard all your writes to a
| "main/master/etc"
| profstasiak wrote:
| so... what do people that want to have sync engines do?
|
| I want to try it for hobby project and I think I will go the
| route of just one way sync (from database to clients) using
| electric sql and I will have writes done in a traditional way
| (POST requests).
|
| I like the idea of having server db and local db in sync, but
| what happens with writes? I know people say CRDT etc... but they
| are solving conflicts in unintuitive ways...
|
| I know I probably sound uneducated, but I think the biggest part
| of this is still solving conflicts in a good way, and I don't
| really see how you can solve those in a way that works for all
| different domains and have it "collapsed" as the author says
| codeulike wrote:
| I've been thinking about this a lot - nearly every problem these
| days is a synchronisation problem. You're regularly downloading
| something from an API? Thats a sync. You've got a distributed
| database? Sync problem. Cache Invalidation? Basically a sync
| problem. You want online and offline functionality? sync problem.
| Collaborative editing? sync problem.
|
| And 'synchronisation' as a practice gets very little attention or
| discussion. People just start with naive approaches like
| 'download whats marked as changed' and then get stuck in the
| quagmire of known problems and known edge cases (handling
| deletions, handling transport errors, handling changes that
| didn't get marked with a timestamp, how to repair after a bad
| sync, dealing with conflicting updates etc).
|
| The one piece of discussion or attempt at a systematic approach
| I've seen to 'synchronisation' recently is to do with Conflict-
| free Replicated Data Types https://crdt.tech which is essentially
| restricting your data and the rules for dealing with conflicts to
| situations that are known to be resolvable and then packaging it
| all up into an object.
| mrkeen wrote:
| I've looked at CRDTs, and the concept really appeals to me in
| the general case, but in the specific cases, my design always
| ends up being "keep-all-the-facts" about a particular item. But
| then you defer the problem of 'which facts can I throw away?'.
| It's like inventing a domain-specific GC.
|
| I'd love to hear about any success cases people have had with
| CRDTs.
| yccs27 wrote:
| For me the main issue with CRDTs is that they have a fixed
| merge algorithm baked in - if you want to change how
| conflicts get resolved, you have to change the whole data
| structure.
| WorldMaker wrote:
| I feel like the state-of-the-art here is slowly starting to
| change. I think CRDTs for too many years got too caught up
| in "conflict-free" as a "manifest destiny" sort of thing
| more than "hope and prayer" and thought they'd keep finding
| the right fixed merged algorithm for every situation. I
| started watching CRDTs from the perspective of source
| control and having a strong inkling that "data is always
| messy" and "conflicts are _human_ " (conflicts are kind of
| inevitable in any structure trying to encode data made by
| people).
|
| I've been thinking for a bit that it is probably about time
| the industry renamed that first C to something other than
| "conflict-free". There is no freedom from conflicts.
| There's _conflict resistance_ , sure and CRDTs can provide
| in their various data structures a lot of conflict
| resistance. But at the end of the day if the data structure
| is meant to encode an application for humans, it needs
| every merge tool and review tool and audit tool it can
| offer to deal with those.
|
| I think we're finally starting to see some of the light in
| the tunnel in the major CRDT efforts and we're finally
| leaving the detour of "no it _must_ be conflict-free, we
| named it that so it must be true ". I don't think any one
| library is yet delivering it at a good high level, but I
| have that feeling that "one of the next libraries" is maybe
| going to start getting the ergonomics of conflict handling
| right.
| dtkav wrote:
| This seems right to me -- imagine being able to tag
| objects or sub-objects with conflict-resolution semantics
| in a more supported way (like LWW, edits from a human,
| edits from automation, human resolution required (with or
| without optimistic application of defaults, etc).
|
| Throwing small language models into the mix could make
| merging less painful too -- like having the system take
| its best guess at what you meant, apply it, and flag it
| for later review.
| satvikpendem wrote:
| I just want some structure where it is conflict-free most
| of the time but I can write custom logic in certain
| situations that is used, sort of like an automated git
| merge conflict resolution function.
| dtkav wrote:
| I've been running into this with automated regex edits. Our
| product (Relay [0]) makes Obsidian real-time collaborative
| using yjs, but I've been fighting with the automated
| processes that rewrites markdown links within notes.
|
| The issue happens when a file is renamed by one client, and
| then all other clients pick up the rename and make the
| change to the local files on disk. Since every edit is
| broken down into delete/keep/insert runs, the automated
| process runs rapidly in all clients and can break the
| links.
|
| I could limit the edits to just one client, but it feels
| clunky. Another thought I've had is to use ytext
| annotations, or just also store a ymap of the link metadata
| and only apply updates if they can meet some kind of check
| (kind of like schema validation for objects).
|
| If anyone has a good mental model for modeling automated
| operations (especially find/replace) in ytext please let me
| know! (email in bio).
|
| [0] https://system3.md/relay
| jdvh wrote:
| It's still early, but we have a checkpointing system that
| works very well for us. And once you have checkpoints you can
| start dropping inconsequential transactions in between
| checkpoints, which you're right, can be considered GC.
| However, checkpointing is desirable anyway otherwise new
| users have to replay the transaction log from T=0 when they
| join, and that's impractical.
| dtkav wrote:
| I've also had success with this method. "domain-specific
| GC" is a fitting term.
| FjordWarden wrote:
| There was an article on this website not so long ago about
| using CRDTs for collaborative editing and there was this
| silly example to show how leaky this abstraction can be. What
| if your have the word "color" and one user replaces it with
| "colour" and another deletes the word, what does the CRDT do
| in this case? Well it merges this two edits into "u". This
| sort of makes me skeptical of using CRDTs for user facing
| applications.
| jakelazaroff wrote:
| There isn't a monolithic "CRDT" in the way you're
| describing. CRDTs are, broadly, a kind of data structure
| that allows clients to eventually agree on a final state
| without coordination. An integer `max` function is a simple
| example of a CRDT.
|
| The behavior the article found is peculiar to the
| particular CRDT algorithms they looked at. But they're
| probably right that it's impossible for all conflicting
| edits to "just work" (in general, not just with CRDTs).
| That doesn't mean CRDTs are pointless; you could imagine an
| algorithm that attempts to detect such semantic conflicts
| so the application can present some sort of resolution UI.
|
| Here's the article, if interested (it's very good):
| https://www.moment.dev/blog/lies-i-was-told-pt-1
| debugnik wrote:
| > There isn't a monolithic "CRDT" in the way you're
| describing.
|
| I can't blame people for thinking otherwise, pretty much
| every self-called "CRDT library" I've come across
| implements exactly one such data structure, maybe
| parameterized.
|
| It's like writing a "semiring library" and it's simply
| (min, +).
| josephg wrote:
| I agree! Lots more things are sync. Also: the state of my
| source files -> my compiler (in watch mode), about 20 different
| APIs in the kernel - from keyboard state to filesystem watching
| to process monitoring to connected USB devices.
|
| Also, http caching is sort of a special case of sync - where
| the cache (say, nginx) is trying to keep a synchronised copy of
| a resource from the backend web server. But because there's no
| way for the web server to notify nginx that the resource has
| changed, you get both stale reads and unnecessary polling.
| Doing fan-out would be way more efficient than a keep alive
| header if we had a way to do it!
|
| CRDTs are cool tech. (I would know - I've been playing with
| them for years). But I think it's worth dividing data
| interfaces into two types: owned data and shared data. Owned
| data has a single owner (eg the database, the kernel, the web
| server) and other devices live down stream of that owner.
| Shared data sources have more complex systems - eg everyone in
| the network has a copy of the data and can make changes, then
| it's all eventually consistent. Or raft / paxos. Think git, or
| a distributed database. And they can be combined - eg, the app
| server is downstream of a distributed database. GitHub actions
| is downstream of a git repo.
|
| I've been meaning to write a blog post about this for years.
| Once you realise how ubiquitous this problem is, you see it
| absolutely everywhere.
| jkaptur wrote:
| I can't wait to read that blog post. I know you're an expert
| in this and respect your views.
|
| One thing I think that is missing in the discussion about
| shared data (and maybe you can correct me) is that there are
| two ways of looking at the problem: * The "math/engineering"
| way, where once state is identical you are done! * The
| "product manager" way where you have reasonable-sounding
| requests like "I was typing in the middle of a paragraph,
| then someone deleted that paragraph, and my text was gone! It
| should be its own new paragraph in the same place."
|
| Literally having identical state (or even identical state
| that adheres to a schema) is hard enough, but I'm not aware
| of techniques to ensure 1) identical state 2) adhering to a
| schema 3) that anyone on the team can easily modify in
| response to "PM-like" demands without being a sync expert.
| miki123211 wrote:
| And then there's the third super-special category of shared
| data with no central server, and where only certain users
| should be allowed to perform certain operations. This comes
| up most often in p2p networks, censorship resistance etc.
|
| In most cases, the easiest approach there is just "slap a
| blockchain on it", as a good and modern (think Ethereum, not
| Bitcoin) blockchain essentially "abstracts away" the
| decentralization and mostly acts like a centralized computer
| to higher layers.
|
| That is certainly not the only viable approach, and I wish we
| looked at others more. For example, a decentralized DNS-like
| system, without an attached cryptocurrency, but with global
| consensus on what a given name points to, would be extremely
| useful. I'm not convinced that such a thing is possible, you
| need some way of preventing one bad actor from grabbing all
| the names, and monetary compensation seems like the easiest
| one, but we should be looking in this direction a lot more.
| danielvaughn wrote:
| CRDTs work well for linear data structures, but there are known
| issues with hierarchical ones. For instance, if you have a
| tree, then two clients could send a transaction that would
| cause a node to be a parent of itself.
|
| That said, there's work that has been done towards fixing some
| of those issues.
|
| Evan Wallace (I think he's the CTO of Figma) has written about
| a few solutions he tried for Figma's collaborative features.
| And then Martin Kleppmann has a paper proposing a solution:
|
| https://martin.kleppmann.com/papers/move-op.pdf
| jdvh wrote:
| As long as all clients agree on the order of CRDT operations
| then cycles are no problem. It's just an invalid transaction
| that can be dropped. Invalid or contradictory updates can
| always happen (regardless of sync mechanism) and the
| resolution is a UX issue. In some cases you might want to
| inform the user, in other cases the user can choose how to
| resolve the conflict, in other cases quiet failure is fine.
| jakelazaroff wrote:
| Unfortunately, a hard constraint of (state-based) CRDTs is
| that merging causally concurrent changes must be
| commutative. ie it is possible that clients will _not_ be
| able to agree on the order of CRDT operations, and they
| must be able to arrive at the same state after applying
| them in _any_ order.
| jdvh wrote:
| I don't think that's required, unless you definitionally
| believe otherwise.
|
| When clients disagree about the the order of events and a
| conflict results then clients can be required to roll
| back (apply the inverse of each change) to the last point
| in time where all clients were in agreement about the
| world state. Then, all clients re-apply all changes in
| the new now-agreed-upon order. Now all changes have been
| applied and there is agreement about the world state and
| the process starts anew.
|
| This way multiple clients can work offline for extended
| periods of time and then reconcile with other clients.
| dboreham wrote:
| That's not how the CRDT concept works.
| jdvh wrote:
| You're free to argue that this isn't "pure" CRDT, but the
| CRDT algorithm still runs normally, just a bit later than
| it otherwise would.
| satvikpendem wrote:
| Eg-walker seems similar to what you're proposing [0]. A
| more in-depth video by the creator [1].
|
| [0] https://loro.dev/docs/advanced/event_graph_walker
|
| [1] https://www.youtube.com/watch?v=rjbEG7COj7o
| rapnie wrote:
| Martin Kleppmann in one of his recent talks about the future
| of local-first, mentions the need for a generic sync service
| for the 'local-first end-game' [0] as he calls it.
| Standardization is needed. Right now everyone and their
| mother is doing sync differently and building production
| platforms around their own protocols and mechanisms.
|
| [0] https://www.youtube.com/watch?v=NMq0vncHJvU&t=1016s
| tmpfs wrote:
| The problem is that the requirements can be vastly
| different. A collaborative editor is very different to say
| syncing encrypted blobs. Perhaps there is a one size fits
| all but I doubt it.
|
| I've been working on sync for the latter use case for a
| while and CRDTs would definitely be overkill.
| layer8 wrote:
| Automatic conflict resolution will always be limited. For
| example, who seriously believes that we'll ever be able to
| fully automate the handling of merge conflicts in version
| control? (Even if recorded every single edit operation on the
| syntax-tree level.) And in regular documents the situation is
| worse, because you don't have formal parsers and type
| checkers and unit tests for them. Even for schematized
| structured data, there are similar issues on the semantic
| level, that a mere "it conforms to the schema" doesn't solve.
| klabb3 wrote:
| > The one piece of discussion or attempt at a systematic
| approach I've seen to 'synchronisation' recently is to do with
| Conflict-free Replicated Data Types https://crdt.tech
|
| I will go against the grain and say CRDTs have been a
| distraction and the overfocus on them have been delaying real
| progress. They are immature and highly complex and thus hard to
| debug and understand, and have extremely limited cross-language
| support in practice - let alone any indexing or storage engine
| support.
|
| Yes, they are fascinating and yes they solve real problems but
| they are absolute overkill to your problems (except collab
| editing), at least currently. Why? Because they are all about
| _conflict resolution_. You can get very far without addressing
| this problem: for instance a cache, like you mentioned, has no
| need for conflict resolution. The main data store owns the
| data, and the cache follows. If you can have single ownership,
| (single writer) or last write wins, or similar, you can drop a
| massive pile of complexity on the floor and not worry about it.
| (In the rare cases it's necessary like Google Docs or Figma I
| would be very surprised if they use off-the-shelf CRDT libs - I
| would bet they have an extremely bespoke and domain-specific
| data structures that are _inspired by_ CRDTs.)
|
| Instead, what I believe we need is end-to-end bidirectional
| stream based data communication, simple patch/replace data
| structures to efficiently notify of updates, and standard
| algorithms and protocols for processing it all. Basically
| adding async reactivity on the read path of existing data
| engines like SQL databases. I believe even this is a massive
| undertaking, but feasible, and delivers lasting tangible value.
| mweidner wrote:
| Indeed, the simple approach of "send your operations to the
| server and it will apply them in the order it receives them"
| gives you good-enough conflict resolution in many cases.
|
| It is still tempting to turn to CRDTs to solve the next
| problem: how to apply server-side changes to a client when
| the client has its own pending local operations. But this can
| be solved in a fully general way using server reconciliation,
| which doesn't restrict your operations or data structures
| like a CRDT does. I wrote about it here:
| https://mattweidner.com/2024/06/04/server-
| architectures.html...
| ochiba wrote:
| > Yes, they are fascinating and yes they solve real problems
| but they are absolute overkill to your problems (except
| collab editing), at least currently. Why? Because they are
| all about conflict resolution. You can get very far without
| addressing this problem: for instance a cache, like you
| mentioned, has no need for conflict resolution. The main data
| store owns the data, and the cache follows. If you can have
| single ownership, (single writer) or last write wins, or
| similar, you can drop a massive pile of complexity on the
| floor and not worry about it. (In the rare cases it's
| necessary like Google Docs or Figma I would be very surprised
| if they use off-the-shelf CRDT libs - I would bet they have
| an extremely bespoke and domain-specific data structures that
| are inspired by CRDTs.)
|
| I agree with this. CRDTs are cool tech but I think in
| practice most folks would be surprised by the high percentage
| of use cases that can be solved with much simpler conflict
| resolution mechanism (and perhaps combined with server
| reconciliation as Matt mentioned). I also agree that
| collaborative document editing is a niche where CRDTs are
| indeed very useful.
| satvikpendem wrote:
| You might not need a CRDT [0]. But also, CRDTs are the
| future [1].
|
| [0] https://news.ycombinator.com/item?id=33865672
|
| [1] https://news.ycombinator.com/item?id=24617542
| 9rx wrote:
| _> In the rare cases it's necessary like Google Docs or Figma
| I would be very surprised if they use off-the-shelf CRDT
| libs_
|
| Or CRDTs at all. Google Docs is based on operational
| transforms and Figma on what they call multiplayer
| technology.
| pwdisswordfishz wrote:
| > Cache Invalidation? Basically a sync problem.
|
| Does naming things and off-by-one errors also count?
| ochiba wrote:
| > And 'synchronisation' as a practice gets very little
| attention or discussion. People just start with naive
| approaches like 'download whats marked as changed' and then get
| stuck in the quagmire of known problems and known edge cases
| (handling deletions, handling transport errors, handling
| changes that didn't get marked with a timestamp, how to repair
| after a bad sync, dealing with conflicting updates etc).
|
| I've spent 16 years working on a sync engine and have worked
| with hundreds of enterprises on sync use cases during this
| time. I've seen countless cases of developers underestimating
| the complexity of sync. In most cases it happens exactly as you
| said: start with a naive approach and then the fractal
| complexity spiral starts. Even if the team is able to do the
| initial implementation, maintaining it usually turns into a
| burden that they eventually find too big to bear.
| jbmsf wrote:
| Absolutely. My current product relies heavily on a handful of
| partner systems and, adds an opinionated layer on top of these
| systems, and propagates data to CRM, DW, and other analytical
| systems.
|
| One early insight was that we needed a representation of
| partner data in our database (and the downstream systems need a
| representation of our opinionated view as well). This is
| clearly an (eventually consistent) synchronization problem.
|
| We also realized that we often either fail to sync (due to
| bugs, timing, or whatever) and need a regular process to resync
| data.
|
| We've ended up with a homegrown framework that does both
| things, such that the same business logic gets used in both
| cases. This also makes it easy to backfill data if a chosen
| representation changes)
|
| We're now on the third or fourth iteration of this system and
| I'm pretty happy with it.
| delusional wrote:
| Once you add a periodic resync you have moved the true
| synchronization away from the online "(eventually consistent)
| synchronization" and into the batch resync. At that point the
| online synchronization is just a performance optimization on
| top of the batch resync.
|
| I've been in that situation a lot, and I'd always carefully
| consider if you even need the online synchronization at that
| point. It's pretty rarely required.
| jbmsf wrote:
| In our case it absolutely is. There are user facing flows
| that require data from partner systems to complete. Waiting
| for the next sync cycle isn't a good UX.
| mattnewport wrote:
| UI is also a sync problem if you squint a bit. React like
| systems are an attempt to be a sync engine between model and
| view in a sense.
|
| Multiplayer games too.
| iansinnott wrote:
| Have been using Instant for a few side projects recently and it
| has been a phenomenal experience. 10/10, would build with it
| again. I suspect this is also at least partially true of client-
| server sync engines in general.
| kenrick95 wrote:
| I concur with this. Been using it on my side project that only
| have a front-end. The "back-end" is 100% InstantDB. Although
| for me, I found that the permissions part a bit hard to
| understand, especially when it involves linking to other
| namespace. Haven't checked them for a while, maybe they've
| improved on this...
| skybrian wrote:
| This is also a tricky UI problem. Live updates, where web pages
| move around on you while you're reading them, aren't always
| desirable. When you're collaborating with someone you know on the
| same document, you want to see edits immediately, but what about
| a web forum? Do you really need to see the newest responses, or
| is this a distraction? You might want a simple indicator that a
| reload will show a change, though.
|
| A white paper showing how Instant solves synchronization problems
| might be nice.
| qudat wrote:
| The problem with sync engines is needing full-stack buy-in in
| order for it to work properly. Having a separate backend-for-
| frontend service defeats the purpose in my mind. So what do you
| do when a company already has an API and other clients beyond a
| web app? The web app has to accommodate. I see this as the major
| downside with sync engines.
|
| I've been using `starfx` which is able to "sync" with APIs using
| structured concurrency: https://github.com/neurosnap/starfx
| wslh wrote:
| Sync, in general, is a very complex topic. There are past
| examples, such as just trying to sync contacts across different
| platforms where no definitive solution emerged. One fundamental
| challenge is that you can't assume all endpoints behave fairly or
| consistently, so error propagation becomes a core issue to
| address.
|
| Returning to the contacts example, Google Contacts attempts to
| mitigate error propagation by introducing a review stage, where
| users can decide how to handle duplicates (e.g., merge contacts
| that contain different information).
|
| In the broader context of sync, this highlights the need for
| policies to handle situations where syncing is simply not
| possible beyond all the smart logic we may implement.
| zelon88 wrote:
| Here's an idea.... Stop putting your critical business data on
| disparate third party systems that you don't have access to.
| Problem solved!
| voidpointer wrote:
| Probably a silly question, but if you take this all the way and
| treat everything as a DB that is synchronized in the background,
| how do you manage access control where not every user/client is
| supposed to have access to every object represented in the DB?
| Where does that logic go? If you do it on the document level like
| figma or canvas, every document is a DB and you sync the changes
| that happen to the document but first you need access to the
| document/DB. But doesn't this whole idea break apart if you need
| to do access control on individual parts of what you treat as the
| DB because you would need to have that logic on the client which
| could never be secure...
| PaulHoule wrote:
| Lotus Notes was a product far ahead of its time (nearly forgotten
| today) which was an object database with synchronization
| semantics. They made a lot of decisions that seem really strange
| today, like building an email system around it, but that
| empowered it for long-running business workflows. It's something
| everybody in the low-code/no-code space really needs to think
| about.
| spankalee wrote:
| The problem I have with "moving the database to the client" is
| the same one I have in practice with CRDTs: In my apps, I need to
| preserve the history of changes to documents, and I need to
| validate and authenticate based on high-level change
| descriptions, not low-level DB access.
|
| This always leads me back to operational transforms. Operations
| being reified changes function as undo records; a log of changes;
| and a narrower, semantically-meaningful API, amenable to
| validation and authz.
|
| For the Roam Firebase example: this only works if you can either
| trust the client to always perform valid actions, or you can
| fully validate with Firebase's security rules.
|
| OT has critiques, but almost all of the fall away in my
| experience when you have a star topology with a central service
| that mediates everything - defining the canonical order of
| operations, performs validation & auth, and records the operation
| log.
| jimbokun wrote:
| > This always leads me back to operational transforms.
| Operations being reified changes function as undo records; a
| log of changes; and a narrower, semantically-meaningful API,
| amenable to validation and authz.
|
| Sounds like another kind of synchronization database.
| spankalee wrote:
| I think it's only a database if you come down on the "logs
| are the source of truth, not tables" side of the logs vs
| tables debate. And if you do, any log is a database, I
| guess...
| VikingCoder wrote:
| There are two hard problems:
|
| 1. Naming things
|
| 2. Caching
|
| 3. Off-by-one errors
| curtisblaine wrote:
| Related:
|
| - https://news.ycombinator.com/item?id=43436645
|
| - https://greenvitriol.com/posts/sync-engine-for-everyone
| loquisgon wrote:
| The local first people (https://localfirstweb.dev/) have some
| cool ideas about how to solve the data synch problem. Check it
| out.
| shikhar wrote:
| We have had interest in using our serverless stream API
| (https://s2.dev/) to power sync engines. Very excited about these
| kinds of use cases, email in profile if anyone wants to chat.
| finolex wrote:
| If anyone could be kind to give feedback on the local-first x
| data ownership db we're building, would really appreciate it!
| https://docs.basic.tech/
|
| Will do my best to take action on any feedback I receive here
| Nelkins wrote:
| Discussion of sync engines typically goes hand in hand with
| local-first software. But it seems to be limited to use cases
| when the amount of data is on the smaller side. For example, can
| anyone imagine how there might be a local-first version of a
| recommendation algorithm (I'm thinking something TikTok-esque)?
| This would be a case where the determination of the
| recommendation relies on a large amount of data.
|
| Or think about any kind of large-ish scale enterprise SaaS. One
| of the clients I'm working with currently sells a Transportation
| Management Software system (think logistics, truck loads, etc).
| There are very small portions of the app that I can imagine
| relying on a sync engine, but being able to search over hundreds
| of thousands of truck loads, their contents, drivers, etc seems
| like it would be infeasible to do via a sync engine.
|
| I mention this because it seems that sync engines get a lot of
| hype and interest these days, but they apply to a relatively
| small subset of applications. Which may still be a lot, but it's
| a bit much to say they're the future (I'm inferring "of
| application development"--which is what I'm getting from this
| article).
| ochiba wrote:
| I think that is where sync engines come in that allow doing
| arbitrary hybrid queries (across local and remote data) and
| then keeping the results of those hybrid queries in sync on the
| client.
|
| This is one of the ideas that appears to be central to the
| genesis of Zero [1]
|
| ElectricSQL allows for a similar pattern and PowerSync is also
| working on this [2]
|
| [1] https://www.youtube.com/watch?v=rqOUgqsWvbw
|
| [2] https://www.powersync.com/blog/powersync-2025-roadmap-
| sqlite...
| Nelkins wrote:
| Interesting! I'll give these a look.
|
| Edit: I watched the presentation (which I really enjoyed) and
| also read the blog post. For anyone with less time, the
| answer is essentially: don't sync everything, treat the local
| data like a cache. Sync as much as you can into that cache,
| and then reach out to the server for other things.
| beders wrote:
| I found it quite disappointing to find a marketing piece from
| Nikki.
|
| It is full of general statements that are only true for a subset
| of solutions. Enterprise solutions in particular are vastly more
| complex and can't be magically made simple by a syncing database.
| (no solution comes even close to "99% business code". Not unless
| you re-define what business code is)
|
| It is astounding how many senior software engineers or architects
| don't understand that their stack contains multiple data models
| and even in a greenfield project you'll end up with 3 or more.
| Reducing this to one is possible for simple cases - it won't
| scale up. (Rama's attempt is interesting and I hope it proves me
| wrong)
|
| From: "yeah, now you don't need to think about the network too
| much" to "humbug, who even needs SQL"
|
| I've seen much bigger projects fail because they fell for one or
| both of these ideas.
|
| While I appreciate some magic on the front-end/back-end gap,
| being explicit (calling endpoints, receiving server-side-events)
| is much easier to reason about. If we have calls failing, we know
| exactly where and why. Sprinkle enough magic over this gap and
| you'll end up in debugging hell.
|
| Make this a laser focused library and I might still be interested
| because it might remove actual boilerplate. Turn it into a full-
| stack and your addressable market will be tiny.
| hamilyon2 wrote:
| I am feeling a bit confused. Is not the stated problem solved
| 99.9% with decades-old battle-proven optimistic locking and some
| careful retries?
| delusional wrote:
| > I've yet to see a code base that has maintained a separate in-
| memory index for data they are querying
|
| Define "separate" but my old X11 compositor project neocomp I did
| something like that with a series of AOS arrays and bitfields
| that combined to make a sort of entity manager. Each index in the
| arrays was an entity, and each array held a data associated with
| a "type" of entity. An entity could hold multiple types that
| would combine to specify behavior. The bitfield existed to make
| it quick to query.
|
| It waaay too complicated for what it was, but it was fun to code
| and worked well enough. I called it a "swiss" (because it was
| full of holes). It's still online on github
| (https://github.com/DelusionalLogic/NeoComp/blob/master/src/s...)
| even though I don't use it much anymore.
| Pamar wrote:
| Maybe I am just dumb but I really cannot see how data synch could
| solve what (in my kind of business) is a real problem.
|
| Example: you develop a web app to book for flights online.
|
| My browser points to it and I login. Should synchronization start
| right now? Before I even input my departure point and date?
|
| Ok, no. I write NYC -> BER, and a dep date.
|
| Should I start synching now?
|
| Let's say I do. Is this really more efficient than querying a
| webservice?
|
| Ok, now all data are synched. Even potentially the ones for
| business class, even if I just need economy.
|
| You kniw, I could always change my mind later. Or find out that
| on the day I need to travel no economy seats are available
| anymore.
|
| Whatever. I have all the inventory data that I need. Raw.
|
| Guess what? As a LH frequent flyer I get special treatment in
| terms of price. Not just for LH, but most Business Alliance
| airlines.
|
| This logic is usually on the server, because airlines want
| maximum creativity and flexibility in handling inventory.
|
| Should we just synch data and make the offer selection algorithm
| run on the webserver instead?
|
| Let's say it does not matter... I have somehow in front of me all
| the options for my trip. So I call my wife to confirm she agrees
| with my choice. I explain her the alternatives... this takes 5
| minutes.
|
| In this period, 367 other people are buying/cancelling trips to
| Europe. So I either see my selection constantly change (yay!
| Synchronization!!!) or I press confirm, and if my choice is gine
| I get a warning message and I repeat my query.
|
| Now add two elements: - airlines prefer not to show real numbers
| of available seats - they will usually send you a single digit
| from 1 to 9 or a "*" to mean "10 or more".
|
| So just symching raw data and let the combinatorial engine work
| in the browser is not a very good idea.
|
| Also, I see the pontential to easily mount DDOS attacks if every
| client is constantly being synchronized by copying high
| contention tables in RT.
|
| What am I missing here?
| earthnail wrote:
| Your use case doesn't benefit from your own data. There's
| nothing you can do that doesn't require a direct interaction
| from the server.
|
| I write an audio recording app, and in my app, users have most
| to gain from their own data. For most people, syncing is
| basically an afterthought. In this use case, the ability of
| having your recordings in your phone is the most important
| thing.
|
| The difference here lies that in my app, the user generates all
| the valuable data themselves. In your app, nothing valuable can
| happen without communication with the airline.
| quantadev wrote:
| IPFS is a technology very helpful for syncing. One way it's being
| used in a modern context (although only sub-parts of IPFS stack)
| is how BlueSky engineers, during their design process a few years
| ago, accepted my proposal that for a new Social Media protocol,
| each user should have his own "Repository" (Basically a Merkel
| Tree) of everything he's ever posted. Then there's just a "Sync"
| up to some master service provider node (decentralized set of
| nodes/servers) for the rest of the world to consume.
|
| Merkel-Tree based synching is as performant as you can possibly
| get (used by Git protocol too I believe) because you can tell of
| a root of a tree-structure is identical to some other remote tree
| structure just by comparing the Hash Strings. And this can be
| recursively applied down any "changed branches" of a tree to
| implement very fast syncing mechanisms.
|
| I think we need a NEW INTERNET (i.e. Web3, and dare I say
| Semantic Web built in) where everyone's basically got their own
| personal "Tree of Stuff" they can publish to the world, all
| naively built into some new kind of tree structure-based killer
| app. Like imagine having Jupyter Notebooks in Tree form, where
| everything on it (that you want to be) is published to the web.
| ativzzz wrote:
| I've always wondered, how do applications with more stringent
| security requirements handle this?
|
| Assume that permissions to any row in the DB can be removed at
| any time. If we store the data offline, this security measure is
| already violated. If you don't care about a user potentially
| storing data they no longer have access to, when they come
| online, any operations they make are invalid and that's fine
|
| But, if security access is part of your business logic, and is
| complex enough to the point where it lives in your app and not in
| your DB (other than using DB tools like RLS), how do you verify
| that the user still has access to all cached data? Wouldn't you
| need to re-query every row every time?
|
| I'm still uncertain how these sync engines can be secured
| properly
| fxnn wrote:
| The author would be excited to learn that CouchDB solves this
| problem since 20 years.
|
| The use case the article describes is exactly the idea behind
| CouchDB: a database that is at the same time the server, and
| that's made to be synced with the client.
|
| You can even put your frontend code into it and it will happily
| serve it (aka CouchApp).
|
| https://couchdb.apache.org
| ltbarcly3 wrote:
| This has been solved every 5 years or so, and along the way
| people learn why this solution doesn't actually work.
___________________________________________________________________
(page generated 2025-03-21 23:01 UTC)