https://rxdb.info/downsides-of-offline-first.html

[                    ]

  * Follow @twitter
  * Chat @gitter
  * Star @github
  * Donate @github
  * 
  * Introduction
  * Install
      + npm
      + import
  * RxDatabase
      + Creation
          o name
          o adapter
          o password
          o multiInstance
          o eventReduce
          o ignoreDuplicate
          o pouchSettings
      + Functions
          o $
          o exportJSON()
          o importJSON()
          o server()
          o waitForLeadership()
          o requestIdlePromise()
          o destroy()
          o remove()
          o checkAdapter()
          o isRxDatabase()
  * RxSchema
      + Example
      + Create a collection with the schema
      + version
      + keyCompression
      + indexes
      + attachments
      + default
      + final
      + encryption
  * RxCollection
      + Creation
          o name
          o schema
          o pouchSettings
          o ORM-functions
          o Migration
      + Functions
          o $
          o insert()
          o bulkInsert()
          o bulkRemove()
          o newDocument()
          o upsert()
          o atomicUpsert()
          o find()
          o findOne()
          o findByIds()
          o findByIds$()
          o exportJSON()
          o importJSON()
          o sync()
          o remove()
          o destroy()
          o isRxCollection()
  * RxDocument
      + Insert
      + Find
      + Functions
          o get()
          o get$()
          o proxy-get
          o update()
          o atomicUpdate()
          o atomicPatch()
          o $
          o remove()
          o deleted$
          o toJSON()
          o set()
          o save()
          o isRxDocument()
  * RxQuery
      + find()
      + findOne()
      + exec()
      + $
      + update()
      + remove()
      + doesDocumentDataMatch()
      + Examples
      + isRxQuery()
  * RxAttachment
  * Middleware-hooks
  * ORM/DRM
  * Population
  * DataMigration
  * LeaderElection
  * Replication CouchDB
  * Replication GraphQL
  * InMemory
  * Backup
  * QueryCache
  * LocalDocuments
  * Custom Build
  * Creating Plugins
  * Adapters
  * 
  * Tutorials
      + Use RxDB with Typescript
      + Using the Server Plugin
  * Opinions
      + About Offline First
      + Downsites of Offline First
  * Questions & Answers
  * Contribute
  * 
  * Published with GitBook

Downsites of Offline First

Downsides of Offline First

So you have read all theses things about how the offline first
paradigm makes it easy to create realtime web applications that even
work when the user has no internet connection. But there is no free
lunch. The offline first paradigm is not the perfect approach for all
kinds of apps.

    You fully understood a technology when you know when not to use
    it

In the following I will point out the limitations you need to know
before you decide to use RxDB or even before you decide to create an
offline first application.

It only works with small datasets

Making data available offline means it must be loaded from the server
and then stored at the clients device. You need to load the full
dataset on the first pageload and on every ongoing load you need to
download the new changes to that set. While in theory you could
download in infinite amount of data, in practice you have a limit how
long the user can wait before having an up-to-date state. You want to
display chat messages like Whatsapp? No problem. Syncing all the
messages a user could write, can be done with a few HTTP requests.
Want to make a tool that displays server logs? Good luck downloading
terabytes of data to the client just to search for a single string.
This will not work.

Besides the network usage, there is another limit for the size of
your data. In browsers you have some options for storage: Cookies,
Localstorage, WebSQL and IndexedDB.

Because Cookies and Localstorage is slow and WebSQL is deprecated,
you will use IndexedDB. The limit of how much data you can store in
IndexedDB depends on two factors: Which browser is used and how much
disc space is left on the device. You can assume that at least a
couple of hundred megabytes are available at least. The maximum is
potentially hundreds of gigabytes or more, but the browser
implementations vary. Chrome allows the browser to use up to 60% of
the total disc space per origin. Firefox allows up to 50%. But on
safari you can only store up to 1GB and the browser will prompt the
user on each additional 200MB increment.

The problem is, that you have no chance to really predict how much
data can be stored. So you have to make assumptions that are
hopefully true for all of your users. Also, you have no way to
increase that space like you would add another hard drive to your
backend server. Once your clients reach the limit, you likely have to
rewrite big parts of your applications.

Browser storage is not really persistent

When data is stored inside IndexedDB or one of the other storage
APIs, it cannot be trusted to stay there forever. Apple for example
deletes the data when the website was not used in the last 7 days.
The other browsers also have logic to clean up the stored data, and
in the end the user itself could be the one that deletes the browers
local data.

The most common way to handle this, is to replicate everything from
the backend to the client again. Of course, this does not work for
state that is not stored at the backend. So if you assume you can
store the users private data inside the browser in a secure way, you
are wrong.

                           safari database

There can be conflicts

Imagine two of your users modify the same JSON document, while both
are offline. After they go online again, their clients replicate the
modified document to the server. Now you have two conflicting
versions of the same document, and you need a way to determine how
the correct new version of that document should look like. This
process is called conflict resolution.

 1. The default in many offline first databases is a deterministic
    conflict resolution strategy. Both conflicting versions of the
    document are kept in the storage and when you query for the
    document, a winner is determined by comparing the hashes of the
    document and only the winning document is returned. Because the
    comparison is deterministic, all clients and servers will always
    pick the same winner. This kind of resolution only works when it
    is not that important that one of the document changes gets
    dropped. Because conflicts are rare, this might be a viable
    solution for some use cases.

 2. A better resolution can be applied by listening to the
    changestream of the database. The changestream emits an event
    each time a write happens to the database. The event contains
    information about the written document and also a flag if there
    is a conflicting version. For each event with a conflict, you
    fetch all versions for that document and create a new document
    that contains the winning state. With that you can implement
    pretty complex conflict resolution strategies, but you have to
    manually code it for each collection of documents.

 3. Instead of the solving conflict once at every client, it can be
    made a bit easier by solely relying on the backend. This can be
    done when all of your clients replicate with the same single
    backend server. With RxDB&apos;s Graphql Replication each client
    side change is sent to the server where conflicts can be resolved
    and the winning document can be sent back to the clients.

 4. Sometimes there is no way to solve a conflict with code. If your
    users edit text based documents or images, often only the user&
    apos;s themself can decide how the winning revision has to look.
    For these cases, you have to implement complex UI parts where the
    users can inspect the conflict and manage its resolution.

 5. You do not have to handle conflicts if they cannot happen in the
    first place. You can achieve that by designing a write only
    database where existing documents cannot be touched. Instead of
    storing the current state in a single document, you store all the
    events that lead to the current state. Sometimes called the
    "everything is a delta" strategy, others would call it Event
    Sourcing. Like an accountant that does not need an eraser, you
    append all changes and afterwards aggregate the current state at
    the client.

    // create one new document for each change to the users balance
    {id: new Date().toJSON(), change: 100} // balance increased by $100
    {id: new Date().toJSON(), change: -50} // balance decreased by $50
    {id: new Date().toJSON(), change: 200} // balance increased by $200

 6. There is this thing called conflict-free replicated data type,
    short CRDT. Using a CRDT library like automerge will magically
    solve all of your conflict problems. Until you use it in
    production where you observe that implementing CRDTs has
    basically the same complexity as implementing conflict resolution
    strategies.

Realtime is a lie

So you replicate stuff between the clients and your backend. Each
change on one side directly changes the state of the other sides in
realtime. But this "realtime" is not the same as in realtime
computing. In the offline first world, the word realtime was
introduced by firebase and is more meant as a marketing slogan than a
technical description. There is an internet between your backend and
your clients and everything you do on one machine takes at least once
the latency until it can affect anything on the other machines. You
have to keep this in mind when you develop anything where the timing
is important, like a multiplayer game or a stock trading app.

                    latency london san franzisco

Eventual consistency

An offline first app does not have a single source of truth. There is
a source on the backend, one on the own client, and also each other
client has its own definition of truth. At the moment your user
starts the app, the local state is hopefully already replicated with
the backend and all other clients. But this does not have to be true,
the states can have converged and you have to plan for that. The user
could update a document based on wrong assumptions because it was not
fully replicated at that point in time because the user is offline. A
good way to handle this problem is to show the replication state in
the UI and tell the user when the replication is running, stopped,
paused or finished.

And some data is just too important to be "eventual consistent".
Create a wire transfer in your online banking app while you are
offline. You keep the smartphone laying at your night desk and when
you use again in the next morning, it goes online and replicates the
transaction. No thank you, do not use offline first for these kind of
things, or at least you have to display the replication state of each
document in the UI.

                             CAP theorem

Permissions and authentication

Every offline first app that goes beyond a prototype, does likely not
have the same global state for all of its users. Each user has a
different set of documents that are allowed to be replicated or seen
by the user. So you need some kind of authentication and permission
handling to divide the documents. The easy way is to just create one
database for each user on the backend and only allow to replicate
that one. Creating that many databases is not really a problem with
for example CouchDB, and it makes permission handling easy. But as
soon as you want to query all of your data in the backend, it will
bite back. Your data is not at a single place, it is distributed
between all of the user specific databases. This becomes even more
complex as soon as you store information together with the documents
that is not allowed to be seen by outsiders. You not only have to
decide which documents to replicate, but also which fields of them.

So what you really want is a single datastore in the backend and then
replicate only the allowed document parts to each of the users. This
always requires you to implement your custom replication endpoint
like what you do with RxDBs GraphQL Replication.

You have to migrate the client database

While developing your app, sooner or later you want to change the
data layout. You want to add some new fields to documents or change
the format of them. So you have to update the database schema and
also migrate the stored documents. With &apos;normal&apos;
applications, this is already hard enough and often dangerous. You
wait until midnight, stop the webserver, make a database backup,
deploy the new schema and then you hope that nothing goes wrong while
it updates that many documents.

With offline first applications, it is even more fun. You do not only
have to migrate your local backend database, you also have to provide
a migration strategy for all of these client databases out there. And
you also cannot migrate everything at the same time. The clients can
only migrate when the new code was updated from the appstore or the
user visited your website again. This could be today or in a few
weeks.

Performance is not native

When you create a web based offline first app, you cannot store data
directly on the users filesystem. In fact there are many layers
between your JavaScript code and the filesystem of the operation
system. Let&apos;s say you insert a document in RxDB:

  * You call the RxDB API to validate and store the data
  * RxDB calls the underlying RxStorage, which is PouchDB for most
    cases
  * Pouchdb calls its underlying storage adapter
  * The storage adapter calls IndexedDB
  * The browser runs its internal handling of the IndexedDB API
  * In most browsers IndexedDB is implemented on top of SQLite
  * SQLite calls the OS to store the data in the filesystem

All these layers are abstractions. They are not build for exactly
that one use case, so you lose some performance to tunnel the data
through the layer itself, and you also lose some performance because
the abstraction does not exactly provide the functions that are
needed by the layer above and it will overfetch data.

You will not find a benchmark comparison between how many
transactions per second you can run on the browser compared to a
server based database. Because it makes no sense to compare them.
Browsers are slower, JavaScript is slower.

    Is it fast enough?

What you really care about is "Is it fast enough?". For most use
cases, the answer is yes. Offline first apps are UI based and you do
not need to process a million transactions per second, because your
user will not click the save button that often. "Fast enough" means
that the data is processed in under 16 milliseconds so that you can
render the updated UI in the next frame. This is of course not true
for all use cases, so you better think about the performance limit
before starting with the implementation.

Nothing is predictable

You have a PostgreSQL database and run a query over 1000ths of rows,
which takes 200 milliseconds. Works great, so you now want to do
something similar at the client device in your offline first app. How
long does it take? You cannot know because people have different
devices, and even equal devices have different things running in the
background that slow the CPUs. So you cannot predict performance and
as described above, you cannot even predict the storage limit. So if
your app does heavy data analytics, you might better run everything
on the backend and just send the results to the client.

There is no relational data

I started creating RxDB many years ago and while still maintaining
it, I often worked with all these other offline first databases out
there. RxDB and all of these other ones, are based on some kind of
document databases similar to NoSQL. Often people want to have a
relational database like the SQL one they use at the backend.

So why are there no real relations in offline first databases? I
could answer with these arguments like how JavaScript works better
with document based data, how performance is better when having no
joins or even how NoSQL queries are more composable. But the truth
is, everything is NoSQL because it makes replication easy. An SQL
query that mutates data in different tables based on some selects and
joins, cannot be partially replicated without breaking the client.
You have foreign keys that point to other rows and if these rows are
not replicated yet, you have a problem. To implement a robust
replication protocol for relational data, you need some stuff like a
reliable atomic clock and you have to block queries over multiple
tables while a transaction replicated. Watch this guy implementing
offline first replication on top of SQLite or read this discussion
about implementing offline first in supabase.

So creating replication for an SQL offline first database is way more
work then just adding some network protocols on top of PostgreSQL. It
might not even be possible for clients that have no reliable clock.

                         no relational data

results matching ""

No results matching ""