[HN Gopher] How we sped up Notion in the browser with WASM SQLite
___________________________________________________________________
How we sped up Notion in the browser with WASM SQLite
Author : jFriedensreich
Score : 171 points
Date : 2024-07-12 21:11 UTC (5 days ago)
(HTM) web link (www.notion.so)
(TXT) w3m dump (www.notion.so)
| gnabgib wrote:
| Article title: _How we sped up Notion in the browser with WASM
| SQLite_
|
| Discussion (19 points, yesterday)
| https://news.ycombinator.com/item?id=40931957
| DANmode wrote:
| Slightly off-topic: surprised to see no Remote roles for Notion.
| jFriedensreich wrote:
| from my own experience what notion did is pretty much required to
| use wasm sqlite with persistence in production. on the one hand
| being used at notions scale really points to wasm sqlite being
| ready for primetime, on the other hand there are still a LOT of
| ugly parts that need to be worked around:
|
| - tooling, support and debugging for opfs
|
| - only really works with a single worker using sync api, but this
| is not available from service workers
|
| - the official js releases are really not idiomatic js and
| require in depth understanding to pick out the right parts and
| then build a usable layer on top, afaik there is no existing npm
| package that just provides what is required
| nafey wrote:
| I believe it's a missed opportunity not to bundle sqlite in major
| browsers. I understand that making SQLite part of the browser
| spec introduces all sorts of complications [1] down the line. But
| nevertheless, it would have proved to be quite a nifty tool.
|
| [1] https://hacks.mozilla.org/2010/06/beyond-html5-database-
| apis...
| compressedgas wrote:
| They already did that. It was called WebSQL. It was removed I
| think in favor of IndexedDB.
| slightwinder wrote:
| [..] Web SQL Database was a prior API developed by Apple.[12]
| But Firefox refused to add support for it and argued against
| it becoming a standard because it would codify the quirks of
| SQLite.[13][14] It was thus deprecated in favor of
| IndexedDB.[..]
|
| Mozilla again.. The real joke is, Firefox is now the only(?)
| browser using sqlite out-of-the-box for internal databases.
| r0ks0n wrote:
| nope, chromium too. look in your cookies file
| slightwinder wrote:
| I did, but seems I missed the actual subfolder where the
| databases are saved. But this makes it even worse. They
| all have sqlite installed, but offer not API for making
| use of it. Really missed chance.
| simonw wrote:
| I'm so glad they didn't. We would all be stuck on some 14 year
| old baseline version of SQLite at this point.
|
| SQLite as WASM is a much better solution IMO.
| AlexErrant wrote:
| > OPFS doesn't come with graceful handling of concurrency out of
| the box. Developers should be aware of this and design around it.
|
| There's a multiple readers and writers proposal [0]. It's been
| "position: positive" by Firefox [1], implemented in Chrome [2],
| and ignored by Webkit [3] (of course). 0: https
| ://github.com/whatwg/fs/blob/main/proposals/MultipleReadersWriter
| s.md 1: https://github.com/mozilla/standards-
| positions/issues/861 2:
| https://chromestatus.com/feature/5172892632875008 3:
| https://github.com/WebKit/standards-positions/issues/238
|
| Love the shoutout to Roy Hashimoto. He's been writing VFSs for
| SQLite-on-the-browser and perf testing them. He's recently wrote
| "IDBMirrorVFS", which "is a new example VFS that keeps all SQLite
| files in memory while persisting to IndexedDB". It has remarkable
| performance, of course. https://github.com/rhashimoto/wa-
| sqlite/discussions/189
|
| Also, looks like Roy takes advantage of the aforementioned
| proposal when he wrote `OPFSPermutedVFS`, which takes 2nd pace
| behind `IDBMirrorVFS` w/r/t perf.
| https://github.com/rhashimoto/wa-sqlite/blob/master/src/exam...
| jauntywundrkind wrote:
| It sucks that even the newest make-wadm-sqlite-fast
| FileAccessHandles are still an intermediated virtual file
| system hosted by the browser. I don't get why we have three
| different file system APIs on the web and none that just use
| files.
|
| Meanwhile WASI has their own server-side file system APIs but
| neither WASI nor the browser side seem to have any effort to
| get on the same page. It'll be 2035 before component-model ends
| up being usable on the web, at this rate, before modules are
| modular.
| jitl wrote:
| The profusion of underpowered interfaces is frustrating but I
| don't see how browsers could
|
| "just use files"
|
| What does this mean? Providing the POSIX filesystem C
| language APIs with file descriptor numbers, etc? I guess the
| browser makers could go Cosmopolitan-C style and emulate the
| Linux filesystem interface on all platforms, but I think
| you'd find there's still a weird intermediary layer in
| between your code and the underlying OS filesystem. Even on
| Linux the browser would have a virtual filesystem to provide
| sandboxing & quality-of-service governance. On other systems
| you'd need a virtual filesystem to paper over those
| differences between operating systems. Windows exists!
| slightwinder wrote:
| They have an actual sql-datebase on the client as cache, but
| still no offline-mode? Damn it. This must be their most demanded
| feature for years.
| bearjaws wrote:
| Overall I think using SQLite locally to offload database work is
| incredibly powerful. Given most laptops have an SSD now a days,
| you can scan an entire 30-100MiB SQLite db in miliseconds.
|
| Meanwhile in AWS you would pay $27k a month to have the same IOPS
| as a Lenovo Thinkpad X1.
|
| I just got done with a side project using WASM SQLite as well,
| it's incredibly powerful, even supports full text search.
|
| My project "cluttr" is a local first site that focuses on
| cleaning up your screen shots folder and making it searchable via
| OCR. All in browser.
|
| It also supports using Ollama to run a vision model against your
| images to provide more context.
|
| Really helpful if you take a lot of screenshots for clients or
| PMs and need to find them later. The best part is 99% of the
| hosting cost is offloaded to the client, so I am not really
| worried about a large server bill.
|
| https://cluttr.pages.dev/
|
| (working on deployment to https://cluttr.ai later today)
| NortySpock wrote:
| If you think those read speeds are great, try DuckDB (which has
| many SIMD improvements) if you want to blow your socks off.
| codetrotter wrote:
| Do these SIMD improvements still apply if you're compiling
| DuckDB to WASM and run it in a browser?
| nilslice wrote:
| Yes, if the browser enables SIMD! https://caniuse.com/wasm-
| simd
| bearjaws wrote:
| This solutions scales to tens of thousands of images in my
| testing (probably ~200k or more before searching is too
| slow). Why use a nuclear bomb when a stick of dynamite is
| good enough :)
| drozycki wrote:
| Isn't one OLTP and the other OLAP? I don't understand why
| DuckDB is often suggested as a drop-in replacement.
| NortySpock wrote:
| Yes, DuckDB is OLAP, SQLite is OLTP. I should have called
| that out.
|
| But if you are doing aggregate or skip-scan analysis and if
| they're talking about read speeds and in-memory processing,
| well, SQLite leaves some performance on the table by being
| single-threaded, as far as I can tell.
| bigbones wrote:
| > Meanwhile in AWS you would pay $27k a month to have the same
| IOPS as a Lenovo Thinkpad X1.
|
| Curious about how this number was arrived at and what products
| were involved
| bearjaws wrote:
| RDS with 256,000 IOPS (the max) on a single primary instance
| of MariaDB.
|
| Of course its not realistic to run a primary DB on a laptop,
| but the IOPS are extremely expensive in AWS was the point.
|
| Offloading expensive queries to a browser is a viable
| solution. If I had decided to make this some sort of SaaS
| offering I am certain running full text search at scale would
| cost thousands of dollars per month long term, with the data
| becoming increasingly irrelevant over time, and I would still
| be forced to host it.
| bastawhiz wrote:
| > Meanwhile in AWS you would pay $27k a month to have the same
| IOPS as a Lenovo Thinkpad X1.
|
| This is kind of an unfair comparison. Essentially nobody needs
| a million iops for their database. Even an extremely busy
| database doesn't need to scan all of the data it holds (or at
| least, if it does, you're using it very wrong--that's why we
| have indexes).
|
| A fast disk is possible on a laptop because it's a tiny hop to
| RAM. And it's desirable because you probably have nowhere near
| 2TB of RAM handy, so you need it to be fast.
|
| In the cloud you can get 2TB of _ram_ for $11k /mo (4x
| r6g.16xlarge). Not that you need anything like that to run your
| database. Most of that data is never being queried.
|
| It's also the case that a laptop workload is very different
| than a server workload. If I run a steam game, I want it open
| fast. My laptop isn't crunching numbers on all the bytes _at
| that moment_. When I run a table scan on a Postgres table,
| processing needs to happen on every single tuple. A million
| iops isn 't useful if your CPU immediately becomes the
| bottleneck. A Thinkpad would simply never match the response
| times of a server with a tenth of the iops under load (if the
| workload required scanning huge amounts of data).
|
| So yes, the iops are more expensive, but that's really not a
| metric that anyone in the target market is hurting over.
| stackskipton wrote:
| I'd also point out about AWS cost, which is pretty high,
| getting IOPS is easy. Getting Redundant IOPS is hard part.
| Spivak wrote:
| Huh? Not really. AWS is a marvel of engineering in that it
| can be _everyone 's_ redundant IOPS but being _your_ IOPS
| isn 't such a big deal. Most folks are single region and
| rent-a-datacenter colo operations can get you two racks
| with separate power/uplink no issue.
|
| I hope everyone at least once in their career gets to
| experience just how god damn fast hardware (especially
| networking speeds between your own servers) is. Sweet lord
| loading up three bare-metal dbs with 1TB ram each and
| bonded 10G nics where the app servers didn't even have to
| hit a router to talk to them. We initially thought the
| replication lag being pinned at 0 was a mistake.
|
| Don't take this as any sort of condemnation of cloud
| offerings, being able to spin up comparable infra in my
| underwear and not having to think about purchasing and
| dealing with hardware vendors are truly a gift.
| bearjaws wrote:
| "Nobody needs IOPS until they need IOPS"
|
| I've had to scrub a multi terabyte database of PII before
| moving to a staging environment, it hurts. With modern data
| architectures, you may write the same data 4-5 times in its
| life cycle, staging data, data warehouses, marketing,
| PowerBI, Looker, etc...
|
| Especially reporting solutions, where they may aggregate
| massive amounts of data and write it to temp tables.
|
| It will require IOPS, and you will pay handsomely for it.
| bastawhiz wrote:
| I don't know about your specific example, but doing an
| operation where you rewrite most of a multi terabyte
| database online is almost certainly not best accomplished
| with SELECT/UPDATE. Even if you need multiple passes,
| that's N terabytes times M passes times two. That's... not
| a lot of reads and writes. Dumping the database to files on
| blob storage, rewriting them, then reading them into your
| destination is almost certainly the fastest and cheapest
| way to go about that.
|
| And that's not an iops avoidance thing, that's a "this
| isn't what your database is built to do with the
| configuration your running it in" sort of thing.
| nikeee wrote:
| Sadly, SharedWorkers still are not supported on Chrome+Android.
| Hope that changes soon because it would be a game-changer for
| many applications to have this API broadly available.
| stpn wrote:
| Pretty neat! If notion folks are in this thread, what do you do
| about lack of SharedWorker in Android Chrome?
| jitl wrote:
| This is an optional cache so Notion continues to work the way
| it did before in browsers that don't support WASM SQLite.
|
| As an alternative we have apps for iOS, Android, Mac, Windows.
| In those apps we already run SQLite in a native thread talking
| to the normal filesystem for years.
| RcouF1uZ4gsC wrote:
| Honestly, we should just have native SQLite supported by the
| browser.
|
| SQLite has become the defacto local single file database
| standard.
|
| It has a very permissive license.
|
| It has support contracts out to the 2050s (service lifetime of
| Airbus A350 airframe).
|
| SQLite will probably have a longer lifetime than whatever browser
| standard we have now.
| biftek wrote:
| Unfortunately WebSQL was ahead of its time, I have no doubt it
| would have caught on more today with the resurgent popularity
| of SQLite.
|
| IndexedDB got to ride the NoSQL hype of the 2010's
| jakubmazanec wrote:
| Will the tables in the Notion mobile app still be so painfully
| slow? Every time I scroll few rows down (in a table with few than
| 100 rows with dozen columns), the app hangs for several seconds
| while trying to fetch/render (who knows what it's actually doing)
| few additional rows. When I edit a row and return back to the
| table view, scroll position is lost and I have to suffer through
| the fetching once again.
| nailer wrote:
| They're a very un-user focused company. They used the | cursor
| for any horizontally scrolling content for years, making a
| kanban board with non default states is nightmarishly complex,
| and there's a billion odd UI slownesses and niggles.
|
| But they're recently added AI.
| ilrwbwrkhv wrote:
| I think their real innovation was the community and getting
| "cosy productivity" during Covid going. They are more of a
| social phenomenon than a technical one.
| threetonesun wrote:
| Notion is a great example of my hatred of the current
| "upgrade or die" delivery method of software. Notion 1.0,
| when it was mostly text with a unique way to view tabular
| data, was fast and enjoyable to use. Once it shifted to
| competing with Atlassian et al, it acquired all the bloat
| those products already have.
| nxobject wrote:
| Sadly, I don't think basing a business (with the tech
| investor desire for growth! growth! growth!) off one
| product, is best for an app that's best when predictable.
| satvikpendem wrote:
| Notion was never fast, and I've been using it on and off
| for the past 10 years or so.
| mrtksn wrote:
| Seeing things like that excite me a lot, I can't wait for a
| future where all the code is written in whatever language you
| like and runs through WASM in the browser. The HTML and JS can go
| back doing that thing they are built for: Displaying the UI and
| handling the interactions.
| jitl wrote:
| displaying ui and handling interactions ends up being most of
| the app though
| jillesvangurp wrote:
| Not necessarily. It's just that a lot of SPAs don't contain a
| lot of business logic because it's too tedious to do without
| a client side database; which means you end up offloading a
| lot of business logic to server side databases behind some
| kind of API.
|
| Adding local persistence in the form of an actual database,
| allows you to do more of these things client side. Which
| means you end up with more business logic that you can
| cleanly separate from your data rendering logic and other
| cruft needed for e.g. form filling, data validation, and what
| not. You still need that of course but it makes web
| applications more similar to full desktop applications in the
| sense that the server might be a lot lighter or even be
| optional (other than serving the code and other assets).
|
| I've actually been toying with building a Google Reader style
| application that stores its data in the browser recently. I
| use a few minimal server scripts to work around cors issues
| for fetching feeds and html previews. But aside from that,
| there is no need for a server. I can save local state (stored
| in indexdb) to a file and download it and then restore it
| from the same file as a backup strategy.
|
| I've been adding search capabilities with tf/idf ranking,
| phrase matching. I'm using OpenAI to help summarize and tag
| content. And I'm currently adding a light weight vector
| search implementation. This is all running in the browser
| (except for openAI).
|
| My goal with this is experimenting with RAG against news
| content. So, I've been piecing together things I need for
| this and raising the ambition level as I've progressed in the
| last few weeks. Most of this is probably not optimal and a
| big motivation for me is to just wrap my head around all the
| bits and pieces I need. But there's no good reason why most
| of those things could not be optimized with e.g. some wasm
| code that uses web-gpu for doing math and less memory
| intensive ways of storing stuff.
|
| BTW, I'm using kotlin-js and kotlin-multiplatform which makes
| it easy to forget that I'm dealing with Javascript and very
| limited browser APIs under the hood. UI is still tedious to
| do but I have a growing amount of code that is pure business
| logic, algorithms, or other stuff you'd normally run on a
| server and implement in a language like Kotlin. Which is why
| it's nice to be using that in the browser.
| mortenjorck wrote:
| All this wonderful optimization, and yet there's still no UI to
| warn that (perhaps due to a websocket connection being lost?) a
| comment won't be saved. I lost two Notion comments this way
| yesterday!
| born-jre wrote:
| after conquering all smartphones, browser, edge db and what not
| now every web page will have own instance of SQLite running in
| wasm, lol
| solarkraft wrote:
| This is cool because I haven't seen WASM SQlite in production. I
| wasn't sure whether it'd end up being more performant than
| something built in like IndexedDB. Seeing that it's practically
| feasible is definitely cool.
| DylanSp wrote:
| Same - I've seen plenty of exploratory/experimental posts, but
| this is the most information I've seen about it being used in
| production.
| NelsonMinar wrote:
| Why is IndexedDB so bad that you have to load a second database
| in WASM instead? I've used the API successfully but never
| measured performance.
| kevingadd wrote:
| It has a weird, verbose API that's more like a simple document
| store than a SQL database, so it's naturally going to be
| outperformed by SQLite, especially if you're performing complex
| operations. It's also heavily async which means operations have
| to spin the event loop.
| NelsonMinar wrote:
| yeah I gave up on the direct API, I used Dexie for my
| product.
|
| I think it's funny you say a problem is that it's async while
| another commentor says a problem is it blocks the UI. Maybe
| it's the worst of both?
| dbalatero wrote:
| It's buggy depending on browser implementation, Firefox will
| corrupt DBs, it's a herculean task to debug things with users,
| it blocks the main thread for reads or writes (maybe wasm stuff
| does too).
| jauntywundrkind wrote:
| Wow this is so embarrassing as hell. Apparently even using a
| WebWorker can still cause the UI to block when talking to
| indexeddb. https://nolanlawson.com/2015/09/29/indexeddb-
| websql-localsto...
| https://issues.chromium.org/issues/41204713#comment28
|
| It's crazy how much effort goes into adding new things to the
| web but stuff like this is almost 10 years old.
| remram wrote:
| Notion takes 15s to load to an empty page. Then another 5 to
| dismiss the popup about new AI features and the like.
|
| I'm glad they are making their app faster, in the meantime I
| (browser user) have cancelled my team's subscription and will be
| using something else.
| jamil7 wrote:
| I was about to say, does Notion actually feel any faster to
| anyone? I stopped using it years ago but the last place I
| contracted at was invested in it heavily and the thing was
| crazy slow.
| alex_lav wrote:
| Curious where you're moving? About to begin finding a tool
| similar to Notion (but not Notion).
| ivanjermakov wrote:
| I felt like a grandpa when using Notion, but haven't found a
| good alternative yet.
|
| As of now, I use google sheets for collaborative work and
| plain text/md files for notetaking.
| vevoe wrote:
| I also moved away from Notion a while ago for being crazy
| slow. For personal note taking I use Obsidian, big fan of it.
|
| For my dev team I recently moved us to eraser.io. Still in
| the early stages with it but it's working well enough for us
| atm.
| arrowsmith wrote:
| I second the Obsidian recommendation.
| nxobject wrote:
| I'd love advice too - I've become heavily reliant on Notion's
| database-lite features, which is what a lot of the markdown-
| based apps don't invest in by default. I'm beginning to get
| frustrated with loading speeds that feel like spinning disks.
| dawnerd wrote:
| Stay away from Clickup, it's just as slow if not worse. Also
| same AI notices.
| ggregoire wrote:
| > Using SQLite improved page navigation times by 20 percent in
| all modern browsers.
|
| Couldn't you cache the data in LocalStorage and get similar speed
| improvements?
|
| I mean, how to run SQLite in the browser is a great topic and
| surely has a lot of very good use cases. But attributing the
| performance improvements to it sounds a bit misleading, when you
| had no client caching strategy before and you do now. Obviously
| loading data from a local cache is faster than loading data from
| a remote server.
|
| Comparing WASM SQLite vs. other client caching solutions (LS,
| IndexDB, etc) could be interesting. In terms of perf, complexity,
| compatibility, storage capacity, query language, etc.
| jitl wrote:
| (I work at Notion, but didn't build the WASM sqlite thingy)
|
| We've implemented this same cache using LocalStorage,
| IndexedDB, and SQLite. The Android & iOS apps used SQLite for
| this since 2020 (I built the Android version IIRC), and the
| desktop app used SQLite running on a native thread since 2021.
|
| We migrated away from LocalStorage for two reasons:
|
| 1. LocalStorage is limited to 10mb file size. We also use
| LocalStorage for a bunch of less-durable state like "is this
| toggle open?" or "which view in this database was open last",
| and as our customer workspaces grew, we faced mounting errors
| as the record cache competed with the rest of the app for that
| space. We'd have a bunch of slowdowns under contention as we
| tried to delete keys in LocalStorage synchronously, which
| manifested as major UI lag. No good!
|
| - LocalStorage loves to lose writes if you're writing from
| multiple tabs. It's just not a reliable or trustworthy API. For
| a cache that doesn't matter as much, but we'd still end up with
| cache misses for power users for pages that should really be
| totally locally loadable.
|
| I implemented the IndexedDB version in 2019, to replace the
| earlier LocalStorage option. We used the IDB record cache in
| the desktop app until we switched over to native SQLite there
| in 2021 (https://www.notion.so/blog/faster-page-load-
| navigation) but we never shipped it for browser users for a few
| reasons:
|
| - Performance and reliability problems with IDB in browsers is
| hard to debug; in the native app we can trust the version of
| Chromium we ship and remediate issues using Electron APIs,
| where as in the browser wild we're at the mercy of the user-
| agent
|
| - Our testing in the browser showed limited performance
| improvements across all device categories: faster devices &
| scenarios were even faster with IndexedDB, but slower devices &
| scenarios could be even slower.
|
| The reason I'd attribute for the performance challenge is that
| IndexedDB pays a high tax per row written and row read. It can
| be fine in terms of total throughput for a cache if you have
| large, coarse-grained cache rows, like caching all of a
| document as a single object, and you update the cache
| infrequently.
|
| Notion's data model is tree/graph of very fine-grained records;
| each paragraph is its own database row. Our cache on IndexedDB
| would perform great for smaller workspace sizes and for a
| single tab, but with multiple tabs and medium-to-large
| workspaces, we'd hit contention in IndexedDB and get major
| slowdowns.
|
| We should improve our cache architecture to have another layer
| of cache that does whole-pages, but need to weigh the
| improvement/complexity there versus other performance
| opportunities.
| jdougan wrote:
| Sounds like NLS/Augment on a machine with enouh core and
| modern support APIs.
| bedatadriven wrote:
| This honestly sounds like a nightmare: multimegabyte wasm
| downloads, data corruption in production and byzantine hacks to
| coordinate writes between tabs. I am very grateful that we chose
| IndexedDB instead for our application!
| jillesvangurp wrote:
| We've been using wasm sqlite with kotlin-js and the kotlin
| sqldelight framework. This is admittedly a bit of an exotic stack
| to be running in a browser. But it actually works surprisingly
| well. Unfortunately the failure modes with opfs are kind of ugly
| and need some attention. A key issue is controlling what happens
| when you have multiple tabs open interacting with the same DB.
|
| One interesting thing is that opfs maximum disk usage is kind of
| browser specific but tends to be a percentage of the available
| diskspace measured in GB rather than some lowish number like 5MB
| as is typical for e.g. browser local storage. This makes it
| suitable for locally caching and syncing large amounts of remote
| database content.
|
| We've had a few challenges with opfs and particularly limited
| support for this on Safari it kind of works but with some
| caveats. Chrome/Firefox are fine. Our web app is packaged up as a
| PWA for use on mobile.
|
| Another issue is that opfs creates issues with loading resources
| from external websites. E.g. we have HTML previews that may
| include images on external domains that the browser will render
| fine without opfs enabled but will refuse to render when using
| opfs, unless you set crossOrigin=anonymous on the img tag.
|
| Relative to indexed-db, which is supported by most browsers, you
| gain a more sane API to access data and more flexible querying
| and support for things like joins. I've had some exposure to
| indexed-db and IMHO it's a case study in bad API design gone
| horribly wrong. Querying is very limited and the APIs are poorly
| documented and have weird failure modes. Sqlite is an absolute
| pleasure to use in comparison and probably faster and way more
| capable.
___________________________________________________________________
(page generated 2024-07-17 23:06 UTC)