[HN Gopher] Marmot: Multi-writer distributed SQLite based on NATS
       ___________________________________________________________________
        
       Marmot: Multi-writer distributed SQLite based on NATS
        
       Author : summarity
       Score  : 100 points
       Date   : 2023-12-11 14:09 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | simonw wrote:
       | Something that wasn't clear to me from the README: how does this
       | handle duplicate IDs?
       | 
       | If I have a table with a string primary key and I insert a row in
       | one node with ID "hello" and do the same thing (but with
       | different column data) on another node, what happens?
        
         | nlh wrote:
         | I'm not 100% sure of this but from the README section on race
         | conditions:
         | 
         | "the last writer will always win. This means there is NO
         | serializability guarantee of a transaction spanning multiple
         | tables. This is a design choice, in order to avoid any sort of
         | global locking, and performance."
         | 
         | So it sounds like in your example, whichever node writes last
         | with a given primary key will be the data you'll end up with.
        
           | dilyevsky wrote:
           | Based on their description of how it works with jetstream i
           | think last insert will fail because it will see the row
           | already exists in jetstream
           | 
           | Within a single table they have tx serializability, just not
           | with multiple tables
        
         | governmentdude wrote:
         | In what context would you use a single string as the ID that
         | would be subject to collisions (instead of including a user id
         | as part of the ID)?
        
           | simonw wrote:
           | People do all sorts of weird things with databases, and if
           | you want to run existing software against a database (as
           | opposed to greenfield development deliberately targeting
           | Marmot) you need to understand what will happen in different
           | cases.
        
             | meepmorp wrote:
             | > if you want to run existing software against a database
             | (as opposed to greenfield development deliberately
             | targeting Marmot) you need to understand what will happen
             | in different cases.
             | 
             | A great point, but from the readme in the Marmot repo:
             | 
             | > It does not require any changes to your existing SQLite
             | application logic for reading/writing.
             | 
             | I suspect you're probably still right, but that's not what
             | the author claims.
        
           | spenczar5 wrote:
           | A better example might be a field that must be unique, like a
           | URL slug. Suppose you have a database of products and the
           | product table has a field used for URLs, like "/tshirt" and
           | "/sweater". You need those to be unique within the table.
           | 
           | The question remains - how does Marmot enforce a uniqueness
           | constraint? If you don't like the product example, fine, but
           | it is easy to think of others. It would be unfortunate if
           | marmot is incapable of supporting uniqueness.
        
             | MobiusHorizons wrote:
             | As I understand it, transactions are still serialized for
             | any given table, just not across tables. Wouldn't that
             | solve this uniqueness constraint issue?
        
         | maxpert wrote:
         | Glad you asked the question. I never recommend use auto-
         | incrementing IDs in production always generate one (e.g.
         | Twitter Snowflake). With ID generators you get rid of
         | collisions.
        
       | Mortiffer wrote:
       | I wonder if this kind of a setup would be useful for federated
       | protocols. Mastodon and Farcaster both maintain websocket
       | connections open from all servers to all servers.
        
       | latchkey wrote:
       | I can totally see some junior dev who doesn't fully grok
       | transactions (and not reading/understanding the "What happens
       | when there is a race condition?" section), trying to deploy
       | something like this and scratching their head why it falls over
       | completely in production. The site worked fine when it was just
       | me testing it!
       | 
       | This is a neat proof of concept and I encourage experimentation.
       | But, if you're developing something, please just use postgres,
       | and don't try to cobble things together with something like this.
       | 
       | Edit: already seeing the downvotes. Yes... classic HN... anything
       | that goes against plain old sanity is punished.
        
         | eatonphil wrote:
         | Postgres transactions are not SERIALIZABLE by default either.
         | Though, yes, Postgres defaults will catch some more conditions
         | than this project will.
         | 
         | I think the important thing is to encourage devs to understand
         | transactions in the first place.
        
         | taywrobel wrote:
         | There's a reason that this is called "hacker news" and not
         | "just use the industry standard for the last 3 decades news".
         | 
         | Won't downvote you for giving pragmatic advice, but I
         | appreciate projects like this that slap together disparate
         | technologies for an interesting goal, even if it isn't the best
         | choice for your usual Fortune 500 company.
        
           | latchkey wrote:
           | > _I appreciate projects like this that slap together
           | disparate technologies for an interesting goal_
           | 
           | That's exactly why I said: "This is a neat proof of concept
           | and I encourage experimentation."
        
         | FredPret wrote:
         | +1 for Postgres. It's actually ridiculous how much good
         | software you can simply start using for free.
        
         | dilyevsky wrote:
         | If you let juniors design and deploy a DB layer to prod it's
         | your fault not the DBs. Transaction isolation is generally
         | complicated topic that a lot of _senior_ devs have a pretty
         | tenuous grasp on ime and distributed Postgres solutions don 't
         | solve this particularly well either last I checked
        
           | latchkey wrote:
           | > _If you let juniors design and deploy a DB layer to prod it
           | 's your fault not the DBs._
           | 
           | Some companies only have juniors. 28 years ago, I was the
           | junior at my company, and the first/only engineer.
           | 
           | Let's also be real here, most applications don't need
           | distributed Postgres either and those that do, will have
           | senior engineers on staff.
        
         | maxpert wrote:
         | I completely agree with your analysis. Understanding the
         | complexities of achieving convergence with basic auto-increment
         | counters without advanced CRDT types is 101 IMO. Those familiar
         | with these issues inherently comprehend the challenges
         | involved. While it's plausible for someone to leverage a
         | library atop Marmot to construct and synchronize such types,
         | it's important to note that this tool isn't tailored for junior
         | developers grappling with transactional intricacies. I've
         | witnessed instances where inexperienced developers initiate
         | transactions and make HTTP calls while holding locks, resulting
         | in system outages. Marmot isn't intended for individuals
         | lacking a solid understanding of distributed systems. My
         | recommendation aligns with advising entry-level individuals to
         | explore these tools only when they reach a scale where such
         | complexities become pertinent.
        
           | latchkey wrote:
           | > _My recommendation aligns with advising entry-level
           | individuals to explore these tools only when they reach a
           | scale where such complexities become pertinent._
           | 
           | I posit that if you're at that scale, you're figuring out how
           | to get distributed postgres to work and not messing with
           | things like Marmot.
        
       | summarity wrote:
       | If you're interested in this, here are some related projects that
       | all take slightly different approaches:
       | 
       | - LiteSync directly competes with Marmot and supports DDL sync,
       | but is closed source commercial (similar to SQLite EE):
       | https://litesync.io
       | 
       | - dqlite is Canonical's distributed SQLite that depends on c-raft
       | and kernel-level async I/O: https://dqlite.io
       | 
       | - cr-sqlite is a Rust-based loadable extension that adds CRDT
       | changeset generation and reconciliation to SQLite:
       | https://github.com/vlcn-io/cr-sqlite
       | 
       | Slightly related but not really (no multi writer, no C-level
       | SQLite API or other restrictions):
       | 
       | - comdb2 (Bloombergs multi-homed RDBMS using SQLite as the
       | frontend)
       | 
       | - rqlite: RDBMS with HTTP API and SQLite as the storage engine,
       | used for replication and strong consistency (does not scale
       | writes)
       | 
       | - litestream/LiteFS: disaster recovery replication
       | 
       | - liteserver: active read-only replication (predecessor of
       | LiteSync)
        
         | anovick wrote:
         | - SQLSync: collaborative offline-first wrapper around SQLite
        
         | KRAKRISMOTT wrote:
         | Don't forget Turso's libsql which uses a local
         | node+reconciliation
         | 
         | https://medium.com/chiselstrike/introducing-embedded-replica...
        
         | spiffytech wrote:
         | Also Expensify's Bedrock, which powers their widely-circulated
         | "Scaling SQLite to 4M QPS" article:
         | 
         | https://bedrockdb.com/
         | 
         | https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a...
        
         | otoolep wrote:
         | rqlite[1] creator here, happy to answer any questions.
         | 
         | [1] https://www.rqlite.io
        
       | stocknear wrote:
       | I'm using Marmot for my own website on production. Up to this
       | date there were no problems.
       | 
       | If I had any technical issues (i.e. questions, optimizations etc)
       | I always asked the developer maxpert and he gave me in-depth
       | answers that helped me personally a lot.
       | 
       | In my case I have much love for Marmot and hopefully it grows and
       | helps a bigger community
        
       | liuliu wrote:
       | > In Marmot every row is uniquely mapped to a JetStream. This
       | guarantees that for any node to publish changes for a row it has
       | to go through same JetStream as everyone else.
       | 
       | and
       | 
       | > This means there is NO serializability guarantee of a
       | transaction spanning multiple tables. This is a design choice, in
       | order to avoid any sort of global locking, and performance.
       | 
       | But since the serialization happens at per row level, does this
       | also mean no serializability guarantee of a transaction within a
       | table too, not only spanning multiple tables?
        
       | jensneuse wrote:
       | This is really cool. We've just created a POC that bridges
       | Federated GraphQL Subscriptions and NATS, so this could maybe
       | work together? Here's a small video of combining Federated
       | Subscriptions and event driven architecture through NATS:
       | https://twitter.com/TheWorstFounder/status/17341349261133783...?
        
       | maxpert wrote:
       | Author of Marmot here. Marmot was born out of my own use-case
       | (was building replicated SQLite based cache). While working on
       | various problems I realized how well suited it might be for read
       | heavy sites/workloads. If I take a typical CMS site 90% of the
       | time it's just reading and SQLite is perfect for that, but then
       | how I get independently deployed nodes to replicate data. The
       | philosophy I am sticking to so far:
       | 
       | - Sidecar! I would avoid any kind of in process library at any
       | cost. Call me biased but I don't trust someone's code in my
       | process space causing it to crash.
       | 
       | - No master - each node should be able to make progress on its
       | own, if these processes go down your own process will keep
       | functioning. They will converge once everything is back.
       | 
       | - Easy to start, yet hard to master - You can get up and running
       | pretty quickly, but make no mistake this tool is not for rookie
       | who doesn't understand how incremental primary keys are bad, and
       | how to they can keep things conflict free.
       | 
       | I am far from getting everything I need in there, and again my
       | philosophy might evolve over time as well. Talking to people on
       | Discord has helped me think through use-cases a lot, so keep the
       | good feedback coming. Would love to answer any questions people
       | might have here.
        
       | wtatum wrote:
       | I see a lot of projects started in this space and all of them
       | appear to have multi-writer as a goal. I've been interested for a
       | long time (and have started and stopped) in a solution for
       | single-write multi-read with eventual consistency. I chatted with
       | @benbjohnson on a LiteStream ticket about the possibility of
       | adding a mobile client to receive the replicas to mobile devices
       | but I think that option isn't really consistent with the new
       | direction of that work for LiteFS at Fly.
       | 
       | To me the multi-writer "collaborative" use case is super powerful
       | but also has a lot of challenges. I personally would see a lot of
       | value in a solution for eventually consistent read-replicas that
       | are available for end-client (WASM or mobile native) replication
       | but still funnel updates through traditional APIs with store-and-
       | forward for the offline case.
       | 
       | Is anybody aware of an open-source project pursuing that goal
       | that maybe I haven't come across?
        
       | sigmonsays wrote:
       | i'm surprised by this.. No schema changes?
       | 
       | Nobody pushes an application out just once.
        
       ___________________________________________________________________
       (page generated 2023-12-11 23:01 UTC)