[HN Gopher] SQLite-based databases on the Postgres protocol
       ___________________________________________________________________
        
       SQLite-based databases on the Postgres protocol
        
       Author : matesz
       Score  : 191 points
       Date   : 2023-01-25 12:31 UTC (10 hours ago)
        
 (HTM) web link (blog.chiselstrike.com)
 (TXT) w3m dump (blog.chiselstrike.com)
        
       | _448 wrote:
       | A noob question, how does one specify unique constraint on a
       | column in SQlite?
        
       | chasil wrote:
       | This is quite an accomplishment. It is unfortunate that it could
       | not be done in the main project.
        
       | bawolff wrote:
       | I don't get why you would want this. It seems like its taking
       | away all the key advantages of the sqlite approach and leaving
       | only the disadvantages.
        
       | newaccount2021 wrote:
       | [dead]
        
       | houqp wrote:
       | Very cool and well executed project. Love the sprinkle of Rust in
       | all the other companion projects as well :)
       | 
       | The ROAPI(https://github.com/roapi/roapi) project I built also
       | happened to support a similar feature set, i.e. to expose sqlite
       | through a variety of remote query interfaces including pg wire
       | protocols, rest apis and graphqls.
        
       | ronanyeah wrote:
       | I've been using render.com to host Rust+sqlite stacks recently,
       | and I'd like to leverage this.
       | 
       | Should my Rust server be running sqld, and I would add a
       | passthrough endpoint through its existing http api to the sqlite?
       | 
       | Or alternatively, should I be using sqld locally to access my
       | sqlite instance through SSH or something?
        
       | cutler wrote:
       | Excuse my ignorance but isn't the whole point of SQLite that it's
       | embedded, not clint-server?
        
         | bob1029 wrote:
         | For us, a major point of SQLite is that we can execute queries
         | in ~100uS when deployed on top of NVMe storage. Everything
         | being in the same process is 50% of the value proposition for
         | our product. Once you involve a network and another computer,
         | it gets a lot slower unless you are doing RDMA-style
         | shenanigans with very expensive hardware.
         | 
         | The other 50% of the value proposition is simplified operations
         | in a B2B setting.
        
           | MuffinFlavored wrote:
           | > For us, a major point of SQLite is that we can execute
           | queries in ~100uS when deployed on top of NVMe storage.
           | 
           | Postgres can't achieve these query times?
        
             | irq-1 wrote:
             | A Dev once told me that running SQL Server and IIS
             | (Microsoft webserver) on the same machine would allow
             | shared memory. Just pass a pointer from db to web; no
             | copying of the data. Postgres doesn't do this AFAIK.
        
               | JohnBooty wrote:
               | Purely FYI for those wondering: yeah, this is the default
               | when connecting to MSSQL from the same box. It's not an
               | IIS thing per se. Any app can do this with the
               | appropriate connection string.
               | 
               | https://learn.microsoft.com/en-
               | us/sql/tools/configuration-ma...
               | 
               | https://learn.microsoft.com/en-
               | us/sql/tools/configuration-ma...
               | 
               | It's been ages since I worked in Microsoft land, but this
               | is one of the reasons why MS shops often scale up instead
               | of out. Imagine a single box with a dozens of beefy cores
               | and a couple hundred gigs of RAM running both IIS and
               | SQL. That's... a significant amount of power.
               | 
               | It's also a potentially massive blast radius for security
               | oopsies, but I don't know the practical history of how
               | secure it has or hasn't been, or how it integrates with
               | things like clustering etc.
        
             | bob1029 wrote:
             | How long does it take to round trip through the network
             | stack of 2 machines in a typical datacenter?
        
               | MuffinFlavored wrote:
               | I thought we'd be comparing apples-to-apples with
               | Postgres running locally in this case?
        
               | bob1029 wrote:
               | Ok - How many microseconds does it take to round trip
               | SELECT 1 to localhost on a typical cloud VM?
               | 
               | You are still using the network stack and involving
               | multiple processes.
        
               | Spivak wrote:
               | People really do be forgetting how unbelievably fast
               | doing things in-process is. It's why people suffer the
               | complexity of threads instead multiple processes with ipc
               | - like the network.
        
               | bob1029 wrote:
               | Agreed.
               | 
               | In-process is where you get things like instruction-level
               | parallelism.
               | 
               | I think our university programs have failed us the most
               | in this regard. ILP should be drilled into every
               | student's brain as _the_ penultimate performance hack.
               | Once you start separating everything into a bunch of
               | different buckets /systems/processes, none of these
               | advantages can be explored anymore.
               | 
               | In some cases, ILP can give you a 100x improvement in
               | performance. That's _serial throughput_. This is the
               | reason many developers consider bit manipulation /hacking
               | to be essentially free.
        
               | justinclift wrote:
               | > Ok - How many microseconds does it take to round trip
               | SELECT 1 to localhost on a typical cloud VM?
               | 
               | As a data point, when PG is on the same box as the
               | querying app, you can run the connection over a unix
               | domain socket instead of going over localhost.
               | 
               | In theory (!), that's supposed to have reduced latency +
               | higher bandwidth too. It's been a bunch of years since I
               | last saw benchmarks about it though, so things might have
               | changed a bit in the meantime (no idea).
        
               | linuxdude314 wrote:
               | There is no "round trip" and no network stacked involved
               | in querying a local Postgres instance (web app on same VM
               | as db).
               | 
               | Attention is better spent designing an appropriate schema
               | than worrying about the network being too slow to run a
               | single query (if it is, you've already failed).
        
               | bawolff wrote:
               | But this article is about using sqlite over a web socket
               | so both would pay that price.
        
               | jbverschoor wrote:
               | Use a unix file socket for the connection, or just embed
               | postgresql :-)
               | 
               | The nice thing about sqlite, i.m.o., is mainly that it is
               | just one file.
        
         | Kinrany wrote:
         | Ideally you want both an embedded database, a library for
         | manipulating requests understood by the database, and a few
         | libraries for exposing the same interface over different
         | networks.
        
           | [deleted]
        
       | Thaxll wrote:
       | Basically re-inventing MySQL / PG but worse. Next step, we don't
       | have auth over the network, let's bake in RBAC into SQLite.
        
         | vidarh wrote:
         | From what else they're doing, it appears the point is to be
         | able to use sqlite in serverless setups where MySQL/Postgres
         | would be way too heavy to deploy on a per-customer or per-
         | function basis.
        
           | matesz wrote:
           | Honestly I don't get the point of edge. Do people really care
           | whether their website loads in 50ms vs 300ms?
        
             | maxmcd wrote:
             | Sure it's on a bit of a hype wave at the moment, but yes,
             | most internet users are mobile users that don't live in
             | major metropolitan areas with robust internet
             | infrastructure? I think at least having the option to serve
             | some things at the edge can dramatically improve the
             | browsing experience for those users (and many others!).
        
             | mirzap wrote:
             | Of course they care. It directly impacts on conversion
             | rates. I'm still stunned with simple fact that most people
             | think 100ms doesn't make a difference (for their business).
             | 
             | https://www.globaldots.com/resources/blog/how-website-
             | speed-...
        
               | billythemaniam wrote:
               | While it matters, +-100ms doesn't matter in the vast
               | majority of web use cases. There is even such a thing as
               | too fast. If a user doesn't notice the page updated due
               | to speed, that is also a poor experience. 300ms would
               | actually feel incredibly fast to most web users for most
               | websites. Less than 1 second is a good guideline for page
               | load though as shown by your link.
        
               | jamil7 wrote:
               | > There is even such a thing as too fast. If a user
               | doesn't notice the page updated due to speed, that is
               | also a poor experience.
               | 
               | If you're relying on your app's execution time to
               | demonstrate this to a user, then you have poor UX.
               | Instant changes and updates have been the norm in mobile
               | apps forever and optimistic UI is becoming pretty
               | standard on the web too, see Linear for example.
        
               | matesz wrote:
               | Can't they just cache let's say these 1000 marketing,
               | sales and customer support pages into cdn then?
        
             | VWWHFSfQ wrote:
             | One request being 50ms or 300ms probably doesn't really
             | matter.
             | 
             | But 10,000 requests _per second_ being 50 or 300ms matters
             | a lot
        
             | vidarh wrote:
             | To take a different tack then the other two (at time of
             | writing) replies to this: It's not just at the edge. Being
             | able to add databases with wild abandon and just provide a
             | key to an S3 compatible bucket to replicate it to, and not
             | have to worry about any server setup or replication is
             | appealing whether or not your setup is otherwise
             | centralised. For some setups you do want to share data
             | across user accounts, but for others, isolation along
             | customer or individual user boundaries is not just fine but
             | an advantage (no accidentally writing a query that returns
             | too much data). And then, it's a plus if having a million
             | databases and selectively spread them out over your servers
             | is trivial.
        
         | matesz wrote:
         | I never thought people take restricting access from within
         | database seriously. Like row security policies [1] in postgres.
         | But now as I look at it, it must be taken seriously, just
         | because those features exist.
         | 
         | Is anybody here using it in production with success?
         | 
         | [1] https://www.postgresql.org/docs/current/ddl-
         | rowsecurity.html
        
           | sally_glance wrote:
           | Jep, my company uses Postgres RLS for a pretty big project
           | for a client in the finance sector. It's the foundation of
           | our multi-tenant setup there. We connect with a single DB
           | user, but set a session variable at the start of each
           | transaction. This session variable is then used by the RLS
           | policies to filter the data.
           | 
           | Works like a charm, you basically get to build your app like
           | it was dealing with a single tenant DB. Just make sure it's
           | as hard as humanly possible for application developers to
           | forget setting that tenant ID when they query... In our case
           | we have a custom JPA base repository which takes care of
           | that, and some PMD rules to catch rogue repos/queries.
        
           | pphysch wrote:
           | > But now as I look at it, it must be taken seriously, just
           | because those features exist.
           | 
           | You _can_ do pretty much everything within Postgres, from ETL
           | to Authz to serving HTML templates and GIS apps. However,
           | that doesn 't mean you should, or that anyone is seriously
           | using it in production on a large scale (after evaluating
           | alternatives).
        
           | giraffe_lady wrote:
           | Yes I've used it a bunch in production it's great. It is
           | highly valued in some PII-conscious fields for compliance
           | reasons that I don't fully understand, but becoming competent
           | with postgres RLS has been a huge benefit for my career.
           | 
           | The main practical downside is that it is more precise and
           | rigorous than your app- or business-level auth logic is
           | likely to be, and has no real escape hatches. If you're
           | trying to drop it on an existing system you're going to spend
           | a _lot_ of time going back and forth with whoever to shake
           | out edge case ambiguities in your rbac policy that weren 't
           | obvious or relevant before.
        
           | aobdev wrote:
           | If I understand correctly, it seems to be a cornerstone of
           | Supabase's Authorization features.
           | https://supabase.com/docs/guides/auth/row-level-security
        
       | Existenceblinks wrote:
       | Enlighten me, if it needs a host different network location than
       | the app, what is the advantage over postgres, is it because it's
       | easier to "setup" and "maintain?
        
         | vidarh wrote:
         | Given their webassembly sqlite function support and other
         | changes, the advantage would seem to be that you're most of the
         | way there to being able to provide "serverless" databases. Have
         | a proxy handle auth, spin up an instance if it's not
         | running/shut it down after a while, and add syncing of changes
         | to/from object storage, and you're there.
        
           | Existenceblinks wrote:
           | > syncing of changes to/from object storage
           | 
           | So you are suggesting to have both the "serverless" db and
           | the storage? If so, you now have 3 problems.
        
             | vidarh wrote:
             | I'm describing capabilities that already exist. Reliably
             | streaming sqlite data to object storage is not a new thing
             | and supported by multiple separate implementations at this
             | point.
        
               | Existenceblinks wrote:
               | The problem of having multiple sources of data is
               | Consistency, Availability, Partition tolerance. It also
               | depends on IO characteristic of apps. Who are readers,
               | who writers, how long stale data is acceptable. Basically
               | all the distribute system problems. In web app, frontend
               | + backend + db sit next to it is enough of problem (e.g.
               | SPA vs MPA state problem)
        
               | vidarh wrote:
               | Did not at any point suggest multiple sources of data.
               | 
               | EDIT: While there are things in their repos that suggests
               | they _might_ be thinking about moving towards allowing
               | multiple writers, what 's _currently_ there suggests a
               | single current active instance of each database, with the
               | WAL being sync 'ed to object storage so that _in the case
               | of failure_ and /or when doing a cold-start, the database
               | can be brought back from object storage.
        
               | Existenceblinks wrote:
               | Ok sorry, I think I misinterpret your architecture. I
               | think you mean something like HA with Litestream.
        
               | vidarh wrote:
               | Yes, similar to that.
               | 
               | So you'd put up a proxy to handle auth, and match an
               | incoming request either to a running database or to a
               | cold database. If it's for a running database, it'd
               | replicate to S3 or similar. If it's for a cold database,
               | you sync from S3 or similar and start the server side
               | process.
               | 
               | To your point, you absolutely need to be able to reliably
               | grant a lease of some sort to whichever frontend pulls
               | down the database and starts and endpoint, or you're
               | absolutely right you'll have huge problems.
               | 
               | Absolutely won't be suitable for every kind of workload,
               | but if you've already committed to running your stuff in
               | a serverless setup, having your database(s) handled that
               | way might be appealing.
        
               | Existenceblinks wrote:
               | > but if you've already committed to running your stuff
               | in a serverless setup
               | 
               | Sounds good but I'm curious what's criteria (need) to
               | architect like this in the first place.
        
               | vidarh wrote:
               | Let's say you want to run a huge number of databases for
               | customers; too much to run on an individual server. Now
               | you have to shard. You can either try to split them
               | between multiple MySQL/Postgres etc. servers, that are
               | now each individually major risk factors, or you can
               | design your system so you can just hook up more servers
               | at will as long as the largest individual customer
               | database can run on a single instance.
               | 
               | I've run large numbers of Postgres databases, and it's
               | not hard to automate, but it's hard to optimise for a
               | setup where the usage patterns of individual databases
               | are hard to estimate. Is your customer using it for batch
               | jobs, or for persistent streams of data? Who can you
               | colocate with whom? When the cost of shutting a database
               | down one place and migrating it elsewhere becomes very
               | low, this kind of scenario can potentially become a lot
               | easier.
               | 
               | In terms of from the user perspective, I'd expect you
               | really wouldn't care, other than in terms of cold start
               | times and cost. Except perhaps for batch jobs etc., where
               | being able to write apps that checks state, obtains a
               | lease, downloads the most recent version does it's job
               | and uploads the result to durable storage might well be
               | convenient rather than having to e.g. keep a bunch of
               | databases constantly running.
        
               | Existenceblinks wrote:
               | > a huge number of databases for customers
               | 
               | Is this one.db per customer? If so, how do you deal with
               | schema migration?
        
               | vidarh wrote:
               | Carefully ;) One option is to build your app to check for
               | migrations on startup. I've run systems that'd do schema
               | migrations on a _per user_ basis for data stored in
               | objects per user, and it worked just fine across a
               | userbase of a couple of million accounts, on the basis
               | that the only clients that connect directly also owns the
               | schema, so nothing ever connects without immediately
               | checking if a migration needs to run.
               | 
               | But consider that for setups like this the database might
               | well be _the customers own database_ , so their schema
               | might not be something it's your job to touch at all.
        
           | jimperio wrote:
           | Interesting, hadn't thought about it like that but it makes
           | sense. Used SQLite for a desktop app but never thought it
           | would make any sense on the server.
        
         | glommer wrote:
         | * extremely easy to get started. * unmatched testability, since
         | you can now run the same code in CI and production and pass .db
         | files to your tests. * extremely cheap and lightweight
         | replication.
        
           | Existenceblinks wrote:
           | What's kind of codes that run on CI AND production? You mean
           | passing a prod-clone.db to test suits instead of fixtures?
        
       | drej wrote:
       | Note that this already exists on top of SQLite proper - authored
       | by Ben Johnson (Litestream, Fly.io etc.) -
       | https://github.com/benbjohnson/postlite
        
         | maxmcd wrote:
         | I think they are quite different it seems. Postlite expects you
         | to connect over the postgres wire protocal. Sqld is compiled
         | into your application so your application behaves like it's
         | talking to an embedded sqlite, the calls are then made over the
         | network (using one of three available transports) before being
         | returning to your application.
        
           | MuffinFlavored wrote:
           | Dumb question, with all of this newfangled WASM stuff, why
           | couldn't we also bake the Postgres server (and then client)
           | into the code? I know the WASM runtime would need to expose
           | the low-level ability to do filesystem operations, mmap()ing,
           | network sockets, etc.
        
             | maxmcd wrote:
             | Then you'd need to run and maintain Postgres, which is much
             | more complicated, not just a single database file.
             | 
             | Postgres also can't be embedded (according to some brief
             | googling), so you'd need to run it as a separate process.
        
             | stuaxo wrote:
             | This would be handy for apps that use Postgres features,
             | that would; nonetheless work in WASM.
        
       | philipwhiuk wrote:
       | "libSQL" is a silly name. It's not a library is it?
       | 
       | I wish people would at least vaguely try to be helpful naming
       | products.
        
         | glommer wrote:
         | of course it is a library. An embedded database is essentially
         | a library that you add to your application.
         | 
         | That you can build stuff around it, doesn't invalidate the fact
         | that the core of it is a library.
        
         | Spivak wrote:
         | It seems like it's a library you're meant to embed in an
         | application and that sqld is just one application built on
         | libSQL.
         | 
         | It does seem like it makes more sense to call it something that
         | alludes that it's postgres
        
         | zffr wrote:
         | From the SQLite homepage:
         | 
         | "SQLite is a C-language library that implements a small, fast,
         | self-contained, high-reliability, full-featured, SQL database
         | engine."
         | 
         | Wouldn't a fork also be a library?
        
       | maxmcd wrote:
       | Wild stuff
       | 
       | - Fork of SQLite with new features
       | https://github.com/libsql/libsql
       | 
       | - And you can embed it in your application and it will really
       | talk to your SQLite/libsql database over the network (subject of
       | this blogpost): https://github.com/libsql/sqld
       | 
       | - Oh, and if you're wondering about backup to S3, they have that
       | too: https://github.com/libsql/bottomless
       | 
       | - Uh, sqld can integrated with this
       | https://github.com/losfair/mvsqlite, so now your SQLite is backed
       | by FoundationDB!?
       | 
       | --
       | 
       | - Meanwhile Litestream exists
       | https://github.com/benbjohnson/litestream/
       | 
       | - Ben is developing https://github.com/superfly/litefs at Fly so
       | that you can talk to your SQLite through a FUSE filesystem. (and
       | work has stopped on the streaming replication effort
       | https://github.com/benbjohnson/litestream/issues/8)
       | 
       | - And, of course, SQLite has announced a new backend that hopes
       | to support concurrent writes and streaming replication:
       | https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html
       | 
       | What a time for SQLite
       | 
       | Presumably all of these things provide a wildly different
       | experience in terms of consistency, availability, latency,
       | complexity, and concurrent use. Would be so nice to read a
       | lengthy blogpost comparing all or some of these with their pros
       | and cons.
        
         | stuaxo wrote:
         | Related: michaelc @ uktrade did some intresting work streaming
         | sqlite with python in this repo -
         | https://github.com/uktrade/stream-sqlite
        
         | maxmcd wrote:
         | Questions for libsql:
         | 
         | - Looks like bottomless does automatic restoring of the
         | database? Litestream seems to avoid this, I assume because of
         | concerns about accidentally creating multiple writers on a
         | rolling-deploy (or similar mistake). Any concerns about this
         | possible footgun?
         | 
         | - Bottomless is a sqlite extension, not a separate process?
         | Pros and cons compared to Litestream?
         | 
         | - Similar questions for sqld. How does sqld handle the
         | lifecycle of deploys and multiple readers/writers talking to
         | the database? Anything a new user should be concerned about?
        
           | psarna wrote:
           | First of all, let me start by reiterating the first sentence
           | from the bottomless repo - it's work in heavy progress (and
           | we'll move it under the sqld/ repo soon, to keep everything
           | in one place). Now, answers:
           | 
           | > - Looks like bottomless does automatic restoring of the
           | database? Litestream seems to avoid this, I assume because of
           | concerns about accidentally creating multiple writers on a
           | rolling-deploy (or similar mistake). Any concerns about this
           | possible footgun?
           | 
           | It's a valid concern, but what always happens on boot is
           | starting a new generation (a generation is basically a
           | snapshot of the main file + its continuously replicated WAL),
           | distinguished by a uuid v7, the timestamped one. So even if a
           | collision happens, it would be recoverable - e.g. one of the
           | stray generations should be deleted.
           | 
           | > - Bottomless is a sqlite extension, not a separate process?
           | Pros and cons compared to litestream?
           | 
           | The only con I see is that a bug in the extension could
           | interfere with the database. As for pros: way less
           | maintenance work, because everything is already embedded,
           | we're also hooked into the database via a virtual WAL
           | interface (libSQL-only), so we have full control over when to
           | replicate, without having to observe the .wal file and reason
           | about it from a separate process.
           | 
           | > - Similar questions for sqld. How does sqld handle the
           | lifecycle of deploys and multiple readers/writers talking to
           | the database? Anything a new user should be concerned about?
           | 
           | There are going to be multiple flavors of sqld, but the rough
           | idea would be to trust the users to only launch a single
           | primary. In the current state of the code, replicas contact
           | the primary in order to register themselves, so the model is
           | centralized. Once we build something on top of a consensus
           | algorithm, leader election will be pushed to the algorithm
           | itself.
        
             | maxmcd wrote:
             | Very cool, thank you for the insights
        
         | houqp wrote:
         | bottomless looks really nice, thanks for sharing!
        
         | aaviator42 wrote:
         | I wrote a small PHP library that gives you a key-value storage
         | interface to SQlite files: https://github.com/aaviator42/StorX
         | 
         | I've been dogfooding for a while by using it in my side
         | projects.
         | 
         | And there's a basic API too, to use it over a network:
         | https://github.com/aaviator42/StorX-API
        
           | vdm wrote:
           | ICYMI, similar concept for python
           | https://dataset.readthedocs.io/
        
       ___________________________________________________________________
       (page generated 2023-01-25 23:01 UTC)