[HN Gopher] SQLite-based databases on the Postgres protocol
___________________________________________________________________
SQLite-based databases on the Postgres protocol
Author : matesz
Score : 191 points
Date : 2023-01-25 12:31 UTC (10 hours ago)
(HTM) web link (blog.chiselstrike.com)
(TXT) w3m dump (blog.chiselstrike.com)
| _448 wrote:
| A noob question, how does one specify unique constraint on a
| column in SQlite?
| chasil wrote:
| This is quite an accomplishment. It is unfortunate that it could
| not be done in the main project.
| bawolff wrote:
| I don't get why you would want this. It seems like its taking
| away all the key advantages of the sqlite approach and leaving
| only the disadvantages.
| newaccount2021 wrote:
| [dead]
| houqp wrote:
| Very cool and well executed project. Love the sprinkle of Rust in
| all the other companion projects as well :)
|
| The ROAPI(https://github.com/roapi/roapi) project I built also
| happened to support a similar feature set, i.e. to expose sqlite
| through a variety of remote query interfaces including pg wire
| protocols, rest apis and graphqls.
| ronanyeah wrote:
| I've been using render.com to host Rust+sqlite stacks recently,
| and I'd like to leverage this.
|
| Should my Rust server be running sqld, and I would add a
| passthrough endpoint through its existing http api to the sqlite?
|
| Or alternatively, should I be using sqld locally to access my
| sqlite instance through SSH or something?
| cutler wrote:
| Excuse my ignorance but isn't the whole point of SQLite that it's
| embedded, not clint-server?
| bob1029 wrote:
| For us, a major point of SQLite is that we can execute queries
| in ~100uS when deployed on top of NVMe storage. Everything
| being in the same process is 50% of the value proposition for
| our product. Once you involve a network and another computer,
| it gets a lot slower unless you are doing RDMA-style
| shenanigans with very expensive hardware.
|
| The other 50% of the value proposition is simplified operations
| in a B2B setting.
| MuffinFlavored wrote:
| > For us, a major point of SQLite is that we can execute
| queries in ~100uS when deployed on top of NVMe storage.
|
| Postgres can't achieve these query times?
| irq-1 wrote:
| A Dev once told me that running SQL Server and IIS
| (Microsoft webserver) on the same machine would allow
| shared memory. Just pass a pointer from db to web; no
| copying of the data. Postgres doesn't do this AFAIK.
| JohnBooty wrote:
| Purely FYI for those wondering: yeah, this is the default
| when connecting to MSSQL from the same box. It's not an
| IIS thing per se. Any app can do this with the
| appropriate connection string.
|
| https://learn.microsoft.com/en-
| us/sql/tools/configuration-ma...
|
| https://learn.microsoft.com/en-
| us/sql/tools/configuration-ma...
|
| It's been ages since I worked in Microsoft land, but this
| is one of the reasons why MS shops often scale up instead
| of out. Imagine a single box with a dozens of beefy cores
| and a couple hundred gigs of RAM running both IIS and
| SQL. That's... a significant amount of power.
|
| It's also a potentially massive blast radius for security
| oopsies, but I don't know the practical history of how
| secure it has or hasn't been, or how it integrates with
| things like clustering etc.
| bob1029 wrote:
| How long does it take to round trip through the network
| stack of 2 machines in a typical datacenter?
| MuffinFlavored wrote:
| I thought we'd be comparing apples-to-apples with
| Postgres running locally in this case?
| bob1029 wrote:
| Ok - How many microseconds does it take to round trip
| SELECT 1 to localhost on a typical cloud VM?
|
| You are still using the network stack and involving
| multiple processes.
| Spivak wrote:
| People really do be forgetting how unbelievably fast
| doing things in-process is. It's why people suffer the
| complexity of threads instead multiple processes with ipc
| - like the network.
| bob1029 wrote:
| Agreed.
|
| In-process is where you get things like instruction-level
| parallelism.
|
| I think our university programs have failed us the most
| in this regard. ILP should be drilled into every
| student's brain as _the_ penultimate performance hack.
| Once you start separating everything into a bunch of
| different buckets /systems/processes, none of these
| advantages can be explored anymore.
|
| In some cases, ILP can give you a 100x improvement in
| performance. That's _serial throughput_. This is the
| reason many developers consider bit manipulation /hacking
| to be essentially free.
| justinclift wrote:
| > Ok - How many microseconds does it take to round trip
| SELECT 1 to localhost on a typical cloud VM?
|
| As a data point, when PG is on the same box as the
| querying app, you can run the connection over a unix
| domain socket instead of going over localhost.
|
| In theory (!), that's supposed to have reduced latency +
| higher bandwidth too. It's been a bunch of years since I
| last saw benchmarks about it though, so things might have
| changed a bit in the meantime (no idea).
| linuxdude314 wrote:
| There is no "round trip" and no network stacked involved
| in querying a local Postgres instance (web app on same VM
| as db).
|
| Attention is better spent designing an appropriate schema
| than worrying about the network being too slow to run a
| single query (if it is, you've already failed).
| bawolff wrote:
| But this article is about using sqlite over a web socket
| so both would pay that price.
| jbverschoor wrote:
| Use a unix file socket for the connection, or just embed
| postgresql :-)
|
| The nice thing about sqlite, i.m.o., is mainly that it is
| just one file.
| Kinrany wrote:
| Ideally you want both an embedded database, a library for
| manipulating requests understood by the database, and a few
| libraries for exposing the same interface over different
| networks.
| [deleted]
| Thaxll wrote:
| Basically re-inventing MySQL / PG but worse. Next step, we don't
| have auth over the network, let's bake in RBAC into SQLite.
| vidarh wrote:
| From what else they're doing, it appears the point is to be
| able to use sqlite in serverless setups where MySQL/Postgres
| would be way too heavy to deploy on a per-customer or per-
| function basis.
| matesz wrote:
| Honestly I don't get the point of edge. Do people really care
| whether their website loads in 50ms vs 300ms?
| maxmcd wrote:
| Sure it's on a bit of a hype wave at the moment, but yes,
| most internet users are mobile users that don't live in
| major metropolitan areas with robust internet
| infrastructure? I think at least having the option to serve
| some things at the edge can dramatically improve the
| browsing experience for those users (and many others!).
| mirzap wrote:
| Of course they care. It directly impacts on conversion
| rates. I'm still stunned with simple fact that most people
| think 100ms doesn't make a difference (for their business).
|
| https://www.globaldots.com/resources/blog/how-website-
| speed-...
| billythemaniam wrote:
| While it matters, +-100ms doesn't matter in the vast
| majority of web use cases. There is even such a thing as
| too fast. If a user doesn't notice the page updated due
| to speed, that is also a poor experience. 300ms would
| actually feel incredibly fast to most web users for most
| websites. Less than 1 second is a good guideline for page
| load though as shown by your link.
| jamil7 wrote:
| > There is even such a thing as too fast. If a user
| doesn't notice the page updated due to speed, that is
| also a poor experience.
|
| If you're relying on your app's execution time to
| demonstrate this to a user, then you have poor UX.
| Instant changes and updates have been the norm in mobile
| apps forever and optimistic UI is becoming pretty
| standard on the web too, see Linear for example.
| matesz wrote:
| Can't they just cache let's say these 1000 marketing,
| sales and customer support pages into cdn then?
| VWWHFSfQ wrote:
| One request being 50ms or 300ms probably doesn't really
| matter.
|
| But 10,000 requests _per second_ being 50 or 300ms matters
| a lot
| vidarh wrote:
| To take a different tack then the other two (at time of
| writing) replies to this: It's not just at the edge. Being
| able to add databases with wild abandon and just provide a
| key to an S3 compatible bucket to replicate it to, and not
| have to worry about any server setup or replication is
| appealing whether or not your setup is otherwise
| centralised. For some setups you do want to share data
| across user accounts, but for others, isolation along
| customer or individual user boundaries is not just fine but
| an advantage (no accidentally writing a query that returns
| too much data). And then, it's a plus if having a million
| databases and selectively spread them out over your servers
| is trivial.
| matesz wrote:
| I never thought people take restricting access from within
| database seriously. Like row security policies [1] in postgres.
| But now as I look at it, it must be taken seriously, just
| because those features exist.
|
| Is anybody here using it in production with success?
|
| [1] https://www.postgresql.org/docs/current/ddl-
| rowsecurity.html
| sally_glance wrote:
| Jep, my company uses Postgres RLS for a pretty big project
| for a client in the finance sector. It's the foundation of
| our multi-tenant setup there. We connect with a single DB
| user, but set a session variable at the start of each
| transaction. This session variable is then used by the RLS
| policies to filter the data.
|
| Works like a charm, you basically get to build your app like
| it was dealing with a single tenant DB. Just make sure it's
| as hard as humanly possible for application developers to
| forget setting that tenant ID when they query... In our case
| we have a custom JPA base repository which takes care of
| that, and some PMD rules to catch rogue repos/queries.
| pphysch wrote:
| > But now as I look at it, it must be taken seriously, just
| because those features exist.
|
| You _can_ do pretty much everything within Postgres, from ETL
| to Authz to serving HTML templates and GIS apps. However,
| that doesn 't mean you should, or that anyone is seriously
| using it in production on a large scale (after evaluating
| alternatives).
| giraffe_lady wrote:
| Yes I've used it a bunch in production it's great. It is
| highly valued in some PII-conscious fields for compliance
| reasons that I don't fully understand, but becoming competent
| with postgres RLS has been a huge benefit for my career.
|
| The main practical downside is that it is more precise and
| rigorous than your app- or business-level auth logic is
| likely to be, and has no real escape hatches. If you're
| trying to drop it on an existing system you're going to spend
| a _lot_ of time going back and forth with whoever to shake
| out edge case ambiguities in your rbac policy that weren 't
| obvious or relevant before.
| aobdev wrote:
| If I understand correctly, it seems to be a cornerstone of
| Supabase's Authorization features.
| https://supabase.com/docs/guides/auth/row-level-security
| Existenceblinks wrote:
| Enlighten me, if it needs a host different network location than
| the app, what is the advantage over postgres, is it because it's
| easier to "setup" and "maintain?
| vidarh wrote:
| Given their webassembly sqlite function support and other
| changes, the advantage would seem to be that you're most of the
| way there to being able to provide "serverless" databases. Have
| a proxy handle auth, spin up an instance if it's not
| running/shut it down after a while, and add syncing of changes
| to/from object storage, and you're there.
| Existenceblinks wrote:
| > syncing of changes to/from object storage
|
| So you are suggesting to have both the "serverless" db and
| the storage? If so, you now have 3 problems.
| vidarh wrote:
| I'm describing capabilities that already exist. Reliably
| streaming sqlite data to object storage is not a new thing
| and supported by multiple separate implementations at this
| point.
| Existenceblinks wrote:
| The problem of having multiple sources of data is
| Consistency, Availability, Partition tolerance. It also
| depends on IO characteristic of apps. Who are readers,
| who writers, how long stale data is acceptable. Basically
| all the distribute system problems. In web app, frontend
| + backend + db sit next to it is enough of problem (e.g.
| SPA vs MPA state problem)
| vidarh wrote:
| Did not at any point suggest multiple sources of data.
|
| EDIT: While there are things in their repos that suggests
| they _might_ be thinking about moving towards allowing
| multiple writers, what 's _currently_ there suggests a
| single current active instance of each database, with the
| WAL being sync 'ed to object storage so that _in the case
| of failure_ and /or when doing a cold-start, the database
| can be brought back from object storage.
| Existenceblinks wrote:
| Ok sorry, I think I misinterpret your architecture. I
| think you mean something like HA with Litestream.
| vidarh wrote:
| Yes, similar to that.
|
| So you'd put up a proxy to handle auth, and match an
| incoming request either to a running database or to a
| cold database. If it's for a running database, it'd
| replicate to S3 or similar. If it's for a cold database,
| you sync from S3 or similar and start the server side
| process.
|
| To your point, you absolutely need to be able to reliably
| grant a lease of some sort to whichever frontend pulls
| down the database and starts and endpoint, or you're
| absolutely right you'll have huge problems.
|
| Absolutely won't be suitable for every kind of workload,
| but if you've already committed to running your stuff in
| a serverless setup, having your database(s) handled that
| way might be appealing.
| Existenceblinks wrote:
| > but if you've already committed to running your stuff
| in a serverless setup
|
| Sounds good but I'm curious what's criteria (need) to
| architect like this in the first place.
| vidarh wrote:
| Let's say you want to run a huge number of databases for
| customers; too much to run on an individual server. Now
| you have to shard. You can either try to split them
| between multiple MySQL/Postgres etc. servers, that are
| now each individually major risk factors, or you can
| design your system so you can just hook up more servers
| at will as long as the largest individual customer
| database can run on a single instance.
|
| I've run large numbers of Postgres databases, and it's
| not hard to automate, but it's hard to optimise for a
| setup where the usage patterns of individual databases
| are hard to estimate. Is your customer using it for batch
| jobs, or for persistent streams of data? Who can you
| colocate with whom? When the cost of shutting a database
| down one place and migrating it elsewhere becomes very
| low, this kind of scenario can potentially become a lot
| easier.
|
| In terms of from the user perspective, I'd expect you
| really wouldn't care, other than in terms of cold start
| times and cost. Except perhaps for batch jobs etc., where
| being able to write apps that checks state, obtains a
| lease, downloads the most recent version does it's job
| and uploads the result to durable storage might well be
| convenient rather than having to e.g. keep a bunch of
| databases constantly running.
| Existenceblinks wrote:
| > a huge number of databases for customers
|
| Is this one.db per customer? If so, how do you deal with
| schema migration?
| vidarh wrote:
| Carefully ;) One option is to build your app to check for
| migrations on startup. I've run systems that'd do schema
| migrations on a _per user_ basis for data stored in
| objects per user, and it worked just fine across a
| userbase of a couple of million accounts, on the basis
| that the only clients that connect directly also owns the
| schema, so nothing ever connects without immediately
| checking if a migration needs to run.
|
| But consider that for setups like this the database might
| well be _the customers own database_ , so their schema
| might not be something it's your job to touch at all.
| jimperio wrote:
| Interesting, hadn't thought about it like that but it makes
| sense. Used SQLite for a desktop app but never thought it
| would make any sense on the server.
| glommer wrote:
| * extremely easy to get started. * unmatched testability, since
| you can now run the same code in CI and production and pass .db
| files to your tests. * extremely cheap and lightweight
| replication.
| Existenceblinks wrote:
| What's kind of codes that run on CI AND production? You mean
| passing a prod-clone.db to test suits instead of fixtures?
| drej wrote:
| Note that this already exists on top of SQLite proper - authored
| by Ben Johnson (Litestream, Fly.io etc.) -
| https://github.com/benbjohnson/postlite
| maxmcd wrote:
| I think they are quite different it seems. Postlite expects you
| to connect over the postgres wire protocal. Sqld is compiled
| into your application so your application behaves like it's
| talking to an embedded sqlite, the calls are then made over the
| network (using one of three available transports) before being
| returning to your application.
| MuffinFlavored wrote:
| Dumb question, with all of this newfangled WASM stuff, why
| couldn't we also bake the Postgres server (and then client)
| into the code? I know the WASM runtime would need to expose
| the low-level ability to do filesystem operations, mmap()ing,
| network sockets, etc.
| maxmcd wrote:
| Then you'd need to run and maintain Postgres, which is much
| more complicated, not just a single database file.
|
| Postgres also can't be embedded (according to some brief
| googling), so you'd need to run it as a separate process.
| stuaxo wrote:
| This would be handy for apps that use Postgres features,
| that would; nonetheless work in WASM.
| philipwhiuk wrote:
| "libSQL" is a silly name. It's not a library is it?
|
| I wish people would at least vaguely try to be helpful naming
| products.
| glommer wrote:
| of course it is a library. An embedded database is essentially
| a library that you add to your application.
|
| That you can build stuff around it, doesn't invalidate the fact
| that the core of it is a library.
| Spivak wrote:
| It seems like it's a library you're meant to embed in an
| application and that sqld is just one application built on
| libSQL.
|
| It does seem like it makes more sense to call it something that
| alludes that it's postgres
| zffr wrote:
| From the SQLite homepage:
|
| "SQLite is a C-language library that implements a small, fast,
| self-contained, high-reliability, full-featured, SQL database
| engine."
|
| Wouldn't a fork also be a library?
| maxmcd wrote:
| Wild stuff
|
| - Fork of SQLite with new features
| https://github.com/libsql/libsql
|
| - And you can embed it in your application and it will really
| talk to your SQLite/libsql database over the network (subject of
| this blogpost): https://github.com/libsql/sqld
|
| - Oh, and if you're wondering about backup to S3, they have that
| too: https://github.com/libsql/bottomless
|
| - Uh, sqld can integrated with this
| https://github.com/losfair/mvsqlite, so now your SQLite is backed
| by FoundationDB!?
|
| --
|
| - Meanwhile Litestream exists
| https://github.com/benbjohnson/litestream/
|
| - Ben is developing https://github.com/superfly/litefs at Fly so
| that you can talk to your SQLite through a FUSE filesystem. (and
| work has stopped on the streaming replication effort
| https://github.com/benbjohnson/litestream/issues/8)
|
| - And, of course, SQLite has announced a new backend that hopes
| to support concurrent writes and streaming replication:
| https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html
|
| What a time for SQLite
|
| Presumably all of these things provide a wildly different
| experience in terms of consistency, availability, latency,
| complexity, and concurrent use. Would be so nice to read a
| lengthy blogpost comparing all or some of these with their pros
| and cons.
| stuaxo wrote:
| Related: michaelc @ uktrade did some intresting work streaming
| sqlite with python in this repo -
| https://github.com/uktrade/stream-sqlite
| maxmcd wrote:
| Questions for libsql:
|
| - Looks like bottomless does automatic restoring of the
| database? Litestream seems to avoid this, I assume because of
| concerns about accidentally creating multiple writers on a
| rolling-deploy (or similar mistake). Any concerns about this
| possible footgun?
|
| - Bottomless is a sqlite extension, not a separate process?
| Pros and cons compared to Litestream?
|
| - Similar questions for sqld. How does sqld handle the
| lifecycle of deploys and multiple readers/writers talking to
| the database? Anything a new user should be concerned about?
| psarna wrote:
| First of all, let me start by reiterating the first sentence
| from the bottomless repo - it's work in heavy progress (and
| we'll move it under the sqld/ repo soon, to keep everything
| in one place). Now, answers:
|
| > - Looks like bottomless does automatic restoring of the
| database? Litestream seems to avoid this, I assume because of
| concerns about accidentally creating multiple writers on a
| rolling-deploy (or similar mistake). Any concerns about this
| possible footgun?
|
| It's a valid concern, but what always happens on boot is
| starting a new generation (a generation is basically a
| snapshot of the main file + its continuously replicated WAL),
| distinguished by a uuid v7, the timestamped one. So even if a
| collision happens, it would be recoverable - e.g. one of the
| stray generations should be deleted.
|
| > - Bottomless is a sqlite extension, not a separate process?
| Pros and cons compared to litestream?
|
| The only con I see is that a bug in the extension could
| interfere with the database. As for pros: way less
| maintenance work, because everything is already embedded,
| we're also hooked into the database via a virtual WAL
| interface (libSQL-only), so we have full control over when to
| replicate, without having to observe the .wal file and reason
| about it from a separate process.
|
| > - Similar questions for sqld. How does sqld handle the
| lifecycle of deploys and multiple readers/writers talking to
| the database? Anything a new user should be concerned about?
|
| There are going to be multiple flavors of sqld, but the rough
| idea would be to trust the users to only launch a single
| primary. In the current state of the code, replicas contact
| the primary in order to register themselves, so the model is
| centralized. Once we build something on top of a consensus
| algorithm, leader election will be pushed to the algorithm
| itself.
| maxmcd wrote:
| Very cool, thank you for the insights
| houqp wrote:
| bottomless looks really nice, thanks for sharing!
| aaviator42 wrote:
| I wrote a small PHP library that gives you a key-value storage
| interface to SQlite files: https://github.com/aaviator42/StorX
|
| I've been dogfooding for a while by using it in my side
| projects.
|
| And there's a basic API too, to use it over a network:
| https://github.com/aaviator42/StorX-API
| vdm wrote:
| ICYMI, similar concept for python
| https://dataset.readthedocs.io/
___________________________________________________________________
(page generated 2023-01-25 23:01 UTC)