[HN Gopher] Go and SQLite in the Cloud
___________________________________________________________________
Go and SQLite in the Cloud
Author : subomi
Score : 116 points
Date : 2022-12-07 14:56 UTC (8 hours ago)
(HTM) web link (www.golang.dk)
(TXT) w3m dump (www.golang.dk)
| sakopov wrote:
| I've become really interested in SQLite when I heard about
| Litestream. What kind of deployment strategies do folks use to
| deploy SQLite and Litestream for production use? Also, AFAIK
| writes can only be done via master instance, so how do you
| control read/writes from multiple app instances?
| markusw wrote:
| Have a look at the followup article that touches on distributed
| SQLite with LiteFS. :)
| 0cf8612b2e1e wrote:
| Only had a chance to skim the LiteFS article, but one use
| case closer to my heart would be blue-green deployments
| against the same SQLite database for zero-downtime
| deployments. Conceptually, I think I could use LiteFS to the
| same effect as distributed, but would love to see a write-up
| or any gotchas likely to occur.
| markusw wrote:
| I don't think there's too much difference to "regular"
| distributed SQLite, but I'll have to think about that.
| simscitizen wrote:
| It doesn't strike me as the best language for embedding SQLite
| given the need to constantly cross the Cgo boundary. But I'm sure
| it still works fine.
| iansinnott wrote:
| Although not mentioned in the article, there is a CGo-free port
| of SQLite which can be used as an alternative to the usual
| driver: https://pkg.go.dev/modernc.org/sqlite
| Cwizard wrote:
| Not relevant if you are interested in performance. This go
| only version is much slower than the cgo overhead. (At least
| it was a year ago, do your own benchmarks)
| Loic wrote:
| It works very well. We are running a search engine for chemical
| properties in Go+SQLite and the speed is simply incredible to
| the point people think it is a static website.
|
| Here is a page with quite some queries to render:
| https://www.chemeo.com/cid/58-801-8/Pentane
| markusw wrote:
| Haha, I've heard that comment before as well. If it's _too_
| fast, it feels suspicious, like it's not doing anything.
|
| I think we've collectively forgotten how fast dynamic
| websites can be on modern hardware!
| mappu wrote:
| _> A benchmark performed on go 1.15 showed 60ns of overhead for
| calls into C_
|
| This concern is a bit overblown, the Cgo boundary is heavier
| than an ordinary function call but still a zillion times faster
| than doing disk I/O on the database.
| mlangenberg wrote:
| I feel that I am spoiled with a great graphical user interface
| being available to explore MySQL databases in the form of Sequel
| Ace for macOS, to the point that I sometimes load an sqlite
| database in MySQL just to be able to browse through it with
| Sequel Ace.
|
| Any recommendations for a macOS GUI for sqlite that comes close
| to Sequel Ace? I have tried DB Browser for SQLite, but that feels
| a bit outdated to be honest.
| markusw wrote:
| I'm pretty happy with TablePlus. Does all the big dbs.
| hapidjus wrote:
| Also happy user of TablePlus. There is a free version that is
| limited to two tabs and some other stuff so you can try
| before you buy.
| markusw wrote:
| Hey! I'm the author of the article. Saw a sudden influx of
| traffic from HN and found the post here.
|
| Happy to answer any questions. Although SQLite is so dead-simple-
| but-awesome that you probably don't have any. :D
|
| Anyway, hope you enjoy it and learn a thing or two.
|
| Markus
| yandrypozo wrote:
| Nice article, what is your opinion on pocketbase.io which match
| your title really well, it would be great a follow-up article
| for LiteFS and pocketbase :)
| markusw wrote:
| I don't know pocketbase, will have a look.
| marktangotango wrote:
| I suggest you run some simple load tests that do inserts,
| you'll find WAL is not sufficient for concurrency. Multiple
| reader, single writers handle this well though.
| infamia wrote:
| I was surprised at the amount of performance SQLite provided.
| I get about 700 inserts/sec with the WAL enabled sitting
| behind a Django app.
|
| https://imgur.com/a/a3U41Zo
|
| Without the WAL enabled, I get a around 400req/sec.
|
| edit: clarity
| sharps_xp wrote:
| You can put all your writes in a queue to be processed by a
| finite set of writers. I think most CRUD applications can
| afford the delay and if not you can pair it with a write
| through cache.
| belmont_sup wrote:
| It sounds simpler to setup postgres and never have to set
| up a queue (unless you're using an in memory queue, and
| that has its drawbacks).
|
| Although I definitely would like to use SQLite just for the
| cost savings for something. Litefs/litestream looks great.
| thomascgalvin wrote:
| Concurrent writes are the bane of SQLite's existence, but for
| a one-person blog, you shouldn't run into any issues.
|
| You _would_ have trouble scaling up to fives of authors,
| though, which would be a deal breaker for any serious
| production app.
| markusw wrote:
| @levelsio famously runs a dozen websites all backed by
| SQLite, with 200M reqs / month:
| https://twitter.com/levelsio/status/1520356430800617472
|
| Expensify got 4 million request _per second_ out of a
| custom SQLite-based setup (on a huge machine, but still):
| https://blog.expensify.com/2018/01/08/scaling-sqlite-
| to-4m-q...
|
| I would call those serious production apps.
| longcommonname wrote:
| Would you provide a summary of how you noticed the increased
| traffic and how you found this post?
| markusw wrote:
| Saw the request spike, searched HN for "sqlite".
| nileshtrivedi wrote:
| I'd like to avoid writing boilerplate CRUD code. Is there an
| equivalent of PostgREST but for SQLite? Essentially, a standard
| binary would read the schema metadata, and generate standard
| CRUD APIs: https://postgrest.org/en/stable/api.html
|
| The APIs can even support authentication and authorization with
| the help of JWT tokens. SQLite may not have row-level security,
| but even a convention (eg: if a row has user_id column, the JWT
| must have the same user_id value to get access to a row) would
| go a long way.
| 0cf8612b2e1e wrote:
| A long way off from PostgREST, but you might be able to hack
| together something sort of similar with Datasette [0] and the
| new JSON api. Not out of the box at all, but the potential is
| there to get a similar product.
|
| [0]: https://datasette.io/
| nileshtrivedi wrote:
| AFAIK, Datasette only offers read-only APIs.
| 0cf8612b2e1e wrote:
| Not anymore [0]. The announcement [1]
|
| [0]:
| https://docs.datasette.io/en/latest/json_api.html#the-
| json-w...
|
| [1]: https://simonwillison.net/2022/Dec/2/datasette-
| write-api/
| [deleted]
| LVB wrote:
| I've not use this myself, but Ben Johnson's
| https://github.com/benbjohnson/postlite in front of SQlite
| might allow you to use PostgREST? I recall him saying on a
| podcast that his goal was to be able to point the large
| ecosystem of PG tools at SQlite.
| Cwizard wrote:
| Have you tried pocketbase?
| iccananea wrote:
| Not exactly what you want, but I've been using
| https://sqlc.dev to generate hydration code from SQL queries
| and loving it!
| maxpert wrote:
| Hey, I am author of Marmot (https://github.com/maxpert/marmot)
| would love to get in touch and explore a few things with you.
| I've been using Marmot in production in a while scaling couple
| of services, and since you mentioned LiteFS I would love to
| pick your brain, and explore couple of ideas with you. Multiple
| ways to get in touch including Github discussions or discord
| channel. Hope to see you around!
| siliconc0w wrote:
| Still not convinced sqlite is a good default, you can also run
| mysql/postgres locally which allows you similarly eliminate
| network hops and gives you the option to separate out things down
| the road. I'm not sure how Sqlite handles concurrent
| readers/writers these days but you'll probably at least end up
| scaling concurrency even in the early stages for things like
| async tasks.
| markusw wrote:
| It's not just extra hops: there's no network! (In distributed
| SQLite for reads at least.) Great for read-heavy workloads, and
| no n+1 query problems.
|
| Concurrent readers is no problem. No concurrent writes right
| now.
| fyresala wrote:
| You won't gain much by the combination of golang, sqlite and the
| cloud. You can't scale out and the cloud layer only makes sqlite
| slower. I can't see any reason not using RDS other than it's more
| expensive.
|
| I would only say this is a quick way to run up an application
| with a cheap VPS for a beginner.
| iveqy wrote:
| Depends on your application.
|
| On the opposite I've seen many cases where sqlite3 scales
| better than postgres, depending on the scale and the
| application.
|
| If you're doing horizontal scaling, sqlite3 is a very good
| alternative.
| ketralnis wrote:
| This is a pretty strong claim without any numbers, to be
| honest. You were probably already running your single database
| instance so being limited to a single sqlite instance isn't
| terribly different. This does limit you to a single app server,
| which unless you're CPU bound is also fine. Even if you are,
| you can get a lot of cores in a single instance these days.
|
| Most things will never need to be scaled up
| aynyc wrote:
| AFAIK, sqlite3 doesn't support multi-core, so you can only do
| vertical scaling. In cloud, people rather do horizontal
| scaling. That being said, I ran a django based webapp for
| internal users and we saw core being pegged by sqlite3 when
| the queries were complex. However, under normal usage, it was
| fine for about 100 concurrent users including machine API
| calls.
| saila wrote:
| Not saying SQLite wasn't the issue here (impossible to tell
| based on the information provided), but I've seen a lot of
| webapps with highly inefficient queries because developers
| either didn't understand what the ORM was doing or didn't
| bother to optimize. In a Django app, _prefetch_related_ can
| make a huge difference (or _joinedload_ when using
| SQLAlchemy or better initial filtering when using raw SQL).
| ketralnis wrote:
| SQLite actually scales far better than the memes would
| generally tell you and that "lite" implies. It's
| frustrating how easily these memes spread by people just
| repeating what they heard once. If there's anything HN has
| taught me it's to check the docs instead of believing these
| ambient notions
|
| It's more complicated than "doesn't support multi-core"
| which you can see in the other replies here: generally
| unlimited concurrent readers are allowed with single or at
| least finite writers. Depending on a bunch of settings
| (e.g. WAL) you might also get concurrent writers, multicore
| sorting, and some other things that can cross cores too.
| Those things do have their own tradeoffs.
|
| But my actual claim is that most things don't need to be
| "scaled" at all. And _if_ you do get there, and again
| statistically you won 't, then you're definitely going to
| be doing some other rearchitecting anyway and moving
| sqlite->postgres might as well be part of that.
| aynyc wrote:
| I never said it's "lite". I can say for certain that in
| my experience, when writes start to go up, such as stock
| market event stream coming in, sqlite3 doesn't perform as
| well as postgres, and scaling writes is hard in sqlite3.
| But I have no issue scaling Postgres for those data.
|
| If you say, sqlite3 is good enough for most webapps, then
| I probably agree. But I would not use it for a lot of
| things that I use Postgres for such as pubsub, or really
| high transaction rate.
| cldellow wrote:
| > doesn't support multi-core
|
| Can you expand on that? I thought it supported multiple
| readers, but only a single writer. They do all have to be
| on the same host, though.
|
| Since you mentioned Django - I think there are some
| complexities where the Python driver can't really do
| concurrent access if using threads instead of processes,
| but this is due to Python/GIL limitations, not sqlite
| limitations.
| ketralnis wrote:
| > Python driver can't really do concurrent access if
| using threads instead of processes, but this is due to
| Python/GIL limitations
|
| Confusingly you can't run 2 lines of Python code at the
| same time, but you can run 2 SQL queries (sqlite, remote
| server, etc) at the same time. Python threads are true
| pthreads, they just need to hold the GIL while running
| Python code. They release it when running C code or doing
| I/O. So you can have 2 queries running at the same time
| no problem, but you'll only get a single core of compute
| processing their results.
| aynyc wrote:
| sqlite3 is a single core application, so even if you have
| multiple cores on a box, sqlite3 isn't going to take
| advantage of that. If you want sqlite3 to go faster, you
| need better single core box. That being said, I do
| believe sqlite3 allow multiple processes to read the same
| database file.
|
| Our django app sits behind Gunicorn that will spin up
| multiple django instances, as long as Sqlite3 has WAL
| mode on, concurrent access (read) isn't a problem. For
| write, we basically use a lock.
| ketralnis wrote:
| This is confusingly worded but for onlookers: sqlite as
| you think of it isn't an application at all, it's a C
| library. There does exist a binary called sqlite3 which
| is an interface to that library, and that binary is
| (mostly, though not exclusively) single-threaded.
|
| But unless you're using that specific tool, and you're
| probably not unless you yourself are typing "sqlite3"
| into a bash shell, you're using the library rather than
| this query tool. And that library interacts in a similar
| way to a socket connect or a fopen call. You can have
| multiple connections open to the same sqlite database,
| and those connections can operate concurrently (subject
| to various limitations, but "you need better single core
| box" is not the summary)
| markusw wrote:
| When you embed it in Go, you can have concurrent reads.
| Each HTTP handler request gets a separate goroutine, so
| you can definitely take advantage of multiple cores.
| stonemetal12 wrote:
| It is even a little better than that these days.
| Beginning a transaction normally locks the whole
| database, but with Wal mode they have added a Begin
| Concurrent command so there can be multiple transactions
| going at once. However when you get to the commit it
| still requires the exclusive lock. I bet after a few
| years even that restriction will be gone.
|
| https://www.sqlite.org/cgi/src/doc/begin-
| concurrent/doc/begi...
| SPBS wrote:
| https://sqlite.org/forum/info/2ad28cb1e0356816e7b0bd2ab45
| 8f7...
|
| I don't think BEGIN CONCURRENT is out yet. AFAIK it's
| only available if you build the experimental wal2 branch
| yourself.
| stonemetal12 wrote:
| I guess I miss read the branch list, I thought the begin-
| concurrent-3.39 branch said it had merged to trunk but
| that is the line above it.
| 0cf8612b2e1e wrote:
| > I can't see any reason not using RDS other than it's more
| expensive.
|
| We don't all operate with unlimited VC money.
| markusw wrote:
| See my followup-article "Distributed SQLite with LiteFS" for
| the scale-out part: https://www.golang.dk/articles/distributed-
| sqlite-with-litef...
|
| And IMO, what you lose in the cloud layer you gain by having a
| deployment really close to your user. Speed of light and all
| that.
|
| Plus SQLite is super cool and fun! :D
| benbjohnson wrote:
| Litestream/LiteFS author here. I agree that "cloud" is a bit
| ambiguous but Go & SQLite are quite powerful together and I
| don't think it's only for beginners. Both are fast and have low
| overhead. In addition to lower cost versus RDS, there's near-
| zero query latency which eliminates a lot of performance
| problems. You can comfortably run tens or hundreds of requests
| per second on minimal hardware (e.g. 256MB or 512MB instances).
| There's a lot of room for scaling up before you hit a
| performance ceiling.
| tacitusarc wrote:
| Do you mean tens or hundreds of thousands of requests per
| second?
| benbjohnson wrote:
| There aren't many apps that have 100,000+ req/sec and you
| certainly can't run them on modest hardware.
|
| If your app averages 100 req/sec then that's 8.4M requests
| per day. That's more than most applications out there.
| ehutch79 wrote:
| Tens of requests a second?
| randomdata wrote:
| He's being generous. Indeed, your service will be lucky if
| it sees tens of requests _per day_.
| ehutch79 wrote:
| :-|
|
| Normally I'm on the side of 'do you actually get that
| much traffic?' but yeah, a dashboard view can generate
| dozens of requests alone, each with multiple queries.
| benbjohnson wrote:
| A request can have a large range of queries within it so I
| was trying to account for that. If your requests are
| lightweight and mostly reads, you can do 1,000+ req/sec on
| a 256MB instance. YMMV.
| msolberg wrote:
| I just watched your gophercon video on SQLite in production
| after seeing it in the article. Great talk. Anyone else who's
| interested can watch here:
| https://www.youtube.com/watch?v=XcAYkriuQ1o
| simonw wrote:
| At what point would you expect to need to scale out?
| randomdata wrote:
| You gain low latency, which allows you to write your code to a
| 'pure' relational model (at least as close as SQL allows),
| without having to rely on code complicating hacks to deal with
| round-trip issues. Less complicated code can improve delivery
| time and reduce faults. While there are abstractions that can
| be used to help with those hacks, they come with their own
| tradeoffs. Pick your poison, as always.
|
| Conceptually, RDS is little more than SQLite with a clever
| networking layer built on top. In context, your application is
| also just a clever networking layer, so the middleman doesn't
| necessarily add any value. Of course it depends on exactly what
| you are trying to do and what tradeoffs you are willing to
| accept. There is no free lunch.
| infamia wrote:
| Once you need to graduate from one large server (which will
| take you a long way in many cases), there are tools like
| [rqlite](https://github.com/rqlite/rqlite) that can handle
| clustering. With WAL mode enabled, SQLite can handle a
| surprising amount of traffic that would fit a lot of use cases.
| If latency is important to you, it's going to be hard to beat
| SQLite for many workloads.
| hackerbrother wrote:
| FYI-- it appears your Mastodon link needs "op" changed to "io"!
| markusw wrote:
| Ooops, thanks! Deploy on the way with a fix. :D
| leg100 wrote:
| How does one perform deployments with go+sqlite? With a client-
| server database such as postgres your app and database are on
| separate servers, and you can perform a blue-green, canary, etc,
| deployment of the app, spinning up new servers running the new
| version alongside the servers running the old version, before
| shutting down the servers running the old version.
|
| But with sqlite you'd have to perform a hot-upgrade, surely? i.e.
| shut down the old version and quickly fire up the new version,
| with a small window of downtime in between.
|
| Note: I see Litestream/LiteFS allows distributed deployment (both
| in beta).
| markusw wrote:
| Essentially yes, with a single SQLite instance, you have a
| small window of downtime on every deploy. But depending on your
| setup, that window could be very small, and unnoticeable if you
| have something in front that just delays incoming requests
| (like fly.io does with a load balancer in front of that single
| instance).
|
| And yes, LiteFS just selects a new leader and does a rolling
| deploy (or whatever else you want).
| ngrilly wrote:
| Another, more hacky option, would be to teach the binary
| executable how to download a new version, and start it to
| replace itself, while staying in the same container. Nginx
| does this for example. Of course, that's some additional
| complexity, but then it is possible to have zero downtime
| upgrades (by basically upgrading the executable in the
| container but not the container itself). It is also possible,
| and simpler to upgrade the container, if it's possible for
| different containers to share the same volume, but I don't
| think this is possible on Fly.io.
| markusw wrote:
| I believe systemd can do something similar, essentially
| replacing the binary and queueing network calls while doing
| so.
| ngrilly wrote:
| Yes, systemd does that. But as you wrote, it will queue
| network calls. The nice thing with the method I suggest
| above is that both Go processes, the old one and the new
| one, can be both serving requests in parallel, as they
| can share the same SQLite file, which makes it really
| zero-downtime.
| cube2222 wrote:
| It's also a really nice combo for simple automation lambdas that
| need some state and you want an ergonomic DB without paying for
| full RDS.
|
| Go + Lambda + EFS + SQLite work great for that.
| davidjfelix wrote:
| Do you limit concurrency to 1 with this setup?
| lormayna wrote:
| It's not relational, but MongoDB has a generous free tier. Or
| maybe you can use Aurora Serverless, it's in the free tier as
| well (but it's designed for different use cases).
| rtukpe wrote:
| This is nice, I've used Litestream for a personal project. I
| wonder how it compares to something like rqlite [1] with larger
| datasets
|
| [1] https://github.com/rqlite/rqlite
| otoolep wrote:
| rqlite author here, happy to answer any questions. The rqlite
| FAQ[1] might be useful to you.
|
| [1] https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md
| LVB wrote:
| This is a good article that neatly covers a lot of the tips I've
| seen spread across various posts, talks, etc. Thanks!
|
| I'm curious what experience you or others have dealing
| specifically with the write concurrency elements of this setup,
| should you find it is actually an issue. I've occasionally seen
| mention of restructuring things to queue writes within the app
| (e.g. having a single writer goroutine fed with a channel), but I
| be interested in more details about when folks hit the point of
| needing to do that, and what they did (and did it help?)
| mappu wrote:
| Go + SQLite user at $DAYJOB here. WAL is essential, and the
| next essential trick is to open a single db.Conn up-front and
| hide that behind a mutex for any transactions/queries that will
| write. All callers can pick this.db_rw or this.db_ro as
| required.
| uxdx wrote:
| Can I ask what $DAYJOB is? I'm interested to learn Go and
| SQLite professionally.
| markusw wrote:
| What's the advantage of the mutexed rw connection compared to
| the busy timeout and built-in locking?
| mappu wrote:
| The built-in one actually sleeps and polls to check if the
| write lock is free again. It's a disaster for high write
| throughput.
|
| https://sqlite.org/forum/info/00312b3d02bc0583
|
| Sqlite's internal one has to work this way to support their
| multi-process guarantees. In comparison, an in-process Go
| mutex will be readied immediately.
| LVB wrote:
| Very interesting. I'm now keen to write a very simple
| benchmark comparing the results of N goroutines leaning
| on the WAL lock vs in-app serialization.
| philosopher1234 wrote:
| would love to see the results of that, if you decide to
| do this
| markusw wrote:
| Ooooh, I didn't know that detail. Thank you for sharing!
| simonw wrote:
| If you turn on WAL mode SQLite will queue the writes for you,
| and since most writes complete in under 1ms you'll likely find
| that this isn't really worth worrying about at all.
|
| I did some trivial benchmarking around this recently:
| https://simonwillison.net/2022/Oct/23/datasette-gunicorn/
| dinosaurdynasty wrote:
| WAL mode does have some _slight_ decrease in durability by
| default. If you pull the power immediately after a commit the
| commit may be reverted when you come back up.
|
| But yeah most of the time it isn't an issue.
| markusw wrote:
| This is news to me. What's your source for that?
|
| AFAIK, SQLite is fully and completely ACID also in WAL
| mode.
| [deleted]
| dinosaurdynasty wrote:
| https://www.sqlite.org/pragma.html#pragma_synchronous
|
| When synchronous is NORMAL (1), the SQLite database
| engine will still sync at the most critical moments, but
| less often than in FULL mode. There is a very small
| (though non-zero) chance that a power failure at just the
| wrong time could corrupt the database in
| journal_mode=DELETE on an older filesystem. WAL mode is
| safe from corruption with synchronous=NORMAL, and
| probably DELETE mode is safe too on modern filesystems.
| WAL mode is always consistent with synchronous=NORMAL,
| but WAL mode does lose durability. A transaction
| committed in WAL mode with synchronous=NORMAL might roll
| back following a power loss or system crash. Transactions
| are durable across application crashes regardless of the
| synchronous setting or journal mode. The
| synchronous=NORMAL setting is a good choice for most
| applications running in WAL mode.
| markusw wrote:
| Interesting, thanks! I didn't know this was a
| configurable tradeoff. Might update the article with a
| little bonus info.
| markusw wrote:
| Thank you for the kind words! :)
|
| The built-in busy timeout basically takes care of write
| queueing, so a standard setup like this will take you very,
| very far.
___________________________________________________________________
(page generated 2022-12-07 23:02 UTC)