[HN Gopher] Litestream v0.5.0
___________________________________________________________________
Litestream v0.5.0
Author : emschwartz
Score : 344 points
Date : 2025-10-02 19:02 UTC (1 days ago)
(HTM) web link (fly.io)
(TXT) w3m dump (fly.io)
| dave78 wrote:
| Interesting tidbit regarding LiteFS/Litestream:
|
| > But the market has spoken! Users prefer Litestream. And
| honestly, we get it: Litestream is easier to run and to reason
| about. So we've shifted our focus back to it.
| simonw wrote:
| That makes sense to me. LiteFS used FUSE, which meant figuring
| out how to run and mount a custom filesystem. Litestream is a
| single compiled Go binary that you point at the SQLite database
| file (and accompanying WAL file).
| thewisenerd wrote:
| relevant: Litestream: Revamped (99 comments)
|
| https://fly.io/blog/litestream-revamped/
| https://news.ycombinator.com/item?id=44045292
| danw1979 wrote:
| Can't wait for Kyle to set Jepsen loose on this.
| tptacek wrote:
| Litestream is pretty boring for the testing Kyle does. It's not
| a distributed consensus system, but something more like a
| storage engine.
| Aurornis wrote:
| Jepsen is primarily for testing distributed systems and
| consensus systems. SQLite has neither of those.
| dastbe wrote:
| litestream makes very few consistency guarantees compared to
| other datastores, and so I would expect most any issues found
| would be "working as intended".
|
| at the end of the day with litestream, when you respond back to
| a client with a successful write you are only guaranteeing a
| replication factor of 1.
| simonw wrote:
| By "replication factor of 1" you mean your data is stored on
| local disk only, right? That matches my understanding:
| Litestream replication is asynchronous, so there's usually a
| gap of a seconds or two between your write being accepted and
| the resulting updated page being pushed off to S3 or similar.
| dastbe wrote:
| Yes. the acknowledgement you're getting in your application
| code is that the data was persisted in sqlite on that host.
| There's no mechanism to delay acknowledgement until the
| write has been asynchronously persisted elsewhere.
| simonw wrote:
| An interesting comparison here is Cloudflare's Durable
| Objects, which provide a blocking SQLite write that
| doesn't return until the data has been written to
| multiple network replicas:
| https://blog.cloudflare.com/sqlite-in-durable-
| objects/#trick...
| simonw wrote:
| I wonder if it would be possible to achieve this using a
| SQLite VFS extension - maybe that could block
| acknowledgment of a right until the underlying page has
| been written to S3?
| sighansen wrote:
| Really love litestream. Easy to use and never crashes on me. I
| still recommend using it as a systemd unit service. I'm not only
| using it as a backup tool but also to mirror databases. Looking
| forward to their read-replica feature.
| TemptedMuse wrote:
| Maybe I misunderstand what this is, but why would I use this and
| not MySQL, Postgres, or any other proper database? Seems like a
| hack to get SQLite to do what those do by design.
| victorbjorklund wrote:
| Why use Postgres if all you need is sqlite? Postgres is way
| overkill for a simple app with few users and no advanced
| database functionality.
| TemptedMuse wrote:
| I'd argue that anything larger than a desktop app should not
| use SQLite. If you need Litestream for replication and backup
| it is probably better to just use Postgres. There are a ton
| of one-click deployment offerings for proper databases,
| Fly.io actually offers managed Postgres.
| victorbjorklund wrote:
| Why would you argue that? Do you have some benchmarks
| backing it up or is it more a personal preference?
| crazygringo wrote:
| It's literally what they're designed for.
|
| SQLite is designed for one local client at a time.
| Client-server relational databases are designed for many
| clients at a time.
| simonw wrote:
| That's not entirely true. SQLite is designed to support
| many processes reading the same file on disk at once. It
| only allows one process to write at a time, using locks -
| but since most writes finish in less than a ms in most
| cases having a process wait until another process
| finishes their write isn't actually a problem.
|
| If you have lots of concurrent writes SQLite isn't the
| right solution. For concurrent reads it's fine.
|
| SQLite also isn't a network database out-of-the-box. If
| you want to be able to access it over the network you
| need to solve that separately.
|
| (Don't try and use NFS. https://sqlite.org/howtocorrupt.h
| tml#_filesystems_with_broke... )
| skeeter2020 wrote:
| the reality is very few workloads have access patterns
| that SQLite can't support. I would much rather start with
| a strategy like 1. use sqlite for my beta / single
| client, 2. duplicate the entire environment for the next
| n clients, 3. solve the "my application is wildly
| successful" and SQLite is no longer appropriate problem
| at a future date. Spoiler: you're never going to get to
| step #3.
| crazygringo wrote:
| > _2. duplicate the entire environment for the next n
| clients_
|
| That becomes an instant problem if users ever write to
| your database. You can't duplicate the environment unless
| it's read-only.
|
| And even if the database is read-only for users, the fact
| that every time _you_ update it you need to redeploy the
| database to every client, is pretty annoying.
|
| That's why it's usually better to start with Postgres or
| MySQL. A single source of truth for data makes everything
| vastly easier.
| victorbjorklund wrote:
| Not true. Can you back up your claim that the developers
| of Sqlite says they dont recommend it for webservers?
| (hint they recommend it).
|
| If you have a read-heavy app (99% of saas) that runs on
| one server and dont have millions of users then sqlite is
| a great option.
| crazygringo wrote:
| I didn't say that. I said one local client at a time. If
| you're running on one server then your webserver is the
| one local client.
|
| Usually you want to be able to run multiple webservers
| against a single database though, since that's the first
| thing you'll usually need to scale.
| vmg12 wrote:
| Let's say I'm building a small app that I'm hosting on some
| shared vps, if I think about the effort involved in setting
| up sqlite with litestream and just getting a $5 (or free)
| postgres provider I don't think sqlite makes my life easier.
|
| Now if I'm building a local app then absolutely sqlite makes
| the most sense but I don't see it otherwise.
| bccdee wrote:
| Litestream is dead simple to setup. You make an S3 bucket
| (or any compatible storage bucket), paste the access keys
| and the path to your db file in /etc/litestream, and then
| run dpkg -i litestream.deb systemctl
| enable litestream systemctl start litestream
|
| The fact it's so simple is my favourite thing about it.
| indigodaddy wrote:
| Are there any use cases/documentation about how
| litestream can be used within a docker based deployment?
| (Eg where systemctl wouldn't be used)
| bccdee wrote:
| You'd probably want to put the sqlite db in a volume &
| run litestream in a separate container that restarts
| automatically on failure.
|
| Systemctl's only in there to restart it if it crashes;
| litestream itself is (iirc) a single cli binary.
| airblade wrote:
| This is documented on the Litestream website.
| simonw wrote:
| Here's their docs on running in a Docker container:
| https://litestream.io/guides/docker/
| victorbjorklund wrote:
| Effort of setting up litestream and sqlite is less time
| than you spend signing up for supabase. And you can have
| 100 apps with their own databases for almost free (just a
| few cents of storage) vs 5*100 for postgres.
|
| I love postgres but in no way is it as simple to run as
| sqlite (pretty sure even postgres core team would agree
| that postgres is more complex than sqlite).
| yellow_lead wrote:
| Because Postgres is mature, works, and has a version number
| above v1.0?
| zwnow wrote:
| Version numbers dont mean anything as the whole Elixir
| ecosystem shows:D
| teeray wrote:
| If v1.0 is your North Star, you should re-evaluate a whole
| lot of software in your stack: https://0ver.org/#notable-
| zerover-projects
| 8organicbits wrote:
| I think you're focusing on the wrong parts of the
| comment.
|
| People care about things like long-term support. Postgres
| 13, from 2020, is still officially supported. Litestream
| 0.1.0 was the first release, also from 2020, but I can't
| tell if it is supported still. Worrying about the
| maturity, stability, and support of an application
| database is very reasonable in risk adverse projects.
| victorbjorklund wrote:
| Litestream is just a backup solution. Should probably be
| compared to a backup solution for postgres that does
| automated backups over the network etc. That isnt part of
| postgres.
|
| Besides the question wasnt litestream vs postgres backup
| apps. It was sqlite vs postgres.
| threatofrain wrote:
| The original response at least concerned litestream
| because the not-1.0 comment only applies to that.
| victorbjorklund wrote:
| I'm guessing this is a joke?
| johnfn wrote:
| Why would saying that Postgres is a mature database -
| more mature than SQLite - be a joke?
| jchanimal wrote:
| SQLite is probably more mature than Postgres in terms of
| service hours.
| hchdifnfbgbf wrote:
| I understood the comparison to be with Litestream, not
| SQLite in general.
| victorbjorklund wrote:
| Why would we compare Litestream (a backup solution) with
| Postgres (a database)? Would be like comparing Linux with
| NGINX.
| immibis wrote:
| Postgres has a similar replication solution built-in.
| It's like comparing Samba with Windows Server if your
| purpose is to run a SMB file server.
| victorbjorklund wrote:
| Both are mature. There are way more sqlite databases
| running than postgres. The code base is smaller and has
| less new features added to it every year. What is
| unstable with sqlite?
| skeeter2020 wrote:
| this is essentially the "no one ever got fired for buying
| IBM" statement. One counter is why buy & manage a rack-
| mounted server when all you need is a raspberry Pi?
| immibis wrote:
| Postgres can also run on the Pi.
|
| More than once I've started a project with sqlite and
| then had to migrate to postgres. In my experience it's
| because of the way sqlite has to lock the whole database
| file while writing to it, blocking concurrent reads -
| this isn't a problem in postgres. (There's WAL mode, but
| it still serialises all writes, and requires periodic
| explicit checkpointing IME)
|
| You may also find you really want a feature postgres has,
| for example more advanced data types (arrays, JSON), more
| advanced indices (GIN inverted index on array members),
| replication...
| danenania wrote:
| For a cloud service, I think it comes down to whether you'll
| ever want more than one app server.
|
| If you're building something as a hobby project and you know
| it will always fit on one server, sqlite is perfect.
|
| If it's meant to be a startup and grow quickly, you don't
| want to have to change your database to horizontally scale.
|
| Deploying without downtime is also much easier with multiple
| servers. So again, it depends whether you're doing something
| serious enough that you can't tolerate dropping any requests
| during deploys.
| victorbjorklund wrote:
| 99,99% of apps dont need more than one app server. You can
| serve a lot of traffic on the larges instances.
|
| For sure downtime is easier with kubernete etc but again
| overkill for 99,99% of apps.
| danenania wrote:
| Right, but if your goal is to have a lot of users (and
| minimal downtime), there's no point in putting a big
| avoidable obstacle in your path when the alternative is
| just as easy.
| victorbjorklund wrote:
| If your goal is to serve billions of users you should
| probably use cassandra etc. Why limit yourself to
| postgres if your goal is to have a billion users online
| at the same time?
| danenania wrote:
| Because cassandra isn't easy to set up and has all kinds
| of tradeoffs on consistency, transactions, et al compared
| to an SQL db.
|
| On the other side, why not just store everything in
| memory and flush to a local json file if you won't have
| any users? sqlite is overkill!
| skeeter2020 wrote:
| your goal should be to solve this problem when you have a
| million or 10M concurrent users. YAGNI
| Too wrote:
| Serving users is one thing. Then you want to run some
| interactive analytics or cronjobs for cleanup etc on the
| db. Even if the load can manage it, how would the admin
| jobs connect to the database. I've never seen a db with
| only one client. There is always some auxiliary thing,
| even when you don't consider yourself a microservice
| shop.
| victorbjorklund wrote:
| For analytics the solution is very simple. Just copy the
| database and run the queries on the copy.
| skeeter2020 wrote:
| agree - with SQLite and DuckDB I've really switched my
| mindset from one perfect, pristine DB to multiple copies
| and a medallion architecture that looks more like
| participation ribbons for everyone! The resources
| required are so cheap & lightweight when you avoid the
| big "BI focused" tech stacks.
| skeeter2020 wrote:
| you can also scale out across unlimited tiny servers,
| because the entire stack is so lightweight and cheap.
| This will also force you to focus on devops, which
| otherwise can become a grind with this approach. The only
| challenge is when you have cross-DB concerns, either data
| or clients.
| tptacek wrote:
| This is the idea behind LiteFS --- to transparently scale
| out SQLite (in some very common configurations and
| workloads) to support multiple app servers. It's still
| there and it works! It's just a little ahead of its time.
| :)
| danenania wrote:
| That makes sense, and it seems really cool from a tech
| perspective. I guess I'm just inherently skeptical about
| using something shiny and new vs. battle hardened
| databases that were designed from the beginning to be
| client-server.
|
| It's definitely really nice though that if you _do_
| choose sqlite initially to keep things as small and
| simple as possible, you don 't immediately need to switch
| databases if you want to scale.
| tptacek wrote:
| I think that's very fair. But the use case for Litestream
| is much simpler and you can get your head around it
| immediately. It also doesn't ask you to do anything that
| would commit you to SQLite rather than switching to
| Postgres later. It's just a way of _very easily_ getting
| a prod caliber backend up for an app without needing a
| database server.
| kblissett wrote:
| One of the big advantages people enjoy is the elimination of
| the network latency between the application server and the DB.
| With SQLite your DB is right there often directly attached over
| NVME. This improves all access latencies and even enables
| patterns like N+1 queries which would typically be considered
| anti-patterns in other DBs.
| turnsout wrote:
| Real talk, how do you actually avoid N+1? I realize you can
| do complicated JOINs, but isn't that almost as bad from a
| performance perspective? What are you really supposed to do
| if you need to, e.g. fetch a list of posts along with the
| number of comments on each post?
| vhcr wrote:
| No, JOINs are pretty much always faster than performing N+1
| queries.
| erpellan wrote:
| You do indeed use JOINS. The goal is to retrieve exactly
| the data you require in a single query. Then you get the DB
| to `EXPLAIN VERBOSE` or similar and ensure that full table
| scans aren't happening and that you have indexed the
| columns the query is being filtered on.
| simjnd wrote:
| AFAIK the problem of N+1 isn't necessarily one more DB
| query, but one more network roundtrip. So if for each page
| of your app you have an API endpoint that provides exactly
| all of the data required for that page, it doesn't matter
| how many DB queries your API server makes to fulfill that
| request (provided that the API server and the DB are on the
| same machine).
|
| This is essentially what GraphQL does instead of crafting
| each of these super tailored API endpoints for each of your
| screens, you use their query language to ask for the data
| you want, it queries the DB for you and get you the data
| back in a single network roundtrip from the user
| perspective.
|
| (Not an expert, so I trust comments to correct what I got
| wrong)
| jerrygenser wrote:
| You still have to write the resolver for graphql. I've
| seen. N+1 with graphql if you don't actually use data
| loader+batch pattern OR if you use it incorrectly.
| jamie_ca wrote:
| Either joins for a fat query, or aggregate the subqueries.
|
| For the latter, it's along the lines of `select * from
| posts where ...` and `select * from authors where id in
| {posts.map(author_id)}`. And then once it's in memory you
| manually work out the associations (or rely on your ORM to
| do it).
| t0mas88 wrote:
| A JOIN is fast, fetching the whole list in one extra query
| with "WHERE id IN (...)" is also pretty fast and results in
| less complex queries if you have serval of these. Doing all
| queries separate is slow because of the network round-trip
| for each query.
| simonw wrote:
| Often you can use joins to get the data in a single complex
| SQL query. Number of comments for a post is relatively
| straight-forward, but you can also do increasingly complex
| associated data fetches with modern databases.
|
| In particular, JSON aggregations mean you can have a single
| query that does things like fetch a blog entry and the
| earliest 10 comments in a single go. I wrote up some
| patterns for doing that in SQLite and PostgreSQL here:
| https://github.com/simonw/til/blob/main/sqlite/related-
| rows-...
|
| Here's an example PostgreSQL query that does this to fetch
| tags for posts: https://simonwillison.net/dashboard/json-
| agg-example select blog_entry.id,
| title, slug, created,
| coalesce(json_agg(json_build_object(blog_tag.id,
| blog_tag.tag)) filter ( where
| blog_tag.tag is not null ), json_build_array()) as
| tags from blog_entry left join
| blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
| left join blog_tag on blog_entry_tags.tag_id = blog_tag.id
| group by blog_entry.id order by
| blog_entry.id desc
|
| The alternative, more common path is the pattern that
| Django calls "prefetch_related". Effectively looks like
| this: select id, title, created from posts
| order by created desc limit 20 -- Now extract
| the id values from that and run: select
| blog_entry.id, blog_tag.tag from
| blog_entry join blog_entry_tags on blog_entry.id =
| blog_entry_tags.entry_id join blog_tag on
| blog_entry_tags.tag_id = blog_tag.id where
| blog_entry.id in (?, ?, ?, ...) -- Now you can
| re-assemble the list of tags for -- each entry in
| your application logic
|
| Once you have a list of e.g. 20 IDs you can run a bunch of
| cheap additional queries to fetch extra data about all 20
| of those items.
| crazygringo wrote:
| > _I realize you can do complicated JOINs, but isn 't that
| almost as bad from a performance perspective?_
|
| No, JOINs should be orders of magnitude faster.
|
| > _What are you really supposed to do if you need to, e.g.
| fetch a list of posts along with the number of comments on
| each post?_
|
| You're really supposed to do a JOIN, together with a GROUP
| BY and a COUNT(). This is elementary SQL.
| joevandyk wrote:
| it gets more complicated when you need to also display
| something like "last comment: <author> <3 days ago>" for
| each post, or if the comment counts need to be filtered
| by various flags/states/etc.
|
| of course, it's all possible with custom SQL but it gets
| complicated quick.
| crazygringo wrote:
| That's like saying it gets more complicated when you have
| to use loops with break statements in programming. It's
| just what programming is.
|
| The filtering you describe is trivial with COUNT(flag IN
| (...) AND state=...) etc.
|
| If you want to retrieve data on the last comment, as
| opposed to an aggregate function of all comments, you can
| do that with window functions (or with JOIN LATERAL for
| something idiomatic specifically to Postgres).
|
| Learning how to do JOIN's in SQL is like learning
| pointers in C -- in that it's a basic building block of
| the language. Learning window functions is like learning
| loops. These are just programming basics.
| joevandyk wrote:
| Yes, all that's possible. But it's not straightforward in
| the majority of ORMs.
| crazygringo wrote:
| Right, and ORMs are the main cause of the N+1 problem,
| and atrocious database performance in general.
|
| For anything that isn't just a basic row lookup from a
| single table, you should really just be writing the SQL
| yourself.
| ComputerGuru wrote:
| A well-written JOIN against a well-designed database
| (regardless if we're talking postgres, SQLite,
| MySQL/MariaDB, or MS SQL) should not be slow. If it's slow,
| you're using it wrong.
| bccdee wrote:
| The actual thing that we're getting N+1 of is network
| round-trips. An additional network round-trip is way, way
| slower than an extra JOIN clause. That's why N+1 query
| patterns aren't a problem when you're using a local
| database: There's no round-trip.
| toast0 wrote:
| A proper join is the right answer. But, it's not always
| possible to make those run well. [1] A "client side join"
| in the right situation can be much better, but then you
| probably want to do a 1+1 rathet than N+1. Do the first
| query to get the ids for the second query, and then
| construct the second query with IN or UNION depending on
| what works best for you database. UNION likely bloats your
| query string, but I've seen plenty of situations where
| UNION is gobs faster than IN.
|
| Alternately, if you can separate query issuance from result
| parsing, you can make N+1 palletable. Ex, do your query to
| get the ids, wait for the results, loop and issue the N
| queries, then loop and wait for results in order. That will
| be two-ish round trips rather than N+1 round trips. But you
| have to search to find database apis that allow that kind
| of separation.
|
| [1] You can almost always express the query you want in
| SQL, but that doesn't mean it will have a reasonable
| runtime. Sometimes server side join and client side join
| have about the same runtime... if it's significant and you
| have the usual case where clients are easier to scale than
| database servers, it _might_ be worthwhile to have the join
| run on the client to reduce server load.
| immibis wrote:
| The performance problem in N+1 is (mostly) not in fetching
| the N rows from disk, but rather from multiplying the
| network latency by a factor of N. Joins solve this; so do
| stored procedures.
|
| In general, you want to ask the remote server _once_ for
| all the data you need, then read all the results. It
| applies to databases as well as APIs.
|
| Pipelined requests also solve the problem and can be more
| flexible.
|
| Also, joins can be optimised in different ways. Sometimes
| the optimal way to do a join isn't to query each row one-
| by-one, but to do something like (when the rows you want
| are a large fraction of the rows that exist) making a bloom
| filter of the rows you want and then sequentially reading
| all the rows in the table.
| asalahli wrote:
| Avoiding N+1 doesn't have to mean limiting yourself to 1
| query. You can still fetch the posts in one query and the
| comments of _all_ posts in a separate query, just don't
| issue a query for _each_ post.
|
| More formally, the number of queries should be constant and
| not linearly scaling with the number of rows you're
| processing.
| pbowyer wrote:
| > One of the big advantages people enjoy is the elimination
| of the network latency between the application server and the
| DB. With SQLite your DB is right there often directly
| attached over NVME.
|
| You can install MySQL/PostgreSQL on the application server,
| connect over a unix socket and get the same benefits as if
| you'd used SQLite on the application server (no network
| latency, fast queries). Plus the other benefits that come
| from using these database servers (Postgres extensions,
| remote connections, standard tooling etc). I'm guessing more
| RAM is required on the application server than if you used
| SQLite but I haven't benchmarked it.
| bob1029 wrote:
| Unix sockets don't actually give you the same benefit.
| You're still doing IPC which can incur substantial memory
| subsystem utilization. SQLite is on the same thread/core as
| whatever is using it.
| manishsharan wrote:
| I have a branch office in boondocks with limited internet
| connection. The branch office cannot manage a RDBMS or access
| cloud services. They can use sqlite app on LAN and we could do
| reconciliation at end of the business day.
| skeeter2020 wrote:
| they can also run the entire application in these scenarios
| on the resources of a 10-yr-old phone.
| andrewmutz wrote:
| I'm not sure, I've never done it, but I think the idea is to
| have many tiny customer-specific databases and move them to be
| powered by sqlite very close to the customer.
|
| But I'd love to hear more from someone more well-versed in the
| use cases for reliable sql-lite
| sauercrowd wrote:
| It's a good question, and I don't think answered sufficiently
| in the recent sqlite hype.
|
| In my opinion if you have an easy way to run postgres,MySQL,...
| - just run that.
|
| There's usually a lot of quirks in the details of DB usage
| (even when it doesn't immediately seem like it - got bitten by
| it a few times). Features not supported, different semantics,
| ...
|
| IMO every project has an "experimental stuff" budget and if you
| go over it it's too broken to recover, and for most projects
| there's just not that much to win by spending them on a new
| database thing
| skeeter2020 wrote:
| >> the recent sqlite hype.
|
| This is an interesting take; why do you see recent hype
| around the most boring and stone-age of technologies, SQLite?
| sauercrowd wrote:
| The rails creator dhh has been hyping it up a lot in the
| first 6 month of this year, and quite a few followed of the
| "Dev influencers" scene. Fly's litestream came out around
| that time, and there's been more sqlite in the cloud
| companies/discussions, in particular with the AI agent use-
| case.
|
| Not super sure who followed who but there was all of a
| sudden a lot of excitement
| simonw wrote:
| Litestream's first release was February 2021:
| https://news.ycombinator.com/item?id=26103776
|
| SQLite's "buzz" isn't new, type "sqlite" into my
| https://tools.simonwillison.net/hacker-news-histogram
| tool and you'll see interest (on HN at least) has been
| pretty stable since 2021.
| nirvdrum wrote:
| Maybe it's a local bump, but it sure seems like SQLite
| has become a fair more popular topic in the Rails world.
| I wouldn't expect to find it in a HN search tool. SQLite
| has gone from the little database you might use to
| boostrap or simplify local development to something
| products are shipping with in production. Functionality
| like solid_cable, solid_cache, and solid_queue allow
| SQLite to be used in more areas of Rails applications and
| is pitched as a way to simplify the stack.
|
| While I don't have stats about every conference talk for
| the last decade, my experience has been that SQLite has
| been featured more in Rails conference talks. There's a
| new book titled "SQLite on Rails: The Workbook" that I
| don't think would have had an audience five years ago.
| And I've noticed more blog posts and more discussion in
| Rails-related discussion platforms. Moreover, I expect
| we'll see SQLite gain even more in popularity as it
| simplifies multi-agent development with multiple git
| worktrees.
| sauercrowd wrote:
| My bad, I remember vaguely that fly io had a "litefs
| released" post but I seem to have confused timelines
| simonw wrote:
| Yeah LiteFS was more recent - September 2022,
| https://fly.io/blog/introducing-litefs/ - and the Cloud
| hosted version was July 2023 https://fly.io/blog/litefs-
| cloud/
| WorldMaker wrote:
| The common answer (especially from Fly.io) is "at-the-edge"
| computing/querying. There is network latency involved in
| sending a query to MySQL or Postgres and getting the data
| returned, whereas with Litestream you could put a read replica
| of the entire SQLite DB at every edge. Queries become fast and
| efficient only to the local read replica. There's still network
| latency associated with _updating_ that read replica over time,
| but it is amortized based on the number of overall writes
| rather than the number of queries, is more fault tolerant in
| "eventually consistent" workflows (you can answer queries from
| the read replica at the edge in the state that you have it
| while you wait for the network to reconnect and replay the
| writes you missed during the fault), and with SQLite backing it
| still has much of the same full relational DB query power of
| SQL you would expect from a larger (or "proper") database like
| MySQL or Postgres.
| SchwKatze wrote:
| Law Theorem[1] fits perfectly for this scenario
|
| 1- https://law-theorem.com/
| tptacek wrote:
| It's significantly faster and incurs less ops overhead. That's
| it.
|
| But most apps should just use a classic n-tier database
| architecture like Postgres. We mostly do too (though Litestream
| does back some stuff here like our token system).
| bob1029 wrote:
| I find myself mostly in this camp now.
|
| In every case where I had a SQLite vertical that required
| resilience, the customer simply configured the block storage
| device for periodic snapshots. Litestream is approximately the
| same idea, except you get block device snapshots implicitly as
| part of being in the cloud. There is no extra machinery to
| worry about and you won't forget about a path/file/etc.
|
| Also, streaming replication to S3 is not that valuable an idea
| to me when we consider the recovery story. All other solutions
| support hot & ready replicas within seconds.
| canadiantim wrote:
| To enable local-first or offline-first design. I prefer having
| data stored on-device and only optionally backed up to cloud
| bccdee wrote:
| Whatever database you end up using, you'll need some sort of
| backup solution. Litestream is a streamed backup solution which
| effectively doubles as replication for durability purposes.
|
| MySQL, Postgres, etc. have a much greater overhead for setup,
| unless you want to pay for a managed database, which is not
| going to be worth the price for small quantities of data.
| arbll wrote:
| To avoid operating a database by yourself and dealing with
| incidents, backups, replicas, failovers, etc... You can use
| cheap commoditised S3-like storage and run your application
| statelessly.
|
| If you have access to a database that is well managed on your
| behalf I would definitely still go with that for many usecases.
| Quarrelsome wrote:
| this is infra for a single-user app. SQLite is THE replacement
| for file databases like MSAccess, but the box goes down and
| your database dies with all your data.
|
| So this fills that gap by giving you a database as a service
| level of QOL without needing to provision a database as a
| service backend. Otherwise you're dicking about maintaining a
| service with all that comes with that (provisioning, updating,
| etc) when really all you need is a file that is automagically
| backed up or placed somewhere on the web to avoid the drawbacks
| of the local file system.
| trallnag wrote:
| But aren't many single-user apps still multi-platform? For
| example as an Android application but also as a web app the
| user might access from his desktop device?
| lordofmoria wrote:
| A small warning for folks.
|
| I once was responsible for migrating a legacy business app to
| Azure, and the app had a local MSSQL server co-running with the
| app (the same pattern that Litestream is using).
|
| As have been mentioned below, the app had been developed assuming
| the local access (and thus <1ms latency), so it had a ton of N+1
| everywhere.
|
| This made it almost impossible to migrate/transition to another
| configuration.
|
| So, if this style of app hosting doesn't take off and you're at
| all worried about this being a dead end storage once you reach a
| certain scale, I'd recommend not doing this, otherwise your
| options will be very limited.
|
| Then again - I bet you could get very very far on a single box,
| so maybe it'd be a non factor! :)
| inerte wrote:
| I used to work on a product where the app server and database
| were in the same rack - so similar low latency. But the product
| was successful, so our N+1 would generate thousands of queries
| and 1ms would become >500ms or more easily. Every other month
| we would look at New Relic and find some slow spot.
|
| It was a Rails app, therefore easy to get into the N+1 but also
| somewhat easy to fix.
| 1-more wrote:
| For our rails app we actually added tests asserting no N+1s
| in our controller tests. Think a test setup with 1 post vs 10
| posts (via factorybot) and you could do an assertion that the
| DB query count was not different between the two. A useful
| technique for any Railsheads reading this!
| simonw wrote:
| That's a good trick. In Django world I like pytest-django's
| django_assert_max_num_queries fixture: https://pytest-
| django.readthedocs.io/en/latest/helpers.html#...
| def test_homepage_queries(django_assert_max_num_queries,
| client): with django_assert_max_num_queries(10):
| assert client.get("/").status_code == 200
|
| Or django_assert_num_queries to assert an exact number.
| jasonwatkinspdx wrote:
| Way back in the prehistoric era of Rails I just wrote a
| like 5 line monkey punch to ActiveRecord that would kill
| mongrel if queries per request went above a limit.
|
| Probably some of the most valuable code I've ever written
| on a per LOC basis lol.
|
| But anyhow, merging that into a new project was always a
| fun day. But on the other side of the cleanup the app stops
| falling down due to memory leaks.
| metadat wrote:
| What is N+1?
| simonw wrote:
| The thing where your app displays 20 stories in the homepage,
| but for each story it runs an extra query to fetch the
| author, and another to fetch the tags.
|
| It's usually a big problem for database performance because
| each query carries additional overhead for the network round
| trip to the database server.
|
| SQLite queries are effectively a C function call accessing
| data on local disk so this is much less of an issue - there's
| an article about that in the SQLite docs here:
| https://www.sqlite.org/np1queryprob.html
| Scubabear68 wrote:
| Object Relational Mapping (ORM) tools, which focus on mapping
| between code based objects and SQL tables, often suffer from
| what is called the N+1 problem.
|
| A naive ORM setup will often end up doing a 1 query to get a
| list of object it needs, and then perform N queries, one per
| object, usually fetching each object individually by ID or
| key.
|
| So for example, if you wanted to see "all TVs by Samsung" on
| a consumer site, it would do 1 query to figure out the set of
| items that match, and then if say 200 items matched, it would
| do 200 queries to get those individual items.
|
| ORMs are better at avoiding it these days, depending on the
| ORM or language, but it still can happen.
| OJFord wrote:
| I dislike ORMs as much as the next ORM disliker, but people
| who are more comfortable in whatever the GP programming
| language is than SQL will write N+1 queries with or without
| an ORM.
| whizzter wrote:
| Yep, people who think OOP is all you need will just
| "abstract away the database".
| iamflimflam1 wrote:
| Very true. But ORMs did make it particularly easy to
| trigger N+1 selects.
|
| It used to be a very common pitfall - and often not at
| all obvious. You'd grab a collection of objects from the
| ORM, process them in a loop, and everything looked fine
| because the objects were already rehydrated in memory.
|
| Then later, someone would access a property on a child
| object inside that loop. What looked like a simple
| property access would silently trigger a database query.
| The kicker was that this could be far removed from any
| obvious database access, so the person causing the issue
| often had no idea they were generating dozens (or
| hundreds) of extra queries.
| Spivak wrote:
| This problem is associated with ORMs but the moment
| there's a get_user(id) function which does a select and
| you need to display a list of users someone will run it
| in a loop to generate the list and it will look like it's
| working until the user list gets long.
|
| I really wish there was a way to compose SQL so you can
| actually write the dumb/obvious thing and it will run a
| single query. I talked with a dev once who seemed to have
| the beginnings of a system that could do this. It
| leveraged async and put composable queryish objects into
| a queue and kept track of what what callers needed what
| results, merged and executed the single query, and then
| returned the results. Obviously far from generalizable
| for arbitrary queries but it did seem to work.
| riffraff wrote:
| I think many ORMs can solve (some of) this these days.
|
| e.g. for ActiveRecord there's ar_lazy_preloader[0] or
| goldiloader[1] which fix many N+1s by keeping track of a
| context: you load a set of User in one go, and when you
| do user.posts it will do a single query for all, and when
| you then access post.likes it will load all likes for
| those and so on. Or, if you get the records some other
| way, you add them to a shared context and then it works.
|
| Doesn't solve everything, but helps quite a bit.
|
| [0] https://github.com/DmitryTsepelev/ar_lazy_preload
|
| [1] https://github.com/salsify/goldiloader
| skeeter2020 wrote:
| I defense of the application developer, it is very
| difficult to adopt set theory thinking which helps with
| SQL when you've never had any real education in this
| area, and it's tough to switch between it and the loop-
| oriented processing you're likely using in your
| application code for almost everyone. ORMs bridge this
| divide which is why they fall in the trap consistently.
| Often it's an acceptable trade-off for the value you get
| from the abstraction, but then you pay the price when you
| need to address the leak!
| cbm-vic-20 wrote:
| There's a common access pattern with object-relational
| mapping frameworks where an initial query will be used to get
| a list of ids, then an individual queries are emitted for
| each item to get the details of the items. For example, if
| you have a database table full of stories, and you want to
| see only the stories written by a certain author, it is
| common for a framework to have a function like
| stories = get_stories(query)
|
| which results in a SQL query like SELECT id
| FROM stories WHERE author = ?
|
| with the '?' being bound to some concrete value like "Jim".
|
| Then, the framework will be used to do something like this
| for id in stories { story = get_story_by_id(id)
| // do something with story }
|
| which results in N SQL queries with SELECT
| title, author, date, content FROM stories WHERE id = ?
|
| and there's your N+1
| metadat wrote:
| Oh yeah, the ORM thing (common side-effect with DB query
| abstractions) - I must not have been fully awake. Cheers
| and thank you for humoring me, @cbm-vic-20!
| tehlike wrote:
| With orms, it can be easy, but also often fixed with
| eager fetching too.
| ComputerGuru wrote:
| This plagues (plagued?) pretty much everything to do with
| WordPress, from core to every theme and plugin developed.
| genpfault wrote:
| https://stackoverflow.com/questions/97197
| upmostly wrote:
| The N+1 problem basically means instead of making one
| efficient query, you end up making N separate queries inside
| a loop. For example, fetching a list of tables, then for each
| table fetching its columns individually -- that's N+1
| queries. It works, but it's slow.
|
| We ran into this while building, funnily enough, a database
| management app called DB Pro (https://dbpro.app) At first we
| were doing exactly that: query for all schemas, then for each
| schema query its tables, and then for each table query its
| columns. On a database with hundreds of tables it took ~3.8s.
|
| We fixed it by flipping the approach: query all the schemas,
| then all the tables, then all the columns in one go, and join
| them in memory. That dropped the load time to ~180ms.
|
| N+1 is one of those things you only really "get" when you hit
| it in practice.
| Quarrelsome wrote:
| I mean, that's not much of a trade off given that it seems that
| what you're saying is that using such a service might just show
| you how shit your code actually is.
|
| Its not its fault. :)
| rco8786 wrote:
| Bad query practices are always going to bite you eventually. I
| would not call that a shortcoming of this approach
| laurencerowe wrote:
| It's not a bad query practice in SQLite!
| https://www.sqlite.org/np1queryprob.html
| masterj wrote:
| > I bet you could get very very far on a single box,
|
| With single instances topping out at 20+ TBs of RAM and
| hundreds of cores, I think this is likely very under-explored
| as an option
|
| Even more if you combine this with cell-based architecture,
| splitting on users / tenants instead of splitting the service
| itself.
| immibis wrote:
| Single instance is underappreciated in general. There's a
| used server reseller near me, and sometimes I check their
| online catalogue out of curiosity. For only $1000ish I could
| have some few generations old box with dual socket 32-core
| chips and 1TB of RAM. I don't have any purpose for which I'd
| need that, but it's surprisingly cheap if I did. And things
| can scale up from there. AWS will charge you the same _per
| month_ that it costs to get one of your own _forever_ - not
| counting electricity or hard drives.
| switz wrote:
| I run my entire business on a single OVH box that costs
| roughly $45/month. It has plenty of headroom for growth.
| The hardest part is getting comfortable with k8s (still
| worth it for a single node!) but I've never had more uptime
| and resiliency than I do now. I was spending upwards of
| $800/mo on AWS a few years ago with way less stability and
| speed. I could set up two nodes for availability, but it
| wouldn't really gain me much. Downtime in my industry is
| expected, and my downtime is rarely related to my web
| services (externalities). In a worst case scenario, I could
| have the whole platform back up in under 6 hours on a new
| box. Maybe even faster.
| ahoog42 wrote:
| any notes or pointers on how to get comfortable with k8?
| For a simple nodejs app I was looking down the pm2 route
| but I wonder of learning k8 is just more future proof.
| carlhjerpe wrote:
| Use K3s in cluster mode, start doing. Cluster mode uses
| etcd instead of kine, kine is not good.
|
| Configure the init flags to disable all controllers and
| other doodads, deploy them yourself with Helm. Helm sucks
| to work with but someone has already gone through the
| pain for you.
|
| AI is GREAT at K8s since K8s has GREAT docs which has
| been trained on.
|
| A good mental model is good: It's an API with a bunch of
| control loops
| tehlike wrote:
| I'd say rent a hetzner vps and use hetzner-k3s
| https://github.com/vitobotta/hetzner-k3s
|
| Then you are off to races. you can add more nodes etc
| later to give it a try.
| likium wrote:
| What's the benefit of using K3 on a single node?
| tehlike wrote:
| Deployments are easy. You define a bunch of yamls for
| what things are running, who mounts what, and what
| secrets they have access to etc.
|
| If you need to deploy it elsewhere, you just install
| k3s/k8s or whatever and apply the yamls (except for
| stateful things like db).
|
| IT also handles name resolution with service names,
| restarts etc.
|
| IT's amazing.
| sciencesama wrote:
| Is this vanilla k8 or any flavor?
| Fnoord wrote:
| I guess you got cheap power. Me too, but not 24/7 and not a
| whole lot (solar). So old enterprise hardware is a no-go
| for me. I do like ECC, but DDR5 is a step in the right
| direction.
| mtlynch wrote:
| Very excited to see Fly restart development on Litestream after a
| 2ish year freeze!
|
| I love Litestream and use it in every app I build now.
|
| They advertise it as costing "pennies per day," but it's even
| less expensive than that. It obviously varies depending on how
| much storage you need, but I had a real app in production, and
| Litestream replication to S3 only cost me 2-3 cents ($0.02-$0.03)
| per month.[0]
|
| [0] https://mtlynch.io/litestream/#using-logpaste-in-production
| atombender wrote:
| What I'd like to see is a system where a single-writer SQLite
| database is replicated to object storage so that you can spin up
| really cheap read replicas. Is anyone working on something like
| that?
|
| Such a system would also require a side channel propagating WAL
| updates (over Kafka or similar) to replicas, so that the running
| replicas can update themselves incrementally and stay fresh
| without loading anything from object storage.
| anentropic wrote:
| Litestream is basically that, though the OP article has "The
| next major feature we're building out is a Litestream VFS for
| read replicas" as a What's Next todo at the bottom.
|
| https://litestream.io/guides/s3/
|
| I think this is also roughly what Turso is, although it's
| becoming a SQLite-compatible db rather than vanilla
|
| https://docs.turso.tech/features/embedded-replicas/introduct...
|
| https://docs.turso.tech/cloud/durability
| atombender wrote:
| I don't Litestream does that yet? It appears to be for
| backing up to S3, and you manually "restore" the image to a
| file. You can't point an SQLite client _at_ the S3 bucket,
| and there 's no provision for getting low latency updates.
| But it sounds like they're working on this.
|
| Turso looks cool and is a project I will keep an eye on, but
| it's replica mode seems to be designed for apps rather than
| mechanism to rapidly scale out read replicas in a server
| cluster. Also, the web site warns that it's not ready for
| production use.
| simonw wrote:
| LiteFS can do that, but you need to run a custom FUSE
| filesystem for it - hence why Litestream remained more popular.
| https://fly.io/docs/litefs/how-it-works/#capturing-sqlite-tr...
|
| Litestream is working on that now - the code is already in
| https://github.com/benbjohnson/litestream/blob/v0.5.0/vfs.go
| but it's not yet a working, documented feature.
| ncruces wrote:
| I mean, that's literally their "What's next?" from the OP:
| https://fly.io/blog/litestream-v050-is-here/#whats-next
|
| They already have a prototype, and... it's pretty rough on the
| edges.
|
| I'm porting it to my Go SQLite driver and already ran into a
| bunch of issues. But it seems at least _feasible_ to get it
| into a working shape.
|
| https://github.com/benbjohnson/litestream/issues/772
|
| https://github.com/ncruces/go-sqlite3/compare/main...litestr...
| koeng wrote:
| Interesting! I really like your wasm compiled SQLite more
| than the pure Go one, so it is what I use most of the time.
|
| What exactly are you trying to port?
| ncruces wrote:
| A VFS that allows you to directly open a Litestream
| replication target (e.g. S3) as a read-only database and
| run queries against it without ever having to download the
| entire database (e.g. to an ephemeral instance that doesn't
| even have the disk space for it).
| koeng wrote:
| Interesting. Any plans for a cache? Or would that just be
| a litestream thing
| ncruces wrote:
| I... honestly don't know. I saw the announcement some
| time ago (the revamped post) and started following their
| repo. They did mention caching to hide latency.
|
| When I saw the v0.5.0 tag, I dived into just porting it.
| It's just over a couple hundred lines, and I have more
| experience with SQLite VFSes than most, so why not.
|
| But it's still pretty shaky.
| Nelkins wrote:
| I'm curious about the same, but also am wondering if there can
| be an automatic election of a new primary through the use of
| conditional writes (or as Fly.io say, CASAAS: Compare-and-Swap
| as a Service).
| mmaunder wrote:
| Is this a transcript?
| svat wrote:
| I don't even care about the topic (much) (right now), but this is
| a really well written blog post / announcement.
| tptacek wrote:
| Thanks, what'd you like about it?
| LVB wrote:
| For me, the introspection and meeting (us) where we're at:
|
| >But the market has spoken! Users prefer Litestream.
|
| >Yeah, we didn't like those movies much either.
| JaggerFoo wrote:
| Interesting information that they chose modernc.org/sqlite over
| mattn/go-sqlite3 as a Quality-of-Life improvement. Going forward
| I guess I'll do the same for new projects.
|
| The NATS Jetstream use case is something I'm curious about.
|
| Cheers and keep up the great work on Litestream.
| threemux wrote:
| All the public benchmarks (and my own applications) indicate
| that there is only a small (often unnoticeable) performance
| penalty associated with modernc.org/sqlite and this is far
| outweighed by the ability to eliminate CGO. I'd use it on
| future projects without hesitation.
| wodenokoto wrote:
| We have an in house application installed on a remote fleet with
| spotty internet access. Because of the spotty internet we have a
| lot of trouble setting up a reliable system for getting data
| home.
|
| Could how does litestream handle backing up through a spotty
| connection and can we consolidate the backups into a central db
| an query against it?
| keeganpoppen wrote:
| i am not a big fly.io fan per se, but their blog posts,
| especially in this sort of sphere, is delightful, and i
| absolutely commend them for this. fly has a fascinating
| combination of "dev-forward" and "we made this complicated
| because it shits us" that continues to confound me, but i must
| confess i am a big fan of this project, and a number of other
| projects (e.g. svelte) that they seem to genuinely treat in an
| arms-length, "let a thousand flowers bloom" sort of way.
| taejavu wrote:
| I'm curious what you don't like about fly.io. Also what's the
| connection to svelte?
| tptacek wrote:
| Just so we're all clear: Litestream is just an open source
| project, not a feature of our platform. This isn't about
| Fly.io.
| barefootford wrote:
| I look forward to trying this out. Any benchmarks or demos on how
| long it actually takes to restore? I ended up cooking my own
| boring S3 backup because previously litestream took 20 minutes to
| restore something like 1000 rows. It felt extremely unoptimized.
| How long does restoration take today?
| umajho wrote:
| It seems Litestream will soon support arbitrary s3-compatible
| destinations.[^1] Neat.
|
| So far I've stuck with the SFTP solution, since I don't use any
| of the cloud object storage services that are hardcoded into the
| tool.[^2]
|
| Big thanks to the developers.
|
| [^1]: https://github.com/benbjohnson/litestream/pull/731
|
| [^2]: https://litestream.io/guides/#replica-guides
| geenat wrote:
| Advantages of Litestream over https://sqlite.org/rsync.html ?
| thelibrarian wrote:
| Litestream gives you point in time recovery - you can restore
| to any snapshot time, not just have a current replica.
| ncruces wrote:
| The other advantage (or difference) is that you don't need a
| "server" on the other end, just object storage. Which may come
| out cheaper.
| wg0 wrote:
| Would it be a good fit for a application with the user base and
| size of let's say Fresh books?
| bradgessler wrote:
| The DX for deploying SQLite apps to Fly.io is rough. I'm a few
| hours into trying to get a production Rails app booting, but
| running into issues getting the database to initialize, migrate,
| and become writable. The root of my problem was the eager loading
| of a gem I wrote, but there were several layers of runners above
| it that made it hard to diagnose.
|
| I wish they'd put a bit more effort into the DX here, but it
| probably doesn't make much sense from a biz PoV since big
| customers aren't going to be running these kinds workloads.
|
| Curious if anybody here is deploying SQLite apps to production
| and what host they're using?
___________________________________________________________________
(page generated 2025-10-03 23:00 UTC)