[HN Gopher] Fly.io buys Litestream
___________________________________________________________________
Fly.io buys Litestream
Author : dpeck
Score : 383 points
Date : 2022-05-09 19:35 UTC (3 hours ago)
(HTM) web link (fly.io)
(TXT) w3m dump (fly.io)
| swaraj wrote:
| Looks v cool, but I feel like I'm missing a big part of the
| story, how do 2 app 'servers/process' connect to same
| sqlite/litestream db?
|
| Do you 'init' (restore) the db from each app process? When one
| app makes a write, is it instantly reflected on the other app's
| local sqlite?
| judofyr wrote:
| Each server would have one copy of the SQLite database. Only
| one of the server would support writes -- and those write will
| be replicated to the other server. Reads in the other server
| will be transactionally safe, but might be slightly out of
| date.
| swaraj wrote:
| This is my main q: are the writes replicated in real-time? Do
| the apps that just need read access have to repeatedly call
| 'restore'?
| tptacek wrote:
| https://litestream.io/getting-started/#continuous-
| replicatio...
| thruflo wrote:
| Also how does the WAL page based replication maintain
| consistency / handle concurrent updates?
| infogulch wrote:
| It doesn't, this gives you a read-only replica only.
| johnrrk wrote:
| I also investigated SQLite and it's not clear how we can use it
| with multiple servers.
|
| The WAL documentation [1] says "The wal-index greatly improves
| the performance of readers, but the use of shared memory means
| that all readers must exist on the same machine. This is why
| the write-ahead log implementation will not work on a network
| filesystem."
|
| So it seems that we can't have 2 Node.js servers accessing the
| same SQLite file on a shared volume.
|
| I'm not sure how to do zero downtime deployment (like starting
| server 2, checking it works, and shutting down server 1, seems
| risky since we'll have 2 servers accessing the same SQLite file
| temporarily)
|
| [1] https://sqlite.org/wal.html
| tptacek wrote:
| The point of Litestream is that you don't have multiple
| servers accessing the same SQLite file. They all have their
| own SQLite databases. Of course, you only write to one of
| them, but that's a common constraint for database clusters.
| rwho wrote:
| mwcampbell wrote:
| Congratulations to Ben on getting a well-funded player like Fly
| to buy into this vision. I'm looking forward to seeing a
| complete, ready-to-deploy sample app, when the upcoming
| Litestream enhancements are ready.
|
| I know that Fly also likes Elixir and Phoenix; they hired Chris
| McCord, after all. So would it make sense for Phoenix
| applications deployed in production on Fly to use SQLite and
| Litestream? Is support for SQLite in the Elixir ecosystem,
| particularly Ecto, good enough for this?
| warmwaffles wrote:
| > Is support for SQLite in the Elixir ecosystem, particularly
| Ecto, good enough for this?
|
| Why yes it is. I maintain the `exqlite` and `ecto_sqlite3`
| libraries and it was just integrated in with `kino_db` which is
| used by `livebook`.
|
| https://github.com/elixir-sqlite/exqlite
| swlkr wrote:
| The reduction in complexity from using sqlite + litestream as a
| server side database is great to see!
| netcraft wrote:
| This is similar to what I hoped websql had eventually grown into.
| sqlite in the browser, but let me sync it up and down with a
| server. Every user gets their own database, the first time to the
| app they "install" the control and system data, then their data,
| then writes are synced to the server. If it became standard, it
| could be super easy - conflict resolution notwithstanding.
| bambax wrote:
| You can make webapps using exactly this approach, with json in
| localstorage as the client db, and occasiona, asynchronous,
| writes to the server. I'm now building a simple webapp exactly
| like this, and the server db is sqlite. So far it works
| perfectly fine.
| tyingq wrote:
| Dqlite is also interesting, and in a similar space. It seems to
| have evolved from the LXC/LXD team wanting a replacement for
| Etcd. It's Sqlite with raft replication and also a networked
| client protocol.
|
| https://dqlite.io/docs/architecture
| tptacek wrote:
| There's also rqlite. There's definitely a place for this kind
| of stuff. But we already use a bunch of stuff that does
| distributed consensus in our stack, and the experience has left
| us wary of it, especially for global distribution. We almost
| used rqlite for a statekeeping feature internally, but today
| we'd certainly just use sqlite+litestream for the same kinds of
| features, just because it's easier to reason about and to deal
| with operationally when there's problems.
|
| https://fly.io/blog/a-foolish-consistency/
| otoolep wrote:
| rqlite author here. Anything else you can tell me about why
| you decided against it? Just simpler, as you say, to avoid a
| distributed system when you can (something I understand).
| tptacek wrote:
| We like rqlite a lot. There's some comments in your issue
| tracker from Jerome about it at the time. The decision
| wasn't against rqlite as a piece of software so much as it
| was us deliberately deciding not to introduce more Raft
| into our architecture; any place there is Raft, we're
| concerned we'll essentially need to train our whole on-call
| rotation on how to handle issues.
|
| The annoying thing about global consensus is that the
| operational problems tend to be global as well; we had an
| outage last night (correlated disk failure on 3 different
| machines!) in Chicago, and it slowed down deploys all the
| way to Sydney, essentially because of invariants maintained
| by a global Raft consensus and fed in part from
| malfunctioning machines.
|
| I think rqlite would make a lot of sense for us for
| applications where we run multiple regional clusters; it's
| just that our problems today tend to be global. We're not
| just looking for opportunities to rip Raft out of our
| stack; we're also trying to build APIs that regionalize
| nicely. In nicely-regionalized, contained settings, rqlite
| might work a treat for us.
| RcouF1uZ4gsC wrote:
| I love Litestream! It is so simple and it just works!
|
| Congratulations, Ben, on making a great product and on the sale!
|
| One thing I have had in the back of my mind, but have not had the
| time to pursue is using SQLite replication to make something
| similar to CloudFlare's durable objects but more open.
|
| A "durable object" would be an SQLite database and some program
| that processes requests and accesses the SQLite database. There
| would be a runtime that transparently replicates the (database,
| program) pair where they are needed and routes to them.
|
| That way, I can just start out locally developing my program with
| an SQLite database, and then run a command and have it available
| globally. At the same time, since it is just accessing an SQLite
| database, there would be much less risk of lockin.
| krts- wrote:
| A great project with awesome implications. Well deserved, and the
| fly.io team are very pragmatic.
|
| This will be even more _brilliant_ than it already is when fly.io
| can get some slick sidecar /multi-process stuff.
|
| I ended up back with Postgres after my misconfigs left me a bit
| burned with S3 costs and data stuff. But I think a master VM
| backed by persistent storage on fly with read replicas as
| required is maybe the next step: I love the simplicity of SQLite.
| foodstances wrote:
| Just curious, is there any financial compensation/support going
| to Richard Hipp with all of this money changing hands?
|
| When I see these startups making a business that is so heavily
| based on open-source software (like Tailscale on top of
| Wireguard), I have to wonder what these companies do to actually
| support the author(s) of the software that so much of their
| company is based on.
| mrkurt wrote:
| Yes. We (Fly.io) are buying a sqlite support agreement. We also
| send money WireGuard's way. I'm pretty sure Tailscale does too.
|
| We have also given OSS authors advisor equity. A couple of
| folks wrote libraries that were important to keeping us going,
| and we've granted them shares the same way some startups would
| to MBA advisors.
| foodstances wrote:
| That's great to hear, thank you!
| qbasic_forever wrote:
| I agree Richard Hipp should be compensated but he explicitly
| licensed and releases SQLite under a public domain license:
| https://www.sqlite.org/copyright.html Not Apache, not MIT, not
| GPL... public domain. You can do almost anything with it and
| not be beholden to any demands. You can tell people you built
| your business on SQLite... or not. It's public domain.
|
| That said SQLite has a business model of selling support and
| premium features like encryption:
| https://www.sqlite.org/prosupport.html
| foodstances wrote:
| Sure, but Apache, MIT, and GPL licenses don't require payment
| to the author either. That's why it's up to the company to
| decide to offer compensation without being required to, and
| why I'm curious which companies actually do it.
|
| It's like when RedHat when public and offered pre-IPO stock
| to open source developers.
| otoolep wrote:
| Congratulations to Ben! This project has been like a rocket ship.
| benbjohnson wrote:
| Thanks, Philip!
| no_wizard wrote:
| This a great and interesting offering! I think this fits well
| with fly.io and their model of computing.
|
| I now wish that I had engaged with this idea that was very
| similar to litestream that I had about a year and half ago. I
| always thought SQLite just needed a distribution layer to be
| extremely effective as a distributed database of sorts. Its flat
| file architecture means its easy to provision, restore and
| backup. SQLite also has incremental snapshotting and re-
| producible WAL logs that can be used to do incremental backups,
| restores, writes etc. It just needs a "frontend" to handle those
| bits. Latency has gotten to the point where you can replicate a
| database by its continued snapshots (which is, on a high level,
| what litestream appears to be doing) being propagated out to
| object / blob storage. You could even achieve brute force
| consensus with this approach if you ran it in a truly distributed
| way (though RAFT is probably more efficient).
|
| Reason I didn't do this? I thought to myself - why in the world
| in 2020 would someone choose to use SQLite at scale instead of
| something like Firebase, Spanner, Fauna, or even Postgres? So
| after I did an initial prototype (long gone, never pushed it to
| GitHub) I just felt like...there was no appetite for it.
|
| Now I regret!
|
| Just a long winded way of saying, congrats! This is awesome!
| Thanks for doing exactly what I wanted to do but didn't have the
| guts to follow through with.
| epilys wrote:
| I implemented exactly this setup, in Rust, last year for a
| client. Distributed WAL with write locks on a RAFT scheme.
| Custom VFS in Rust for sqlite3 to handle the IO. I asked the
| client to opensource it but it's probably not gonna happen...
| It's definitely doable though.
| ComputerGuru wrote:
| Did you write your own rust raft implementation or reuse
| something already available?
| epilys wrote:
| Reused a well known library that uses raft. I don't know if
| I should mention any more details since it was a private
| project.
| Serow225 wrote:
| there's some stuff out there:
|
| - https://github.com/rqlite/rqlite -
| https://github.com/chiselstrike/chiselstore -
| https://dqlite.io/
|
| I'm sure there's more, those are just the ones I remember.
| mrcwinn wrote:
| I have really enjoyed using Fly. Great service and support.
| scwoodal wrote:
| > According to the conventional wisdom, SQLite has a place in
| this architecture: as a place to run unit tests.
|
| Be careful with this approach. Frameworks like Django have DB
| engine specific features[1]. When you start using them in your
| application you can no longer use a different DB (SQLite) to run
| your unit tests.
|
| [1]
| https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/f...
| seanwilson wrote:
| SQLite uses dynamic types? Is this an issue in practice,
| especially for large apps? Don't you lose guarantees about your
| data which makes it messy to handle on the backend?
|
| Context from https://www.sqlite.org/datatype3.html: "SQLite uses
| a more general dynamic type system. In SQLite, the datatype of a
| value is associated with the value itself, not with its
| container. The dynamic type system of SQLite is backwards
| compatible with the more common static type systems of other
| database engines in the sense that SQL statements that work on
| statically typed databases work the same way in SQLite. However,
| the dynamic typing in SQLite allows it to do things which are not
| possible in traditional rigidly typed databases. Flexible typing
| is a feature of SQLite, not a bug."
| aliswe wrote:
| This sounds like schemalessness to me? serious "question".
| jamie_ca wrote:
| Not schemaless, but typeless. SQLite will let you declare a
| column to be an integer and then dump a string into it, but
| you're still defining a table with specific columns.
|
| It's like the opposite problem Mysql has when you try to
| write data larger than the field definition - Mysql will
| truncate, Sqlite will store the data you gave it.
| seanwilson wrote:
| Typeless is the default though? Why wouldn't you want the
| types to be reliable when you're reading/writing from the
| backend in the general case?
| ripley12 wrote:
| You can use SQLite in strict mode if you prefer.
| https://www.sqlite.org/stricttables.html
| rco8786 wrote:
| All of the action around SQLite recently is very exciting!
| bob1029 wrote:
| > SQLite isn't just on the same machine as your application, but
| actually built into your application process. When you put your
| data right next to your application, you can see per-query
| latency drop to 10-20 microseconds. That's micro, with a m. A
| 50-100x improvement over an intra-region Postgres query.
|
| This is the #1 reason my exuberant technical mind likes that we
| use SQLite for all the things. Latency is the exact reason you
| would have a problem scaling any large system in the first place.
| Forcing it all into one cache-coherent domain is a really good
| way to begin eliminating entire universes of bugs.
|
| Do we all appreciate just how much more throughput you can get in
| the case described above? A 100x latency improvement doesn't
| translate _directly_ into the same # of transactions per second,
| but its pretty damn close if your I /O subsystem is up to the
| task.
| throwaway894345 wrote:
| If you're pushing the database up into the application layer,
| do you have to route all write operations through a single
| "master" application instance? If not, is there some multi-
| master scheme, and if so, is it cheaper to propagate state all
| the time than it is to have the application write to a master
| database instance over a network? Moreover, how does it affect
| the operations of your application? Are you still as
| comfortable bouncing an application instance as you would
| otherwise be?
| closeparen wrote:
| This is a large part of what Rich Hickey emphasizes about
| Datomic, too. We're so used to the database being "over there"
| but it's actually very nice to have it locally. Datomic solves
| this in the context of a distributed database by having the
| read-only replicas local to client applications while the
| transaction-running parts are remote.
| abraxas wrote:
| Only trouble with that particular implementation is that the
| Datomic Transactor is a single threaded single process that
| serializes every transaction going through it. As long as you
| don't need to scale writes it works like a charm. However,
| the workloads I somehow always end up working with are write
| heavy or at best 50/50 between read and write.
| vmception wrote:
| > SQLite isn't just on the same machine as your application,
| but actually built into your application process.
|
| How is that different than whats commonly happening? Android
| and iOS do this... right? ... but its still accessing the
| filesystem to use it.
|
| Am I missing something or is what they are describing just
| completely commonplace that is only interesting to people that
| use microservices and never knew what was normal.
| tlb wrote:
| It's normal (and HN does something similar, working from in-
| process data) for systems that don't have to scale beyond one
| server. If you need multiple servers you have to do
| something, such as Litestream.
| mrkurt wrote:
| This is how client apps use sqlite, yes. Single instance
| client apps. Litestream is one method of making sqlite work
| for server side apps. The hard part on the server is solving
| for multiple processes/vms/containers writing to one sqlite
| db.
| nicoburns wrote:
| > the hard part on the server is solving for multiple
| processes/vms/containers writing to one sqlite db.
|
| I feel like if you have multiple apps writing to the
| database then you shouldn't be using SQLite. That's where
| Postgres etc completely earn their place in the stack.
| Where litestream is really valuable is when you have a
| single writer, but you want point-in-time backups like you
| can get with postgres.
| vmception wrote:
| interesting, such a weird way to describe it then. but I
| guess some people are more familiar with that problem.
| funstuff007 wrote:
| This is exactly the reason I am so skeptical of the cloud. I
| don't care how easy it is to stand up VMs, containers, k8s,
| etc. What I need to know is how hard is it to lug my data to my
| application and vice a versa. My feelings on this are so strong
| as I work mostly on database read-heavy applications.
| WJW wrote:
| How do any writes end up on other horizontally scaled machines
| though? To me the whole point of a database on another machine
| is that it is the single point of truth that many horizontally
| scaled servers can write to and read each others' updates from.
| If you don't need that, you might as well read the entire
| dataset into memory and be done with it.
|
| I know TFA says that you can "soon" automagically replicate
| your sqlite db to another server, but it only allows writes on
| a single server and all other will be readers. Now you need to
| think about how to move all write traffic to a single app
| server. All writes to that server will still take several
| milliseconds (possibly more, since S3 is eventually consistent)
| to propagate around all replicas.
|
| In short, 100x latency improvement for reads is great but a bit
| of a red herring since if you have read-only traffic you don't
| need sqlite replication. If you do have write traffic, then
| routing it through S3 will definitely not give you a 100x
| latency improvement over Postgres or MySQL anymore. Litestream
| is definitely on my radar, but as a continuous backup system
| for small apps ("small" meaning it runs and will always run on
| a single box) rather than a wholesale replacement of
| traditional client-server databases.
|
| PS: Congrats Ben!
| jolux wrote:
| S3 is strongly consistent now:
| https://aws.amazon.com/s3/consistency/
| bob1029 wrote:
| What if, due to ridiculous latency reductions, your business
| no longer requires more than 1 machine to function at scale?
|
| I'm talking more about sqlite itself than any given product
| around it at this point, but I still think it's an
| interesting thought experiment in this context.
| WJW wrote:
| I'll point out that the ridiculous latency reductions don't
| apply to replicating the writes to S3 and/or any replica
| servers, that still takes as long as it would to any other
| server across a network. The latency reductions are _only_
| for pure read traffic. Also, every company I ever worked at
| had a policy to run at least two instances of a service in
| case of hardware failure. (Is this reasonable to
| extrapolate this policy to a company which might want to
| run on a single sqlite instance? I don 't know, but just as
| a datapoint I don't think any business should strive to run
| on a single instance)
|
| This write latency _might_ be fine, although more than one
| backend app I know renewed the expiry time of a user
| session on every hit and would thus do at least one DB
| write per HTTP call. I don 't think this is optimal, but it
| does happen and simply going "well don't do write traffic
| then" does not always line up with how apps are actually
| built. Replicated sqlite over litestream is very cool, but
| definitely you need to build your app around and also
| definitely something that costs you one of your innovation
| tokens.
| tptacek wrote:
| There's no magic here (that there is no magic is part of
| the point). You have the same phenomenon in n-tier
| Postgres deployments: to be highly available, you need
| multiple instances; you're going to have a write leader,
| because you're not realistically want to run a Raft
| consensus for every write; etc.
|
| The point of the post is just that if you can get rid of
| most of the big operational problems with using server-
| side SQLite in a distributed application --- most
| notably, failing over and snapshotting --- then SQLite
| can occupy a much more interesting role in your stack
| than it's conventionally been assigned. SQLite has some
| very attractive properties that have been largely ignored
| because people assume they won't be able to scale it out
| and manage it. Well, you can scale it out and manage it.
| Now you've got an extremely simple database layer that's
| easy to reason about, doesn't require you to run a
| database server (or even a cache server) next to all your
| app instances, and happens to be extraordinarily fast.
|
| Maybe it doesn't make sense for your app? There are
| probably lots of apps that really want Postgres and not
| SQLite. But the architecture we're proposing is one
| people historically haven't even considered. Now, they
| should.
|
| I'm not sure "litestream replicate <file>" really costs a
| whole innovation token. It's just SQLite. You should get
| an innovation rebate for using it. :)
| toolz wrote:
| I have to imagine having your service highly available
| (i.e. you need a failover machine) is far more likely to be
| the reason to need multiple machines than exhausting the
| resources on some commodity tier machine.
| ok_dad wrote:
| With Postgres, you might have one server, or one cluster of
| servers that are coordinated, and then inside there you have
| tables with users and the users' data with foreign keys tying
| them together.
|
| With SQLite, you would instead have one database (one file)
| per user as close to the user as possible that has all of the
| user's data and you would just read/write to that database.
| If your application needs to aggregate multiple user's data,
| then you use something like Litestream to routinely back it
| up to S3, then when you need to aggregate data you can just
| access it all there and use a distributed system to do the
| aggregation on the SQLite database files.
| danappelxx wrote:
| Hold on, doesn't one-database-per-user totally absolve all
| ACID guarantees? You can't do cross-database transactions
| (to my knowledge), which means you can end up with
| corrupted data during aggregations. What am I missing?
| mwcampbell wrote:
| One database per tenant only makes sense in multi-tenant
| applications that don't have any cross-tenant actions. I
| imagine there are many B2B applications that fall into
| this category.
| nicoburns wrote:
| > If you don't need that, you might as well read the entire
| dataset into memory and be done with it.
|
| Over in-memory data structures,SQLite gives you:
|
| - Persistence
|
| - Crash tolerance
|
| - Extremely powerful declarative querying capabilities
|
| > if you have read-only traffic you don't need sqlite
| replication.
|
| I agree with you that the main use-case here is backup and
| data durability for small apps. Which is pretty big deal, as
| a database server is often the most expensive part of running
| a small app. That said, there are definitely systems where
| latency of returning a snapshot of the data is important, but
| which snapshot isn't (if updates take a while to percolate
| that's fine).
| mrkurt wrote:
| Litestream does a couple of things. It started as a way to
| continuously back sqlite files up to s3. Then Ben added read
| replicas - you can configure Litestream to replicate from a
| "primary" litestream server. It's still limited to a single
| writer, but there's no s3 in play. You get async replication
| to other VMs: https://github.com/fly-apps/litestream-base
|
| We have a feature for redirecting HTTP requests that perform
| writes to a single VM. This makes Litestream + replicas
| workable for most fullstack apps:
| https://fly.io/blog/globally-distributed-postgres/
|
| It's not a perfect setup, though. You have to take the writer
| down to do a deploy. The next big Litestream release should
| solve that, and is part of what's teased in the post.
| throwoutway wrote:
| > We have a feature for redirecting HTTP requests that
| perform writes to a single VM. This makes Litestream +
| replicas workable for most fullstack apps:
| https://fly.io/blog/globally-distributed-postgres/
|
| Thereby making it a constraint and (without failover) a
| single point of failover? What's the upper limit here?
| tptacek wrote:
| This constraint is common to most n-tier architectures
| (with Postgres or MySQL) as well. Obviously, part of
| what's interesting about Litestream is that it simplifies
| fail-over with SQLite.
| a-dub wrote:
| if you can tolerate eventual consistency and have the disk/ram
| on the application vms, then sure, keeping the data and the
| indices close to the code has the added benefit of keeping
| request latency down.
|
| downside of course is the complexity added in synchronization,
| which is what they're tackling here.
|
| personally i like the idea of per-tenant databases with
| something like this to scale out for each tenant. it encourages
| architectures that are more conducive for e2ee or procedures
| that allow for better guarantees around customer privacy than
| big central databases with a customer id column.
| judofyr wrote:
| > Latency is the exact reason you would have a problem scaling
| any large system in the first place.
|
| Let's not forget why we started using separate database server
| in the first now...
|
| A web server does quite a lot of things: Parsing/formatting
| HTTP/JSON/HTML, restructuring data, calculating stuff. This is
| typically very separate from the data loading aspect and as you
| get more requests you'll have to put more CPU in order to keep
| up (regardless of the language).
|
| By separating the web server from the database server you
| introduce more latency in favor of enabling scalability. Now
| you can spin up hundreds of web servers which all talk to a
| single database server. This is a typical strategy for
| scalability: _decouple_ the logic and _scale up individually_.
|
| If you couple them together it's more difficult to scale. First
| of all, in order to spin up a server you need a full version of
| the database. Good luck autoscaling on-demand! Also, now every
| write will have to be replicated to _all_ the readers. That 's
| a lot more bandwidth.
|
| There are _definitely_ use cases for Litestream, but it 's far
| from a replacement for your typical Node + PostgreSQL stack. I
| can see it being useful as a lower-level component: You can use
| Litestream to build your "own" database server with customized
| logic which you can talk to using an internal protocol (gRPC?)
| from your web servers.
| nicoburns wrote:
| > There are definitely use cases for Litestream, but it's far
| from a replacement for your typical Node + PostgreSQL stack
|
| If you're a language like Node.js then horizontal scaling
| makes a lot of sense, but I've been working with Rust a lot
| recently. And Rust is so efficient that you typically end up
| in a place where a single application server can easily
| saturate the database. At that point moving them both onto
| the same box can start to make sense.
|
| This is especially true for a low-traffic apps. I could
| probably run most of my Rust apps on a VM with 128MB RAM (or
| even less) and not even a whole CPU core and still get
| excellent performance. In that context, sticking a SQLite
| database that backs up to object storage on the same box
| becomes very attractive from a cost perspective.
| judofyr wrote:
| This is "vertical scaling" and that is indeed a very valid
| approach! You just have to be aware that vertical scaling
| has some fundamental limits and it's going to suck big time
| if it comes at a surprise to you.
| mwcampbell wrote:
| Considering that more powerful machines continue to
| become more affordable, it's a safe bet that most of us
| will never hit those limits.
| Karrot_Kream wrote:
| Not sure about that. It would be smarter to just failure
| test your apps. Once you cross some threshold, you scale.
| Lots of companies build formulas costing out their cloud
| spend based on infra needs and failure tests.
| judofyr wrote:
| Alternatively, instead of just betting on it, you could
| do a benchmark, figure out the limits of your system and
| check if your current implementation is capable of
| handling the future needs.
| [deleted]
| tptacek wrote:
| I don't think anyone's seriously arguing that the n-tier
| database architecture is, like, intrinsically bankrupt. Most
| applications are going to continue to be built with Postgres.
| We like Postgres; we have a Postgres offering; we're friends
| with Postgres-providing services; our product uses Postgres.
|
| The point the post is making is that we think people would be
| surprised how far SQLite can get a typical application.
| There's a clear win for it in the early phases of an
| application: managing a database server is operationally (and
| capitally) expensive, and, importantly, it tends to pin you
| to a centralized model where it really only makes sense for
| your application to run in Ashburn --- every request is
| getting backhauled their anyways.
|
| As the post notes, there's a whole ecosystem of bandaids ---
| err, tiers --- that mitigate this problem; it's one reason
| you might sink a lot of engineering work into a horizontally-
| scaling sharded cache tier, for instance.
|
| The alternative the post proposes is: just use SQLite. Almost
| all of that complexity melts away, to the point where even
| your database access code in your app gets simpler (N+1 isn't
| a game-over problem when each query takes microseconds). Use
| Litestream and read-only replicas to scale read out
| horizontally; scale the write leader vertically.
|
| Eventually you'll need to make a decision: scale "out" of
| SQLite into Postgres (or CockroachDB or whatever), or start
| investing engineering dollars into making SQLite scale (for
| instance: by using multiple databases, which is a SQLite
| feature people sleep on). But the bet this post is making is
| that the actual value of "eventually" is "surprisingly far
| into the future", "far enough that it might not make sense to
| prematurely optimize for it", especially early on when all
| your resources, cognitively and financially and temporally,
| are scarce.
|
| We might be very wrong about this! There isn't an interesting
| blog post (or technical bet) to make about "I'm all in on the
| n-tier architecture of app servers and database servers".
| We're just asking people to think about the approach, not
| saying you're crazy if you don't adopt it.
| ithrow wrote:
| As they say, "you are not twitter" ;)
|
| Access to monstrous machines is easy today and you have very
| fast runtimes like Go and the JVM that can leverage this
| hardware.
| plesiv wrote:
| I absolutely love this. I think so called n-tier architecture as
| a pattern should be aggressively battled in the attempt to reduce
| the n. Software is so much more reliable when the communication
| between different computational modules of the system are
| function calls as opposed to IPC calls. Why does everything that
| computes something or provides some data need to be a process? It
| doesn't.
|
| Postgresql and every other server/process should have first class
| support for a single CLI command that: spins up the DB that
| slurps up the config and the data storage, takes the SQL command
| provided through the CLI arguments, runs it, returns results and
| terminates. Effectively, every server/process software should be
| a library first, since it's easy to make a server out of a
| library and the reverse is anything but.
| jjeaff wrote:
| If you want to maintain much of the data in memory, wouldn't
| that require a process?
| plesiv wrote:
| Sure. If you need your software to be a process I think you
| should build it to be both: a library first and a process
| second. Libraries are so much easier to use, test and reason
| about.
| beck5 wrote:
| I have found it easy to overload SQLite with too many write
| operations (20+ Concurrently), is this typical behaviour referred
| to in the post, or a write heavy workload?
| Scarbutt wrote:
| How big are the writes? are you storing blobs?
| benbjohnson wrote:
| It can depends on a lot of factors such as the journaling mode
| you're using as well as your hardware. SQLite has a single-
| writer-at-a-time restriction so it's important manage the size
| of your writes. I typically see very good write throughput
| using WAL mode and synchronous=normal on modern SSDs.
| NeutralForest wrote:
| There's something I don't understand, it says that the "data is
| next to the application", what does it mean? Where is stored and
| how is it accessed by the application?
| tptacek wrote:
| The data lives in a file the application reads/writes directly
| (and in a cache that the sqlite libraries can park inside the
| application itself). The point is that you're not calling out
| over the network to a "database server"; your app server is the
| database server.
| ledauphin wrote:
| it means the data is stored in a file on the local drive of a
| computer that is also running the application.
|
| it also means that it is the application itself (via the SQLite
| library) that reads and modifies that database file. There is
| no separate database process.
| anyfactor wrote:
| Story time!
|
| A client told me that they will use a DigitalOcean droplet for a
| web app. Because the database was very small I chose to use
| SQLite3.
|
| After delivery the client said their devops guy wasn't available
| they would like to deploy to Heroku. Heroku being a ephemeral
| cloud service couldn't handle the same directory SQLite3 db I had
| there. The only solution was to use their Postgres database
| service.
|
| For some reason, it was infuriating that I have to use a database
| like that to store few thousand rows of data. Moreover, I would
| have to rewrite a ton of stuff accommodate the change to
| Postgres.
|
| I ended up using firestore.
|
| ---
|
| I think something like this could have saved me a ton of hassle
| that day.
| luhn wrote:
| It was too much work to migrate from SQLite to PostgreSQL, so
| you migrated to... a NoSQL DB?
| pjot wrote:
| I think they're referring to the trade from managing one
| system (DO + SQLite) to two (Heroku + pg) and instead
| choosing Firestore instead as it's only one system to manage.
| [deleted]
| szundi wrote:
| He wrote it was a "day" at the end. This guy is fast.
| me_me_mu_mu wrote:
| Please let me know if you've ever had to move data out of
| firestore. I'm currently using firestore for some real time
| requirements but the data is written to Postgres before the
| relevant data for real time needs (client needs to show some
| data updating constantly) is written to firestore.
|
| Just curious if you've ever had to migrate data out of
| firestore.
| ilrwbwrkhv wrote:
| For how much?
| benbjohnson wrote:
| Litestream author here. I've been on the fence about disclosing
| the amount. I'm generally open about everything but I know some
| people get weird about money stuff. I'm also autistic so I tend
| to not navigate social norms very well. That all being said,
| the project was acquired for $500k.
| scottlamb wrote:
| Thanks for sharing that. I've never really looked at open
| source projects as acquisition targets. I see in another
| comment that you're going to continue releasing it under the
| Apache license. It's easy for me to see why fly.io would want
| to hire you, with an agreed percentage (anywhere from
| 0%-100%) of your time continuing to go into Litestream. If
| you forgive the blunt question, what more do they get for the
| $500k (acquisition cost / signing bonus)? (Part of me is
| wondering if an open source project of mine, which various
| startups have shown some degree of interest in, is holding a
| significant payday I hadn't realized. Probably not, but it
| seems more possible than a moment ago.)
| tartakovsky wrote:
| I would also be interested in understanding whether there
| is a proper pricing model for such things. Wordle comes to
| mind. Or a friend that has an IPad app that took 2 years to
| build that is something novel but not released. Some
| projects are open-source and some aren't. Some are acquired
| for users and some are acqui-hired for continued
| development. Any interesting advice or links here for folks
| that don't want to be founders but want to make a solid
| chunk of cash, have an expertise of value and love the
| development work.
| benbjohnson wrote:
| There's not any real pricing model that I know of. I
| think it comes down to a question of what value an
| acquisition brings and that's always kinda fuzzy. If you
| want specific numbers, the project was at ~5k GitHub
| stars at the time of acquisition so I guess it's a
| hundred bucks per star. :)
| benbjohnson wrote:
| Good question. I think the folks at Fly realize that they
| get a lot of benefit from enabling open source projects
| that work well on their platform. They have a somewhat
| similar approach with the Phoenix project in that they
| hired Chris McCord to work on it full-time.
|
| Litestream has a lot of potential in being a lightweight,
| fast, globally-distributed database and that aligns really
| well with Fly. Continuing to release it as open source
| means more folks can benefit from it and give feedback --
| even if they don't use it on Fly.
| [deleted]
| mtlynch wrote:
| Super cool! Congrats, Ben!
|
| I've been building all of my projects for the last year with
| SQLite + fly.io + Litestream. It's already such a great
| experience, but I'm excited to see what develops now that
| Litestream is part of fly.
| learndeeply wrote:
| Since both Fly.io and Litestream founders are here - why not
| disclose the price?
| benbjohnson wrote:
| Litestream author here. I just posted it as a reply here:
| https://news.ycombinator.com/item?id=31319556
| tiffanyh wrote:
| @dang, the actual title is " I'm All-In on Server-Side SQLite"
|
| Maybe I missed it but where in the article does it say Fly
| acquired Litestream?
|
| EDIT: Ben Johnson says he just joined Fly. Nothing about Fly
| "acquiring" Litestream.
|
| https://mobile.twitter.com/benbjohnson/status/15237489883352...
| gamblor956 wrote:
| "Litestream has a new home at Fly.io, but it is and always will
| be an open-source project"
|
| Very bottom of the post. Technically, Litestream remains an
| open-source project, so it's more accurate to say that Fly.io
| acquired the brand IP and the owner of that IP.
| lnsp wrote:
| > Litestream has a new home at Fly.io, but it is and always
| will be an open-source project. My plan for the next several
| years is to keep making it more useful, no matter where your
| application runs, and see just how far we can take the SQLite
| model of how databases can work.
|
| As far as I understood it, Fly.io hired the person working on
| Litestream and pays them to keep working on Litestream.
| tiffanyh wrote:
| That's how I understood it and that's radically different
| than how this HN post got titled.
|
| Ben Johnson confirms how you framed it here:
|
| https://mobile.twitter.com/benbjohnson/status/15237489883352.
| ..
| tptacek wrote:
| We wrote a different title for this blog post, and we did
| in fact buy Litestream (to the extent that anyone can "buy"
| a FOSS project, of course).
| bussetta wrote:
| The tweet[1] links the blog post and says Litestream is part of
| fly.io now.
|
| [1]https://twitter.com/flydotio/status/1523743433109692416
| jrochkind1 wrote:
| While the title is about a business acquisition, the article is
| mostly about the technology itself -- replicating SQLite,
| suggested as a superior option to a more traditional separate-
| process rdbms, for real large-scale production workloads.
|
| I'd be curious to hear reactions to/experiences with that
| suggestion/technology, inside or outside the context of fly.io.
| LunaSea wrote:
| I wonder if we'll ever see an embedded version of PostgreSQL?
| nicoburns wrote:
| That's basically what SQLite is (notably, SQLite makes an
| effort to be compatible with Postgres's SQL syntax). If you
| mean based off the actual PostgreSQL codebase, then I highly
| doubt it.
| LunaSea wrote:
| I doubt it as well.
|
| That's sad though because SQLite is really missing a lot of
| features that PostgreSQL has.
| nicoburns wrote:
| > That's sad though because SQLite is really missing a lot
| of features that PostgreSQL has.
|
| It is, but luckily it's not standing still. It's added JSON
| support and window functions in recent years for example.
| melony wrote:
| Note that the popular Node.js ORM Prisma does not support WAL.
|
| https://github.com/prisma/prisma/issues/3303
| tylergetsay wrote:
| It also crashes if you try to write to the DB while its open
| https://github.com/prisma/prisma/issues/2955
| quintes wrote:
| What's the use case here, a single web app with inproc db?
|
| More complex use cases?
|
| I remember I could do this on azure at one point in time with app
| services, not Sure if it's still a thing.. but heavy writes and
| scaling of those types of apps would lead to to rethink this
| approach right?
| paulhodge wrote:
| Wow Litestream sounds really interesting to me. I was just
| starting on an architecture, that was either stupid or genius, of
| using many SQLite databases on the server. Each user's account
| gets their own SQLite file. So the service's horizontal scaling
| is good (similar to the horizontal scaling of a document DB), and
| it naturally mitigates data leaks/injections. Also opens up a few
| neat tricks like the ability to do blue/green rollouts for schema
| changes. Anyway Litestream seems pretty ideal for that, will be
| checking it out!
| freedomben wrote:
| I actually did something very similar to this for an app that
| produced _a lot_ of data. I wrote a small middleware that
| automatically figured out which shard to use so the app logic
| could pretend that it was all just one big db. The app
| ultimately ended up in the can so it never needed to scale, but
| I always wonder how it would have gone.
| Scarbutt wrote:
| _Each user 's account gets their own SQLite file._
|
| So now you need one database connection per user...
| robertlagrant wrote:
| If by connection you mean in-process database.
| freedomben wrote:
| Without knowing details about the app, it's hard to know if
| that would matter. If a small number of concurrent users
| would ever be using it, I would think it would be NBD.
| tptacek wrote:
| And? It's SQLite; it's a file handle and some cache, not a
| connection pool.
| mwcampbell wrote:
| Depending on how you define "account", that can be quite
| reasonable. In a B2B application, each business customer
| could get their own SQLite database, and the number of SQLite
| connections would likely be quite manageable, even though
| some customers have many users.
| mwcampbell wrote:
| An architecture like yours has certainly been done before,
| though AFAIK it never went mainstream. In particular, check out
| this post from Glyph Lefkowitz of Twisted Python fame,
| particularly the section about the (apparently dead) Mantissa
| application server:
|
| https://glyph.twistedmatrix.com/2008/06/this-word-scaling.ht...
| [deleted]
| ok_dad wrote:
| I was just about to start using this for a project, I hope the
| license won't change.
|
| Congrats to the author though, no matter what! I wish everyone
| could be so successful.
| [deleted]
| benbjohnson wrote:
| Litestream author here. It'll continue to be open source under
| an Apache 2 license.
| wasd wrote:
| Fly is putting together a pretty great team and interesting tech
| stack. It's the service I see as a true disruptor to Heroku
| because it's doing something novel (not just cheaper).
|
| I'm still a little murky on the tradeoffs with Fly (and
| litestream). @ben / @fly, you should write a tutorial on hosting
| a todo app using rails with litestream and any expected hurdles
| at different levels of scale (maybe comparing to Heroku).
| pbowyer wrote:
| Not surprised. Congratulations Ben!
| endisneigh wrote:
| What's an example of a popular app (more than 100K users) that
| uses lite stream? Curious to see how this looks like in
| production
| [deleted]
| jkaplowitz wrote:
| Tailscale: https://tailscale.com/blog/database-for-2022/
|
| I don't know their user count, but they are growing well and
| just raised their Series B.
| benbjohnson wrote:
| Litestream author here. That's a good question. There's not
| very good visibility into open source usage so it's hard to say
| unless folks write blog posts about it. For example, I know
| Tailscale runs part of their infrastructure with SQLite &
| Litestream[1].
|
| I wrote a database called BoltDB before and I have no idea how
| widespread it is exactly. It's used in a lot of open source
| projects like Consul & etcd but I don't know anything about
| non-public usage.
|
| [1]: https://tailscale.com/blog/database-for-2022/
| gfd wrote:
| For non-public usages, I remember Boltdb being named as one
| of the root causes that took down Roblox for three days!
| https://blog.roblox.com/2022/01/roblox-return-to-
| service-10-...
| benbjohnson wrote:
| Yep! That's usually how I find out usage inside companies.
| :)
| [deleted]
| kall wrote:
| I am as obsessed with sub 100ms responses as the people at
| fly.io, so I think the one writer and many, many readers
| architecture is smart and fits quite a few applications. When
| litestream adds actual replication it will get really exciting.
|
| > it won't work well on ephemeral, serverless platforms or when
| using rolling deployments
|
| That's... a lot of new applications these days.
| mwcampbell wrote:
| > it won't work well on ephemeral, serverless platforms or when
| using rolling deployments
|
| I assumed that was what Fly was hiring Ben to work on.
| emptysea wrote:
| Yeah the rolling deployments gotcha really stuck out to me. I
| think most PaaS will provide that by default anyways because
| who wants downtime during deploys?
| mwcampbell wrote:
| mrkurt specifically mentioned that a solution for that is in
| the works. https://news.ycombinator.com/item?id=31319544
| thdxr wrote:
| in practice how do you make a single application node the writer?
|
| do you now need your nodes to be clustered + electing a leader
| and shipping writes there?
|
| know fly.io did this with PG + Elixir but BEAM makes this type of
| stuff pretty easy
___________________________________________________________________
(page generated 2022-05-09 23:00 UTC)