[HN Gopher] In Search of a Faster SQLite
___________________________________________________________________
In Search of a Faster SQLite
Author : avinassh
Score : 164 points
Date : 2024-12-16 16:46 UTC (6 hours ago)
(HTM) web link (avi.im)
(TXT) w3m dump (avi.im)
| high_byte wrote:
| "The benefits become noticeable only at p999 onwards; for p90 and
| p99, the performance is almost the same as SQLite."
|
| I hate to be a hater, and I love sqlite and optimizations, but
| this is true.
| bawolff wrote:
| The benchmark seems a bit weird. Fetch 100 results from a table
| with no filtering,sorting,or anything? That feels like the IO
| is going to be really small anyways.
| tsegratis wrote:
| they compare threads and coroutines for limbo. threads have
| much worse p90 latencies since they context switch.... im not
| sure they can draw any conclusions except that coroutines are
| faster (of course)
| efitz wrote:
| The article discusses the specific use case of serverless
| computing, e.g. AWS Lambda, and how a central database doesn't
| always work well with apps constructed in a serverless fashion.
|
| I was immediately interested in this post because 6-7 years ago I
| worked on this very problem- I needed to ingest a set of complex
| hierarchical files that could change at any time, and I needed to
| "query" them to extract particular information. FaaS is expensive
| for computationally expensive tasks, and it also didn't make
| sense to load big XML files and parse them every time I needed to
| do a lookup in any instance of my Lambda function.
|
| My solution was to have a central function on a timer that read
| and parsed the files every couple of minutes, loaded the data
| into a SQLite database, indexed it, and put the file in S3.
|
| Now my functions just downloaded the file from S3, if it was
| newer than the local copy or on a cold start, and did the lookup.
| Blindingly fast and no duplication of effort.
|
| One of the things that is not immediately obvious from Lambda is
| that it has a local /tmp directory that you can read from and
| write to. Also the Python runtime includes SQLite; no need to
| upload code besides your function.
|
| I'm excited that work is going on that might make such solutions
| even faster; I think it's a very useful pattern for distributed
| computing.
| rmbyrro wrote:
| > Now my functions just downloaded the file from S3, if it was
| newer than the local copy
|
| if you have strong consistency requirements, this doesn't work.
| synchronizing clocks reliably between different servers is
| surprisingly hard. you might end up working with stale data.
| might work for use cases that can accept eventual consistency.
| Spivak wrote:
| I don't really know if that matters for this use case. Just
| by the very nature of source_data -> processing -> dest_data
| taking nonzero time anything consuming dest_data must already
| be tolerant of some amount of lag. And how it's coded
| guarantees you can never observe dest_data going new -> old
| -> new.
| jedberg wrote:
| One of the announcements from AWS this year at Re:invent is
| that they now can guarantee that the instances clocks are
| synced within microseconds of each other. Close enough that
| you can rely on it for distributed timekeeping.
| Dylan16807 wrote:
| If you have strong consistency requirements, then it doesn't
| work by the very nature of making multiple copies of the
| database. Even if the clocks are perfect. (Though the clocks
| are probably close enough that it doesn't matter.)
| efitz wrote:
| I versioned the databases and my use case didn't require
| strong consistency.
| iskela wrote:
| Wouldnt e-tag version numbers also work? Or just havkng .jsom
| with version metadata next to the db blob? No need to sync
| clocks. Just GET the small db-ver.json and compare version
| details?
| viccis wrote:
| > One of the things that is not immediately obvious from Lambda
| is that it has a local /tmp directory that you can read from
| and write to.
|
| The other big thing a lot of people don't know about Python on
| Lambda is that your global scope is also persisted for that
| execution context's lifetime like /tmp is. I ran into issues at
| one point with Lambdas that processed a high volume of data
| getting intermittent errors connecting to S3. An AWS engineer
| told me to cache my boto3 stuff (session, client, resources,
| etc.) in the global namespace, and that solved the problem
| overnight.
| moduspol wrote:
| We have that issue at work, though I solved it by including the
| sqlite database within the container image that we use. We then
| deploy the new container image (with the same code as before,
| but with a revised database file) at most every fifteen
| minutes.
|
| This gives you an atomic point at which you are 100% confident
| all instances are using the same database, and by provisioning
| concurrency, you can also avoid a "thundering herd" of
| instances all fetching from the file on S3 at startup (which
| can otherwise lead to throttling).
|
| Of course, that's only feasible if it's acceptable that your
| data can be stale for some number of minutes, but if you're
| caching the way you are, and periodically checking S3 for an
| updated database, it probably is.
| efitz wrote:
| I actually versioned my database file - I had a small
| metadata table with version number and creation time.
|
| Then in the output from each of my other functions, I
| included the database version number. So all my output could
| be subsequently normalized by re-running the same input
| versus an arbitrary version of the database file.
| jrochkind1 wrote:
| > "thundering herd" of instances all fetching from the file
| on S3 at startup (which can otherwise lead to throttling).
|
| Have any "thundering herd" problems with S3, including
| throttling, actually been seen?
|
| I think S3 is advertised to have no concurrent connection
| limit, and support up to at least 5,500 GETs per second (per
| "prefix", which I'm confused about what that means exactly in
| practice). I don't think S3 ever applies intentional
| throttling, although of course if you exceed it's capacity to
| deliver data you will see "natural" throttling.
|
| Do you have a fleet big enough that you might be exceeding
| those limits, or have people experienced problems even well
| under these limits, or is it just precautionary?
| moduspol wrote:
| Sorry--the throttling was at the AWS Lambda layer, not S3.
| We were being throttled because we'd deploy a new container
| image and suddenly thousands of them are all simultaneously
| trying to pull the database file from S3.
|
| We aim to return a response in the single digit
| milliseconds and sometimes get tens of thousands of
| requests per second, so even if it only takes a second or
| two to fetch that file from S3, the request isn't getting
| served while it's happening, and new requests are coming
| in.
|
| You very quickly hit your Lambda concurrency limit and get
| throttled just waiting for your instances to fetch the
| file, even though logically you're doing exactly what you
| planned to.
|
| By having the file exist already in the container image,
| you lean on AWS's existing tools for a phased rollout to
| replace portions of your deployment at a time, and every
| one is responding in single digit milliseconds from its
| very first request.
|
| EDIT: The same technique could be applied for other
| container management systems, but for stuff like Kubernetes
| or ECS, it might be simpler to use OP's method with a
| readiness check that only returns true if you fetched the
| file successfully. And maybe some other logic to do
| something if your file gets too stale, or you're failing to
| fetch updates for some reason.
| up2isomorphism wrote:
| It is so painful to see so many people make wrong use of S3.
| lcnPylGDnU4H9OF wrote:
| Is there a list of correct uses of s3 we can all follow?
| akira2501 wrote:
| Yep. Right here. [0].
|
| Generally people ignore the per PUT and GET pricing on S3
| along with the higher latency since it's a "global"
| service. If your objects are small then you're almost
| always benefited from using DynamoDB as the GET pricing and
| latency are far more favorable, as long as you don't mind
| the region dependency or the multi region setup.
|
| [0]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/
| optimi...
| compootr wrote:
| storing massive amounts of unstructured data
| refulgentis wrote:
| Are we sure edge computing providers have io_uring enabled? It is
| disabled in inter alia, ChromeOS and Android, because it's been a
| significant source of vulnerabilities. Seems deadly in a multi
| tenant environment.
| ncruces wrote:
| Their goal is to run this on their own cloud.
|
| Despite their lofty claims about community building, their
| projects are very much about forwarding their use case.
|
| Given that SQLite is public domain, they're not required to
| give anything back. So, it's very cool that they're making
| parts of their tech FOSS.
|
| But I've yet to see anything coming from them that isn't "just
| because we need it, and SQLite wouldn't do it for us."
|
| There's little concern about making things useful to others,
| and very little community consensus about any of it.
|
| https://turso.tech/
| tracker1 wrote:
| Of course they are scratching their own itch, so to speak.
| Thats what companies do. I think the fact that they are doing
| so much in the open is the indication of good stewardship
| itself. I'm not sure what else they would do or release that
| they didn't need internally. For that matter, I'm not really
| aware of many significant contributions to FLOSS at all that
| aren't initially intended for company use, that's kinda how
| it works. Where I'm surprised here is how much secret sauce
| Turso is sharing at all.
| ncruces wrote:
| I have no problem with them scratching their itch. That's
| par for the course.
|
| I'm salty about them describing the SQLite licensing,
| development model, and code of ethics as almost toxic,
| setting up a separate entity with a website and a manifesto
| promising to do better, and then folding "libSQL into the
| Turso family" within the year.
|
| They forked, played politics, added a few features (with
| some ill-considered incompatibilities), and properly
| documented _zero_ of them.
|
| And I'm salty because I'm actually interested in some of
| those features, and they're impossible to use without
| proper documentation.
|
| I've had _much_ better luck interacting with SQLite
| developers in the SQLite forum.
| nindalf wrote:
| I think you've taken the most cynical view possible.
|
| SQLite is open source but not open contribution. So if they
| "won't do it for us" and "we need it", what else are they
| supposed to do? They're offering SQLite in the cloud,
| obviously they need to offer a streaming backup solution. Is
| there something wrong in building that?
|
| Alternatively, do you want them to reimplement features in
| SQLite already built out by the SQLite team?
|
| Really sounds like you're complaining about getting an open
| source, MIT licensed, open contribution bit of software _for
| free_ that you 're under no obligation to use. And here you
| are complaining about "community consensus".
| refulgentis wrote:
| I may be able to shed some light.
|
| It seems they proposed a way to resolve the contradiction I
| raised (io_uring isn't supported on cloud providers) with
| the ground reality in the company's blog post.
|
| Namely, that:
|
| * it is intended for edge function deployment.
|
| * the paper they mention that is informing their decision
| to rewrite is based on perf improvements in the longtail by
| using io_uring.
|
| I framed it as "Are we sure...?", but truth is, I know
| providers don't have io_uring support. This is relatively
| straightforward to derive from edge functions are well-
| known to be multi-tenant.
|
| A cursory search shows unresolved AWS tickets re: this, and
| multiple announcements from Google about how it has been
| disabled in gCloud.
|
| Thus, it is likely they understood I was framing it
| politely, and that there was a clear contradiction here,
| hence their reply to me, raising a potential resolution for
| that contradiction, a resolution I hadn't considered.
|
| I don't see anything complaining, or bringing up, or
| implicitly denying, all the other stuff ("open source",
| "MIT licensed", "open contribution", "software for free"
| "under no obligation to use."). In fact, they explicitly
| indicated they completely agree with that view ("it's very
| cool that they're making parts of their tech FOSS.")
| adamrezich wrote:
| The problem is that SQLite already exists, and is public
| domain. It's a fantastic piece of software engineering that
| has had a lot of time and effort put into making it great.
|
| The pitch for these SQLite alternatives is:
|
| - SQLite is public domain so there's no problem with us
| rewriting it
|
| - We're going to rewrite it in Rust because that's going to
| make it inherently better (and don't question this)
|
| - We're going to MIT license our rewrite because there's
| various reasons why people would rather use MIT-licensed
| code than public domain code (but SQLite being public
| domain means we get to do all of this rewriting and re-
| licensing to begin with)
|
| - Also we're going to extend SQLite to be "cloud-capable"
| because that's our business use-case, even though it's
| completely at odds with SQLite's intended use-case
|
| - Also we're going to foster a "community" around our
| rewrite(-in-progress) (because people still think this is
| something desirable for some reason, as though a nonzero
| part of SQLite's greatness is that it operates entirely
| without having do deal with "nerd drama" that such
| "communities" inevitably breed)
|
| - Also, we replaced that pesky, unsavory-to-our-
| sensibilities "Code of Ethics" with a bog-standard "Code of
| Conduct"--because, again: "community"
|
| - But we're not going to even _attempt_ to replicate the
| rigorous, arduous, and closed-source testing that goes into
| developing SQLite (which really makes up the bulk of its
| engineering effort)
|
| - We've made some progress toward all of this, but it's
| nowhere near done yet
|
| - But we're announcing it now anyway, because "hype" and
| "community" and "modern"
|
| - Also, look at our microbenchmarks that show that our
| unfinished SQLite rewrite is already faster than SQLite
| (because we haven't reimplemented everything from SQLite
| yet) (and also we don't plan to reimplement _everything_
| from SQLite anyway)
|
| I find it really odd that I can only seems to find
| criticism of any of this here on HN, and in a couple of
| reddit threads. It's kind of like when there was that Godot
| controversy awhile back, and some people made a fork and
| announced it, despite being identical aside from the Code
| of Conduct. Merely _announcing_ a fork /rewrite of existing
| open-source software as a means of "building hype" and
| "creating a community", while benefiting from untold man-
| hours of work done by others, strikes me personally as
| rather untoward behavior--regardless of whether the
| reasoning behind said fork/rewrite is ideological (Redot)
| or business (libSQL/"Limbo").
|
| Software--especially software used to create other software
| --should be lauded for its engineering efforts, rather than
| its ability to "build hype" or "build and foster online
| 'community' 'engagement'". If SQLite was abandonware, then
| it would be a somewhat different story, but SQLite is an
| amazing piece of software--perhaps the most widely-used _on
| the planet_ --that is still being actively developed and
| supported. So, piggybacking on its success, without having
| concrete, demonstrable improvements to its functionality,
| comes across as... well, I can't think of another polite,
| applicable adjective than "untoward".
| chambers wrote:
| Not an expert at all: I wonder if getting the perf requires
| trading-off some (advertised) safety. IO uring has been noted
| to be confusing with async in Rust,
| https://news.ycombinator.com/item?id=41992975
|
| I'm reminded of how Confluent advertised Kafka as a database.
| They quietly externalized key guarantees of an RDBMS onto their
| customers, who were then saddled with implementing those
| guarantees in application level logic. By obscuring the trade-
| offs, Confluent made developers feel they could have their cake
| and eat it too.
| toast0 wrote:
| IMHO, most of io_uring's performance should come from
| reducing transition between kernel and userspace. There
| doesn't need to be a safety tradeoff there (although in
| practice, there have been several safety issues). There may
| be a trade off against having a simple concurrency model;
| without io_uring you can't really request an accept and not
| handle the response that comes back, because a syscall is
| necessarily synchronous from the point of view of the caller;
| the calling thread can't continue until it gets a response,
| even if that's EINTR.
| refulgentis wrote:
| Out of my league / knowledge, but a tidbit that you might
| understand better: last time I mentioned this, someone said
| something about SELinux can't do security checks and
| implied it's some fundamental mismatch, rather than some
| work SELinux can do
| eyberg wrote:
| Anything new is going to have people poking and prodding at it.
| It doesn't mean that the concept is 'bad'.
|
| Linux has desperately needed an interface like io_uring for
| decades. Async IO was in Windows NT 3.1 in 1993.
| refulgentis wrote:
| Might've replied to the wrong comment: I don't think io_uring
| is bad, and the comment doesn't contain 'bad', and I
| certainly don't think async IO is bad :)
| saurik wrote:
| Amazon runs every Lambda function in it's own microVM.
| chambers wrote:
| One small comment: it may be worth disclaiming that one of the
| two cited researchers is the author's boss.
|
| It's a small detail, but I mistakenly thought the author and the
| researchers were unrelated until I read a bit more
| bawolff wrote:
| So silly question - if i understand right, the idea is you can do
| other stuff while i/o is working async.
|
| When working on a database, don't you want to wait for the
| transaction to complete before continuing on? How does this
| affect durability of transactions? Or do i just have the wrong
| mental model for this.
| bjornsing wrote:
| I think the OP is about a runtime that runs hundreds of
| programs concurrently. When one program is waiting for a
| transaction other programs can execute.
| graemep wrote:
| From the paper it looks like this is for read heavy workloads
| (testing write performance is "future work") and I think for
| network file systems which will add latency.
| mamcx wrote:
| The complex thing with a transactional db is that _many_
| concurrent transactions (should be) executed simultaneously,
| and that mix that single query tx and the one that loads 1
| million rows.
| bawolff wrote:
| The sqlite model is that only one write transaction can be
| run at a time. That's kind of a defining trade-off because it
| allows simplifying locking.
| scheme271 wrote:
| One of the nice things about sqlite is that there is a very
| extensive test suite that extensively tests it. The question is
| whether the rewrite have something similar or will it get the
| similar testing? Especially if it uses fast but hard to write and
| potentially buggy features like io_uring.
| malkia wrote:
| ^^^ - this was my first reaction too. I wonder how they would
| ensure the same level of quality (e.g. not just safe code due
| to Rust)
| 01HNNWZ0MV43FF wrote:
| I wonder why Limbo has an installer script and isn't just `cargo
| install limbo`
| 01HNNWZ0MV43FF wrote:
| Update: Checked out the script and it seems to just be for
| convenience and maybe compatibility with OSes that Cargo can
| compile for but not run on.
|
| Seeing a curl pipe script makes me worry it's going to ask for
| odd permissions, if I don't also see something simpler like a
| binary download or cargo install. There is a zip for Windows so
| maybe the script is just for getting the binary.
| zeroq wrote:
| Much better framing than the previous "yet another library
| rewritten in Rust"
| egeozcan wrote:
| sqlite is open source, but an important test harness is not. How
| does any alternative ensure compatibility?
|
| https://www.sqlite.org/th3.html#th3_license
| krossitalk wrote:
| I argue it's not Open Source (Freedom, not Free Beer) because
| PRs are locked and only Hipp and close contributors can merge
| code. It's openly developed, but not by the community.
| jmcqk6 wrote:
| sqlite is actually public domain.
| https://sqlite.org/copyright.html. This is also the reason
| why they are closed contribution.
|
| It's a strange combination in the free software world, but
| I'm grateful for it.
| ijlx wrote:
| You can certainly argue that, but that's not what Open Source
| or Free Software has ever been. It's about your freedoms as a
| user, you are always free to fork with a different model. I
| think the expectation of "open contributions" is quite
| damaging, to the point where peple/organizations are hesitant
| to release their software as open source at all.
| samatman wrote:
| This is a case of you deciding that open source means
| something which it does not, never has, and will not mean.
|
| I consider this an empty exercise, but if it pleases you to
| use language that way, no one can stop you.
| jefftk wrote:
| That's not what Open Source means. The development team not
| being willing to review your pull requests does not limit
| your freedom to use sqlite in any way.
| jppope wrote:
| > "However, the authors argue that KV doesn't suit all problem
| domains. Mapping table-like data into a KV model leads to poor
| developer experience and (de)serialization costs. SQL would be
| much better, and SQLite being embedded solves this--it can be
| directly embedded in the serverless runtime."
|
| The levels people will go to to so that they can use SQL never
| ceases to astound me.
| IshKebab wrote:
| > Mapping table-like data into a KV model leads to poor
| developer experience
|
| This is definitely true in my experience. Unless you are
| literally storing a hashmap, KV databases are a pain to use
| directly. I think they're meant to be building blocks for other
| databases.
| liontwist wrote:
| Relations are one of the most efficient and flexible ways to
| represent arbitrary graphs.
|
| In my experience Everyone goes to incredible lengths to avoid
| sql, in ignorance of this fact.
|
| They store (key, value) tables they they then extract into an
| object graph.
| LudwigNagasena wrote:
| Relations are cool, but SQL DBs either prohibit or make it
| hard to present relations inside relations, which is one of
| the most common ways of structuring data in everyday
| programming life. You can see people suggesting writing SQL
| functions that convert rows to json or using ORM simply to
| query a one-to-many relationship, that's crazy:
| https://stackoverflow.com/questions/54601529/efficiently-
| map...
| bawolff wrote:
| Any tool can be used incorrectly...
|
| Im not sure what relations in relations mean. Do you just
| mean M:N?
| LudwigNagasena wrote:
| I mean 'tables' inside 'tables', 0NF. If I have a list of
| restaurants with their reviews, naturally, I have a table
| of restaurants, each of which contains a table of
| reviews. If I have a table of nested comments, naturally,
| I have a table of comments, each of which recursively
| contains a table of child comments.
|
| Those are some of the most common use cases for data
| presentation; and SQL-based DBs are not flexible enough
| to handle them in a straightforward way.
| toast0 wrote:
| I mean, if SQL is a good match for the data, embedding a
| database engine designed to be embedded doesn't seem like too
| far of a reach?
| bawolff wrote:
| Trying to put relational data into K-V store is always going to
| be a crazy mess. Even if the language used was not SQL, it
| would still obviously be a good idea to use the right type of
| database for your problem space.
| conradev wrote:
| I wonder if using a different allocator in SQLite
| (https://www.sqlite.org/malloc.html) would improve performance in
| their workload to a greater degree than any amount of Rust or
| io_uring.
|
| I can understand how io_uring increases server utilization, but I
| fail to see how it will make any individual query faster.
| TheRealPomax wrote:
| So... did they talk to the SQLite maintainer to see how much of
| this can be taken on board? Because it seems weird to omit that
| if they did, and it seems even weirder if they didn't after
| benchmarking showed two orders of magnitude improvement.
|
| (Even if that information should only be a line item in the
| paper, I don't see one... and a post _about_ the paper should
| definitely have something to link to?)
| IshKebab wrote:
| They're rewriting SQLite. They're going to put their effort
| into that surely? Also SQLite explicitly state that they do not
| accept outside contributions, so there's no point trying.
| meneer_oke wrote:
| Just this weekend I had the perfect problem for sqlite,
| unfortunately 200MB and above it became unwieldy.
| djsnoopy wrote:
| 200mb? Really? Do you really mean 200tb?
| samwillis wrote:
| This is a great article.
|
| There was a previous attempt to bring async io to Postgres, but
| sadly it went dormant: https://commitfest.postgresql.org/34/3316/
|
| A more recent proposal was to make it possible to swap out the
| storage manager for a custom one without having to fork the
| codebase. I.e. extensions can provide an alternative.
| https://commitfest.postgresql.org/49/4428/
|
| This would allow for custom ones that do async IO to any custom
| storage layer.
|
| There are a lot of interested parties in the new proposal (it's
| come out of Neon, as they run a fork with a custom storage
| manager). With the move to separate compute from storage this
| becomes something many Postgres orgs will want to be able to do.
|
| A change of core to use async io becomes slightly less relevant
| when you can swap out the whole storage manager.
|
| (Note that the storage manager only handles pages in the heap
| tables, not the WAL. There is more exploration needed there to
| make the WAL extendable/replaceable)
___________________________________________________________________
(page generated 2024-12-16 23:00 UTC)