hngopher.com

       [HN Gopher] In Search of a Faster SQLite
       ___________________________________________________________________
        
       In Search of a Faster SQLite
        
       Author : avinassh
       Score  : 164 points
       Date   : 2024-12-16 16:46 UTC (6 hours ago)
        
 (HTM) web link (avi.im)
 (TXT) w3m dump (avi.im)
        
       | high_byte wrote:
       | "The benefits become noticeable only at p999 onwards; for p90 and
       | p99, the performance is almost the same as SQLite."
       | 
       | I hate to be a hater, and I love sqlite and optimizations, but
       | this is true.
        
         | bawolff wrote:
         | The benchmark seems a bit weird. Fetch 100 results from a table
         | with no filtering,sorting,or anything? That feels like the IO
         | is going to be really small anyways.
        
           | tsegratis wrote:
           | they compare threads and coroutines for limbo. threads have
           | much worse p90 latencies since they context switch.... im not
           | sure they can draw any conclusions except that coroutines are
           | faster (of course)
        
       | efitz wrote:
       | The article discusses the specific use case of serverless
       | computing, e.g. AWS Lambda, and how a central database doesn't
       | always work well with apps constructed in a serverless fashion.
       | 
       | I was immediately interested in this post because 6-7 years ago I
       | worked on this very problem- I needed to ingest a set of complex
       | hierarchical files that could change at any time, and I needed to
       | "query" them to extract particular information. FaaS is expensive
       | for computationally expensive tasks, and it also didn't make
       | sense to load big XML files and parse them every time I needed to
       | do a lookup in any instance of my Lambda function.
       | 
       | My solution was to have a central function on a timer that read
       | and parsed the files every couple of minutes, loaded the data
       | into a SQLite database, indexed it, and put the file in S3.
       | 
       | Now my functions just downloaded the file from S3, if it was
       | newer than the local copy or on a cold start, and did the lookup.
       | Blindingly fast and no duplication of effort.
       | 
       | One of the things that is not immediately obvious from Lambda is
       | that it has a local /tmp directory that you can read from and
       | write to. Also the Python runtime includes SQLite; no need to
       | upload code besides your function.
       | 
       | I'm excited that work is going on that might make such solutions
       | even faster; I think it's a very useful pattern for distributed
       | computing.
        
         | rmbyrro wrote:
         | > Now my functions just downloaded the file from S3, if it was
         | newer than the local copy
         | 
         | if you have strong consistency requirements, this doesn't work.
         | synchronizing clocks reliably between different servers is
         | surprisingly hard. you might end up working with stale data.
         | might work for use cases that can accept eventual consistency.
        
           | Spivak wrote:
           | I don't really know if that matters for this use case. Just
           | by the very nature of source_data -> processing -> dest_data
           | taking nonzero time anything consuming dest_data must already
           | be tolerant of some amount of lag. And how it's coded
           | guarantees you can never observe dest_data going new -> old
           | -> new.
        
           | jedberg wrote:
           | One of the announcements from AWS this year at Re:invent is
           | that they now can guarantee that the instances clocks are
           | synced within microseconds of each other. Close enough that
           | you can rely on it for distributed timekeeping.
        
           | Dylan16807 wrote:
           | If you have strong consistency requirements, then it doesn't
           | work by the very nature of making multiple copies of the
           | database. Even if the clocks are perfect. (Though the clocks
           | are probably close enough that it doesn't matter.)
        
           | efitz wrote:
           | I versioned the databases and my use case didn't require
           | strong consistency.
        
           | iskela wrote:
           | Wouldnt e-tag version numbers also work? Or just havkng .jsom
           | with version metadata next to the db blob? No need to sync
           | clocks. Just GET the small db-ver.json and compare version
           | details?
        
         | viccis wrote:
         | > One of the things that is not immediately obvious from Lambda
         | is that it has a local /tmp directory that you can read from
         | and write to.
         | 
         | The other big thing a lot of people don't know about Python on
         | Lambda is that your global scope is also persisted for that
         | execution context's lifetime like /tmp is. I ran into issues at
         | one point with Lambdas that processed a high volume of data
         | getting intermittent errors connecting to S3. An AWS engineer
         | told me to cache my boto3 stuff (session, client, resources,
         | etc.) in the global namespace, and that solved the problem
         | overnight.
        
         | moduspol wrote:
         | We have that issue at work, though I solved it by including the
         | sqlite database within the container image that we use. We then
         | deploy the new container image (with the same code as before,
         | but with a revised database file) at most every fifteen
         | minutes.
         | 
         | This gives you an atomic point at which you are 100% confident
         | all instances are using the same database, and by provisioning
         | concurrency, you can also avoid a "thundering herd" of
         | instances all fetching from the file on S3 at startup (which
         | can otherwise lead to throttling).
         | 
         | Of course, that's only feasible if it's acceptable that your
         | data can be stale for some number of minutes, but if you're
         | caching the way you are, and periodically checking S3 for an
         | updated database, it probably is.
        
           | efitz wrote:
           | I actually versioned my database file - I had a small
           | metadata table with version number and creation time.
           | 
           | Then in the output from each of my other functions, I
           | included the database version number. So all my output could
           | be subsequently normalized by re-running the same input
           | versus an arbitrary version of the database file.
        
           | jrochkind1 wrote:
           | > "thundering herd" of instances all fetching from the file
           | on S3 at startup (which can otherwise lead to throttling).
           | 
           | Have any "thundering herd" problems with S3, including
           | throttling, actually been seen?
           | 
           | I think S3 is advertised to have no concurrent connection
           | limit, and support up to at least 5,500 GETs per second (per
           | "prefix", which I'm confused about what that means exactly in
           | practice). I don't think S3 ever applies intentional
           | throttling, although of course if you exceed it's capacity to
           | deliver data you will see "natural" throttling.
           | 
           | Do you have a fleet big enough that you might be exceeding
           | those limits, or have people experienced problems even well
           | under these limits, or is it just precautionary?
        
             | moduspol wrote:
             | Sorry--the throttling was at the AWS Lambda layer, not S3.
             | We were being throttled because we'd deploy a new container
             | image and suddenly thousands of them are all simultaneously
             | trying to pull the database file from S3.
             | 
             | We aim to return a response in the single digit
             | milliseconds and sometimes get tens of thousands of
             | requests per second, so even if it only takes a second or
             | two to fetch that file from S3, the request isn't getting
             | served while it's happening, and new requests are coming
             | in.
             | 
             | You very quickly hit your Lambda concurrency limit and get
             | throttled just waiting for your instances to fetch the
             | file, even though logically you're doing exactly what you
             | planned to.
             | 
             | By having the file exist already in the container image,
             | you lean on AWS's existing tools for a phased rollout to
             | replace portions of your deployment at a time, and every
             | one is responding in single digit milliseconds from its
             | very first request.
             | 
             | EDIT: The same technique could be applied for other
             | container management systems, but for stuff like Kubernetes
             | or ECS, it might be simpler to use OP's method with a
             | readiness check that only returns true if you fetched the
             | file successfully. And maybe some other logic to do
             | something if your file gets too stale, or you're failing to
             | fetch updates for some reason.
        
         | up2isomorphism wrote:
         | It is so painful to see so many people make wrong use of S3.
        
           | lcnPylGDnU4H9OF wrote:
           | Is there a list of correct uses of s3 we can all follow?
        
             | akira2501 wrote:
             | Yep. Right here. [0].
             | 
             | Generally people ignore the per PUT and GET pricing on S3
             | along with the higher latency since it's a "global"
             | service. If your objects are small then you're almost
             | always benefited from using DynamoDB as the GET pricing and
             | latency are far more favorable, as long as you don't mind
             | the region dependency or the multi region setup.
             | 
             | [0]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/
             | optimi...
        
             | compootr wrote:
             | storing massive amounts of unstructured data
        
       | refulgentis wrote:
       | Are we sure edge computing providers have io_uring enabled? It is
       | disabled in inter alia, ChromeOS and Android, because it's been a
       | significant source of vulnerabilities. Seems deadly in a multi
       | tenant environment.
        
         | ncruces wrote:
         | Their goal is to run this on their own cloud.
         | 
         | Despite their lofty claims about community building, their
         | projects are very much about forwarding their use case.
         | 
         | Given that SQLite is public domain, they're not required to
         | give anything back. So, it's very cool that they're making
         | parts of their tech FOSS.
         | 
         | But I've yet to see anything coming from them that isn't "just
         | because we need it, and SQLite wouldn't do it for us."
         | 
         | There's little concern about making things useful to others,
         | and very little community consensus about any of it.
         | 
         | https://turso.tech/
        
           | tracker1 wrote:
           | Of course they are scratching their own itch, so to speak.
           | Thats what companies do. I think the fact that they are doing
           | so much in the open is the indication of good stewardship
           | itself. I'm not sure what else they would do or release that
           | they didn't need internally. For that matter, I'm not really
           | aware of many significant contributions to FLOSS at all that
           | aren't initially intended for company use, that's kinda how
           | it works. Where I'm surprised here is how much secret sauce
           | Turso is sharing at all.
        
             | ncruces wrote:
             | I have no problem with them scratching their itch. That's
             | par for the course.
             | 
             | I'm salty about them describing the SQLite licensing,
             | development model, and code of ethics as almost toxic,
             | setting up a separate entity with a website and a manifesto
             | promising to do better, and then folding "libSQL into the
             | Turso family" within the year.
             | 
             | They forked, played politics, added a few features (with
             | some ill-considered incompatibilities), and properly
             | documented _zero_ of them.
             | 
             | And I'm salty because I'm actually interested in some of
             | those features, and they're impossible to use without
             | proper documentation.
             | 
             | I've had _much_ better luck interacting with SQLite
             | developers in the SQLite forum.
        
           | nindalf wrote:
           | I think you've taken the most cynical view possible.
           | 
           | SQLite is open source but not open contribution. So if they
           | "won't do it for us" and "we need it", what else are they
           | supposed to do? They're offering SQLite in the cloud,
           | obviously they need to offer a streaming backup solution. Is
           | there something wrong in building that?
           | 
           | Alternatively, do you want them to reimplement features in
           | SQLite already built out by the SQLite team?
           | 
           | Really sounds like you're complaining about getting an open
           | source, MIT licensed, open contribution bit of software _for
           | free_ that you 're under no obligation to use. And here you
           | are complaining about "community consensus".
        
             | refulgentis wrote:
             | I may be able to shed some light.
             | 
             | It seems they proposed a way to resolve the contradiction I
             | raised (io_uring isn't supported on cloud providers) with
             | the ground reality in the company's blog post.
             | 
             | Namely, that:
             | 
             | * it is intended for edge function deployment.
             | 
             | * the paper they mention that is informing their decision
             | to rewrite is based on perf improvements in the longtail by
             | using io_uring.
             | 
             | I framed it as "Are we sure...?", but truth is, I know
             | providers don't have io_uring support. This is relatively
             | straightforward to derive from edge functions are well-
             | known to be multi-tenant.
             | 
             | A cursory search shows unresolved AWS tickets re: this, and
             | multiple announcements from Google about how it has been
             | disabled in gCloud.
             | 
             | Thus, it is likely they understood I was framing it
             | politely, and that there was a clear contradiction here,
             | hence their reply to me, raising a potential resolution for
             | that contradiction, a resolution I hadn't considered.
             | 
             | I don't see anything complaining, or bringing up, or
             | implicitly denying, all the other stuff ("open source",
             | "MIT licensed", "open contribution", "software for free"
             | "under no obligation to use."). In fact, they explicitly
             | indicated they completely agree with that view ("it's very
             | cool that they're making parts of their tech FOSS.")
        
             | adamrezich wrote:
             | The problem is that SQLite already exists, and is public
             | domain. It's a fantastic piece of software engineering that
             | has had a lot of time and effort put into making it great.
             | 
             | The pitch for these SQLite alternatives is:
             | 
             | - SQLite is public domain so there's no problem with us
             | rewriting it
             | 
             | - We're going to rewrite it in Rust because that's going to
             | make it inherently better (and don't question this)
             | 
             | - We're going to MIT license our rewrite because there's
             | various reasons why people would rather use MIT-licensed
             | code than public domain code (but SQLite being public
             | domain means we get to do all of this rewriting and re-
             | licensing to begin with)
             | 
             | - Also we're going to extend SQLite to be "cloud-capable"
             | because that's our business use-case, even though it's
             | completely at odds with SQLite's intended use-case
             | 
             | - Also we're going to foster a "community" around our
             | rewrite(-in-progress) (because people still think this is
             | something desirable for some reason, as though a nonzero
             | part of SQLite's greatness is that it operates entirely
             | without having do deal with "nerd drama" that such
             | "communities" inevitably breed)
             | 
             | - Also, we replaced that pesky, unsavory-to-our-
             | sensibilities "Code of Ethics" with a bog-standard "Code of
             | Conduct"--because, again: "community"
             | 
             | - But we're not going to even _attempt_ to replicate the
             | rigorous, arduous, and closed-source testing that goes into
             | developing SQLite (which really makes up the bulk of its
             | engineering effort)
             | 
             | - We've made some progress toward all of this, but it's
             | nowhere near done yet
             | 
             | - But we're announcing it now anyway, because "hype" and
             | "community" and "modern"
             | 
             | - Also, look at our microbenchmarks that show that our
             | unfinished SQLite rewrite is already faster than SQLite
             | (because we haven't reimplemented everything from SQLite
             | yet) (and also we don't plan to reimplement _everything_
             | from SQLite anyway)
             | 
             | I find it really odd that I can only seems to find
             | criticism of any of this here on HN, and in a couple of
             | reddit threads. It's kind of like when there was that Godot
             | controversy awhile back, and some people made a fork and
             | announced it, despite being identical aside from the Code
             | of Conduct. Merely _announcing_ a fork /rewrite of existing
             | open-source software as a means of "building hype" and
             | "creating a community", while benefiting from untold man-
             | hours of work done by others, strikes me personally as
             | rather untoward behavior--regardless of whether the
             | reasoning behind said fork/rewrite is ideological (Redot)
             | or business (libSQL/"Limbo").
             | 
             | Software--especially software used to create other software
             | --should be lauded for its engineering efforts, rather than
             | its ability to "build hype" or "build and foster online
             | 'community' 'engagement'". If SQLite was abandonware, then
             | it would be a somewhat different story, but SQLite is an
             | amazing piece of software--perhaps the most widely-used _on
             | the planet_ --that is still being actively developed and
             | supported. So, piggybacking on its success, without having
             | concrete, demonstrable improvements to its functionality,
             | comes across as... well, I can't think of another polite,
             | applicable adjective than "untoward".
        
         | chambers wrote:
         | Not an expert at all: I wonder if getting the perf requires
         | trading-off some (advertised) safety. IO uring has been noted
         | to be confusing with async in Rust,
         | https://news.ycombinator.com/item?id=41992975
         | 
         | I'm reminded of how Confluent advertised Kafka as a database.
         | They quietly externalized key guarantees of an RDBMS onto their
         | customers, who were then saddled with implementing those
         | guarantees in application level logic. By obscuring the trade-
         | offs, Confluent made developers feel they could have their cake
         | and eat it too.
        
           | toast0 wrote:
           | IMHO, most of io_uring's performance should come from
           | reducing transition between kernel and userspace. There
           | doesn't need to be a safety tradeoff there (although in
           | practice, there have been several safety issues). There may
           | be a trade off against having a simple concurrency model;
           | without io_uring you can't really request an accept and not
           | handle the response that comes back, because a syscall is
           | necessarily synchronous from the point of view of the caller;
           | the calling thread can't continue until it gets a response,
           | even if that's EINTR.
        
             | refulgentis wrote:
             | Out of my league / knowledge, but a tidbit that you might
             | understand better: last time I mentioned this, someone said
             | something about SELinux can't do security checks and
             | implied it's some fundamental mismatch, rather than some
             | work SELinux can do
        
         | eyberg wrote:
         | Anything new is going to have people poking and prodding at it.
         | It doesn't mean that the concept is 'bad'.
         | 
         | Linux has desperately needed an interface like io_uring for
         | decades. Async IO was in Windows NT 3.1 in 1993.
        
           | refulgentis wrote:
           | Might've replied to the wrong comment: I don't think io_uring
           | is bad, and the comment doesn't contain 'bad', and I
           | certainly don't think async IO is bad :)
        
         | saurik wrote:
         | Amazon runs every Lambda function in it's own microVM.
        
       | chambers wrote:
       | One small comment: it may be worth disclaiming that one of the
       | two cited researchers is the author's boss.
       | 
       | It's a small detail, but I mistakenly thought the author and the
       | researchers were unrelated until I read a bit more
        
       | bawolff wrote:
       | So silly question - if i understand right, the idea is you can do
       | other stuff while i/o is working async.
       | 
       | When working on a database, don't you want to wait for the
       | transaction to complete before continuing on? How does this
       | affect durability of transactions? Or do i just have the wrong
       | mental model for this.
        
         | bjornsing wrote:
         | I think the OP is about a runtime that runs hundreds of
         | programs concurrently. When one program is waiting for a
         | transaction other programs can execute.
        
         | graemep wrote:
         | From the paper it looks like this is for read heavy workloads
         | (testing write performance is "future work") and I think for
         | network file systems which will add latency.
        
         | mamcx wrote:
         | The complex thing with a transactional db is that _many_
         | concurrent transactions (should be) executed simultaneously,
         | and that mix that single query tx and the one that loads 1
         | million rows.
        
           | bawolff wrote:
           | The sqlite model is that only one write transaction can be
           | run at a time. That's kind of a defining trade-off because it
           | allows simplifying locking.
        
       | scheme271 wrote:
       | One of the nice things about sqlite is that there is a very
       | extensive test suite that extensively tests it. The question is
       | whether the rewrite have something similar or will it get the
       | similar testing? Especially if it uses fast but hard to write and
       | potentially buggy features like io_uring.
        
         | malkia wrote:
         | ^^^ - this was my first reaction too. I wonder how they would
         | ensure the same level of quality (e.g. not just safe code due
         | to Rust)
        
       | 01HNNWZ0MV43FF wrote:
       | I wonder why Limbo has an installer script and isn't just `cargo
       | install limbo`
        
         | 01HNNWZ0MV43FF wrote:
         | Update: Checked out the script and it seems to just be for
         | convenience and maybe compatibility with OSes that Cargo can
         | compile for but not run on.
         | 
         | Seeing a curl pipe script makes me worry it's going to ask for
         | odd permissions, if I don't also see something simpler like a
         | binary download or cargo install. There is a zip for Windows so
         | maybe the script is just for getting the binary.
        
       | zeroq wrote:
       | Much better framing than the previous "yet another library
       | rewritten in Rust"
        
       | egeozcan wrote:
       | sqlite is open source, but an important test harness is not. How
       | does any alternative ensure compatibility?
       | 
       | https://www.sqlite.org/th3.html#th3_license
        
         | krossitalk wrote:
         | I argue it's not Open Source (Freedom, not Free Beer) because
         | PRs are locked and only Hipp and close contributors can merge
         | code. It's openly developed, but not by the community.
        
           | jmcqk6 wrote:
           | sqlite is actually public domain.
           | https://sqlite.org/copyright.html. This is also the reason
           | why they are closed contribution.
           | 
           | It's a strange combination in the free software world, but
           | I'm grateful for it.
        
           | ijlx wrote:
           | You can certainly argue that, but that's not what Open Source
           | or Free Software has ever been. It's about your freedoms as a
           | user, you are always free to fork with a different model. I
           | think the expectation of "open contributions" is quite
           | damaging, to the point where peple/organizations are hesitant
           | to release their software as open source at all.
        
           | samatman wrote:
           | This is a case of you deciding that open source means
           | something which it does not, never has, and will not mean.
           | 
           | I consider this an empty exercise, but if it pleases you to
           | use language that way, no one can stop you.
        
           | jefftk wrote:
           | That's not what Open Source means. The development team not
           | being willing to review your pull requests does not limit
           | your freedom to use sqlite in any way.
        
       | jppope wrote:
       | > "However, the authors argue that KV doesn't suit all problem
       | domains. Mapping table-like data into a KV model leads to poor
       | developer experience and (de)serialization costs. SQL would be
       | much better, and SQLite being embedded solves this--it can be
       | directly embedded in the serverless runtime."
       | 
       | The levels people will go to to so that they can use SQL never
       | ceases to astound me.
        
         | IshKebab wrote:
         | > Mapping table-like data into a KV model leads to poor
         | developer experience
         | 
         | This is definitely true in my experience. Unless you are
         | literally storing a hashmap, KV databases are a pain to use
         | directly. I think they're meant to be building blocks for other
         | databases.
        
         | liontwist wrote:
         | Relations are one of the most efficient and flexible ways to
         | represent arbitrary graphs.
         | 
         | In my experience Everyone goes to incredible lengths to avoid
         | sql, in ignorance of this fact.
         | 
         | They store (key, value) tables they they then extract into an
         | object graph.
        
           | LudwigNagasena wrote:
           | Relations are cool, but SQL DBs either prohibit or make it
           | hard to present relations inside relations, which is one of
           | the most common ways of structuring data in everyday
           | programming life. You can see people suggesting writing SQL
           | functions that convert rows to json or using ORM simply to
           | query a one-to-many relationship, that's crazy:
           | https://stackoverflow.com/questions/54601529/efficiently-
           | map...
        
             | bawolff wrote:
             | Any tool can be used incorrectly...
             | 
             | Im not sure what relations in relations mean. Do you just
             | mean M:N?
        
               | LudwigNagasena wrote:
               | I mean 'tables' inside 'tables', 0NF. If I have a list of
               | restaurants with their reviews, naturally, I have a table
               | of restaurants, each of which contains a table of
               | reviews. If I have a table of nested comments, naturally,
               | I have a table of comments, each of which recursively
               | contains a table of child comments.
               | 
               | Those are some of the most common use cases for data
               | presentation; and SQL-based DBs are not flexible enough
               | to handle them in a straightforward way.
        
         | toast0 wrote:
         | I mean, if SQL is a good match for the data, embedding a
         | database engine designed to be embedded doesn't seem like too
         | far of a reach?
        
         | bawolff wrote:
         | Trying to put relational data into K-V store is always going to
         | be a crazy mess. Even if the language used was not SQL, it
         | would still obviously be a good idea to use the right type of
         | database for your problem space.
        
       | conradev wrote:
       | I wonder if using a different allocator in SQLite
       | (https://www.sqlite.org/malloc.html) would improve performance in
       | their workload to a greater degree than any amount of Rust or
       | io_uring.
       | 
       | I can understand how io_uring increases server utilization, but I
       | fail to see how it will make any individual query faster.
        
       | TheRealPomax wrote:
       | So... did they talk to the SQLite maintainer to see how much of
       | this can be taken on board? Because it seems weird to omit that
       | if they did, and it seems even weirder if they didn't after
       | benchmarking showed two orders of magnitude improvement.
       | 
       | (Even if that information should only be a line item in the
       | paper, I don't see one... and a post _about_ the paper should
       | definitely have something to link to?)
        
         | IshKebab wrote:
         | They're rewriting SQLite. They're going to put their effort
         | into that surely? Also SQLite explicitly state that they do not
         | accept outside contributions, so there's no point trying.
        
       | meneer_oke wrote:
       | Just this weekend I had the perfect problem for sqlite,
       | unfortunately 200MB and above it became unwieldy.
        
         | djsnoopy wrote:
         | 200mb? Really? Do you really mean 200tb?
        
       | samwillis wrote:
       | This is a great article.
       | 
       | There was a previous attempt to bring async io to Postgres, but
       | sadly it went dormant: https://commitfest.postgresql.org/34/3316/
       | 
       | A more recent proposal was to make it possible to swap out the
       | storage manager for a custom one without having to fork the
       | codebase. I.e. extensions can provide an alternative.
       | https://commitfest.postgresql.org/49/4428/
       | 
       | This would allow for custom ones that do async IO to any custom
       | storage layer.
       | 
       | There are a lot of interested parties in the new proposal (it's
       | come out of Neon, as they run a fork with a custom storage
       | manager). With the move to separate compute from storage this
       | becomes something many Postgres orgs will want to be able to do.
       | 
       | A change of core to use async io becomes slightly less relevant
       | when you can swap out the whole storage manager.
       | 
       | (Note that the storage manager only handles pages in the heap
       | tables, not the WAL. There is more exploration needed there to
       | make the WAL extendable/replaceable)
        
       ___________________________________________________________________
       (page generated 2024-12-16 23:00 UTC)