[HN Gopher] Show HN: SQLite Transaction Benchmarking Tool
       ___________________________________________________________________
        
       Show HN: SQLite Transaction Benchmarking Tool
        
       I wanted to make my own evaluation of what kind of performance I
       could expect from SQLite on a server and investigate the
       experimental `BEGIN CONCURRENT` branch vs the inbuilt `DEFERRED`
       and `IMMEDIATE` behaviors.  Explanatory blog post:
       https://reorchestrate.com/posts/sqlite-transactions/
        
       Author : seddonm1
       Score  : 114 points
       Date   : 2024-07-17 21:44 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | leononame wrote:
       | Thanks for the interesting article. Lots of things seem to happen
       | in SQLite land at the moment and I appreciate that the SQLite
       | team documents their quirks so openly, it gives great confidence.
       | 
       | Since I don't know where else to ask, maybe this is a good place:
       | How do async wrappers around SQLite (e.g. for node or python)
       | work? SQLite only uses synchronous I/O if I'm not mistaken. Is it
       | just a pretend async function with only synchronous code?
       | 
       | And, as a follow-up: If I have a server with say 100 incoming
       | connections that will all read from the database, I've got 100
       | readers. No problem in WAL mode. However, I still could get
       | congested by file I/O, right? Because every time a reader is
       | waiting for data from disk, I can't execute the application code
       | of another connection in a different thread since execution is
       | blocked on my current thread. Is there any benefit to having a
       | thread pool with a limit of more than $NUM_CPU readers?
       | 
       | And one more: Would you recommend actually pooling connections or
       | just opening/closing the database for each request as needed?
       | Could keeping a file handle open prevent SQLite from
       | checkpointing under certain conditions?
        
         | seddonm1 wrote:
         | Thanks.
         | 
         | All good and valid questions.
         | 
         | 1. I work mostly in Rust so I'll answer there in terms of
         | async. This library [0] uses queues to manage workload. I run a
         | modified version [1] which creates 1 writer and n reader
         | connections to a WAL backed SQLite and dispatch async
         | transactions against them. The n readers will pull work from a
         | shared common queue.
         | 
         | 2. Yes there is not much you can do about file IO but SQLite is
         | still a full database engine with caching. You could use this
         | benchmarking tool to help understand where your limits would be
         | (you can do a run against a ramdisk then against your real
         | storage).
         | 
         | 3. As per #1, I keep connections open and distribute
         | transactions across them myself. Checkpointing will only be a
         | problem under considerable sustained write load but you should
         | be able to simulate your load and observe the behavior. The
         | WAL2 branch of SQLite is intended to prevent sustained load
         | problems.
         | 
         | [0]: https://github.com/programatik29/tokio-rusqlite [1]:
         | https://github.com/seddonm1/s3ite/blob/0.5.0/src/database.rs
        
           | pdimitar wrote:
           | Valuable info and links, instant bookmarks, thank you!
           | 
           | If you don't mind me asking, why did you go with rusqlite + a
           | tokio wrapper for it and not go with sqlx?
        
             | seddonm1 wrote:
             | Whilst I love the idea of SQLX compile-time checked queries
             | it is not always practical to need a database connection to
             | compile the code in my experience. If it works for you then
             | thats great but we had a few tricky edge cases when dealing
             | with migrations etc.
             | 
             | Also, and more fundamentally, your application state is the
             | most valuable thing you have. Do whatever you feel makes
             | you most comfortable to make sure that state (and state
             | transitions) is as well understood as possible. rusqlite is
             | that for me.
        
               | pdimitar wrote:
               | Thank you, good perspective.
               | 
               | Weren't the compile-time connections to DB optional btw?
               | They could be turned off I think (last I checked, which
               | was last year admittedly).
               | 
               | My question was more about the fact that sqlx is
               | integrated with tokio out of the box and does not need an
               | extra crate like rusqlite does. But I am guessing you
               | don't mind that.
        
               | seddonm1 wrote:
               | SQLX has an offline mode where it saves the metadata of
               | the SQL database structure but then you run into risk of
               | that being out of sync with the database?
               | 
               | Yeah I just drop this one file [0] into my Tokio projects
               | and I have a SQLite with single writer/multi reader pool
               | done in a few seconds.
               | 
               | [0]: https://github.com/seddonm1/s3ite/blob/0.5.0/src/dat
               | abase.rs
        
               | pdimitar wrote:
               | Thanks again!
               | 
               | I'll be resuming my effort to build an Elixir <-> Rust
               | SQLite bridge in the next several months. Hope you won't
               | mind some questions.
        
           | leononame wrote:
           | Thanks for your answer.
           | 
           | For 1, what is a good n? More than NUM_CPU probably does not
           | make sense, right? But would I want to keep it lower?
           | 
           | Also, you dispatch transactions in your queue? You define
           | your whole workload upfront, send it to the queue and wait
           | for it to finish?
        
             | seddonm1 wrote:
             | I went through the same mental process as you and also use
             | num_cpus [0] but this is based only on intuition that is
             | likely wrong. More benchmarking is needed as my benchmarks
             | show that more parallelism only works to a point.
             | 
             | You can see how the transactions work in this example[1]. I
             | have a connection `.write()` or `.read()` which decides
             | which queue to use. I am in the process [2] of trying to do
             | a PR against rusqlite to set the default transaction
             | behavior as a result of this benchmarking so hopefully
             | `write()` will default to IMMEDIATE and `read()` remains
             | DEFERRED.
             | 
             | [0] https://docs.rs/num_cpus/latest/num_cpus/ [1]
             | https://github.com/seddonm1/s3ite/blob/0.5.0/src/s3.rs#L147
             | [2] https://github.com/rusqlite/rusqlite/pull/1532
        
         | rmbyrro wrote:
         | If you have a server with 100 cores to serve 100 connections
         | simultaneously - and really need this setup -, you should
         | probably be using Postgres or smth else.
        
           | leononame wrote:
           | It's a made up example to clarify whether I understand
           | potential congestion scenarios and limitations correctly, not
           | my actual situation.
           | 
           | If I had a server with 100 cores to serve 100 connections,
           | but each query took only 5ms, SQLite might be totally viable.
           | There's no blanket solution.
           | 
           | Edit: More importantly, SQLite async limitations come into
           | play when I have only 12 cores but 100 incoming connections,
           | and on top of querying data from SQLite, I do have other CPU
           | bound work to do with the results. If I had 100 cores, 100
           | connections to the database would be no problem at all since
           | each core could hold a connection and block without problem.
        
             | Moehassan wrote:
             | You can make SQLite scale way beyond the limitations of WAL
             | mode or even Begin Concurrent mode, all while doing
             | synchronous writes
             | 
             | https://oldmoe.blog/2024/07/08/the-write-stuff-concurrent-
             | wr...
        
               | leononame wrote:
               | If synchronous IO is blocking your CPU bound application
               | code, this won't help you. My made up example was not
               | about concurrent writes, and the concurrent reads I
               | mentioned were not my main point. For all I care, you
               | could have 100 different databases or even normal files
               | in this scenario and you read them.
               | 
               | I was wondering how the async wrappers around SQLite work
               | when SQLite itself only has synchronous IO. At least for
               | the Rust example by Op, the async part is only used when
               | awaiting a queue, but the IO itself still has the
               | potential of blocking all your application code while
               | idling.
        
             | rmbyrro wrote:
             | > 12 cores but 100 incoming connections
             | 
             | Especially when using a modern storage medium, which most
             | servers nowadays use, I doubt that filesystem I/O will be a
             | bottleneck for the vast majority of use cases.
             | 
             | I/O is extremely fast and will be negligible compared to
             | other stuff going on to serve those requests, even running
             | queries themselves.
             | 
             | The CPU work done by SQLite will vastly outshine the time
             | it takes to read/write to disk.
             | 
             | It might be a bottleneck to reading if you have a very
             | large database file, though.
        
         | matharmin wrote:
         | You get concurrency in SQLite by using multiple connections -
         | and typically a dedicated thread per connection.
         | 
         | When using async wrappers, a good solution is connection
         | pooling like you mentioned - exactly the same concept as used
         | by client->server database drivers. So you can have 5 or 10
         | read connections serving those 100 connections, with a
         | statement/transaction queue to manage spikes in load. It's
         | probably not worth having more connections than CPUs, but it
         | depends a little on whether your queries are limited by I/O or
         | CPU, and whether you have other delays in your transactions
         | (each transaction requires exclusive use of one connection
         | while it's running).
         | 
         | SQLite maintains an in-memory cache of recently-accessed pages
         | of data. However, this gets cleared on all other connections
         | whenever you write to the database, so is not that efficient
         | when you have high write loads. But the OS filesystem cache
         | will still make a massive difference here - in many cases your
         | connections will just read from the filesystem cache, which is
         | much faster than the underlying storage.
         | 
         | Open connections don't block checkpointing in SQLite. The main
         | case I'm aware of that does block it, is always having one or
         | more active transactions. I believe that's quite rare in
         | practice unless you have really high and continuous load, but
         | if you do then the WAL2 branch may be for you.
         | 
         | I feel connection pooling is much more rare in SQLite libraries
         | than it should be. I'm maintaining one implementation
         | (sqlite_async for Dart), but feel like this should be the
         | standard for all languages with async/await support.
        
           | pdimitar wrote:
           | > _I feel connection pooling is much more rare in SQLite
           | libraries than it should be. I 'm maintaining one
           | implementation (sqlite_async for Dart), but feel like this
           | should be the standard for all languages with async/await
           | support._
           | 
           | I completely agree. But I simply have no reference / good-
           | practice implementations to take inspiration from. I'd be
           | more than willing to have an Elixir FFI bridge to a Rust
           | library (and write both in the process) that actually make
           | full use of parallelism to fully utilize SQLite's strengths
           | but again, I got nowhere to steal from. :) Or I am not aware
           | where to look.
        
         | simonw wrote:
         | I wrote an async wrapper around SQLite in Python - I'm using a
         | thread pool:
         | https://github.com/simonw/datasette/blob/main/datasette/data...
         | 
         | I have multiple threads for reads and a single dedicated thread
         | for writes, which I send operations to via a queue. That way I
         | avoid ever having two writes against the same connection at the
         | same time.
        
       ___________________________________________________________________
       (page generated 2024-07-18 23:10 UTC)