[HN Gopher] Show HN: SQLite Transaction Benchmarking Tool
___________________________________________________________________
Show HN: SQLite Transaction Benchmarking Tool
I wanted to make my own evaluation of what kind of performance I
could expect from SQLite on a server and investigate the
experimental `BEGIN CONCURRENT` branch vs the inbuilt `DEFERRED`
and `IMMEDIATE` behaviors. Explanatory blog post:
https://reorchestrate.com/posts/sqlite-transactions/
Author : seddonm1
Score : 114 points
Date : 2024-07-17 21:44 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| leononame wrote:
| Thanks for the interesting article. Lots of things seem to happen
| in SQLite land at the moment and I appreciate that the SQLite
| team documents their quirks so openly, it gives great confidence.
|
| Since I don't know where else to ask, maybe this is a good place:
| How do async wrappers around SQLite (e.g. for node or python)
| work? SQLite only uses synchronous I/O if I'm not mistaken. Is it
| just a pretend async function with only synchronous code?
|
| And, as a follow-up: If I have a server with say 100 incoming
| connections that will all read from the database, I've got 100
| readers. No problem in WAL mode. However, I still could get
| congested by file I/O, right? Because every time a reader is
| waiting for data from disk, I can't execute the application code
| of another connection in a different thread since execution is
| blocked on my current thread. Is there any benefit to having a
| thread pool with a limit of more than $NUM_CPU readers?
|
| And one more: Would you recommend actually pooling connections or
| just opening/closing the database for each request as needed?
| Could keeping a file handle open prevent SQLite from
| checkpointing under certain conditions?
| seddonm1 wrote:
| Thanks.
|
| All good and valid questions.
|
| 1. I work mostly in Rust so I'll answer there in terms of
| async. This library [0] uses queues to manage workload. I run a
| modified version [1] which creates 1 writer and n reader
| connections to a WAL backed SQLite and dispatch async
| transactions against them. The n readers will pull work from a
| shared common queue.
|
| 2. Yes there is not much you can do about file IO but SQLite is
| still a full database engine with caching. You could use this
| benchmarking tool to help understand where your limits would be
| (you can do a run against a ramdisk then against your real
| storage).
|
| 3. As per #1, I keep connections open and distribute
| transactions across them myself. Checkpointing will only be a
| problem under considerable sustained write load but you should
| be able to simulate your load and observe the behavior. The
| WAL2 branch of SQLite is intended to prevent sustained load
| problems.
|
| [0]: https://github.com/programatik29/tokio-rusqlite [1]:
| https://github.com/seddonm1/s3ite/blob/0.5.0/src/database.rs
| pdimitar wrote:
| Valuable info and links, instant bookmarks, thank you!
|
| If you don't mind me asking, why did you go with rusqlite + a
| tokio wrapper for it and not go with sqlx?
| seddonm1 wrote:
| Whilst I love the idea of SQLX compile-time checked queries
| it is not always practical to need a database connection to
| compile the code in my experience. If it works for you then
| thats great but we had a few tricky edge cases when dealing
| with migrations etc.
|
| Also, and more fundamentally, your application state is the
| most valuable thing you have. Do whatever you feel makes
| you most comfortable to make sure that state (and state
| transitions) is as well understood as possible. rusqlite is
| that for me.
| pdimitar wrote:
| Thank you, good perspective.
|
| Weren't the compile-time connections to DB optional btw?
| They could be turned off I think (last I checked, which
| was last year admittedly).
|
| My question was more about the fact that sqlx is
| integrated with tokio out of the box and does not need an
| extra crate like rusqlite does. But I am guessing you
| don't mind that.
| seddonm1 wrote:
| SQLX has an offline mode where it saves the metadata of
| the SQL database structure but then you run into risk of
| that being out of sync with the database?
|
| Yeah I just drop this one file [0] into my Tokio projects
| and I have a SQLite with single writer/multi reader pool
| done in a few seconds.
|
| [0]: https://github.com/seddonm1/s3ite/blob/0.5.0/src/dat
| abase.rs
| pdimitar wrote:
| Thanks again!
|
| I'll be resuming my effort to build an Elixir <-> Rust
| SQLite bridge in the next several months. Hope you won't
| mind some questions.
| leononame wrote:
| Thanks for your answer.
|
| For 1, what is a good n? More than NUM_CPU probably does not
| make sense, right? But would I want to keep it lower?
|
| Also, you dispatch transactions in your queue? You define
| your whole workload upfront, send it to the queue and wait
| for it to finish?
| seddonm1 wrote:
| I went through the same mental process as you and also use
| num_cpus [0] but this is based only on intuition that is
| likely wrong. More benchmarking is needed as my benchmarks
| show that more parallelism only works to a point.
|
| You can see how the transactions work in this example[1]. I
| have a connection `.write()` or `.read()` which decides
| which queue to use. I am in the process [2] of trying to do
| a PR against rusqlite to set the default transaction
| behavior as a result of this benchmarking so hopefully
| `write()` will default to IMMEDIATE and `read()` remains
| DEFERRED.
|
| [0] https://docs.rs/num_cpus/latest/num_cpus/ [1]
| https://github.com/seddonm1/s3ite/blob/0.5.0/src/s3.rs#L147
| [2] https://github.com/rusqlite/rusqlite/pull/1532
| rmbyrro wrote:
| If you have a server with 100 cores to serve 100 connections
| simultaneously - and really need this setup -, you should
| probably be using Postgres or smth else.
| leononame wrote:
| It's a made up example to clarify whether I understand
| potential congestion scenarios and limitations correctly, not
| my actual situation.
|
| If I had a server with 100 cores to serve 100 connections,
| but each query took only 5ms, SQLite might be totally viable.
| There's no blanket solution.
|
| Edit: More importantly, SQLite async limitations come into
| play when I have only 12 cores but 100 incoming connections,
| and on top of querying data from SQLite, I do have other CPU
| bound work to do with the results. If I had 100 cores, 100
| connections to the database would be no problem at all since
| each core could hold a connection and block without problem.
| Moehassan wrote:
| You can make SQLite scale way beyond the limitations of WAL
| mode or even Begin Concurrent mode, all while doing
| synchronous writes
|
| https://oldmoe.blog/2024/07/08/the-write-stuff-concurrent-
| wr...
| leononame wrote:
| If synchronous IO is blocking your CPU bound application
| code, this won't help you. My made up example was not
| about concurrent writes, and the concurrent reads I
| mentioned were not my main point. For all I care, you
| could have 100 different databases or even normal files
| in this scenario and you read them.
|
| I was wondering how the async wrappers around SQLite work
| when SQLite itself only has synchronous IO. At least for
| the Rust example by Op, the async part is only used when
| awaiting a queue, but the IO itself still has the
| potential of blocking all your application code while
| idling.
| rmbyrro wrote:
| > 12 cores but 100 incoming connections
|
| Especially when using a modern storage medium, which most
| servers nowadays use, I doubt that filesystem I/O will be a
| bottleneck for the vast majority of use cases.
|
| I/O is extremely fast and will be negligible compared to
| other stuff going on to serve those requests, even running
| queries themselves.
|
| The CPU work done by SQLite will vastly outshine the time
| it takes to read/write to disk.
|
| It might be a bottleneck to reading if you have a very
| large database file, though.
| matharmin wrote:
| You get concurrency in SQLite by using multiple connections -
| and typically a dedicated thread per connection.
|
| When using async wrappers, a good solution is connection
| pooling like you mentioned - exactly the same concept as used
| by client->server database drivers. So you can have 5 or 10
| read connections serving those 100 connections, with a
| statement/transaction queue to manage spikes in load. It's
| probably not worth having more connections than CPUs, but it
| depends a little on whether your queries are limited by I/O or
| CPU, and whether you have other delays in your transactions
| (each transaction requires exclusive use of one connection
| while it's running).
|
| SQLite maintains an in-memory cache of recently-accessed pages
| of data. However, this gets cleared on all other connections
| whenever you write to the database, so is not that efficient
| when you have high write loads. But the OS filesystem cache
| will still make a massive difference here - in many cases your
| connections will just read from the filesystem cache, which is
| much faster than the underlying storage.
|
| Open connections don't block checkpointing in SQLite. The main
| case I'm aware of that does block it, is always having one or
| more active transactions. I believe that's quite rare in
| practice unless you have really high and continuous load, but
| if you do then the WAL2 branch may be for you.
|
| I feel connection pooling is much more rare in SQLite libraries
| than it should be. I'm maintaining one implementation
| (sqlite_async for Dart), but feel like this should be the
| standard for all languages with async/await support.
| pdimitar wrote:
| > _I feel connection pooling is much more rare in SQLite
| libraries than it should be. I 'm maintaining one
| implementation (sqlite_async for Dart), but feel like this
| should be the standard for all languages with async/await
| support._
|
| I completely agree. But I simply have no reference / good-
| practice implementations to take inspiration from. I'd be
| more than willing to have an Elixir FFI bridge to a Rust
| library (and write both in the process) that actually make
| full use of parallelism to fully utilize SQLite's strengths
| but again, I got nowhere to steal from. :) Or I am not aware
| where to look.
| simonw wrote:
| I wrote an async wrapper around SQLite in Python - I'm using a
| thread pool:
| https://github.com/simonw/datasette/blob/main/datasette/data...
|
| I have multiple threads for reads and a single dedicated thread
| for writes, which I send operations to via a queue. That way I
| avoid ever having two writes against the same connection at the
| same time.
___________________________________________________________________
(page generated 2024-07-18 23:10 UTC)