[HN Gopher] Compiling and running sqlite3-rsync from a branch
___________________________________________________________________
Compiling and running sqlite3-rsync from a branch
Author : tosh
Score : 35 points
Date : 2024-10-05 11:59 UTC (1 days ago)
(HTM) web link (til.simonwillison.net)
(TXT) w3m dump (til.simonwillison.net)
| isoprophlex wrote:
| > It took me quite a few iterations to get to this recipe for
| compiling the tool itself
|
| Hehe, I really feel this remark, having recently built a sqlite3
| binary with Lua support[*]
|
| The LLMs were no help here, either. Which is maybe a good thing
| for our collective future employment.
|
| That said, I found the quality of the documentation and
| legibility of the sqlite source code to be absolutely fantastic.
| When you work with the code, you can almost feel the programming
| enlightenment rub off on you.
|
| [*] why, for heavens' sake? so you can decouple your business
| logic from the database and test it a bit outside the DB, whilst
| still keeping the data all in one place without sending it over
| the wire. It lets you SELECT lua('some lua code that operates on
| your sqlite data') FROM some-table;, so you can put gnarly nested
| conditionals or whatever into Lua, build some comprehensive
| testing around that, and execute it on your data as if everything
| was a single, plain old SQL statement.
| simonw wrote:
| I got a little bit of assistance from Claude in figuring this
| out, but it pointed me in the right direction more than
| actually giving me the right sequence of commands. Claude got
| me as far as: gcc -o sqldiff sqldiff.c
| ../sqlite3.c -I.. -ldl -lpthread
|
| Which was enough for me to figure out I needed to get the
| sqlite3.c amalgamation to build and then run gcc in the tool/
| directory. I landed on this after a bit more poking:
| gcc -o sqlite3-rsync sqlite3-rsync.c ../sqlite3.c
| -DSQLITE_ENABLE_DBPAGE_VTAB
|
| I did have a much more useful interaction with an LLM later on:
| I was curious about the protocol used over SSH, so I copied the
| whole of this 1800 line C file:
|
| https://github.com/sqlite/sqlite/blob/sqlite3-rsync/tool/sql...
|
| And pasted it into OpenAI o1-preview with the prompt "Explain
| the protocol over SSH part of this" - and got back a genuinely
| excellent high-level explanation of how that worked:
| https://chatgpt.com/share/6701450c-bc9c-8006-8c9e-468ab6f67e...
| someone13 wrote:
| That share link 404s for me, FWIW. I'd be interested in
| seeing it!
| simonw wrote:
| Weird - https://chatgpt.com/share/6701450c-bc9c-8006-8c9e-4
| 68ab6f67e... is working for me in a Chrome Incognito
| window.
|
| Here's a copy of the Markdown answer it gave me: https://gi
| st.github.com/simonw/ffbf90e0602df04c2f6b387de42ac...
| tosh wrote:
| I wonder how this compares to litestream and other existing
| sqlite replication strategies.
| simonw wrote:
| I guess this is probably fast enough that you could run it on a
| schedule every 10 seconds or so (at least for databases
| measured in GBs, not TBs) to keep a warm copy updated
| elsewhere. [UPDATE: maybe not, it has to hash everything, see
| comment here
| https://news.ycombinator.com/item?id=41749288#41760395]
|
| Litestream gives you a backup in S3 within less than a second
| of data being committed though, plus since it records the full
| WAL archive you get point-in-time recovery for previous
| timestamps too.
| ralferoo wrote:
| I had a quick scan through the source and it gets the hash of
| each page on both sides and sends the page if it's different
| (so conceptually very similar to rsync). I don't think the
| cache is indexed anywhere normally, so that'd imply it needs to
| do a full read of the database on both sides, so it'll be slow
| and probably trash your disk cache on a large database.
|
| In theory, it should be able to use the WAL file directly if it
| copies the WAL checksums, but that information would get lost
| every time the WAL is flushed and the data is moved into the
| main db file.
|
| Litestream knows exactly which pages have been written via the
| fuse fs, so it doesn't need to do all that work for unchanged
| pages.
| simonw wrote:
| Ugh, yeah that's a good point: it looks like each time you
| run sqlite-rsync it calculates a hash for every database page
| on both the source and the replica side of things - that's a
| lot of disk access and CPU, way more than you would want to
| run on an every-ten-seconds basis.
| gary_0 wrote:
| sqlite3-rsync is in the makefile. I got sqlite-rsync to build by
| just checking out the main branch and going:
| ./configure make sqlite3-rsync
|
| I think the rsync branch has been merged?
|
| I'm really excited for this new tool; I'm hoping to use it for
| nightly backups of a webapp I'm building that uses sqlite
| exclusively. Although I'm hesitant to rely on sqlite-rsync right
| away because it's so new.
| simonw wrote:
| Thanks for that! I just updated my TIL to promote this recipe
| instead - I didn't realize it was A) in the Makefile and B)
| merged into the main branch.
| ralferoo wrote:
| > I'm hoping to use it for nightly backups of a webapp I'm
| building that uses sqlite exclusively
|
| The easiest recipe for this is to use sqlite3
| $SRC "vacuum into '$DEST'"
|
| $DEST must not already exist, personally I prefer to put the
| timestamp into the filename and keep the last n copies.
| gary_0 wrote:
| I'm aware of the existing backup methods, but AFAIK they
| don't do a delta of only the changed pages (unless I use
| separate software), and "vacuum into" doesn't work over SSH.
|
| I have no need to store a backup on the same physical server.
| Also, the backup server will keep snapshots, and with
| sqlite3-rsync I can just update a previous snapshot with the
| changed pages rather than redundantly transferring the whole
| database over every time.
| simonw wrote:
| Yeah, the unique benefit of sqlite-rsync looks to be
| efficiently backing up medium to large databases that don't
| change enormously (so deltas are worthwhile) over an SSH
| connection. I particularly like that you don't need double
| the disk space to create the temporary backup copy before
| downloading it.
| pkhuong wrote:
| > I'm hesitant to rely on sqlite-rsync right away because it's
| so new.
|
| If you want something less new that only needs an S3-compatible
| endpoint, there's
| https://gist.github.com/pkhuong/555be66f5e573a2d56186d658fd8...
___________________________________________________________________
(page generated 2024-10-06 23:00 UTC)