[HN Gopher] Compiling and running sqlite3-rsync from a branch
       ___________________________________________________________________
        
       Compiling and running sqlite3-rsync from a branch
        
       Author : tosh
       Score  : 35 points
       Date   : 2024-10-05 11:59 UTC (1 days ago)
        
 (HTM) web link (til.simonwillison.net)
 (TXT) w3m dump (til.simonwillison.net)
        
       | isoprophlex wrote:
       | > It took me quite a few iterations to get to this recipe for
       | compiling the tool itself
       | 
       | Hehe, I really feel this remark, having recently built a sqlite3
       | binary with Lua support[*]
       | 
       | The LLMs were no help here, either. Which is maybe a good thing
       | for our collective future employment.
       | 
       | That said, I found the quality of the documentation and
       | legibility of the sqlite source code to be absolutely fantastic.
       | When you work with the code, you can almost feel the programming
       | enlightenment rub off on you.
       | 
       | [*] why, for heavens' sake? so you can decouple your business
       | logic from the database and test it a bit outside the DB, whilst
       | still keeping the data all in one place without sending it over
       | the wire. It lets you SELECT lua('some lua code that operates on
       | your sqlite data') FROM some-table;, so you can put gnarly nested
       | conditionals or whatever into Lua, build some comprehensive
       | testing around that, and execute it on your data as if everything
       | was a single, plain old SQL statement.
        
         | simonw wrote:
         | I got a little bit of assistance from Claude in figuring this
         | out, but it pointed me in the right direction more than
         | actually giving me the right sequence of commands. Claude got
         | me as far as:                   gcc -o sqldiff sqldiff.c
         | ../sqlite3.c -I.. -ldl -lpthread
         | 
         | Which was enough for me to figure out I needed to get the
         | sqlite3.c amalgamation to build and then run gcc in the tool/
         | directory. I landed on this after a bit more poking:
         | gcc -o sqlite3-rsync sqlite3-rsync.c ../sqlite3.c
         | -DSQLITE_ENABLE_DBPAGE_VTAB
         | 
         | I did have a much more useful interaction with an LLM later on:
         | I was curious about the protocol used over SSH, so I copied the
         | whole of this 1800 line C file:
         | 
         | https://github.com/sqlite/sqlite/blob/sqlite3-rsync/tool/sql...
         | 
         | And pasted it into OpenAI o1-preview with the prompt "Explain
         | the protocol over SSH part of this" - and got back a genuinely
         | excellent high-level explanation of how that worked:
         | https://chatgpt.com/share/6701450c-bc9c-8006-8c9e-468ab6f67e...
        
           | someone13 wrote:
           | That share link 404s for me, FWIW. I'd be interested in
           | seeing it!
        
             | simonw wrote:
             | Weird - https://chatgpt.com/share/6701450c-bc9c-8006-8c9e-4
             | 68ab6f67e... is working for me in a Chrome Incognito
             | window.
             | 
             | Here's a copy of the Markdown answer it gave me: https://gi
             | st.github.com/simonw/ffbf90e0602df04c2f6b387de42ac...
        
       | tosh wrote:
       | I wonder how this compares to litestream and other existing
       | sqlite replication strategies.
        
         | simonw wrote:
         | I guess this is probably fast enough that you could run it on a
         | schedule every 10 seconds or so (at least for databases
         | measured in GBs, not TBs) to keep a warm copy updated
         | elsewhere. [UPDATE: maybe not, it has to hash everything, see
         | comment here
         | https://news.ycombinator.com/item?id=41749288#41760395]
         | 
         | Litestream gives you a backup in S3 within less than a second
         | of data being committed though, plus since it records the full
         | WAL archive you get point-in-time recovery for previous
         | timestamps too.
        
         | ralferoo wrote:
         | I had a quick scan through the source and it gets the hash of
         | each page on both sides and sends the page if it's different
         | (so conceptually very similar to rsync). I don't think the
         | cache is indexed anywhere normally, so that'd imply it needs to
         | do a full read of the database on both sides, so it'll be slow
         | and probably trash your disk cache on a large database.
         | 
         | In theory, it should be able to use the WAL file directly if it
         | copies the WAL checksums, but that information would get lost
         | every time the WAL is flushed and the data is moved into the
         | main db file.
         | 
         | Litestream knows exactly which pages have been written via the
         | fuse fs, so it doesn't need to do all that work for unchanged
         | pages.
        
           | simonw wrote:
           | Ugh, yeah that's a good point: it looks like each time you
           | run sqlite-rsync it calculates a hash for every database page
           | on both the source and the replica side of things - that's a
           | lot of disk access and CPU, way more than you would want to
           | run on an every-ten-seconds basis.
        
       | gary_0 wrote:
       | sqlite3-rsync is in the makefile. I got sqlite-rsync to build by
       | just checking out the main branch and going:
       | ./configure         make sqlite3-rsync
       | 
       | I think the rsync branch has been merged?
       | 
       | I'm really excited for this new tool; I'm hoping to use it for
       | nightly backups of a webapp I'm building that uses sqlite
       | exclusively. Although I'm hesitant to rely on sqlite-rsync right
       | away because it's so new.
        
         | simonw wrote:
         | Thanks for that! I just updated my TIL to promote this recipe
         | instead - I didn't realize it was A) in the Makefile and B)
         | merged into the main branch.
        
         | ralferoo wrote:
         | > I'm hoping to use it for nightly backups of a webapp I'm
         | building that uses sqlite exclusively
         | 
         | The easiest recipe for this is to use                   sqlite3
         | $SRC "vacuum into '$DEST'"
         | 
         | $DEST must not already exist, personally I prefer to put the
         | timestamp into the filename and keep the last n copies.
        
           | gary_0 wrote:
           | I'm aware of the existing backup methods, but AFAIK they
           | don't do a delta of only the changed pages (unless I use
           | separate software), and "vacuum into" doesn't work over SSH.
           | 
           | I have no need to store a backup on the same physical server.
           | Also, the backup server will keep snapshots, and with
           | sqlite3-rsync I can just update a previous snapshot with the
           | changed pages rather than redundantly transferring the whole
           | database over every time.
        
             | simonw wrote:
             | Yeah, the unique benefit of sqlite-rsync looks to be
             | efficiently backing up medium to large databases that don't
             | change enormously (so deltas are worthwhile) over an SSH
             | connection. I particularly like that you don't need double
             | the disk space to create the temporary backup copy before
             | downloading it.
        
         | pkhuong wrote:
         | > I'm hesitant to rely on sqlite-rsync right away because it's
         | so new.
         | 
         | If you want something less new that only needs an S3-compatible
         | endpoint, there's
         | https://gist.github.com/pkhuong/555be66f5e573a2d56186d658fd8...
        
       ___________________________________________________________________
       (page generated 2024-10-06 23:00 UTC)