[HN Gopher] Cron-based backup for SQLite
___________________________________________________________________
Cron-based backup for SQLite
Author : davnicwil
Score : 329 points
Date : 2022-05-15 09:14 UTC (13 hours ago)
(HTM) web link (litestream.io)
(TXT) w3m dump (litestream.io)
| benbjohnson wrote:
| Litestream author here. The motivation for this docs page was
| two-fold. First, laying out the different options with their
| trade-offs helps people understand what Litestream is trying to
| be so it helps keep my focus narrow. For example, if someone is
| looking for very high durability guarantees then they can see
| that rqlite might be a better fit and they don't try to fit
| Litestream into that mold.
|
| Second, Litestream doesn't interfere with other backup methods so
| you can run it alongside a cron-based backup. I typically run
| both because I'm overly paranoid and because it's cheap.
| mattl wrote:
| crontab isn't a tool, cron is the tool. It reads a little
| strange because of that.
| benbjohnson wrote:
| Good call. I updated the docs repo and the site should update
| in a few minutes.
| mattl wrote:
| Awesome!
| wheels wrote:
| I'm not saying that's the logic here, but this is a common
| pattern because SEO. It's a useful way to end up on potential
| customers' radar.
| ctur wrote:
| Nice post -- one suggestion would be to use zstd instead of
| gzip though. Out of the box it is much faster and will result
| in smaller files, and allows tuning to fit inside memory and
| cpu thresholds different use cases might prefer.
|
| Friends don't let friends use gzip :)
| otterley wrote:
| Storage is so cheap nowadays that usability and compression
| speed are more important for most people.
| usefulcat wrote:
| For a given compression ratio, zstd compression is faster
| than gzip by a good margin (even without using multiple
| threads). And zstd decompression is pretty much always
| faster than gzip too.
| benbjohnson wrote:
| Thanks, I thought about using a different compression tool
| but everybody knows gzip so I figured I'd keep it simple.
| Litestream uses LZ4 internally (which is great) but it's not
| something a lot of developers come across too often.
| nikeee wrote:
| There is also VACUUM INTO:
| https://www.sqlite.org/lang_vacuum.html#vacuuminto
| nbevans wrote:
| Indeed and in my experience this has been more
| reliable/predictable than the official backup API. The official
| backup API can get "stuck" on a very large and hot database
| because it, essentially, has to start from scratch if a page is
| modified. VACUUM INTO doesn't have this problem.
| charcircuit wrote:
| >Do not use `cp` to back up SQLite databases. It is not
| transactionally safe.
|
| I've read that this is true, but it has always confused me
| because I would expect that using cp would be equivalent to if
| the application had crashed / you lost power.
| fmajid wrote:
| No, because cp takes a non-zero amount of time and so by the
| time it has completed the first blocks may have been rewritten,
| thus the backup is inconsistent. If you take something like a
| ZFS snapshot, then, yes, that would be a consistent backup.
| amcvitty wrote:
| Not an expert but the transaction log is in another file and
| you'd need both to be in line with each other and backed up, so
| perhaps that's a factor
| benbjohnson wrote:
| I wrote up a bit longer explanation of the failure modes
| (which depend on your journaling mode) in this comment:
| https://news.ycombinator.com/item?id=31387447
| kijin wrote:
| `cp` is not instantaneous. Unless you lock the entire file, one
| block can change while you're reading another block. You end up
| with a Frankenstein of a file, compared with a consistent
| snapshot if you had simply lost power.
|
| If your filesystem supports snapshotting, it would be safe to
| create a snapshot and `cp` the database off of that snapshot.
| kijin wrote:
| Another option is to use the `.dump` command that produces a
| series of SQL statements to recreate the database and populate it
| with data, just like good ol' mysqldump.
|
| I'm not sure whether this is faster or safer than `.backup` which
| produces an actual database file with indexes and whatnot, but a
| benefit of the plain text output is that it's very flexible. The
| output can be piped to gzip or any other program on the fly, even
| across the network, without waiting for an intermediate file to
| be written in full. It can also be imported to other types of
| RDBMS with minimal changes.
| benbjohnson wrote:
| The .dump command is a great option for smaller databases. The
| main downsides are that it produces a larger output, it'll take
| longer to compress that output, and the restore time is much
| longer since all the commands have to be replayed.
|
| The benefit to using the backup API is that the file is ready-
| to-go as the database and can be started up immediately.
| evoxmusic wrote:
| Fun to see that we share some very similar ideas with Replibyte -
| an open source tool to seed a dev database with real data.
| https://github.com/Qovery/replibyte
| cpach wrote:
| Nitpick: On a modern Linux system, I would probably use a systemd
| timer instead of cron.
| manjana wrote:
| > Do not use `cp` to back up SQLite databases. It is not
| transactionally safe.
|
| Anyone know if comitting the sqlite db file to git is not safe
| either?
| benbjohnson wrote:
| Anything that only uses the file system copying mechanics is
| not safe (which includes a git check-in). I wrote a more
| detailed comment here:
| https://news.ycombinator.com/item?id=31387447
| Moto7451 wrote:
| If you're not actively writing to the database you can cp or
| commit it safely. SQLite uses a journal or a write ahead log
| (WAL) file to process transactions.
|
| If you cp a database mid transaction you will loose the
| transaction as it is processed in the separate journal or WAL
| file until commit. If you copy mid commit then you will have
| incomplete data in the main database file and possibly data
| corruption.
| CraigJPerry wrote:
| Do the cron entries work as-is? I'm sitting here trying to decide
| if cron will break here: aws s3 cp
| /path/to/backup.gz s3://mybucket/backup-`date +%H`.gz
|
| The % is going to be interpreted as a newline, it should need
| escaping as in date +\%H
|
| But it's in backticks (subshell) so now I'm doubting myself and
| thinking maybe the subshell saves this but my brain is screaming
| NO NO BUG! :-)
|
| I'm going to have to spin up a VM to test it and seek inner peace
| here :-)
| CraigJPerry wrote:
| Yeah this isn't used in the cron line itself, this is wrapped
| in a script further down the article and the script is run from
| cron, which makes sense and no bug.
| gorgoiler wrote:
| I've always assumed sqlite database files are resilient to being
| copied, mid-write. Sure: maybe that last transaction will be
| corrupted, but everything up to that point will be in tact,
| right? The next backup will capture whatever was in the half-
| complete transaction.
|
| Am I deluded in this?
| benbjohnson wrote:
| Litestream author here. Your backup can be corrupt if you
| simply use "cp" but the details depend on your journaling mode.
|
| If you're using the default rollback journal mode, it works by
| copying old pages to a "-journal" file and then updating the
| main database file with new pages. Your transaction finally
| commits when you delete the journal file. Copying the main
| database file during a write transaction can give you either a
| subset of transaction pages or half-written pages.
|
| If you're using the WAL journaling mode, it works by writing
| new pages to the "-wal" file and then periodically copying
| those pages back to the main database file in a process called
| "checkpointing". If you only "cp" the main database file then
| you'll be missing all transactions that are in the WAL. There's
| no time bound on the WAL so you could lose a lot of
| transactions. Also, you can still corrupt your database backup
| if you "cp" the main database file during the checkpoint.
|
| You could "cp" the main database file and the WAL file and
| _probably_ not corrupt it but there 's still a race condition
| where you could copy the main file during checkpointing and
| then not reach the WAL file before it's deleted by SQLite.
|
| tl;dr is to just use the backup API and not worry about it. :)
| creatonez wrote:
| If you make an atomic copy -- like something equivalent to a
| power failure, then it's transactionally safe. One way this can
| be done is with a snapshot-capable filesystem like ZFS, btrfs,
| or XFS.
|
| But a traditional `cp` will go from left to right in the file,
| it won't take the contents all at once. Which is explicitly
| documented as a thing that will break Sqlite -
| https://www.sqlite.org/howtocorrupt.html
| jchw wrote:
| Question:
|
| > Do not use `cp` to back up SQLite databases. It is not
| transactionally safe.
|
| I'm curious what happens. Does this mean that it may cause side
| effects from transactions that are not committed to become
| visible?
|
| I guess if you could snapshot the database file and WAL and etc.
| simultaneously this isn't an issue, right? Because otherwise it
| would be a problem for a database any time your program or
| machine crashed.
| simonw wrote:
| In terms of restoring from backups: "Be sure to stop your
| application before copying the database into place"
|
| This intuitively feels right to me, but I'd be interested to
| understand the mechanics behind it. What can go wrong if you
| ignore this advice and use "mv" to atomically switch out the
| SQLite file from underneath your application?
|
| My hunch is that this relates to journal and WAL files - I
| imagine bad things can happen if those no longer match the main
| database file.
|
| But how about if your database file is opened in read-only or
| immutable mode and doesn't have an accompanying journal/WAL?
| [deleted]
| zekica wrote:
| On Linux at least, if you switch the file, the old file (even
| if deleted) will still be used until the file descriptor is
| closed.
| ancientsofmumu wrote:
| The example uses Dead Man's Snitch, Healthchecks.io is my
| preference and their plans offer more for less. Both offer code
| on Github for examination.
|
| https://deadmanssnitch.com/plans vs.
| https://healthchecks.io/pricing/
| WrtCdEvrydy wrote:
| Is there anything self hosted in this space?
| cuu508 wrote:
| Yes, Healthchecks is open-source and can be self-hosted
| https://github.com/healthchecks/healthchecks
| jethro_tell wrote:
| Depending on your environment, i.e. if you have a metrics
| system in place already, I generally just post a success
| metric and alert on lack of success metrics.
| spencerbrian wrote:
| I do similar, it's a relatively small but important SQLite
| database so every five minutes I take a backup using that method,
| and check the generated SQL files into remote git repositories.
|
| One on GitHub which is just a snapshot, a single commit in an
| otherwise empty repo, force pushed. This is for recovery
| purposes, I don't need the history and would probably run afoul
| of their service limits if I did so.
|
| And the other on Azure DevOps which has the entire commit history
| for the past few years. This one is a bit trickier because the
| pack files end up exhausting disk space if not cleaned up, and
| garbage collection interrupts the backups. So it clones just the
| latest commit (grafted), pushes the next commit, and wipes the
| local repo. No idea how this looks on the remote backend but it's
| still working without any size complaints, and it's good to know
| there's an entire snapshotted history there if needed. As well as
| being able to clone the most recent for recovery if GitHub fails.
| Scaevolus wrote:
| I rolled my own terrible sqlite replication system once. I used
| execution tracing hooks
| (https://rogerbinns.github.io/apsw/execution.html#tracing) to
| record all mutation queries and their arguments sent to the
| database, then periodically copied that log to a replica system
| and executed the identical statements.
|
| It's rather primitive, but could be considered an intermediate
| step between this blog's full copying and litestream's WAL-based
| replication.
| encoderer wrote:
| If you use cron for anything important you should use Cronitor -
| there's a free for life plan that can cover your SQLite backups.
|
| The article mentions calling a "dead man" service and that is
| fine but Cronitor (built by 3 people, no VC dollars) is a proper
| monitoring solution for jobs not just a "dead man" alert. Happy
| monitoring
| zrail wrote:
| If Ben is reading the thread, tiny but important bug in the
| example script. It should have set -e at the top to ensure that a
| failed step will actually fail the script. Otherwise the snitch
| will always get called.
|
| Edit: filed an issue on the docs repo:
| https://github.com/benbjohnson/litestream.io/issues/54
| creativemonkeys wrote:
| If I'm using "-e", I also like to use "set -o pipefail" which
| will fail the script if any command in a pipeline fails too: x
| | y | z, which is useful if 'y' fails but 'z' continues to
| process the output and returns 0, -e won't catch that (the
| example script doesn't use pipes though).
| zrail wrote:
| Yep, my standard bash header is set -eo pipefail with x if I
| need to see what's going on. This particular script didn't
| appear to be using pipes, as you say.
| benbjohnson wrote:
| Thanks for catching that! I fixed it up in a PR and merged in:
| https://github.com/benbjohnson/litestream.io/pull/55
| davnicwil wrote:
| This is so great - open source in action - I can't think of a
| more apt thing to come out of this post than an on the fly
| fix to the docs!
|
| Pun absolutely intended.
| TekMol wrote:
| Do not use `cp` to back up SQLite databases. It is not
| transactionally safe.
|
| What if you disable the journal like this:
|
| PRAGMA journal_mode=OFF;
|
| Can cp be used to backup the DB then?
|
| And do you still have to send "COMMIT" after each query or will
| each query be executed immediately then?
| ralferoo wrote:
| The main reason is that the backing file is random access, and
| transactional correctness is ensured using various locking
| mechanisms. Sqlite generally relies on fsync() ensuring that
| data is flushed to disk at certain points, so conceptually,
| data is written to an unused portion of the file, then fsync()
| is done, then various structures are updated so that this data
| is now live, leaving some unused space in the middle of the
| file, and this is fysnc()'d again. Later that unused space
| might be reclaimed, but only when all current readers and
| writers have finished.
|
| If you cp the file, you might end up with different chunks from
| different points in the transaction history that don't make
| sense combined together. If you use ".backup", it guarantees
| that the data in the copy will be consistent with the data as
| it existed at one of the fsync() calls.
|
| Turning off the journalling will likely increase the chance
| that your copy will be inconsistent, as there will be more
| churn in the main data file.
| [deleted]
| KingOfCoders wrote:
| I was amazed by the Livestream post on fly.io - it wants to
| become a distributed database. This is huge.
| mwint wrote:
| > B-tree databases like SQLite compress well
|
| Curious, why is this?
| tptacek wrote:
| This question nerd-sniped me. I don't know either but my first
| guess, supported by hex dumping a medium-sized database, is
| that sqlite B-tree pages are collections of 32-bit page offsets
| that the database is designed to cluster (also, just lots and
| lots of zeroes). Also, the database I looked at was mostly
| UTF-8 text, which is obviously compressible, but I don't think
| that's what Ben meant.
| benbjohnson wrote:
| Litestream author here. When a page gets full then it splits in
| two so each new mode has empty space for inserts and updates.
| That empty space compresses really well. Also, OLTP data can
| have duplication when you have a categorical column that you
| haven't moved into a separate table (eg a "type" column that
| has a couple unique values).
| efrecon wrote:
| May I chime in with a project around more or less the same idea
| and use cases. https://github.com/efrecon/sqlite-backup. I am
| pretty sure there are dozens of similar-but-not-quite projects on
| GitHub/gitlab/etc.
| davnicwil wrote:
| Following the announcement of Fly.io acquiring the Litestream
| project and hiring its creator Ben Johnson I've been looking into
| it (and am very impressed by what I see!)
|
| Reading through the docs I was delighted to find this full set of
| instructions including script commands for setting up a simpler
| regular snapshot backup using cron - for situations where
| Litestream is (quote) "overkill" - ie where the database is
| smaller and your durability requirements are lower.
|
| In fact, the docs have a whole alternatives _section_ for
| different requirements. I think this kind of thing is great, and
| something more projects should do! Wanted to share it because of
| this meta thing, but also because it 's just pretty useful _per
| se_.
| dgb23 wrote:
| I agree. There are a bunch of benefits for everyone here. Users
| get to have a sharper view on the intended user case of the
| tool, this benefits bith sides, and learn more related bits
| along the way. It also builds trust.
|
| The best marketing for devs is no marketing. Just being honest,
| clear and helpful is what works best.
| kitd wrote:
| Agreed. In my mind, the fact that the developers understand
| limitations in their own tools means that future enhancements
| are likely to be more focussed and effective.
| mritchie712 wrote:
| if you run a saas, this is a good move for SEO too. People may
| be searching "backup sqlite cron" or something like it and
| realize they'd rather use litestream.
| swah wrote:
| I felt the same finding that section: by not "pushing too much
| their solution" they actually increase the chances that I'll
| use it someday.
| mattgreenrocks wrote:
| This what a mature project looks like: it knows what it is, who
| it is for, and what it isn't.
|
| Contrast with so many open source projects that accumulate
| features and promise the world, so long as you give them a
| star.
| vanviegen wrote:
| > Contrast with so many open source projects that accumulate
| features and promise the world, so long as you give them a
| star.
|
| What a cynical take.
| pierregillesl wrote:
| Don't forget to stop writing to the database while doing the
| backup, otherwise you can run into an infinite loop if you write
| faster than sqlite3 .backup is doing the backup :D
|
| Learned that the hard way when implementing sqlite3 backups on
| Gladys Assistant ( open-source home automation platform
| https://github.com/GladysAssistant/Gladys )
| davnicwil wrote:
| to be fair, this sounds like a situation where Litestream _is_
| the more appropriate solution!
| simonw wrote:
| This surprises me. I was under the impression that SQLite
| backups run inside a dedicated transaction in order to avoid
| this.
| simonw wrote:
| This comment cleared that up for me: VACUUM INTO backups are
| transactional, but the .backup mechanism is not:
| https://news.ycombinator.com/item?id=31387556
|
| Ben I suggest updating that cron backups documentation page
| to recommend VACUUM INTO instead!
| ignoramous wrote:
| Could start an issue:
| https://github.com/benbjohnson/litestream.io/issues
| benbjohnson wrote:
| Good catch. I went ahead and added an issue on the docs.
| https://github.com/benbjohnson/litestream.io/issues/56
| oefrha wrote:
| See the SQLite online backup API documentation [1].
|
| You can finish the backup in one step, but a read-lock would
| be held during the entire duration, preventing writes. If you
| do the backup several pages at a time, then
|
| > If another thread or process writes to the source database
| while this function is sleeping, then SQLite detects this and
| usually restarts the backup process when
| sqlite3_backup_step() is next called. ...
|
| > Whether or not the backup process is restarted as a result
| of writes to the source database mid-backup, the user can be
| sure that when the backup operation is completed the backup
| database contains a consistent and up-to-date snapshot of the
| original. However: ...
|
| > If the backup process is restarted frequently enough it may
| never run to completion and the backupDb() function may never
| return.
|
| The CLI .backup command does non-blocking backup IIRC so is
| subject to restarts.
|
| [1] https://www.sqlite.org/backup.html
| littlecranky67 wrote:
| Can you elaborate more? I think the cron-solution will be
| unable to synchronize with your application code to determine
| when to stop writing, so more background and your solution
| would be of interest.
|
| As I understand it, while you do the backup, other writes
| should go to the WAL log and only get commited until after the
| backup?
| prirun wrote:
| I did some reading, and there are 2 ways to use the SQLite
| backup API:
|
| 1. call backup_init, backup_step with a step size of -1, then
| backup_finish. This will lock the db the whole time the
| backup is taking place and backup the entire db.
|
| 2. call backup_init, backup_step in a loop until it returns
| SQLITE_DONE with a positive step size indicating how many
| pages to copy, then backup_finish.
|
| With method 2, no db lock is held between backup_step calls.
| If a write occurs between backup_step calls, the backup API
| automagically detects this and restarts. I don't know if it
| looks at the commit count and restarts the backup from the
| beginning or is smart enough to know the first changed page
| and restarts from there. Because the lock is released, a
| continuous stream of writes could prevent the backup from
| completing.
|
| I looked in the sqlite3 shell command source, and it uses
| method 2. So if using the .backup command with continuous
| concurrent writes, you have to take a read lock on the db
| before .backup to ensure it finishes. It would be nice if the
| .backup command took a -step option. That would enable the -1
| step size feature of method 1. The sqlite3 shell uses a step
| size of 100.
|
| Another option would be to check backup_remaining() and
| backup_pagecount() after each step, and if the backup isn't
| making progress, increase the step size. Once the step size
| is equal to backup_pagecount() it will succeed, though it may
| have to lock out concurrent writes for a long time on a large
| db. There's really no other choice unless you get into
| managing db logs.
| pierregillesl wrote:
| It's not what I've experienced!
|
| In my experience, as soon as there is some new data coming in
| the DB, the .backup command will continue, and if the writes
| are not stopping, the backup will never stop as well :D
|
| In Gladys case, we put in the application logic a blocking
| transaction to lock writes during the backup. I haven't found
| any other way to avoid infinite backups in case of write-
| heavy databases
| 1958325146 wrote:
| I ran into the case of needing to back up a write-heavy
| database without blocking anything, and came up with a
| solution: Writing a VFS ( https://www.sqlite.org/vfs.html )
| that makes the application move the Sqlite journals file to
| a directory for processing instead of deleting them.
| Another process reads them to see what pages were touched
| and can very quickly get the changes to update the backup.
| yawaramin wrote:
| I'm using VACUUM INTO, which does basically that:
| https://sqlite.org/lang_vacuum.html#vacuuminto
|
| > The VACUUM command with an INTO clause is an alternative
| to the backup API for generating backup copies of a live
| database....The VACUUM INTO command is transactional in the
| sense that the generated output database is a consistent
| snapshot of the original database.
|
| EDIT: Litestream docs will also recommend that:
| https://github.com/benbjohnson/litestream.io/issues/56
| Multicomp wrote:
| A service that says 'here is an alternative to our service'?
| Right on their own website? Even though cron backups are
| basically 'competition' to litestream?
|
| That's it. I'm giving fly.io a try next hobby app I start.
|
| I'm so impressed with their business decisions right now, I was
| afraid for litestream but then I read how the creator is just
| hired to work on it full time. What a splendid development!
| tptacek wrote:
| Don't give us that much credit! Litestream is a FOSS project.
| Fly.io isn't competitive about it. We just bought it because
| its existence, like, in the universe makes Fly.io more
| valuable, and we wanted to make sure Ben could keep working on
| it. We actually already have offerings that compete with it
| (Fly.io Postgres). :)
|
| (Ben might feel competitive about Litestream though; it's his
| baby. So maybe give him some credit.)
| jjice wrote:
| > A service that says 'here is an alternative to our service'?
|
| I actually see this quite a bit. I think the reason for it is
| that it's a good way to show complexity or moving parts that
| can break in a process, where the marketing service is just a
| drop in and start. We can see the SQLite backup process here's
| and it's not too bad, but we can also see there are a few
| things that can go wrong that I'm sure litestream takes care of
| and would allow us to avoid any issues.
|
| For example, backing up the DB onto the local disk before
| copying it _could_ fail due to lack of disk space, and them we
| have to deal with the notification and fix for that. I 'm sure
| litestream is a 10 minute setup that handles a lot of
| intricacies like that.
| er4hn wrote:
| It's also in the same spirit as SQLite's own page on if
| SQLite is a good fit: https://www.sqlite.org/whentouse.html
|
| There is something very powerful about knowing your strength,
| leaning into it, and building on it.
___________________________________________________________________
(page generated 2022-05-15 23:00 UTC)