[HN Gopher] Duplicity: Encrypted bandwidth-efficient backup
___________________________________________________________________
Duplicity: Encrypted bandwidth-efficient backup
Author : GTP
Score : 91 points
Date : 2024-01-24 13:37 UTC (9 hours ago)
(HTM) web link (duplicity.us)
(TXT) w3m dump (duplicity.us)
| AdmiralAsshat wrote:
| Brilliant name, if you think about it. If they ever decided to
| start doing shady shit, they'd have a perfect legal shield. No
| one would be able to convincingly argue in court that they were
| being duplicitous.
| mlyle wrote:
| If no one can argue they're duplicitous, then it's a case of
| false advertising...
| longwave wrote:
| I used this many, many years ago but switched to Borg[0] about
| five years ago. Duplicity required full backups with incremental
| deltas, which meant my backups ended up taking too long and using
| too much disk space. Borg lets you prune older backups at will,
| because of chunk tracking and deduplication there is no such
| thing as an incremental backup.
|
| [0] https://www.borgbackup.org/
| giamma wrote:
| Same for me. Also, on MacOs duplicity was consuming much more
| CPU than Borg and was causing my fan to spin loudly. Eventually
| I moved to timemachine, but I still consider Borg a very good
| option.
| zzzeek wrote:
| I have an overnight cron that flattens my duplicity backups
| from many incremental backups made over the course of one day
| to a single full backup file, that becomes the new backup. then
| subsequent backups over the course of the day do incremental on
| that file. So I always have full backups for each individual
| day with only a dozen or so incremental backups tacked onto it.
|
| that said will give Borg a look
| tussa wrote:
| I did the same. I had some weird path issues with Duplicity.
|
| Borg is now my holy backup grail. Wish I could backup
| incrementally to AWS glacier storage but that just me sounding
| like an ungrateful begger. I'm incredibly grateful and happy
| with Borg!
| sigio wrote:
| Agree completely... used duplicity many years ago, but switched
| to Borg and never looked back. Currently doing borg-backups of
| quite a lot of systems, many every 6 hours, and some, like my
| main shell-host every 2 hours.
|
| It's quick, tiny and easy... and restores are the easiest, just
| mount the backup, browse the snapshot, and copy files where
| needed.
| stavros wrote:
| After Borg, I switched to Restic:
|
| https://restic.net/
|
| AFAIK, the only difference is that Restic doesn't require
| Restic installed on the remote server, so you can efficiently
| backup to things like S3 or FTP. Other than that, both are
| fantastic.
| pdimitar wrote:
| Technically Borg doesn't require it either, you can backup
| to a local directory and then use `rclone` to upload the
| repo wherever.
|
| Not practical for huge backups but it works for me as I'm
| backing up my machines configuration and code directories
| only. ~60MB, and that includes a lot of code and some data
| (SQL, JSON et. al.)
| jwr wrote:
| Excellent piece of software, and relatively simple to use with
| gpg encryption. I've been using it for many years.
|
| My only complaint is that, like a lot of software written in
| Python, it has no regard for traditional UNIX behavior (keep
| quiet unless you have something meaningful to say), so I have to
| live with cron reporting stuff like:
|
| "/usr/lib/python2.7/dist-packages/paramiko/rsakey.py:99:
| DeprecationWarning: signer and verifier have been deprecated.
| Please use sign and verify instead. algorithm=hashes.SHA1()"
|
| along with stuff I actually do (or might) care about.
|
| Oh well.
| tynorf wrote:
| You can try setting `export PYTHONWARNINGS="ignore"` to
| suppress warnings.
| 65 wrote:
| If you're using S3 to back up your files, it's easier to write a
| shell script with the AWS CLI. For example here's a script I
| wrote that I run automatically to back up my computer to S3. I
| have an exclude array to exclude certain folders. It's simpler
| than downloading software and more customizable.
|
| # $1 # local folder
|
| # $2 # bucket
|
| declare -a exclude=( "node_modules"
| "Applications" "Public"
|
| )
|
| args=""
|
| for item in "${exclude[@]}";
|
| do args+=" --exclude '*/$item/*' --exclude
| '$item/*'";
|
| done
|
| cmd="aws s3 sync '$1' 's3://$2$1' --include '*' $args"
|
| eval "$cmd"
| rakoo wrote:
| Your script doesn't do the same thing as duplicity. Your script
| mirrors the local directory with your bucket. It loses all
| history. Duplicity does backups (ie with history) but not just
| that, it does differential backups to not upload everything all
| the time.
| 65 wrote:
| S3 has bucket versioning if you want to have multiple
| backups. The S3 sync command also does differential backups;
| if you for example try to run the script over and over it
| will only upload new/different files.
| res0nat0r wrote:
| The major issue out of the box vs any deduping backup
| software is that S3 doesnt support any deduplication. If
| you move or rename a 15GB file you're going to have to
| completely upload it again and also store a second copy and
| pay for it until your S3 bucket policy purges the
| previously uploaded file you've deleted. Also aws s3 sync
| is much slower since it has to iterate over all of the
| files to see if their size/timestamp has changed. Something
| like borgbackup is much faster as it uses smarter caching
| to skip unchanged directories etc.
| 65 wrote:
| It's possible to find probable duplicate files with the
| S3 CLI based on size and tags - I was working a script to
| do just that but I haven't finished it yet. Alternatively
| if you want exact backups of your computer you can use
| the --delete flag which will delete files in the bucket
| that aren't in the source.
|
| I agree this is not the absolute most optimized solution
| but it does work quite well for me and is easily
| extendible with other scripts and S3 CLI commands.
| Theoretically if Borgbackup or Duplicity are backing up
| to S3 they're using all the same commands as the S3
| CLI/SDK.
|
| Besides, shell scripting is fun!
| sevg wrote:
| If I have to choose between hacking together a bunch of
| shell scripts to do my deduplicated, end-to-end encrypted
| backups, vs using a popular open source well-tested off
| the shelf solution, I know which one I'm picking!
| Scarbutt wrote:
| no encryption though
| 65 wrote:
| You can encrypt S3 buckets/files inside your buckets. By
| default buckets are encrypted.
| sevg wrote:
| Not end-to-end encryption.
| mrich wrote:
| If you don't need incremental backups (thus saving space for the
| signatures) and want to store to S3 Deep Glacier, take a look at
| https://github.com/mrichtarsky/glacier_deep_archive_backup
| _flux wrote:
| I've moved to using backup tools using content-based ids with
| rolling window hashes, which allows deduplicating content even
| between different hosts--and crucially handles moving content
| from one host to another efficiently--even though in other
| scenarios I'm guessing rdiff-algorithm can produce smaller
| backups.
|
| The problem I have with duplicity and backups tools of its kind
| is that you still need to create a full backup again
| periodically, unless you want to have an ever-growing sequence of
| increments from the day you started doing backups.
|
| Content-addressed backups avoid that, because all snapshots are
| complete (even if the backup process itself is incremental), but
| their content blobs are shared and eventually garbage collected
| when no references exist to them.
|
| My tool of choice is kopia. Also borgbackup does similar things
| (though borgbackup is still unable to back up to the same repo
| from multiple hosts at the same time, though I haven't checked
| this for a while). Both do encryption, but its symmetric, so the
| client will have keys to opening the backups as well. If you
| require asymmetric encryption then these tools are not for you--
| though I guess this is not a technical requirement for this
| approach, so maybe one day a content-addressed backup tool with
| asymmetric encryption will appear?
| mizzao wrote:
| Content-addressed backups sound something like how git stores
| data, is that the best way to think about them?
|
| And if so, what would be the main differences between just
| committing to a git repo for example?
| KMag wrote:
| The "rolling window hashes" from the comment suggests sub-
| file matching at any offset. (See Bently-McIlroy diff
| algo/how rsync efficiently finds matches, for example.) I'm
| not aware that git performs this sort of deduplication.
|
| Better yet would be to use a rolling hash to decide where to
| cut the blocks, and then use a locality-aware hash (SimHash,
| etc.) to find similar blocks. Perform a topological sort to
| decide which blocks to store as diffs of others.
|
| Microsoft had some enterprise product that performed
| distribution somewhat like this, but also recursively using
| similarity hashes to see if the diffs were similar to
| existing files on the far machine.
| Linux-Fan wrote:
| > Content-addressed backups sound something like how git stores
| data, is that the best way to think about them?
|
| I think it is a valid way to consider them. Another option is
| to think of the backup as a special kind of file system
| snapshot that manifests itself as real files as opposed to data
| on a block device.
|
| > And if so, what would be the main differences between just
| committing to a git repo for example?
|
| The main difference is that good backup tools allow you to
| delete backups and free up the space whereas git is not really
| designed for this.
| nijave wrote:
| Also git is not good with binary and sort of bolts on git-lfs
| as a workaround
| EuAndreh wrote:
| > though borgbackup is still unable to back up to the same repo
| from multiple hosts at the same time
|
| Wouldn't that mean that, when using encrypted backups, secrets
| would have to be shared across multiple clients?
|
| If I'm understanding it correctly, it sounds like an anti-
| feature. Do other backup tools do that?
| _flux wrote:
| Yes, it seems to be the case; only the data in the server is
| encrypted, while the key is shared between clients sharing
| the same repository.
|
| I'm not sure if content-addressed storage is feasible to
| implement otherwise. Maybe use the hash of the the
| unencrypted or shared-key-encrypted as the key, and then
| encrypt the per-block keys with keys of the clients who have
| the contents would do it. In any case, I'm not aware of such
| backup tools (I imagine most just don't encrypt anything).
| formerly_proven wrote:
| CAS-based backup tools leak metadata like a sieve, so
| they're generally not the best choice for the most paranoid
| people, which should probably stick to uncompressed tar
| archives (or zips, which avoid compressing unrelated files
| together, which leaks data) padded to full 100 megs or so
| and then encrypted en bloc.
| AgentME wrote:
| Restic also works like this, and has the following benefits
| over Borg: multiple hosts can back up to the same repo, and it
| supports "dumb" remote file hosts that aren't running Borg like
| S3 or plain SFTP servers.
| johnmaguire wrote:
| I really like restic, and am personally happy to use it via
| the command line. It's very fast and efficient! However, I do
| wish there was better tooling / wrappers around it. I'd love
| to be able to set something simple up on my partner's
| Macbook.
|
| For example, Pika Backup, and Vorta are popular UIs for Borg
| of which no equivalent exists for Restic, while Borgmatic
| seems to be a de-facto standard for profile configuration.
|
| For my own purposes, I've been using a script I found on
| Github[0] for a while, but it only really supports Backblaze
| B2 AFAIK.[1] I've been meaning to try autorestic[2] and
| resticprofile[3] as they are potentially more flexible than
| the script I'm currently using but the fact that there are so
| many competing tools - many of which are no longer maintained
| - makes it difficult to choose a specific one.
|
| Prestic[4] looks intriguing for my partner's use, although it
| seems to have very few users. :\ A fork of Vorta[5] seems to
| have fizzled out six years ago.
|
| [0] https://github.com/erikw/restic-automatic-backup-
| scheduler
|
| [1] https://github.com/erikw/restic-automatic-backup-
| scheduler/i...
|
| [2] https://github.com/cupcakearmy/autorestic
|
| [3] https://github.com/creativeprojects/resticprofile
|
| [4] https://github.com/ducalex/prestic
|
| [5] https://github.com/Mebus/restatic
| e12e wrote:
| > For example, Pika Backup, and Vorta are popular UIs for
| Borg of which no equivalent exists for Restic
|
| Have you considered?:
|
| https://github.com/netinvent/npbackup
|
| Or (not FOSS, but restore-compatible):
|
| https://relicabackup.com/features
| JoshTriplett wrote:
| I'm a huge fan of restic as well. My only complaint is
| performance and memory usage. I'm looking forward to being
| able to use Rustic: https://rustic.cli.rs/
| nijave wrote:
| >borgbackup is still unable to back up to the same repo from
| multiple hosts at the same time,
|
| Basically still an issue. The machine takes an exclusive lock
| and it also adds override since each machine has to update it's
| local data cache (or whatever it's called) because they're
| constantly getting out of sync when another machine backs up
|
| bupstash looks promising as a close-to-but-more-performant borg
| alternative but it's still basically alpha quality
|
| It's unfortunate peer-to-peer Crashplan died
| totetsu wrote:
| Could this help dedupe 20 years of ad-hoc drive dumps from
| Changing systems...
| rkagerer wrote:
| For those of us who prefer not to ship to the cloud, have you
| used Kopia Repository Server and is it any good? Does it run on
| Windows?
|
| The documentation refers to files and directories. Does the
| software let you take a consistent, point-in-time snapshot of a
| whole drive (or even multiple volumes), e.g. using something
| like VSS? Or if you want that have you got to use other
| software (like Macrium Reflect) to produce a file?
|
| Where does the client cache your encryption password/key? (or
| do you have to enter it each session)
| dveeden2 wrote:
| https://apps.gnome.org/DejaDup/ is using this as backend. It also
| has a experimental option to use https://github.com/restic/restic
| instead of duplicity.
| MallocVoidstar wrote:
| I've had issues with Duplicity not performing backups when a
| single backend is down (despite it being set to continue on
| error). In my case an SFTP server lost its IPv6 routing so
| Duplicity couldn't connect to it; rather than give up on that
| backend and only use the other server it just gave up _entirely_.
| darrmit wrote:
| I've found restic + rclone to be extremely stable and reliable
| for this same sort of differential backup. I backup to Backblaze
| B2 and have also used Google Drive with success, even for 1TB+ of
| data.
| monkey26 wrote:
| I've been using Restic since 2017 without issue. Tried Kopia
| for a while, but its backup size ballooned on me, maybe it
| wasn't quite ready.
| tmalsburg2 wrote:
| +1 for restic. I tried various solutions and restic is the best
| by far. So fast, so reliable.
|
| https://restic.net/
| muppetman wrote:
| I agree. restic with it's simple ability to mount (using
| fusefs) a backup so you can just copy out the file(s) you need,
| is so wonderful. A single binary that you just download and can
| SCP around the place to any device etc.
|
| It's fantastic to have so many great open source backup
| solutions. I investigated many and settled on restic. It still
| brings me joy to actually use it, it's so simple and hassle
| free.
| baal80spam wrote:
| +1 for rclone. What a great piece of software.
| lgunsch wrote:
| Years ago I used a very simple bash script font-end for Duplicity
| called Duply. It worked very well for the half-dozen years or so
| I used it.
| diekhans wrote:
| I have been having great luck with incremental backups with the
| very similar named Duplicacy https://duplicacy.com/
| tzs wrote:
| Not to be confused with Duplicati [1] or Duplicacy [2]. There are
| too many backup programs whose names start with 'Duplic'.
|
| [1] https://www.duplicati.com/
|
| [2] https://duplicacy.com/
| jszymborski wrote:
| While we're on the topic of Duplicati, I feel the need to share
| my personal experience; one that's echoed by lots of folks
| online.
|
| Duplicati restores can take what seems like the heat death of
| the universe to restore a repo as little as 500Gb. I've lost a
| laptop worth of files to it. You can find tonnes of posts on
| the Duplicati forums which retell the same story [0].
|
| I've moved to Borg and backing up to a Hetzner Storage Box.
| I've restored many times with no issue.
|
| Remember folks, test your backups.
|
| [0] https://forum.duplicati.com/t/several-days-and-no-restore-
| fe...
| johnchristopher wrote:
| > Remember folks, test your backups.
|
| Since you mention it, I am seizing the opportunity to ask:
| how should borg backup be tested ? Can it be automated ?
| jszymborski wrote:
| It's actually pretty simple using the check command [0]!
| borg check --verify-data REPOSITORY_OR_ARCHIVE
|
| You can add that to a cron job.
|
| Alternatively, I think the Vorta GUI also has a way to
| easily schedule it[1].
|
| I'll add that one thing I like to do once in a blue-moon is
| to spin-up a VM and try to recover a few random files.
| While the check command checks that the data is there and
| theoretically recoverable, nothing really beats proving to
| yourself that you can, in a clean environment, recover your
| files.
|
| [0] https://borgbackup.readthedocs.io/en/stable/usage/check
| .html
|
| [1] https://vorta.borgbase.com/
| pdimitar wrote:
| Yes, I have a bunch of scripts that allow me to pick a Borg
| snapshot and then do stuff with it. One such action is
| `borg export-tar` that just creates a TAR file containing a
| full self-sufficient snapshot of your stuff.
|
| Then just listing the files in the archive is a not-bad way
| to find an obvious problem. Or straight up unpacking it.
|
| But if you're asking about a separate parity file that can
| be used to check and correct errors -- I haven't done that.
| rabbitofdeath wrote:
| Duplicacy for me has been amazing - I use it to backup all of
| my machines nightly all consolidated into 1 repo that is copied
| to B2 and it works amazingly. I've restored plenty and have not
| had any issues.
| BeetleB wrote:
| Duply is a good frontend for Duplicity.
|
| https://duply.net/Main_Page
| igtztorrero wrote:
| Try kopia.io is very good
| green-salt wrote:
| Seconding this, its saved me several times.
| thedanbob wrote:
| PSA for anyone else as stupid as me: when doing selective
| restores be _very_ careful about how you set target_directory.
| "duplicity restore --path-to-restore some_file source_url ~" does
| not mean "restore some_file to my home directory", it means
| "replace my home directory with some_file".
___________________________________________________________________
(page generated 2024-01-24 23:01 UTC)