[HN Gopher] Duplicity: Encrypted bandwidth-efficient backup
       ___________________________________________________________________
        
       Duplicity: Encrypted bandwidth-efficient backup
        
       Author : GTP
       Score  : 91 points
       Date   : 2024-01-24 13:37 UTC (9 hours ago)
        
 (HTM) web link (duplicity.us)
 (TXT) w3m dump (duplicity.us)
        
       | AdmiralAsshat wrote:
       | Brilliant name, if you think about it. If they ever decided to
       | start doing shady shit, they'd have a perfect legal shield. No
       | one would be able to convincingly argue in court that they were
       | being duplicitous.
        
         | mlyle wrote:
         | If no one can argue they're duplicitous, then it's a case of
         | false advertising...
        
       | longwave wrote:
       | I used this many, many years ago but switched to Borg[0] about
       | five years ago. Duplicity required full backups with incremental
       | deltas, which meant my backups ended up taking too long and using
       | too much disk space. Borg lets you prune older backups at will,
       | because of chunk tracking and deduplication there is no such
       | thing as an incremental backup.
       | 
       | [0] https://www.borgbackup.org/
        
         | giamma wrote:
         | Same for me. Also, on MacOs duplicity was consuming much more
         | CPU than Borg and was causing my fan to spin loudly. Eventually
         | I moved to timemachine, but I still consider Borg a very good
         | option.
        
         | zzzeek wrote:
         | I have an overnight cron that flattens my duplicity backups
         | from many incremental backups made over the course of one day
         | to a single full backup file, that becomes the new backup. then
         | subsequent backups over the course of the day do incremental on
         | that file. So I always have full backups for each individual
         | day with only a dozen or so incremental backups tacked onto it.
         | 
         | that said will give Borg a look
        
         | tussa wrote:
         | I did the same. I had some weird path issues with Duplicity.
         | 
         | Borg is now my holy backup grail. Wish I could backup
         | incrementally to AWS glacier storage but that just me sounding
         | like an ungrateful begger. I'm incredibly grateful and happy
         | with Borg!
        
         | sigio wrote:
         | Agree completely... used duplicity many years ago, but switched
         | to Borg and never looked back. Currently doing borg-backups of
         | quite a lot of systems, many every 6 hours, and some, like my
         | main shell-host every 2 hours.
         | 
         | It's quick, tiny and easy... and restores are the easiest, just
         | mount the backup, browse the snapshot, and copy files where
         | needed.
        
           | stavros wrote:
           | After Borg, I switched to Restic:
           | 
           | https://restic.net/
           | 
           | AFAIK, the only difference is that Restic doesn't require
           | Restic installed on the remote server, so you can efficiently
           | backup to things like S3 or FTP. Other than that, both are
           | fantastic.
        
             | pdimitar wrote:
             | Technically Borg doesn't require it either, you can backup
             | to a local directory and then use `rclone` to upload the
             | repo wherever.
             | 
             | Not practical for huge backups but it works for me as I'm
             | backing up my machines configuration and code directories
             | only. ~60MB, and that includes a lot of code and some data
             | (SQL, JSON et. al.)
        
       | jwr wrote:
       | Excellent piece of software, and relatively simple to use with
       | gpg encryption. I've been using it for many years.
       | 
       | My only complaint is that, like a lot of software written in
       | Python, it has no regard for traditional UNIX behavior (keep
       | quiet unless you have something meaningful to say), so I have to
       | live with cron reporting stuff like:
       | 
       | "/usr/lib/python2.7/dist-packages/paramiko/rsakey.py:99:
       | DeprecationWarning: signer and verifier have been deprecated.
       | Please use sign and verify instead. algorithm=hashes.SHA1()"
       | 
       | along with stuff I actually do (or might) care about.
       | 
       | Oh well.
        
         | tynorf wrote:
         | You can try setting `export PYTHONWARNINGS="ignore"` to
         | suppress warnings.
        
       | 65 wrote:
       | If you're using S3 to back up your files, it's easier to write a
       | shell script with the AWS CLI. For example here's a script I
       | wrote that I run automatically to back up my computer to S3. I
       | have an exclude array to exclude certain folders. It's simpler
       | than downloading software and more customizable.
       | 
       | # $1 # local folder
       | 
       | # $2 # bucket
       | 
       | declare -a exclude=(                 "node_modules"
       | "Applications"            "Public"
       | 
       | )
       | 
       | args=""
       | 
       | for item in "${exclude[@]}";
       | 
       | do                 args+=" --exclude '*/$item/*' --exclude
       | '$item/*'";
       | 
       | done
       | 
       | cmd="aws s3 sync '$1' 's3://$2$1' --include '*' $args"
       | 
       | eval "$cmd"
        
         | rakoo wrote:
         | Your script doesn't do the same thing as duplicity. Your script
         | mirrors the local directory with your bucket. It loses all
         | history. Duplicity does backups (ie with history) but not just
         | that, it does differential backups to not upload everything all
         | the time.
        
           | 65 wrote:
           | S3 has bucket versioning if you want to have multiple
           | backups. The S3 sync command also does differential backups;
           | if you for example try to run the script over and over it
           | will only upload new/different files.
        
             | res0nat0r wrote:
             | The major issue out of the box vs any deduping backup
             | software is that S3 doesnt support any deduplication. If
             | you move or rename a 15GB file you're going to have to
             | completely upload it again and also store a second copy and
             | pay for it until your S3 bucket policy purges the
             | previously uploaded file you've deleted. Also aws s3 sync
             | is much slower since it has to iterate over all of the
             | files to see if their size/timestamp has changed. Something
             | like borgbackup is much faster as it uses smarter caching
             | to skip unchanged directories etc.
        
               | 65 wrote:
               | It's possible to find probable duplicate files with the
               | S3 CLI based on size and tags - I was working a script to
               | do just that but I haven't finished it yet. Alternatively
               | if you want exact backups of your computer you can use
               | the --delete flag which will delete files in the bucket
               | that aren't in the source.
               | 
               | I agree this is not the absolute most optimized solution
               | but it does work quite well for me and is easily
               | extendible with other scripts and S3 CLI commands.
               | Theoretically if Borgbackup or Duplicity are backing up
               | to S3 they're using all the same commands as the S3
               | CLI/SDK.
               | 
               | Besides, shell scripting is fun!
        
               | sevg wrote:
               | If I have to choose between hacking together a bunch of
               | shell scripts to do my deduplicated, end-to-end encrypted
               | backups, vs using a popular open source well-tested off
               | the shelf solution, I know which one I'm picking!
        
         | Scarbutt wrote:
         | no encryption though
        
           | 65 wrote:
           | You can encrypt S3 buckets/files inside your buckets. By
           | default buckets are encrypted.
        
             | sevg wrote:
             | Not end-to-end encryption.
        
       | mrich wrote:
       | If you don't need incremental backups (thus saving space for the
       | signatures) and want to store to S3 Deep Glacier, take a look at
       | https://github.com/mrichtarsky/glacier_deep_archive_backup
        
       | _flux wrote:
       | I've moved to using backup tools using content-based ids with
       | rolling window hashes, which allows deduplicating content even
       | between different hosts--and crucially handles moving content
       | from one host to another efficiently--even though in other
       | scenarios I'm guessing rdiff-algorithm can produce smaller
       | backups.
       | 
       | The problem I have with duplicity and backups tools of its kind
       | is that you still need to create a full backup again
       | periodically, unless you want to have an ever-growing sequence of
       | increments from the day you started doing backups.
       | 
       | Content-addressed backups avoid that, because all snapshots are
       | complete (even if the backup process itself is incremental), but
       | their content blobs are shared and eventually garbage collected
       | when no references exist to them.
       | 
       | My tool of choice is kopia. Also borgbackup does similar things
       | (though borgbackup is still unable to back up to the same repo
       | from multiple hosts at the same time, though I haven't checked
       | this for a while). Both do encryption, but its symmetric, so the
       | client will have keys to opening the backups as well. If you
       | require asymmetric encryption then these tools are not for you--
       | though I guess this is not a technical requirement for this
       | approach, so maybe one day a content-addressed backup tool with
       | asymmetric encryption will appear?
        
         | mizzao wrote:
         | Content-addressed backups sound something like how git stores
         | data, is that the best way to think about them?
         | 
         | And if so, what would be the main differences between just
         | committing to a git repo for example?
        
           | KMag wrote:
           | The "rolling window hashes" from the comment suggests sub-
           | file matching at any offset. (See Bently-McIlroy diff
           | algo/how rsync efficiently finds matches, for example.) I'm
           | not aware that git performs this sort of deduplication.
           | 
           | Better yet would be to use a rolling hash to decide where to
           | cut the blocks, and then use a locality-aware hash (SimHash,
           | etc.) to find similar blocks. Perform a topological sort to
           | decide which blocks to store as diffs of others.
           | 
           | Microsoft had some enterprise product that performed
           | distribution somewhat like this, but also recursively using
           | similarity hashes to see if the diffs were similar to
           | existing files on the far machine.
        
         | Linux-Fan wrote:
         | > Content-addressed backups sound something like how git stores
         | data, is that the best way to think about them?
         | 
         | I think it is a valid way to consider them. Another option is
         | to think of the backup as a special kind of file system
         | snapshot that manifests itself as real files as opposed to data
         | on a block device.
         | 
         | > And if so, what would be the main differences between just
         | committing to a git repo for example?
         | 
         | The main difference is that good backup tools allow you to
         | delete backups and free up the space whereas git is not really
         | designed for this.
        
           | nijave wrote:
           | Also git is not good with binary and sort of bolts on git-lfs
           | as a workaround
        
         | EuAndreh wrote:
         | > though borgbackup is still unable to back up to the same repo
         | from multiple hosts at the same time
         | 
         | Wouldn't that mean that, when using encrypted backups, secrets
         | would have to be shared across multiple clients?
         | 
         | If I'm understanding it correctly, it sounds like an anti-
         | feature. Do other backup tools do that?
        
           | _flux wrote:
           | Yes, it seems to be the case; only the data in the server is
           | encrypted, while the key is shared between clients sharing
           | the same repository.
           | 
           | I'm not sure if content-addressed storage is feasible to
           | implement otherwise. Maybe use the hash of the the
           | unencrypted or shared-key-encrypted as the key, and then
           | encrypt the per-block keys with keys of the clients who have
           | the contents would do it. In any case, I'm not aware of such
           | backup tools (I imagine most just don't encrypt anything).
        
             | formerly_proven wrote:
             | CAS-based backup tools leak metadata like a sieve, so
             | they're generally not the best choice for the most paranoid
             | people, which should probably stick to uncompressed tar
             | archives (or zips, which avoid compressing unrelated files
             | together, which leaks data) padded to full 100 megs or so
             | and then encrypted en bloc.
        
         | AgentME wrote:
         | Restic also works like this, and has the following benefits
         | over Borg: multiple hosts can back up to the same repo, and it
         | supports "dumb" remote file hosts that aren't running Borg like
         | S3 or plain SFTP servers.
        
           | johnmaguire wrote:
           | I really like restic, and am personally happy to use it via
           | the command line. It's very fast and efficient! However, I do
           | wish there was better tooling / wrappers around it. I'd love
           | to be able to set something simple up on my partner's
           | Macbook.
           | 
           | For example, Pika Backup, and Vorta are popular UIs for Borg
           | of which no equivalent exists for Restic, while Borgmatic
           | seems to be a de-facto standard for profile configuration.
           | 
           | For my own purposes, I've been using a script I found on
           | Github[0] for a while, but it only really supports Backblaze
           | B2 AFAIK.[1] I've been meaning to try autorestic[2] and
           | resticprofile[3] as they are potentially more flexible than
           | the script I'm currently using but the fact that there are so
           | many competing tools - many of which are no longer maintained
           | - makes it difficult to choose a specific one.
           | 
           | Prestic[4] looks intriguing for my partner's use, although it
           | seems to have very few users. :\ A fork of Vorta[5] seems to
           | have fizzled out six years ago.
           | 
           | [0] https://github.com/erikw/restic-automatic-backup-
           | scheduler
           | 
           | [1] https://github.com/erikw/restic-automatic-backup-
           | scheduler/i...
           | 
           | [2] https://github.com/cupcakearmy/autorestic
           | 
           | [3] https://github.com/creativeprojects/resticprofile
           | 
           | [4] https://github.com/ducalex/prestic
           | 
           | [5] https://github.com/Mebus/restatic
        
             | e12e wrote:
             | > For example, Pika Backup, and Vorta are popular UIs for
             | Borg of which no equivalent exists for Restic
             | 
             | Have you considered?:
             | 
             | https://github.com/netinvent/npbackup
             | 
             | Or (not FOSS, but restore-compatible):
             | 
             | https://relicabackup.com/features
        
           | JoshTriplett wrote:
           | I'm a huge fan of restic as well. My only complaint is
           | performance and memory usage. I'm looking forward to being
           | able to use Rustic: https://rustic.cli.rs/
        
         | nijave wrote:
         | >borgbackup is still unable to back up to the same repo from
         | multiple hosts at the same time,
         | 
         | Basically still an issue. The machine takes an exclusive lock
         | and it also adds override since each machine has to update it's
         | local data cache (or whatever it's called) because they're
         | constantly getting out of sync when another machine backs up
         | 
         | bupstash looks promising as a close-to-but-more-performant borg
         | alternative but it's still basically alpha quality
         | 
         | It's unfortunate peer-to-peer Crashplan died
        
         | totetsu wrote:
         | Could this help dedupe 20 years of ad-hoc drive dumps from
         | Changing systems...
        
         | rkagerer wrote:
         | For those of us who prefer not to ship to the cloud, have you
         | used Kopia Repository Server and is it any good? Does it run on
         | Windows?
         | 
         | The documentation refers to files and directories. Does the
         | software let you take a consistent, point-in-time snapshot of a
         | whole drive (or even multiple volumes), e.g. using something
         | like VSS? Or if you want that have you got to use other
         | software (like Macrium Reflect) to produce a file?
         | 
         | Where does the client cache your encryption password/key? (or
         | do you have to enter it each session)
        
       | dveeden2 wrote:
       | https://apps.gnome.org/DejaDup/ is using this as backend. It also
       | has a experimental option to use https://github.com/restic/restic
       | instead of duplicity.
        
       | MallocVoidstar wrote:
       | I've had issues with Duplicity not performing backups when a
       | single backend is down (despite it being set to continue on
       | error). In my case an SFTP server lost its IPv6 routing so
       | Duplicity couldn't connect to it; rather than give up on that
       | backend and only use the other server it just gave up _entirely_.
        
       | darrmit wrote:
       | I've found restic + rclone to be extremely stable and reliable
       | for this same sort of differential backup. I backup to Backblaze
       | B2 and have also used Google Drive with success, even for 1TB+ of
       | data.
        
         | monkey26 wrote:
         | I've been using Restic since 2017 without issue. Tried Kopia
         | for a while, but its backup size ballooned on me, maybe it
         | wasn't quite ready.
        
         | tmalsburg2 wrote:
         | +1 for restic. I tried various solutions and restic is the best
         | by far. So fast, so reliable.
         | 
         | https://restic.net/
        
         | muppetman wrote:
         | I agree. restic with it's simple ability to mount (using
         | fusefs) a backup so you can just copy out the file(s) you need,
         | is so wonderful. A single binary that you just download and can
         | SCP around the place to any device etc.
         | 
         | It's fantastic to have so many great open source backup
         | solutions. I investigated many and settled on restic. It still
         | brings me joy to actually use it, it's so simple and hassle
         | free.
        
         | baal80spam wrote:
         | +1 for rclone. What a great piece of software.
        
       | lgunsch wrote:
       | Years ago I used a very simple bash script font-end for Duplicity
       | called Duply. It worked very well for the half-dozen years or so
       | I used it.
        
       | diekhans wrote:
       | I have been having great luck with incremental backups with the
       | very similar named Duplicacy https://duplicacy.com/
        
       | tzs wrote:
       | Not to be confused with Duplicati [1] or Duplicacy [2]. There are
       | too many backup programs whose names start with 'Duplic'.
       | 
       | [1] https://www.duplicati.com/
       | 
       | [2] https://duplicacy.com/
        
         | jszymborski wrote:
         | While we're on the topic of Duplicati, I feel the need to share
         | my personal experience; one that's echoed by lots of folks
         | online.
         | 
         | Duplicati restores can take what seems like the heat death of
         | the universe to restore a repo as little as 500Gb. I've lost a
         | laptop worth of files to it. You can find tonnes of posts on
         | the Duplicati forums which retell the same story [0].
         | 
         | I've moved to Borg and backing up to a Hetzner Storage Box.
         | I've restored many times with no issue.
         | 
         | Remember folks, test your backups.
         | 
         | [0] https://forum.duplicati.com/t/several-days-and-no-restore-
         | fe...
        
           | johnchristopher wrote:
           | > Remember folks, test your backups.
           | 
           | Since you mention it, I am seizing the opportunity to ask:
           | how should borg backup be tested ? Can it be automated ?
        
             | jszymborski wrote:
             | It's actually pretty simple using the check command [0]!
             | borg check --verify-data REPOSITORY_OR_ARCHIVE
             | 
             | You can add that to a cron job.
             | 
             | Alternatively, I think the Vorta GUI also has a way to
             | easily schedule it[1].
             | 
             | I'll add that one thing I like to do once in a blue-moon is
             | to spin-up a VM and try to recover a few random files.
             | While the check command checks that the data is there and
             | theoretically recoverable, nothing really beats proving to
             | yourself that you can, in a clean environment, recover your
             | files.
             | 
             | [0] https://borgbackup.readthedocs.io/en/stable/usage/check
             | .html
             | 
             | [1] https://vorta.borgbase.com/
        
             | pdimitar wrote:
             | Yes, I have a bunch of scripts that allow me to pick a Borg
             | snapshot and then do stuff with it. One such action is
             | `borg export-tar` that just creates a TAR file containing a
             | full self-sufficient snapshot of your stuff.
             | 
             | Then just listing the files in the archive is a not-bad way
             | to find an obvious problem. Or straight up unpacking it.
             | 
             | But if you're asking about a separate parity file that can
             | be used to check and correct errors -- I haven't done that.
        
         | rabbitofdeath wrote:
         | Duplicacy for me has been amazing - I use it to backup all of
         | my machines nightly all consolidated into 1 repo that is copied
         | to B2 and it works amazingly. I've restored plenty and have not
         | had any issues.
        
       | BeetleB wrote:
       | Duply is a good frontend for Duplicity.
       | 
       | https://duply.net/Main_Page
        
       | igtztorrero wrote:
       | Try kopia.io is very good
        
         | green-salt wrote:
         | Seconding this, its saved me several times.
        
       | thedanbob wrote:
       | PSA for anyone else as stupid as me: when doing selective
       | restores be _very_ careful about how you set target_directory.
       | "duplicity restore --path-to-restore some_file source_url ~" does
       | not mean "restore some_file to my home directory", it means
       | "replace my home directory with some_file".
        
       ___________________________________________________________________
       (page generated 2024-01-24 23:01 UTC)