[HN Gopher] Borg - Deduplicating archiver with compression and e...
       ___________________________________________________________________
        
       Borg - Deduplicating archiver with compression and encryption
        
       Author : rubyn00bie
       Score  : 112 points
       Date   : 2025-07-20 02:36 UTC (20 hours ago)
        
 (HTM) web link (www.borgbackup.org)
 (TXT) w3m dump (www.borgbackup.org)
        
       | creamyhorror wrote:
       | I remember using Borg Backup before eventually switching to
       | Duplicati. It's been a while.
        
         | Snild wrote:
         | I currently use borg, and have never heard of Duplicati. What
         | made you switch?
        
         | racked wrote:
         | I've had an awful experience with Duplicati. Unstable,
         | incomplete, hell to install natively on Linux. This was 5 years
         | ago and development in Duplicati seemed slow back then. Not
         | sure how the situation is now.
        
           | creamyhorror wrote:
           | Interesting to hear. I use Duplicati on Windows and it's been
           | fine, though I haven't extensively used its features.
        
           | jszymborski wrote:
           | Likewise. The ETA for the restore of my 500Gb HDD was like
           | 100+ years or something. It's what caused me to ditch it for
           | borg.
        
       | toenail wrote:
       | Last time I checked the deduplication only works per host when
       | backups are encrypted, which makes sense. Anyway, borg is one of
       | the three backup systems I use, it's alright.
        
         | arendtio wrote:
         | Which are the others?
        
           | guerby wrote:
           | https://kopia.io/
        
           | toenail wrote:
           | backuppc and a shell script using rsync, for backups to usb
           | sticks
        
       | ElectronBadger wrote:
       | I using it with via Vorta (https://vorta.borgbase.com) frontend.
       | My favorite backup solution so far.
        
         | Kudos wrote:
         | Pika Backup (https://apps.gnome.org/PikaBackup/) pointed at
         | https://borgbase.com is my choice.
        
       | blablabla123 wrote:
       | I once met the Borg author at a conference, pretty chill guy. He
       | said that when people file bugs because of data corruption, it's
       | because his tool found the underlying disk to be broken. Sounds
       | quite reliable although I'm mostly fine with tar...
        
         | vrighter wrote:
         | I used to work on backup software. I lost count of the number
         | of times this happened to us with our clients too
        
           | ValentineC wrote:
           | I used CrashPlan in 2014. Back then, their implementation of
           | Windows's Volume Shadow Copy Service (VSS) was buggy, and I
           | lost data because of that. I doubt my underlying disk was
           | broken.
        
         | im3w1l wrote:
         | While saying "hardware issue not my fault not my problem" is a
         | valid stance, I'm thinking that if you hear it again and again
         | from your users, maybe you should consider if you can do more.
         | Verify the file was written correctly is a low hanging fruit.
         | Other possibilities is run some s.m.a.r.t. check and show
         | warning, or adding redundancy to recover from partial failure.
        
           | ddtaylor wrote:
           | I think the failure mode that is happening for users/devs
           | here is bit rot. It's not that the device won't report back
           | the same bytes, even if you disable whatever caching is
           | happening, it's that after T amount of time it will report
           | the wrong bytes. Some file systems have "scrubs" and stuff
           | they do to automatically find these and sometimes attempt to
           | repair them (ZFS can do this).
        
       | thangngoc89 wrote:
       | I switched to restic (https://restic.net/) and the backrest webui
       | (https://github.com/garethgeorge/backrest) for Windows support.
       | Files are deduplicated across machines with good compression
       | support.
        
         | sureglymop wrote:
         | I also use restic and do backups to append-only rest-servers in
         | multiple locations.
         | 
         | I also back up multiple hosts to the same repository, which
         | actually results in insane storage space savings. One thing I'm
         | missing though is being able to specify multiple repositories
         | for one snapshot such that I have consistency across the
         | multiple backup locations. For now the snapshots just have
         | different ids.
        
           | linsomniac wrote:
           | >back up multiple hosts to the same repository
           | 
           | I haven't tried that recently (~3 years), does that work with
           | concurrency or do you need to ensure one backup is running at
           | a time? Back when I tried it I got the sense that it wasn't
           | really meant to have many machines accessing the repo at
           | once, and decided it was probably worth wasting space but
           | having potentially more robust backups. Especially for my
           | home use case where I only have a couple machines I'm backing
           | up. But it'd be pretty cool if I could replace my main backup
           | servers (using rsync --inplace and zfs snapshots) with restic
           | and get deduplication.
        
             | l33tman wrote:
             | The issue with this is that if someone hacks one of the
             | hosts now they have access to the backups of all your other
             | hosts. With borg at least and the standard setup, would be
             | cool if I was wrong though
        
               | sureglymop wrote:
               | At least with restic that is not an issue. See my other
               | comment here:
               | https://news.ycombinator.com/item?id=44626515
               | 
               | Backups are append only and each host gets its own key,
               | the keys can be individually revoked.
               | 
               | Edit: I have to correct myself. After further research,
               | it seems that append-only != write-only. Thus you are
               | correct in that a single host could possibly access/read
               | data backed up by another host. I suppose it depends on
               | use-case whether that is a problem.
        
             | sureglymop wrote:
             | It works. In general, multiple clients can back up
             | to/restore from the same repository at the same time and do
             | writes/reads in parallel. However, restic does have a
             | concept of exclusive and non-exclusive locks and I would
             | recommend reading the manual/reference section on locks. It
             | has some smart logic to detect and clean up stale locks by
             | itself.
             | 
             | Locks are created e.g. when you want to forget/prune data
             | or when doing a check. The way I handle this is that I use
             | systemd timers for my backup jobs. Before I do e.g. a check
             | command I use an ansible ad-hoc command to pause the
             | systemd units on all hosts and then wait until their
             | operations are done. After doing my modifications to the
             | repos I enable the units again.
             | 
             | Another tip is that you can create individual keys for your
             | hosts for the same repository. Each host gets its own key
             | so that host compromise only leads to that key being
             | compromised which can then be revoked after the breach. And
             | as I said I use rest-servers in append-only mode so a
             | hacker can only "waste storage" in case of a breach. And I
             | also back up to multiple different locations (sequentially)
             | so if a backup location is compromised I could recover from
             | that.
             | 
             | I don't back up the full hosts, mainly application data. I
             | use tags to tag by application, backup type, etc. One pain
             | point is, as I mentioned, that the snapshot IDs in the
             | different repositories/locations are different. Also,
             | because I back up sequentially, data may have already
             | changed between writing to the different locations. But
             | this is still better than syncing them with another tool as
             | that would be bad in case one of the backup locations was
             | compromised. The tag combinations help me deal with this
             | issue.
             | 
             | Restic really is an insanely powerful tool and can do
             | almost everything other backup tools can!
             | 
             | The only major downside to me is that it is not available
             | in library form to be used in a Go program. But that may
             | change in the future.
             | 
             | Also, what would be even cooler for the multiple backup
             | locations, is if the encrypted data could be distributed
             | using e.g. something like shamir secret sharing where you'd
             | need access to k of n backup locations to recreate the
             | secret data. That would also mean that you wouldn't have to
             | trust whatever provider you use to back up to (e.g. if it's
             | amazon s3 or something).
        
         | jeltz wrote:
         | One big advantage of using restic is that its append only
         | storage actually works unlike for Borg where it is just a hack.
        
       | rollcat wrote:
       | I've been using it for ~10 years at work and at home. Fantastic
       | software.
        
       | kachapopopow wrote:
       | Restic is far better both in terms of usability and packaging
       | (borgmatic pretty much is a requirement for usability). Have used
       | both extensively, you can argue that borg can just be scripted
       | instead and is a lot more versitile, but I had a much better
       | experience with restic in terms of setup and forget. I am not
       | scared that restic will break, with borg I did.
       | 
       | Also not sure why this was posted, did a new version release or
       | something?
        
         | mekster wrote:
         | How is the performance for both?
         | 
         | Last time I used restic a few years ago, it choked on not so
         | large data set with high memory usage. I read Borg doesn't
         | choke like that.
        
           | homebrewer wrote:
           | Depends on what you consider large; I looked at one of the
           | machines (at random), and it backups about two terabytes of
           | data spread across about a million files. Most of them aren't
           | changing day to day. I ran another backup, and restic
           | rescanned them & created a snapshot in exactly 35 seconds,
           | using ~800 MiB of RAM at peak and about 600 on average.
           | 
           | The files are on HDD, and the machine doesn't have a lot of
           | RAM, looking at high I/O wait times and low CPU load overall,
           | I'm pretty sure the bottleneck is in loading filesystem
           | metadata off disk.
           | 
           | I wouldn't backup billions of files or petabytes of data with
           | either restic or borg; stick to ZFS for anything of this
           | scale.
           | 
           | I don't remember what the initial scan time was (it was many
           | years ago), but it wasn't unreasonable -- pretty sure the
           | bottleneck also was in disk I/O.
        
         | kmarc wrote:
         | > you can argue that borg can just be scripted
         | 
         | And that's what I did myself. Organically it grew to ~200
         | lines, but it sits in the background (created a systemd unit
         | for it, too) and does its job. I also use rclone to store the
         | encrypted backups in an AWS S3 bucket
         | 
         | I so much forget about it that sometimes I have to remind
         | myself to test it out if it still works (it does).
         | Original size      Compressed size    Deduplicated size
         | All archives:                2.20 TB              1.49 TB
         | 52.97 GB
        
         | bjoli wrote:
         | Pika backup is pretty darn simple.
        
         | jszymborski wrote:
         | I use Vorta, which makes Borg use very easy.
         | 
         | https://vorta.borgbase.com/
        
         | johng wrote:
         | Emborg is also really cool:
         | https://emborg.readthedocs.io/en/stable/
        
       | sunaookami wrote:
       | Love borg, use it to backup all my servers and laptop to a
       | Hetzner Storage Box. Always impressed with the deduplication
       | stats!
        
         | stevekemp wrote:
         | Same story here, using Borg with a Hetzner storage box to give
         | me offsite backups.
         | 
         | Cheap, reliable, and almost trouble-free.
        
       | AnonC wrote:
       | I've been looking at this project occasionally for more than four
       | years. The development of version 2.0 started sometime in April
       | 2022 (IIRC) and there's still no release candidate yet. I'm
       | guessing that it'll be finished in a year from now.
       | 
       | What are the current recommendations here to do periodic backups
       | of a NAS with lower (not lowest) costs for about 1 TB of data
       | (mostly personal photos and videos), ease of use and robustness
       | that one can depend on (I know this sounds like a "pick two"
       | situation)? I also want the backup to be completely private.
        
         | homebrewer wrote:
         | You definitely should have checksumming in some form, even if
         | compression and deduplication are worthless in this particular
         | use case, so either use ZFS on both the sending and the
         | receiving side (most efficient, but probably will force you to
         | redo the NAS), or stick to restic.
         | 
         | I've been mostly using restic over the past five years to
         | backup two dozen servers + several desktops (one of them
         | Windows), no problems so far, and it's been very stable in both
         | senses of the word (absence of bugs & unchanging API -- both
         | "technical" and "user-facing").
         | 
         | https://github.com/restic/restic
         | 
         | The important thing is to run periodic scrubs with full data
         | read to check that your data can actually be restored (I do it
         | once a week; once a month is probably the upper limit).
         | restic check --read-data ...
         | 
         | Some suggestions for the receiver unless you want to go for
         | your own hardware:
         | 
         | https://www.rsync.net/signup/order.html?code=experts
         | 
         | https://www.borgbase.com
         | 
         | (the code is NOT a referral, it's their own internal thingy
         | that cuts the price in half)
        
       | rjh29 wrote:
       | People like to recommend restic but I stay with Borg because it
       | is old, popular and battle tested. Very important when dealing
       | with backing up data!
        
         | muppetman wrote:
         | Restic is hardly new and untested? I don't think they're
         | dissimilar in age. Restic is certainly battle tested. Are you
         | thinking of rustic?
        
           | rjh29 wrote:
           | It's at least 5 years older, it's not 1.0 yet, and it seems
           | to be still under heavy development. For example compression
           | was only added in 2022 and people reported severe performance
           | issues / high RAM usage with larger backups only a few years
           | ago.
           | 
           | Fair point though, both have enough of a user base that they
           | could be considered safe at this point.
        
       | TacticalCoder wrote:
       | I'll die on this hill... If may files that are named like this:
       | DSC009847.JPG
       | 
       | were actually named like this:
       | DSC009847-b3-73ea2364d158.JPG
       | 
       | where "-b3-" means "what's coming before the extension are the
       | first x bits (choose as many hexdigits as you want) of the Blake3
       | cryptographic hash of the file...
       | 
       | We'd be living in a better world.
       | 
       | I do that for _many_ of my files. Notably family pictures and
       | family movies, but also _.iso_ files, tar /gzip'ed files, etc.
       | 
       | This makes detecting bitflips trivial.
       | 
       | I've create little shellscripts for verification, backups, etc.
       | that work with files having such a naming scheme.
       | 
       | It's bliss.
       | 
       | My world is a better place now. I moved to such a scheme after I
       | had a series of 20 pictures from vacation with old friends that
       | were corrupted (thankfully I had backups, but the concept of
       | "determining which one is the correct file" programmatically is
       | not _that_ easy).
       | 
       | And, yes, it detected one bitflip since I'm using it.
       | 
       | I don't always verify all the checksums: but I've got a script
       | that does random sampling... It picks x% of the files with such a
       | naming scheme and verifies the checksum of these x% of files
       | picked randomly.
       | 
       | It's not incompatible with ZFS: I still run ZFS on my Proxmox
       | server. It's not incompatible with restic/borg/etc. either.
       | 
       | This solves so many issues, including the _" How do you know your
       | data is correct?"_ (answer is: _" Because I've already looked
       | that family movie after the cryptographic hash was added to its
       | name"_).
       | 
       | Not a panacea but doesn't hurt and it's _really_ not much work.
        
         | homebrewer wrote:
         | It's an old idea and is also how some anime fansub groups
         | prepare their releases: the filename of each episode contains
         | the CRC32 of the file inside [square brackets].
         | 
         | Doesn't really make much sense for BitTorrent uploads (which
         | provides its own much stronger hashes), it's a holdover from
         | the era of IRC bots.
        
         | networked wrote:
         | I prefer                 DSC009847.JPG.b3sum
         | 
         | sidecar files [1] or per-directory checksum files like
         | B3SUMS
         | 
         | because they can be verified with standard tools. This scheme
         | also allows you to checksum files whose names you can't or
         | don't want to change. (Though in that situation you have an
         | alternative of using a symlink for either the original name or
         | the name with the checksum.) I have used the scheme less since
         | I adopted ZFS.
         | 
         | I do use very similar _example.com /foo/bar/b3-abcd0123.html_
         | for https://example.com/foo/bar in the archival tool for
         | outgoing links on my website. It avoids the need to have a date
         | prefix like in the Wayback Machine while preventing
         | duplication.
         | 
         | Speaking of _.iso_ files. A recent PR [2] to my favorite Linux
         | USB-disk-image burning tool Caligula has added support for
         | detecting and verifying sidecar files like _foo.iso.sha256_
         | (albeit not Blake).
         | 
         | [1] https://en.wikipedia.org/wiki/Sidecar_file
         | 
         | [2] https://github.com/ifd3f/caligula/pull/186
        
       | bjoli wrote:
       | They are also a prominent user of aes-ocb iirc.
        
       | dxs wrote:
       | Also: Baqpaq
       | 
       | "Baqpaq takes snapshots of files and folders on your system, and
       | syncs them to another machine, or uploads it to your Google Drive
       | or Dropbox account. Set up any schedule you prefer and Baqpaq
       | will create, prune, sync, and upload snapshots at the scheduled
       | time.
       | 
       | "Baqpaq is a tool for personal data backups on Linux systems.
       | Powered by BorgBackup, RSync, and RClone it is designed to run on
       | Linux distributions based on Debian, Ubuntu, Fedora, and Arch
       | Linux."
       | 
       | At: https://store.teejeetech.com/product/baqpaq/
       | 
       | Though personally I use Borg, Rsync, and some scripts I wrote
       | based on Tar.
        
       | evulhotdog wrote:
       | Kopia is an awesome tool that checks the same boxes, and has a
       | wonderful GUI if you need that.
       | 
       | Not affiliated, just a happy user.
        
       | jszymborski wrote:
       | I've been using the Vorta GUI [0] and Hetzner's Storage Box
       | service for ages and it works great. Has saved me from some
       | headaches.
       | 
       | I switched over from Duplicati a long while back when my laptop's
       | sole HDD failed and Duplicati was giving me 143 year estimates
       | for the restore to complete. This was true whether I aimed to
       | restore the whole drive or just a single file.
       | 
       | https://vorta.borgbase.com/
        
       | johng wrote:
       | Plakar is a new project out there that is interesting.... lots of
       | cool stuff happening.
       | 
       | https://plakar.io/
        
       ___________________________________________________________________
       (page generated 2025-07-20 23:02 UTC)