hngopher.com

       [HN Gopher] The various scripts I use to back up my home compute...
       ___________________________________________________________________
        
       The various scripts I use to back up my home computers using SSH
       and rsync
        
       Author : tosh
       Score  : 138 points
       Date   : 2022-12-09 15:27 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | smm11 wrote:
       | I gave up on this at home long ago, and just use Onedrive for
       | everything. I don't even have "local" files. My stuff is there,
       | and in the event my computer won't start up I lose what's open in
       | the browser. I can handle that.
       | 
       | At work I use Windows backup to write to empty SMB-mounted drives
       | nightly, then write those daily to another drive on an offline
       | Fedora box.
       | 
       | My super critical files are on an encrypted SD card I sometimes
       | put in my phone when cellular connection is off, and this is
       | periodically backed up to Glacier. The phone (Galaxy) runs Dex
       | and can be my computer when needed to work with these files.
        
       | pmontra wrote:
       | His backup rotation algorithm is very close to what rsnapshot
       | does.
       | 
       | https://rsnapshot.org/
        
         | NelsonMinar wrote:
         | I use rsnapshot still! It feels very old fashioned but it works
         | reliably and is easy to understand.
        
           | mekster wrote:
           | It's good to keep multiple backups with different
           | implementations local and remote.
           | 
           | Rsnapshot is hard to break by using very basic principles of
           | file system based files and hard links. If your file system
           | isn't zfs, I think it's a viable backup strategy for local
           | copy while you can use others to take remote backups.
        
       | PopAlongKid wrote:
       | >I don't use Windows at the moment and don't really mount network
       | drives, either. That might be a good alternative to consider.
       | 
       | Regarding Windows:
       | 
       | I have successfully mirrored a notebook and a desktop[0] (single
       | user) with Windows using _robocopy_ , which is a utility that
       | comes with Windows (used to be part of the Resource Kit but I
       | think it is now in the base product). When I say "mirror" I mean
       | I can use either machine as my current workstation without any
       | loss of data, as long as I run the "sync" script at each switch.
       | 
       | I use "net use" to temporarily mount a few critical drives on the
       | local network, then _robocopy_ does its work, it has maybe 85% of
       | the same functionality of rsync (which I also used extensively
       | when administering corporate servers and workstations). Back in
       | the DOS days, I wrote my own very simple version of the same
       | thing using C, but when _robocopy_ came along I was glad to stop
       | maintaining my own effort.
       | 
       | [0]or two desktops, using removable high-capacity media like
       | Iomega zip drives.
        
         | gary_0 wrote:
         | I use MSYS2 on Windows in order to run regular rsync and other
         | such utilities. It's served me very well for years. I also have
         | some bash scripts that I can conveniently run on either Linux
         | or Windows via MSYS2.
        
         | EvanAnderson wrote:
         | Robocopy is very nice but has no delta compression
         | functionality. For things like file server migrations (where I
         | want to preserve ACLs, times, etc) robocopy is my go-to tool.
         | 
         | I've used the cwRsync[0] binary distribution of rsync on
         | Windows for backups. I found it worked very well for simple
         | file backups. I never did get around to trying to combine it
         | with Volume Shadow Copy to make consistent backups of the
         | registry and applications like Microsoft SQL Server. (I
         | wouldn't expect to get a bootable restore from such a backup,
         | though.)
         | 
         | [0] https://www.itefix.net/cwrsync
        
           | rzzzt wrote:
           | I used QtdSync, another frontend backed by a Windows rsync
           | binary. A nice feature was that it supported the "duplicate
           | entire target folder with hard links, then overwrite changes
           | only"-style on NTFS volumes, so I could have lots of
           | browseable point-in-time backup folders without consuming
           | extra disk space:
           | https://www.qtdtools.de/page.php?tool=0&sub=1&lang=en
        
       | wereallterrrist wrote:
       | I find it very, very hard to go wrong with Syncthing (for stuff I
       | truly need replicated, code/photos/text-records) and ZFS +
       | znapzend + rsync.net (automatic snapshots of `/home` and
       | `/var/lib` on servers).
       | 
       | The only thing missing is -> I'd like to stop syncing code with
       | Syncthing and instead build some smarter daemon. The daemon would
       | take a manifest of repositories, each with a mapping of
       | worktrees->branches to be actualized and fsmonitored. The daemon
       | would auto-commit changes on those worktrees into a shadow branch
       | and push/pull it. Ideally this could leverage (the very amazing,
       | you must try it) `jj` for continous committing of the working
       | copy and (in the future, with native jj formart) even handle the
       | likely-never-to-happen conflict scenario. (I'd happily
       | collaborate on a Rust impl and/or donate funds to one.)
       | 
       | Given the number of worktrees I have of some huge repos (nixpkgs,
       | linux, etc) it would likely mark a significant reduction in
       | CPU/disk usage given what Syncthing is having to do now to
       | monitor/rescan as much as I'm asking it to (given it has to dumb-
       | sync .git, syncs gitignored content, etc, etc).
        
         | hk1337 wrote:
         | I use Syncthing between Mac, Windows (have included Linux in
         | the mix at one point), and with my Synology NAS. Syncthing is
         | more for my short term backup though. I will either commit it
         | to a repo, save it to a Synology share, or delete it.
         | 
         | *edit* my gitea server saves its backups to synology
        
         | than3 wrote:
         | I hate to be the one to point out the obvious, but replication
         | isn't a backup. Its for resiliency just like RAID, the two
         | aren't the same.
        
           | whalesalad wrote:
           | What is the actual difference between a backup and
           | replication? If the 1's and 0's are replicated to a different
           | host, is that any different than "backing up" (replicating
           | them) to a piece of external media?
        
             | jjav wrote:
             | > What is the actual difference between a backup and
             | replication?
             | 
             | Simplest way to think about it is that a backup must be an
             | immutable snapshot in time. Any changes and deletions which
             | happen after that point in time will never reflect back
             | onto the backup.
             | 
             | That way, any files you accidentaly delete or corrupt (or
             | other unwanted changes, like ransomware encrypting them for
             | you) can be recovered by going back to the backup.
             | 
             | Replication is very different, you intentionally want all
             | ongoing changes to replicate to the multiple copies for
             | availability. But it means that unwanted changes or data
             | corruption happily replicates to all the copies so now all
             | of them are corrupt. That's when you reach for the most
             | recent backup.
             | 
             | That's why you always need to backup and you'll usually
             | want to replicate as well.
        
             | chrishas35 wrote:
             | When those 1s and 0s are deleted and that delete is
             | replicated (or other catastrophic change, such as
             | ransomware) you presumably don't have the ability to
             | restore if all you're doing is replication. A strategy that
             | layers replication + backup/versioning is the goal.
        
               | natebc wrote:
               | I'll add that _usually_ a backup strategy includes
               | generational backups of some kind. That is daily, weekly,
               | monthly, etc to hedge against individually impacted files
               | as mentioned.
               | 
               | Ideally there is also an offsite and inaccessible from
               | the source component to this strategy. Usually this level
               | of robustness isn't present in a "replication" setup.
        
               | than3 wrote:
               | Put more simply, backups account for and mitigate the
               | common risks to data during storage while minimizing
               | costs, ransomware is one of those common risks. Its
               | organizational dependent based on costs and available
               | budget so it varies.
               | 
               | Long term storage usually has some form of Forward Error
               | Correction (FEC) protection schemes (for bitrot), and
               | often backups are segmented which may be a mix of full
               | and iterative, or delta backups (to mitigate cost) with
               | corresponding offline components (for ransomware
               | resiliency), but that too is very dependent on the
               | environment as well as the strategy being used for data
               | minimization.
               | 
               | > Usually this level of robustness isn't present in a
               | "replication" setup.
               | 
               | Exactly, and thinking about replication as a backup often
               | also gives those using it a false sense of security in
               | any BC/DR situations.
        
           | NelsonMinar wrote:
           | Syncthing has file versioning but I don't know for sure if
           | it's suitable for backup.
           | https://docs.syncthing.net/users/versioning.html
        
           | reacharavindh wrote:
           | Replication to another machine that has a COW file system
           | with snapshots is backup though :-)
           | 
           | We backup our data storage for an entire HPC cluster, about 2
           | PiB of it to a single machine with a 4 disk shelves running
           | ZFS with snapshots. It works very well. Simple raunchy every
           | night, and snapshotted.
           | 
           | We use the backup as a sort of Time Machine should we need
           | data from the past that we deleted in the primary. Plus, we
           | don't need to wait for the tapes to load or anything.. it is
           | pretty fast and intuitive
        
           | jerf wrote:
           | The person you're replying to said "Syncthing ... and ZFS +
           | znapzend + rsync.net" though. You're ignoring the rsync.net
           | part.
           | 
           | I have something similar; it's Nextcloud + restic to AWS S3,
           | but it's the same principle. You can give people the
           | convenience and human-comprehensibility of sync-based
           | sharing, but also back that up too, for the best of both
           | worlds. Though in my case the odds of me needing "previous
           | versions" of things approach zero and a full sync is fairly
           | close to backup, but, even so I do have a full solution here.
        
           | jrm4 wrote:
           | But, it makes things easy. I have e.g. a home computer, a
           | server in the closet thing, a laptop and a work computer all
           | with a shared Syncthing folder.
           | 
           | So to bolster that other thing, I just have a simple bash
           | script that reminds me every 7 days to make a copy of that
           | folder somewhere else on that machine. It's not precise
           | because I often don't know what machine I will be using, but
           | that creates a natural staggering that I figure should be
           | sufficient of something goes weird and lose something; like
           | I'm likely to have an old copy somewhere?
        
         | killingtime74 wrote:
         | For code I just use a self hosted git server
        
         | acranox wrote:
         | Sparkleshare does something kind of similar. It uses git as the
         | backend automatically sync directories on a few computers.
         | https://www.sparkleshare.org/
        
         | JeremyNT wrote:
         | > Given the number of worktrees I have of some huge repos
         | (nixpkgs, linux, etc) it would likely mark a significant
         | reduction in CPU/disk usage given what Syncthing is having to
         | do now to monitor/rescan as much as I'm asking it to (given it
         | has to dumb-sync .git, syncs gitignored content, etc, etc).
         | 
         | Are you really hitting that much of a resource utilization
         | issue with syncthing though? I use it on lots of small files
         | and git repos and since it uses inotify there's not really much
         | of a problem. I guess the worst case is switching to very
         | different branches frequently, or committing very large
         | (binary?) files where it may need to transfer them twice, but
         | this hasn't been a problem in my own experience.
         | 
         | I'm not sure you could really do a whole lot better than
         | syncthing by being clever, and it strikes me as a lot of effort
         | to optimize for a specific workflow.
         | 
         | Edit: actually, I wonder if you could just exclude the working
         | copies with a clever exclude list in syncthing, such that you'd
         | ONLY grab .git so you wouldn't even need the double
         | transfer/storage. You risk losing uncommitted work I suppose.
        
         | fncivivue7 wrote:
         | Sounds like you want Borg
         | 
         | https://borgbackup.readthedocs.io/en/stable/
         | 
         | My two 80% full 1tb laptops and 1tb desktop backup to around
         | 300-400G after dedupe and compression. Currently have around
         | 12tb of backups stored in that 300G.
         | 
         | Incremental backups run in about 5 mins even against the
         | spinning disk's they're stored on.
        
           | _dain_ wrote:
           | They work together. I use syncthing to keep things
           | synchronized across devices, including to an always-on
           | "master" device that has more storage. Then borg runs on the
           | master device to create backups.
        
           | 0cf8612b2e1e wrote:
           | Python programmer here, but I actually prefer Restic [0].
           | While more or less the same experience, the huge selling
           | point to me is that the backup program is a single executable
           | that can be easily stored alongside the backups. I do not
           | want any dependency/environment issues to assert themselves
           | when restoration is required (which is most likely on a
           | virgin, unconfigured system).
           | 
           | [0] https://restic.net/
        
             | SomeoneOnTheWeb wrote:
             | You can also take a look at Kopia (https://kopia.io/).
             | 
             | I've been using Borg, Restic and Kopia for a long time and
             | Kopia is my personal favorite - very fast, very efficient,
             | runs in the background automatically without having to
             | schedule a CRON or anything like that.
             | 
             | Only downside is that the backups are made of a HUGE number
             | of files, so when synchronizing it can sometimes take a bit
             | of time to check the ~5k files.
        
               | klodolph wrote:
               | I've been using Kopia, I recommend it.
        
               | wanderingmind wrote:
               | Highly recommend Kopia that has a nice UI and can work
               | with rclone (so any cloud back end)
        
           | codethief wrote:
           | I don't think GP was talking about backups (which is what
           | Borg is good for) but about _synchronization_ between
           | machines which is another issue entirely.
        
           | wereallterrrist wrote:
           | No, I distinctly don't want borg. It doesn't help or solve
           | anything that Syncthing doesn't do. The obsession with borg
           | and bup are pretty baffling to me. We deserve better in this
           | space. (see: Asuran and another who's name I forget...)
           | 
           | Critically, I'm specifically referring to code sync that
           | needs to operate at a git-level to get the huge efficiencies
           | I'm thinking of.
           | 
           | Syncthing, or borg, scanning 8 copies of the Linux kernel is
           | pretty horrific compared to something doing a "git commit &&
           | git push" and "git pull --rebase" in the background (over-
           | simplifying the shadow-branch process here for brevity.)
           | 
           | re: 'we deserve better' -- case in point, see Asuran -
           | there's no real reason that sync and backup have to be
           | distinctly different tools. Given chunking and dedupe and
           | append-logs, we really, really deserve better in this tooling
           | space.
        
             | formerly_proven wrote:
             | borg et al and "git commit" work in essentially the same
             | way. Both scan the entire tree for changes using
             | modification timestamps.
        
               | dragonwriter wrote:
               | > borg et al and "git commit" work in essentially the
               | same way. Both scan the entire tree for changes using
               | modification timestamps.
               | 
               | But git commit _doesn't_ do that. If you want to do that
               | in git, you typically do it before commit with "git add
               | -A".
        
               | [deleted]
        
         | ww520 wrote:
         | Yes. I just let Syncthing sync among devices, using it for
         | creating copies of the backup. The daily backup scripts do
         | their things and create one backup snapshot, then Syncthing
         | picks up the new backup files and propagate them to multiple
         | devices.
        
       | blindriver wrote:
       | I use Synology to back everything up, and then from there I use
       | Hyperbackup to backup to 2 external hard drives every week. When
       | the hard drives get full, I buy a new one that is larger and I
       | put the old one into my closet and date it.
       | 
       | Now that you reminded me, it might be best to buy a new larger
       | hard drive if there are any pre-Christmas sales.
        
         | kevstev wrote:
         | Have you looked into backing up into the cloud? I used to do
         | this way back in the day, but by using AWS I get legit offsite
         | storage. Its really cheap if you use glacier, and I was
         | actually looking this week, and there is now an even cheaper
         | option called Deep Archive. It costs me about $2 a month to
         | store my stuff there. I just back up the irreplaceable things-
         | my photos, documents, etc. All the other stuff is backed up on
         | TPB or github for me.
        
           | blindriver wrote:
           | I don't trust backing up to the cloud, I just do everything
           | on site and hope there's nothing catastrophic!
        
       | kkfx wrote:
       | Oh, curious, It's the first backup in clojure I've seen :-)
       | 
       | My personal recipe is less sophisticated:
       | 
       | - znapzend on all home machines send to a homeserver regularly
       | (with enough storage), partially replicated between
       | desktops/laptop
       | 
       | - homeserver backup itself via simple incremental zfs send +
       | mbuffer with one snapshot per day (last 2 days), one per week
       | (last 2 w) and one per month (last 1 month) offsite
       | 
       | - manually triggered offline local backup of the homeserver on
       | external USB drives and a physically mirrored home server,
       | normally on weekly basis
       | 
       | Nothing more, nothing less. On any major NixOS release update I
       | rebuild one homeserver and a month or so later the second one.
       | Desktops and homeserver custom iso are built automatically every
       | Sunday and just left there (I know, it simply took to much time
       | checking so...).
       | 
       | Essentially in case of a fault of a machine I still have data,
       | config and ready iso for a quick reinstall. In case of logical
       | faults (like a direct attack who compromise my data AND zfs
       | itself) there is not much protection beside different sync times
       | (I do NOT use all desktops/latptop at once, when they are powered
       | off they remain behind and I have normally plenty of time to see
       | most casual potential attacks.
       | 
       | Long story short for anyone: when you talk about backups talk
       | about how you restore, or your backups will probably be just
       | useless bits a day...
        
       | e1g wrote:
       | Recent versions of rsync support zstd compression, which can
       | improve speed and reduce the load on both sides. You can check if
       | your rsync supports that with "rsync -h | grep zstd" and instruct
       | to use it with "-z --zc=zstd"
       | 
       | However, compression is useful in proportion to how crappy the
       | network is and how compressable the content is (e.g., text
       | files). This repo is about backing up user files to an external
       | SSD with high bandwidth and low latency, and applying compression
       | likely makes the process slower.
        
         | greggyb wrote:
         | Compression is useful even with directly attached storage
         | devices. Disk IO is still slower than compression throughput
         | unless you are running very fast storage.
         | 
         | If your workload is IO-bound, then it is quite likely that
         | compression will help. Most people, on their personal machines,
         | would likely see IO performance "improve" with filesystem level
         | compression.
        
       | arichard123 wrote:
       | I'm doing something similar but running a zfs pool off a usb dock
       | and using zfs snapshot instead of hardlinks. Usb is slow but it's
       | still faster than my network, so not the bottleneck.
        
       | alchemist1e9 wrote:
       | Let's not forget zbackup. Excellent useful low level tool.
        
       | proactivesvcs wrote:
       | If one uses software meant for backups, like restic, there are so
       | many advantages. Independent snapshots, deduplication,
       | compression, encryption, proper methods to verify the backup
       | integrity and forgetting snapshots according to a more structured
       | policy. Mount and read any backup by host or snapshot, multi-
       | platform, single binary and one can even run its rest-server on
       | the destination to allow for append-only backups. The importance
       | of using the right tool for the job, for something as crucial as
       | backup, cannot be understated.
        
       | aborsy wrote:
       | ZFS send receive is perfect, except there is almost no ZFS cloud
       | storage on the received side. You have to set up a ZFS server
       | offsite somewhere, like in a friend's house.
       | 
       | Restic is darn good too! It has integration with many cloud
       | storage providers.
        
       | neilv wrote:
       | You can combine this with _restricted_ SSH and server-side
       | software, so that the client being backed up to the server can
       | only add new incremental backups, not delete old ones.
       | 
       | (So, less data loss, in event of a malicious intruder on the
       | client, or some very broken code on the client that gets ahold of
       | the SSH private key.)
        
       | pjdesno wrote:
       | I've got a solution that I've used to back up machines for my
       | group, but never did the last 10% to make it something plug-and-
       | play for other folks: https://github.com/pjd-nu/s3-backup
       | 
       | Full and incremental backups of a directory tree to S3 objects,
       | one per backup, and access to existing backups via FUSE mount.
       | With a bit more scripting (mostly automount) and maybe shifting
       | some cached data from RAM to the local file system it should be
       | fairly comparable to Apple Time Machine - not designed to restore
       | your disk as much as to be able to access its contents at
       | different points in time.
       | 
       | If you're interested in it, feel free to drop me a note - my
       | email is in my Github profile I think.
        
       | LelouBil wrote:
       | Speaking about backups, I recently set up a back up process for
       | my home server including a recovery plan, and that makes me sleep
       | better at night !
       | 
       | I have Duplicati [0] that does a backup of the data of my many
       | self hosted applications Every day, encrypted and stored in a
       | folder on the server itself.
       | 
       | Only the password manager backup is not encrypted by Duplicati,
       | because it's encrypted using my master password, and it stores
       | all the encryption keys of the other backups.
       | 
       | Then, I have a systemd service to run rclone [1] every day after
       | the backups finished to sync the backup folder towards :
       | 
       | - Backblaze B2
       | 
       | - AWS S3 Glacier Deep Archive
       | 
       | For now I only use the free tier of B2 as I have less than a GB
       | to backup, but that's because I haven't installed next cloud yet
       | !
       | 
       | However, I still like using S3 because I am paying for it (even
       | though deep Archive is very cheap) and I'm pretty sure if
       | something happens with my account, the fact that I'm a paying
       | customer will prevent AWS from unilaterally removing my data (I
       | have seen posts about google accounts being closed without any
       | recourse, I hope I'm protected of that with AWS)
       | 
       | Right now I only have CalDav/CardDav, my password manager and my
       | configs being backed up, but I plan to use Syncthing to also
       | backup other devices towards the home server, to fit inside what
       | I already configured.
       | 
       | If anyone has advice on what I did/did not do/could have done
       | better please tell me !
       | 
       | [0] https://www.duplicati.com/
       | 
       | [1] https://rclone.org/
        
       | UI_at_80x24 wrote:
       | ZFS snapshots + send/receive are an absolute game changer in this
       | regard.
       | 
       | I have my /home in a separate dataset that gets snapshotted every
       | 30 minutes. The snapshots are sent to my primary file-server, and
       | can be picked up by any system on my network. I do a variation of
       | this with my dotfiles similar to STOW but with quicker snapshots.
        
         | customizable wrote:
         | ZFS is a game changer for quickly and reliably backing up large
         | multi-terabyte PostgreSQL databases as well. In case anyone is
         | interested, here is our experience with PostgreSQL on ZFS,
         | complete with a short backup script:
         | https://lackofimagination.org/2022/04/our-experience-with-po...
        
         | GekkePrutser wrote:
         | Zfs send/receive is nice but it does lack the toolchain to
         | easily extract individual files from a backup. It's more of a
         | disaster recovery thing in terms of backup.
        
           | customizable wrote:
           | You can actually extract individual files from a snapshot by
           | using the hidden .zfs directory like: /mnt-
           | point/.zfs/snapshot/snapshot-name
           | 
           | Another alternative is to create a clone from a snapshot,
           | which also makes the data writable.
        
         | pmarreck wrote:
         | Came here to say this. Can you list your example commands for
         | snapshotting, zfs send, restoring single files or entire
         | snapshots, etc.? (Have you tested it out?) I am actually in the
         | position of doing this (I use zfs on root as of recently and I
         | have a TrueNAS) but am stuck at the bootstrapping problem (I
         | haven't taken a single snapshot yet; presumably the first one
         | is the only big one? and then how do I send incremental
         | snapshots? and then how do I restore these to, say, a new
         | machine? do I remotely mount a snapshot somehow, or zfs recv,
         | or? Do you set up systemd/cron jobs for this?) Also, having
         | auto-snapshotted on Ubuntu in the past, eventually things
         | slowed to a crawl every time I did an apt update... Is this
         | avoidable?
        
           | customizable wrote:
           | Yes, the first snapshot is the big one, the rest are
           | incremental. Restoring a snapshot is just one line really.
           | Something like ;)
           | 
           | sudo zfs send -cRi db/data@2022-12-08T00-00
           | db/data@2022-12-09T00-00 | ssh me@backup-server "sudo zfs
           | receive -vF db/data"
        
       | Whatarethese wrote:
       | I use rsync to backup my iCloud Photos from my local server to a
       | NAS at my parents house. Works great.
        
       | armoredkitten wrote:
       | For anyone using btrfs on their system, I heartily recommend
       | btrbk, which has served me very well for making incremental
       | backups with a customizable retention period:
       | https://github.com/digint/btrbk
        
         | nickersonm wrote:
         | I highly recommend this as well, although I just use it for
         | managing snapshots on my NAS.
         | 
         | For backup I use hourly & daily kopia backups that are then
         | rcloned to an external drive and Backblaze.
        
       | dawnerd wrote:
       | I've been using borg + rsync to a google drive and s3. Works
       | great. Used it a few weeks ago for recovery and it went smoothly.
        
       | yehia2amer wrote:
       | Did anyone tried https://kopia.io/docs/features/
       | 
       | It is Awesome !
       | 
       | It's very fast usually I struggle with backup tools on windows
       | clients. And it ticks all my needs. deduplication, End-to-End
       | Encryption, incremental Snapshots with error Correction if any,
       | mounting snapshots as a drive and using it normally or to restore
       | specific files/folders, Caching. The only thing that could be
       | better is the GUI but it works.
        
         | mekster wrote:
         | Backup tools are nothing until it can prove its reliability
         | which can only be proved with many years of usage.
         | 
         | In that regard, I don't trust anything but Borg and zfs.
        
           | yehia2amer wrote:
           | zfs is not an option with windows clients and even most linux
           | clients. Also finding these set of features is really scarce
           | not sure why ! I am using zfs on my server though!
        
       | falcolas wrote:
       | So, quick trick with rsync that means you don't have to copy
       | everything and then hardlink:                   --link-dest=DIR
       | hardlink to files in DIR when unchanged
       | 
       | Basically, you list your previous backup dir as the link-dest
       | directory, and if the file hasn't changed, it will be hardlinked
       | from the previous directory into the current directory. Pretty
       | nice for creating time-machine style backups with one command and
       | no SSH.
       | 
       | Also works a treat with incremental logical backups of databases.
        
         | amelius wrote:
         | This is good to know, I used an extra "cp -rl" step in my
         | previous scripts.
        
           | falcolas wrote:
           | One thing of note - the file is not transferred, so backups
           | happen faster and consume less bandwidth (important if your
           | target is not network-local to you).
        
           | rsync wrote:
           | Yes - they accomplish the same thing.
           | 
           | --link-dest is just an elegant, built-in way to create
           | "hardlink snapshots" the same way that 'cp -al' always did.
           | 
           | But note:
           | 
           | A changed file - even the smallest of changes - breaks the
           | link and causes you to consume (size of file) more space
           | cascading through your snapshots. Depending on your file
           | sizes and change frequency this can get rather expensive.
           | 
           | We now recommend abandoning hardlink snapshots altogether and
           | doing a "dumb mirror" rsync to your rsync.net account - with
           | no retention or versioning - and letting the ZFS snapshots
           | create your retention.
           | 
           | As opposed to hardlink snapshots, ZFS snapshots diff on a
           | block level, not a file level - so you can change some blocks
           | of a file and not use (that entire file) more space. It can
           | be much more efficient, depending on file sizes.
           | 
           | The other big benefit is that ZFS snapshots are
           | immutable/read-only so if your backup source is compromised,
           | Mallory can't wipe out all of the offsite backups too.
        
             | falcolas wrote:
             | It also reduces the amount of data transferred, making the
             | backup faster.
             | 
             | > We now recommend
             | 
             | Who's we?
        
               | jwiz wrote:
               | The poster to whom you replied is affiliated with
               | rsync.net, a popular backup service.
        
       | ndsipa_pomu wrote:
       | I'm using BackupPC https://backuppc.github.io/backuppc/ to do
       | these kinds of backups. It does all the deduplication so the
       | total storage is smaller than you'd expect for multiple machines
       | with lots of identical files.
        
       | ysopex wrote:
       | https://github.com/bcpierce00/unison
       | 
       | I use this to keep a few machines synced up. Including a machine
       | that does proper daily backups.
        
       | litoE wrote:
       | All my backups go, via rsync, to a dedicated backup server
       | running Linux with a large hard disk. But I still lose sleep:
       | what if someone hacks into my home network and encrypts the file
       | systems, including the backup server? Other than taking the
       | backup server offline, I don't see how I can protect myself from
       | a full-blown intrusion. Any ideas?
        
         | jerezzprime wrote:
         | What about more copies? Have a copy or two in cloud storage,
         | across providers. This protects against other failure modes
         | too, like a house fire or theft.
        
         | greggyb wrote:
         | ZFS snapshots are immutable, rendering them quite resilient to
         | encryption attacks. This may alleviate some of your concern.
        
         | saltcured wrote:
         | There's no perfect answer, since different approaches to this
         | will introduce more complexity and inconvenience at the same
         | time they block some of these threats. You need to consider
         | which kinds of loss/disaster you are trying to mitigate. An
         | overly complex solution introduces new kinds of failure you
         | didn't have before.
         | 
         | As others mention, backup needs more than replication. You
         | recover from a ransomware attack or other data-destruction
         | event by using point-in-time recovery to restore good data that
         | was backed up prior to the event. You need a sufficient
         | retention period for older backups depending on how long it
         | might take you to recognize a data loss event and perform
         | recovery. A mere replica is useless since it does not retain
         | those older copies. With retention, your worry is how to
         | prevent the compromised machines from damaging the older time
         | points in the backup archive.
         | 
         | The traditional method was offline tape backups, so the earlier
         | time points are physically secure. They can only be destroyed
         | if someone goes to the storage and tampers with the tapes.
         | There is no way for the compromised system to automatically
         | access earlier backups. You cannot automate this because that
         | likely makes it an online archive again. A similar technique in
         | a personal setting might be backing up to removable flash
         | drives and physically rotating these to have offline drives.
         | But, the inconvenience means you lose protection if you forget
         | to perform the periodic physical rituals.
         | 
         | With the sort of rsync over ssh mechanism you are describing,
         | one way to reduce the risk a little bit is to make a highly
         | trusted and secured server and _pull_ backups from specific
         | machines instead of _pushing_. This is under the assumption
         | that your desktops and whatnot are more likely to be hacked and
         | subverted. Have a keypair on the server that is authorized to
         | connect and pull data from the more vulnerable machines. The
         | various machines do not get a key authorized to connect to the
         | server and manipulate storage. However, this depends on a
         | belief that the rsync+ssh protocol is secure against a
         | compromised peer. I'm not sure if this is really true over the
         | long term.
         | 
         | A modern approach is to try to use an object store like S3 with
         | careful setup of data retention policies and/or access
         | policies. If you can trust the operating model, you can give an
         | automated backup tool the permission to write new snapshots
         | without being allowed to delete or modify older snapshots. The
         | restic tool mentioned elsewhere has been designed with this in
         | mind. It effectively builds a content-addressable store of file
         | content (for deduplication) and snapshots as a description of
         | how to compose the contents into a full backup. Building a new
         | snapshot is adding new content objects and snapshot objects to
         | the archive. This process does not need permission to delete or
         | replace existing objects in the archive. Other management tools
         | would need higher privilege to do cleanup maintenance of the
         | archive, e.g. to delete older snapshots or garbage collect when
         | some of the archived content is no longer used by any of the
         | snapshots.
         | 
         | The new risk with these approaches like restic on s3 or some
         | ZFS snapshot archive with deduplicative storage is that the
         | tooling itself could fail and prevent you from reconstructing
         | your snapshot during recovery. It is significantly more complex
         | than a traditional file system or tape archive. But, it
         | provides a much more convenient abstraction if you can trust
         | it. A very risk-averse and resource rich operator might use
         | redundant backup methods with different architectures, so that
         | there is a backup for when their backup system fails!
        
       | europeanguy wrote:
       | This looks like work. Just get synching and stop complicating
       | your life.
        
       | greensoap wrote:
       | I recommend Backuppc for these requirements. Pooling, no sw
       | install required, uses rsync and dedupes across clients and uses
       | client side hashing to avoid sending files already in the pool.
       | 
       | https://backuppc.github.io/backuppc/index.html
        
       | ranting-moth wrote:
       | I used to do similar things until I met Borg Backup. I highly
       | recommend it.
        
       | photochemsyn wrote:
       | I've been using command-line git for data backup to a RPi over
       | SSH, once it's set up it's pretty easy to stay on top of, and
       | then everyone once in a while rsync both the local storage and
       | the RPi to separate USB drives. Also, every 3-6 months or so,
       | rsync everything to a new USB drive and set it aside so that
       | something like a system-wide ransomware attack doesn't corrupt
       | all the backups.
        
       ___________________________________________________________________
       (page generated 2022-12-09 23:00 UTC)