[HN Gopher] The various scripts I use to back up my home compute...
___________________________________________________________________
The various scripts I use to back up my home computers using SSH
and rsync
Author : tosh
Score : 138 points
Date : 2022-12-09 15:27 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| smm11 wrote:
| I gave up on this at home long ago, and just use Onedrive for
| everything. I don't even have "local" files. My stuff is there,
| and in the event my computer won't start up I lose what's open in
| the browser. I can handle that.
|
| At work I use Windows backup to write to empty SMB-mounted drives
| nightly, then write those daily to another drive on an offline
| Fedora box.
|
| My super critical files are on an encrypted SD card I sometimes
| put in my phone when cellular connection is off, and this is
| periodically backed up to Glacier. The phone (Galaxy) runs Dex
| and can be my computer when needed to work with these files.
| pmontra wrote:
| His backup rotation algorithm is very close to what rsnapshot
| does.
|
| https://rsnapshot.org/
| NelsonMinar wrote:
| I use rsnapshot still! It feels very old fashioned but it works
| reliably and is easy to understand.
| mekster wrote:
| It's good to keep multiple backups with different
| implementations local and remote.
|
| Rsnapshot is hard to break by using very basic principles of
| file system based files and hard links. If your file system
| isn't zfs, I think it's a viable backup strategy for local
| copy while you can use others to take remote backups.
| PopAlongKid wrote:
| >I don't use Windows at the moment and don't really mount network
| drives, either. That might be a good alternative to consider.
|
| Regarding Windows:
|
| I have successfully mirrored a notebook and a desktop[0] (single
| user) with Windows using _robocopy_ , which is a utility that
| comes with Windows (used to be part of the Resource Kit but I
| think it is now in the base product). When I say "mirror" I mean
| I can use either machine as my current workstation without any
| loss of data, as long as I run the "sync" script at each switch.
|
| I use "net use" to temporarily mount a few critical drives on the
| local network, then _robocopy_ does its work, it has maybe 85% of
| the same functionality of rsync (which I also used extensively
| when administering corporate servers and workstations). Back in
| the DOS days, I wrote my own very simple version of the same
| thing using C, but when _robocopy_ came along I was glad to stop
| maintaining my own effort.
|
| [0]or two desktops, using removable high-capacity media like
| Iomega zip drives.
| gary_0 wrote:
| I use MSYS2 on Windows in order to run regular rsync and other
| such utilities. It's served me very well for years. I also have
| some bash scripts that I can conveniently run on either Linux
| or Windows via MSYS2.
| EvanAnderson wrote:
| Robocopy is very nice but has no delta compression
| functionality. For things like file server migrations (where I
| want to preserve ACLs, times, etc) robocopy is my go-to tool.
|
| I've used the cwRsync[0] binary distribution of rsync on
| Windows for backups. I found it worked very well for simple
| file backups. I never did get around to trying to combine it
| with Volume Shadow Copy to make consistent backups of the
| registry and applications like Microsoft SQL Server. (I
| wouldn't expect to get a bootable restore from such a backup,
| though.)
|
| [0] https://www.itefix.net/cwrsync
| rzzzt wrote:
| I used QtdSync, another frontend backed by a Windows rsync
| binary. A nice feature was that it supported the "duplicate
| entire target folder with hard links, then overwrite changes
| only"-style on NTFS volumes, so I could have lots of
| browseable point-in-time backup folders without consuming
| extra disk space:
| https://www.qtdtools.de/page.php?tool=0&sub=1&lang=en
| wereallterrrist wrote:
| I find it very, very hard to go wrong with Syncthing (for stuff I
| truly need replicated, code/photos/text-records) and ZFS +
| znapzend + rsync.net (automatic snapshots of `/home` and
| `/var/lib` on servers).
|
| The only thing missing is -> I'd like to stop syncing code with
| Syncthing and instead build some smarter daemon. The daemon would
| take a manifest of repositories, each with a mapping of
| worktrees->branches to be actualized and fsmonitored. The daemon
| would auto-commit changes on those worktrees into a shadow branch
| and push/pull it. Ideally this could leverage (the very amazing,
| you must try it) `jj` for continous committing of the working
| copy and (in the future, with native jj formart) even handle the
| likely-never-to-happen conflict scenario. (I'd happily
| collaborate on a Rust impl and/or donate funds to one.)
|
| Given the number of worktrees I have of some huge repos (nixpkgs,
| linux, etc) it would likely mark a significant reduction in
| CPU/disk usage given what Syncthing is having to do now to
| monitor/rescan as much as I'm asking it to (given it has to dumb-
| sync .git, syncs gitignored content, etc, etc).
| hk1337 wrote:
| I use Syncthing between Mac, Windows (have included Linux in
| the mix at one point), and with my Synology NAS. Syncthing is
| more for my short term backup though. I will either commit it
| to a repo, save it to a Synology share, or delete it.
|
| *edit* my gitea server saves its backups to synology
| than3 wrote:
| I hate to be the one to point out the obvious, but replication
| isn't a backup. Its for resiliency just like RAID, the two
| aren't the same.
| whalesalad wrote:
| What is the actual difference between a backup and
| replication? If the 1's and 0's are replicated to a different
| host, is that any different than "backing up" (replicating
| them) to a piece of external media?
| jjav wrote:
| > What is the actual difference between a backup and
| replication?
|
| Simplest way to think about it is that a backup must be an
| immutable snapshot in time. Any changes and deletions which
| happen after that point in time will never reflect back
| onto the backup.
|
| That way, any files you accidentaly delete or corrupt (or
| other unwanted changes, like ransomware encrypting them for
| you) can be recovered by going back to the backup.
|
| Replication is very different, you intentionally want all
| ongoing changes to replicate to the multiple copies for
| availability. But it means that unwanted changes or data
| corruption happily replicates to all the copies so now all
| of them are corrupt. That's when you reach for the most
| recent backup.
|
| That's why you always need to backup and you'll usually
| want to replicate as well.
| chrishas35 wrote:
| When those 1s and 0s are deleted and that delete is
| replicated (or other catastrophic change, such as
| ransomware) you presumably don't have the ability to
| restore if all you're doing is replication. A strategy that
| layers replication + backup/versioning is the goal.
| natebc wrote:
| I'll add that _usually_ a backup strategy includes
| generational backups of some kind. That is daily, weekly,
| monthly, etc to hedge against individually impacted files
| as mentioned.
|
| Ideally there is also an offsite and inaccessible from
| the source component to this strategy. Usually this level
| of robustness isn't present in a "replication" setup.
| than3 wrote:
| Put more simply, backups account for and mitigate the
| common risks to data during storage while minimizing
| costs, ransomware is one of those common risks. Its
| organizational dependent based on costs and available
| budget so it varies.
|
| Long term storage usually has some form of Forward Error
| Correction (FEC) protection schemes (for bitrot), and
| often backups are segmented which may be a mix of full
| and iterative, or delta backups (to mitigate cost) with
| corresponding offline components (for ransomware
| resiliency), but that too is very dependent on the
| environment as well as the strategy being used for data
| minimization.
|
| > Usually this level of robustness isn't present in a
| "replication" setup.
|
| Exactly, and thinking about replication as a backup often
| also gives those using it a false sense of security in
| any BC/DR situations.
| NelsonMinar wrote:
| Syncthing has file versioning but I don't know for sure if
| it's suitable for backup.
| https://docs.syncthing.net/users/versioning.html
| reacharavindh wrote:
| Replication to another machine that has a COW file system
| with snapshots is backup though :-)
|
| We backup our data storage for an entire HPC cluster, about 2
| PiB of it to a single machine with a 4 disk shelves running
| ZFS with snapshots. It works very well. Simple raunchy every
| night, and snapshotted.
|
| We use the backup as a sort of Time Machine should we need
| data from the past that we deleted in the primary. Plus, we
| don't need to wait for the tapes to load or anything.. it is
| pretty fast and intuitive
| jerf wrote:
| The person you're replying to said "Syncthing ... and ZFS +
| znapzend + rsync.net" though. You're ignoring the rsync.net
| part.
|
| I have something similar; it's Nextcloud + restic to AWS S3,
| but it's the same principle. You can give people the
| convenience and human-comprehensibility of sync-based
| sharing, but also back that up too, for the best of both
| worlds. Though in my case the odds of me needing "previous
| versions" of things approach zero and a full sync is fairly
| close to backup, but, even so I do have a full solution here.
| jrm4 wrote:
| But, it makes things easy. I have e.g. a home computer, a
| server in the closet thing, a laptop and a work computer all
| with a shared Syncthing folder.
|
| So to bolster that other thing, I just have a simple bash
| script that reminds me every 7 days to make a copy of that
| folder somewhere else on that machine. It's not precise
| because I often don't know what machine I will be using, but
| that creates a natural staggering that I figure should be
| sufficient of something goes weird and lose something; like
| I'm likely to have an old copy somewhere?
| killingtime74 wrote:
| For code I just use a self hosted git server
| acranox wrote:
| Sparkleshare does something kind of similar. It uses git as the
| backend automatically sync directories on a few computers.
| https://www.sparkleshare.org/
| JeremyNT wrote:
| > Given the number of worktrees I have of some huge repos
| (nixpkgs, linux, etc) it would likely mark a significant
| reduction in CPU/disk usage given what Syncthing is having to
| do now to monitor/rescan as much as I'm asking it to (given it
| has to dumb-sync .git, syncs gitignored content, etc, etc).
|
| Are you really hitting that much of a resource utilization
| issue with syncthing though? I use it on lots of small files
| and git repos and since it uses inotify there's not really much
| of a problem. I guess the worst case is switching to very
| different branches frequently, or committing very large
| (binary?) files where it may need to transfer them twice, but
| this hasn't been a problem in my own experience.
|
| I'm not sure you could really do a whole lot better than
| syncthing by being clever, and it strikes me as a lot of effort
| to optimize for a specific workflow.
|
| Edit: actually, I wonder if you could just exclude the working
| copies with a clever exclude list in syncthing, such that you'd
| ONLY grab .git so you wouldn't even need the double
| transfer/storage. You risk losing uncommitted work I suppose.
| fncivivue7 wrote:
| Sounds like you want Borg
|
| https://borgbackup.readthedocs.io/en/stable/
|
| My two 80% full 1tb laptops and 1tb desktop backup to around
| 300-400G after dedupe and compression. Currently have around
| 12tb of backups stored in that 300G.
|
| Incremental backups run in about 5 mins even against the
| spinning disk's they're stored on.
| _dain_ wrote:
| They work together. I use syncthing to keep things
| synchronized across devices, including to an always-on
| "master" device that has more storage. Then borg runs on the
| master device to create backups.
| 0cf8612b2e1e wrote:
| Python programmer here, but I actually prefer Restic [0].
| While more or less the same experience, the huge selling
| point to me is that the backup program is a single executable
| that can be easily stored alongside the backups. I do not
| want any dependency/environment issues to assert themselves
| when restoration is required (which is most likely on a
| virgin, unconfigured system).
|
| [0] https://restic.net/
| SomeoneOnTheWeb wrote:
| You can also take a look at Kopia (https://kopia.io/).
|
| I've been using Borg, Restic and Kopia for a long time and
| Kopia is my personal favorite - very fast, very efficient,
| runs in the background automatically without having to
| schedule a CRON or anything like that.
|
| Only downside is that the backups are made of a HUGE number
| of files, so when synchronizing it can sometimes take a bit
| of time to check the ~5k files.
| klodolph wrote:
| I've been using Kopia, I recommend it.
| wanderingmind wrote:
| Highly recommend Kopia that has a nice UI and can work
| with rclone (so any cloud back end)
| codethief wrote:
| I don't think GP was talking about backups (which is what
| Borg is good for) but about _synchronization_ between
| machines which is another issue entirely.
| wereallterrrist wrote:
| No, I distinctly don't want borg. It doesn't help or solve
| anything that Syncthing doesn't do. The obsession with borg
| and bup are pretty baffling to me. We deserve better in this
| space. (see: Asuran and another who's name I forget...)
|
| Critically, I'm specifically referring to code sync that
| needs to operate at a git-level to get the huge efficiencies
| I'm thinking of.
|
| Syncthing, or borg, scanning 8 copies of the Linux kernel is
| pretty horrific compared to something doing a "git commit &&
| git push" and "git pull --rebase" in the background (over-
| simplifying the shadow-branch process here for brevity.)
|
| re: 'we deserve better' -- case in point, see Asuran -
| there's no real reason that sync and backup have to be
| distinctly different tools. Given chunking and dedupe and
| append-logs, we really, really deserve better in this tooling
| space.
| formerly_proven wrote:
| borg et al and "git commit" work in essentially the same
| way. Both scan the entire tree for changes using
| modification timestamps.
| dragonwriter wrote:
| > borg et al and "git commit" work in essentially the
| same way. Both scan the entire tree for changes using
| modification timestamps.
|
| But git commit _doesn't_ do that. If you want to do that
| in git, you typically do it before commit with "git add
| -A".
| [deleted]
| ww520 wrote:
| Yes. I just let Syncthing sync among devices, using it for
| creating copies of the backup. The daily backup scripts do
| their things and create one backup snapshot, then Syncthing
| picks up the new backup files and propagate them to multiple
| devices.
| blindriver wrote:
| I use Synology to back everything up, and then from there I use
| Hyperbackup to backup to 2 external hard drives every week. When
| the hard drives get full, I buy a new one that is larger and I
| put the old one into my closet and date it.
|
| Now that you reminded me, it might be best to buy a new larger
| hard drive if there are any pre-Christmas sales.
| kevstev wrote:
| Have you looked into backing up into the cloud? I used to do
| this way back in the day, but by using AWS I get legit offsite
| storage. Its really cheap if you use glacier, and I was
| actually looking this week, and there is now an even cheaper
| option called Deep Archive. It costs me about $2 a month to
| store my stuff there. I just back up the irreplaceable things-
| my photos, documents, etc. All the other stuff is backed up on
| TPB or github for me.
| blindriver wrote:
| I don't trust backing up to the cloud, I just do everything
| on site and hope there's nothing catastrophic!
| kkfx wrote:
| Oh, curious, It's the first backup in clojure I've seen :-)
|
| My personal recipe is less sophisticated:
|
| - znapzend on all home machines send to a homeserver regularly
| (with enough storage), partially replicated between
| desktops/laptop
|
| - homeserver backup itself via simple incremental zfs send +
| mbuffer with one snapshot per day (last 2 days), one per week
| (last 2 w) and one per month (last 1 month) offsite
|
| - manually triggered offline local backup of the homeserver on
| external USB drives and a physically mirrored home server,
| normally on weekly basis
|
| Nothing more, nothing less. On any major NixOS release update I
| rebuild one homeserver and a month or so later the second one.
| Desktops and homeserver custom iso are built automatically every
| Sunday and just left there (I know, it simply took to much time
| checking so...).
|
| Essentially in case of a fault of a machine I still have data,
| config and ready iso for a quick reinstall. In case of logical
| faults (like a direct attack who compromise my data AND zfs
| itself) there is not much protection beside different sync times
| (I do NOT use all desktops/latptop at once, when they are powered
| off they remain behind and I have normally plenty of time to see
| most casual potential attacks.
|
| Long story short for anyone: when you talk about backups talk
| about how you restore, or your backups will probably be just
| useless bits a day...
| e1g wrote:
| Recent versions of rsync support zstd compression, which can
| improve speed and reduce the load on both sides. You can check if
| your rsync supports that with "rsync -h | grep zstd" and instruct
| to use it with "-z --zc=zstd"
|
| However, compression is useful in proportion to how crappy the
| network is and how compressable the content is (e.g., text
| files). This repo is about backing up user files to an external
| SSD with high bandwidth and low latency, and applying compression
| likely makes the process slower.
| greggyb wrote:
| Compression is useful even with directly attached storage
| devices. Disk IO is still slower than compression throughput
| unless you are running very fast storage.
|
| If your workload is IO-bound, then it is quite likely that
| compression will help. Most people, on their personal machines,
| would likely see IO performance "improve" with filesystem level
| compression.
| arichard123 wrote:
| I'm doing something similar but running a zfs pool off a usb dock
| and using zfs snapshot instead of hardlinks. Usb is slow but it's
| still faster than my network, so not the bottleneck.
| alchemist1e9 wrote:
| Let's not forget zbackup. Excellent useful low level tool.
| proactivesvcs wrote:
| If one uses software meant for backups, like restic, there are so
| many advantages. Independent snapshots, deduplication,
| compression, encryption, proper methods to verify the backup
| integrity and forgetting snapshots according to a more structured
| policy. Mount and read any backup by host or snapshot, multi-
| platform, single binary and one can even run its rest-server on
| the destination to allow for append-only backups. The importance
| of using the right tool for the job, for something as crucial as
| backup, cannot be understated.
| aborsy wrote:
| ZFS send receive is perfect, except there is almost no ZFS cloud
| storage on the received side. You have to set up a ZFS server
| offsite somewhere, like in a friend's house.
|
| Restic is darn good too! It has integration with many cloud
| storage providers.
| neilv wrote:
| You can combine this with _restricted_ SSH and server-side
| software, so that the client being backed up to the server can
| only add new incremental backups, not delete old ones.
|
| (So, less data loss, in event of a malicious intruder on the
| client, or some very broken code on the client that gets ahold of
| the SSH private key.)
| pjdesno wrote:
| I've got a solution that I've used to back up machines for my
| group, but never did the last 10% to make it something plug-and-
| play for other folks: https://github.com/pjd-nu/s3-backup
|
| Full and incremental backups of a directory tree to S3 objects,
| one per backup, and access to existing backups via FUSE mount.
| With a bit more scripting (mostly automount) and maybe shifting
| some cached data from RAM to the local file system it should be
| fairly comparable to Apple Time Machine - not designed to restore
| your disk as much as to be able to access its contents at
| different points in time.
|
| If you're interested in it, feel free to drop me a note - my
| email is in my Github profile I think.
| LelouBil wrote:
| Speaking about backups, I recently set up a back up process for
| my home server including a recovery plan, and that makes me sleep
| better at night !
|
| I have Duplicati [0] that does a backup of the data of my many
| self hosted applications Every day, encrypted and stored in a
| folder on the server itself.
|
| Only the password manager backup is not encrypted by Duplicati,
| because it's encrypted using my master password, and it stores
| all the encryption keys of the other backups.
|
| Then, I have a systemd service to run rclone [1] every day after
| the backups finished to sync the backup folder towards :
|
| - Backblaze B2
|
| - AWS S3 Glacier Deep Archive
|
| For now I only use the free tier of B2 as I have less than a GB
| to backup, but that's because I haven't installed next cloud yet
| !
|
| However, I still like using S3 because I am paying for it (even
| though deep Archive is very cheap) and I'm pretty sure if
| something happens with my account, the fact that I'm a paying
| customer will prevent AWS from unilaterally removing my data (I
| have seen posts about google accounts being closed without any
| recourse, I hope I'm protected of that with AWS)
|
| Right now I only have CalDav/CardDav, my password manager and my
| configs being backed up, but I plan to use Syncthing to also
| backup other devices towards the home server, to fit inside what
| I already configured.
|
| If anyone has advice on what I did/did not do/could have done
| better please tell me !
|
| [0] https://www.duplicati.com/
|
| [1] https://rclone.org/
| UI_at_80x24 wrote:
| ZFS snapshots + send/receive are an absolute game changer in this
| regard.
|
| I have my /home in a separate dataset that gets snapshotted every
| 30 minutes. The snapshots are sent to my primary file-server, and
| can be picked up by any system on my network. I do a variation of
| this with my dotfiles similar to STOW but with quicker snapshots.
| customizable wrote:
| ZFS is a game changer for quickly and reliably backing up large
| multi-terabyte PostgreSQL databases as well. In case anyone is
| interested, here is our experience with PostgreSQL on ZFS,
| complete with a short backup script:
| https://lackofimagination.org/2022/04/our-experience-with-po...
| GekkePrutser wrote:
| Zfs send/receive is nice but it does lack the toolchain to
| easily extract individual files from a backup. It's more of a
| disaster recovery thing in terms of backup.
| customizable wrote:
| You can actually extract individual files from a snapshot by
| using the hidden .zfs directory like: /mnt-
| point/.zfs/snapshot/snapshot-name
|
| Another alternative is to create a clone from a snapshot,
| which also makes the data writable.
| pmarreck wrote:
| Came here to say this. Can you list your example commands for
| snapshotting, zfs send, restoring single files or entire
| snapshots, etc.? (Have you tested it out?) I am actually in the
| position of doing this (I use zfs on root as of recently and I
| have a TrueNAS) but am stuck at the bootstrapping problem (I
| haven't taken a single snapshot yet; presumably the first one
| is the only big one? and then how do I send incremental
| snapshots? and then how do I restore these to, say, a new
| machine? do I remotely mount a snapshot somehow, or zfs recv,
| or? Do you set up systemd/cron jobs for this?) Also, having
| auto-snapshotted on Ubuntu in the past, eventually things
| slowed to a crawl every time I did an apt update... Is this
| avoidable?
| customizable wrote:
| Yes, the first snapshot is the big one, the rest are
| incremental. Restoring a snapshot is just one line really.
| Something like ;)
|
| sudo zfs send -cRi db/data@2022-12-08T00-00
| db/data@2022-12-09T00-00 | ssh me@backup-server "sudo zfs
| receive -vF db/data"
| Whatarethese wrote:
| I use rsync to backup my iCloud Photos from my local server to a
| NAS at my parents house. Works great.
| armoredkitten wrote:
| For anyone using btrfs on their system, I heartily recommend
| btrbk, which has served me very well for making incremental
| backups with a customizable retention period:
| https://github.com/digint/btrbk
| nickersonm wrote:
| I highly recommend this as well, although I just use it for
| managing snapshots on my NAS.
|
| For backup I use hourly & daily kopia backups that are then
| rcloned to an external drive and Backblaze.
| dawnerd wrote:
| I've been using borg + rsync to a google drive and s3. Works
| great. Used it a few weeks ago for recovery and it went smoothly.
| yehia2amer wrote:
| Did anyone tried https://kopia.io/docs/features/
|
| It is Awesome !
|
| It's very fast usually I struggle with backup tools on windows
| clients. And it ticks all my needs. deduplication, End-to-End
| Encryption, incremental Snapshots with error Correction if any,
| mounting snapshots as a drive and using it normally or to restore
| specific files/folders, Caching. The only thing that could be
| better is the GUI but it works.
| mekster wrote:
| Backup tools are nothing until it can prove its reliability
| which can only be proved with many years of usage.
|
| In that regard, I don't trust anything but Borg and zfs.
| yehia2amer wrote:
| zfs is not an option with windows clients and even most linux
| clients. Also finding these set of features is really scarce
| not sure why ! I am using zfs on my server though!
| falcolas wrote:
| So, quick trick with rsync that means you don't have to copy
| everything and then hardlink: --link-dest=DIR
| hardlink to files in DIR when unchanged
|
| Basically, you list your previous backup dir as the link-dest
| directory, and if the file hasn't changed, it will be hardlinked
| from the previous directory into the current directory. Pretty
| nice for creating time-machine style backups with one command and
| no SSH.
|
| Also works a treat with incremental logical backups of databases.
| amelius wrote:
| This is good to know, I used an extra "cp -rl" step in my
| previous scripts.
| falcolas wrote:
| One thing of note - the file is not transferred, so backups
| happen faster and consume less bandwidth (important if your
| target is not network-local to you).
| rsync wrote:
| Yes - they accomplish the same thing.
|
| --link-dest is just an elegant, built-in way to create
| "hardlink snapshots" the same way that 'cp -al' always did.
|
| But note:
|
| A changed file - even the smallest of changes - breaks the
| link and causes you to consume (size of file) more space
| cascading through your snapshots. Depending on your file
| sizes and change frequency this can get rather expensive.
|
| We now recommend abandoning hardlink snapshots altogether and
| doing a "dumb mirror" rsync to your rsync.net account - with
| no retention or versioning - and letting the ZFS snapshots
| create your retention.
|
| As opposed to hardlink snapshots, ZFS snapshots diff on a
| block level, not a file level - so you can change some blocks
| of a file and not use (that entire file) more space. It can
| be much more efficient, depending on file sizes.
|
| The other big benefit is that ZFS snapshots are
| immutable/read-only so if your backup source is compromised,
| Mallory can't wipe out all of the offsite backups too.
| falcolas wrote:
| It also reduces the amount of data transferred, making the
| backup faster.
|
| > We now recommend
|
| Who's we?
| jwiz wrote:
| The poster to whom you replied is affiliated with
| rsync.net, a popular backup service.
| ndsipa_pomu wrote:
| I'm using BackupPC https://backuppc.github.io/backuppc/ to do
| these kinds of backups. It does all the deduplication so the
| total storage is smaller than you'd expect for multiple machines
| with lots of identical files.
| ysopex wrote:
| https://github.com/bcpierce00/unison
|
| I use this to keep a few machines synced up. Including a machine
| that does proper daily backups.
| litoE wrote:
| All my backups go, via rsync, to a dedicated backup server
| running Linux with a large hard disk. But I still lose sleep:
| what if someone hacks into my home network and encrypts the file
| systems, including the backup server? Other than taking the
| backup server offline, I don't see how I can protect myself from
| a full-blown intrusion. Any ideas?
| jerezzprime wrote:
| What about more copies? Have a copy or two in cloud storage,
| across providers. This protects against other failure modes
| too, like a house fire or theft.
| greggyb wrote:
| ZFS snapshots are immutable, rendering them quite resilient to
| encryption attacks. This may alleviate some of your concern.
| saltcured wrote:
| There's no perfect answer, since different approaches to this
| will introduce more complexity and inconvenience at the same
| time they block some of these threats. You need to consider
| which kinds of loss/disaster you are trying to mitigate. An
| overly complex solution introduces new kinds of failure you
| didn't have before.
|
| As others mention, backup needs more than replication. You
| recover from a ransomware attack or other data-destruction
| event by using point-in-time recovery to restore good data that
| was backed up prior to the event. You need a sufficient
| retention period for older backups depending on how long it
| might take you to recognize a data loss event and perform
| recovery. A mere replica is useless since it does not retain
| those older copies. With retention, your worry is how to
| prevent the compromised machines from damaging the older time
| points in the backup archive.
|
| The traditional method was offline tape backups, so the earlier
| time points are physically secure. They can only be destroyed
| if someone goes to the storage and tampers with the tapes.
| There is no way for the compromised system to automatically
| access earlier backups. You cannot automate this because that
| likely makes it an online archive again. A similar technique in
| a personal setting might be backing up to removable flash
| drives and physically rotating these to have offline drives.
| But, the inconvenience means you lose protection if you forget
| to perform the periodic physical rituals.
|
| With the sort of rsync over ssh mechanism you are describing,
| one way to reduce the risk a little bit is to make a highly
| trusted and secured server and _pull_ backups from specific
| machines instead of _pushing_. This is under the assumption
| that your desktops and whatnot are more likely to be hacked and
| subverted. Have a keypair on the server that is authorized to
| connect and pull data from the more vulnerable machines. The
| various machines do not get a key authorized to connect to the
| server and manipulate storage. However, this depends on a
| belief that the rsync+ssh protocol is secure against a
| compromised peer. I'm not sure if this is really true over the
| long term.
|
| A modern approach is to try to use an object store like S3 with
| careful setup of data retention policies and/or access
| policies. If you can trust the operating model, you can give an
| automated backup tool the permission to write new snapshots
| without being allowed to delete or modify older snapshots. The
| restic tool mentioned elsewhere has been designed with this in
| mind. It effectively builds a content-addressable store of file
| content (for deduplication) and snapshots as a description of
| how to compose the contents into a full backup. Building a new
| snapshot is adding new content objects and snapshot objects to
| the archive. This process does not need permission to delete or
| replace existing objects in the archive. Other management tools
| would need higher privilege to do cleanup maintenance of the
| archive, e.g. to delete older snapshots or garbage collect when
| some of the archived content is no longer used by any of the
| snapshots.
|
| The new risk with these approaches like restic on s3 or some
| ZFS snapshot archive with deduplicative storage is that the
| tooling itself could fail and prevent you from reconstructing
| your snapshot during recovery. It is significantly more complex
| than a traditional file system or tape archive. But, it
| provides a much more convenient abstraction if you can trust
| it. A very risk-averse and resource rich operator might use
| redundant backup methods with different architectures, so that
| there is a backup for when their backup system fails!
| europeanguy wrote:
| This looks like work. Just get synching and stop complicating
| your life.
| greensoap wrote:
| I recommend Backuppc for these requirements. Pooling, no sw
| install required, uses rsync and dedupes across clients and uses
| client side hashing to avoid sending files already in the pool.
|
| https://backuppc.github.io/backuppc/index.html
| ranting-moth wrote:
| I used to do similar things until I met Borg Backup. I highly
| recommend it.
| photochemsyn wrote:
| I've been using command-line git for data backup to a RPi over
| SSH, once it's set up it's pretty easy to stay on top of, and
| then everyone once in a while rsync both the local storage and
| the RPi to separate USB drives. Also, every 3-6 months or so,
| rsync everything to a new USB drive and set it aside so that
| something like a system-wide ransomware attack doesn't corrupt
| all the backups.
___________________________________________________________________
(page generated 2022-12-09 23:00 UTC)