[HN Gopher] An upside-down backup strategy
___________________________________________________________________
An upside-down backup strategy
Author : IvyMike
Score : 89 points
Date : 2022-09-27 17:47 UTC (1 days ago)
(HTM) web link (ivymike.dev)
(TXT) w3m dump (ivymike.dev)
| nicoburns wrote:
| > I still want backups. But instead of "backing up files in the
| cloud", I back them up locally, by redownloading them to a local
| archival drive. This is upside-down of most people's backup
| strategy, but it really is quite nice once you get it set up.
|
| Is it? I thought that was what everyone did.
| outworlder wrote:
| 'Everyone' normally syncs locally, then uploads to the cloud,
| as 'offsite' storage.
| ramses0 wrote:
| I'm in a slightly similar place to the author, but would love
| some other backup nerds to throw in some advice.
|
| I'm aware of the traditional "requirements" for backup (eg:
| attribute/metadata, cross platform, restore testing, etc), but in
| modern usage, I've discovered a split (and lean towards "cloud
| first").
|
| It boils down to "active" files v. "passive" data. I found this
| when testing rclone around preserving executable bits and trying
| to back up "active" (`chmod +x`) git working directories. It was
| a mess as all shell scripts lost permissions, and especially
| frustrating because backing up Git working directories is like
| kindof useless, but also very useful for those times when
| something goes wrong.
|
| However, when backing up my MP3 rips or family videos (or large
| video editing projects), I'm really just needing passive blobs,
| not bit+attribute duplicate restores, especially when the local
| computer is effectively a moderately thin client compared to an
| exact duplicate of all possible data that I've backed up to a
| cloud provider (eg: rsync.net in my case).
|
| Also there are tiers of cost and accessibility that aren't
| necessarily managed well with most modern backup software.
|
| I've settled on rsync.net for bulk "warm" access, and syncthing
| with a few local raspberry pi's, and all local computers/phones
| as kind of a "hot" replacement for "Documents" directory.
|
| I had to do a moderately complete backup with "restic" from one
| local HD to another on my desktop and that worked well for being
| able to mount snapshots and pull files out, but it all feels like
| a lot of overhead compared to what I'm really shooting for.
|
| Further problems are the general unreliability of fuse/sshfs-
| mounted filesystems, as "slow" file access for some things is
| quite acceptable, but unreliable or hanging on individual file
| access gets real old real quick.
|
| So it boils down towards the far away end of the backup "cone"
| being S3-glacier-ish (cheap-ish), and mostly blob data with no
| extended attributes needed. Mid-tier is rsync/warm, mostly blob-
| ish shared between multiple systems, along with some kinds of
| per-device system restore capabilities (or home dir restore
| capabilities), and near/hot is Syncthing, and a local NAS of
| everything if possible. Modulo local device/hardware management,
| and attempting to reduce any administrative overhead, especially
| for casual/non-primary users of the same system.
|
| It doesn't feel like nirvana, am I missing any gaps in thought
| here? Anyone else have good experiences or suggestions on
| something somewhat comprehensive between "do nothing, let the
| cloud sort it out" and "be prepared, have your own local and
| remote backup/transfer/restore processes"?
| bombcar wrote:
| One thing I'd add is make sure you're snapshotting via ZFS or
| something similar, because the most likely cause of dataloss is
| accidental deletion, and if your synced copies are perfect
| replicas, they'll replicate the deletion, too.
| rsync wrote:
| If I could speak directly to the op I would recommend flipping
| back to a "rightside up" model and doing dumb, mirror, 1:1
| backups to rsync.net and then configuring a day/week/month
| schedule of ZFS snapshots on that end.
|
| Those ZFS snapshots are immutable / read-only so they not only
| serve as retention, they protect against Mallory (and
| ransomware and malware, etc.)
| mikessoft_gmail wrote:
| secabeen wrote:
| The most careful of my cloud-first users maintain local
| snapshotted NAS copies of their cloud data.
| mikessoft_gmail wrote:
| Eun wrote:
| For Google I just use Googles Takeout[1] on a regular basis.
|
| [1]: https://takeout.google.com/settings/takeout
| ComodoHacker wrote:
| But do you verify your backups?
| ahupp wrote:
| Note that the Google Photos API which rclone uses for gphotos
| sources does not allow downloading original quality images (only
| re-compressed versions), and strips EXIF. So I personally don't
| use it for backup purposes, instead I:
|
| 1) Sync my Google Drive to a local Synology NAS
|
| 2) Periodically request a Google Takeout, which dumps the entire
| contents of my Google account (including original photos) into
| Drive.
|
| 3) A nightly script unzips the takeout archives so they can be
| picked up by an incremental backup to Backblaze B2. The NAS also
| does local snapshots.
|
| This has some nice properties:
|
| - There are 3 distinct copies
|
| - If a file is inadvertently deleted I can restore from either
| local snapshot or B2.
|
| - If my google account is nuked I still have an archive of mail,
| photos, etc. Though that's limited to the frequency of my Takeout
| requests.
|
| - Since the NAS is the single point for backups it's a natural
| place to put non-cloud files, or Dropbox etc and have them
| automatically picked up for backup without setting up anything
| new.
| anderspitman wrote:
| You can get full quality photos of you're a paying Google
| customer right?
| ahupp wrote:
| It's an API limitation afaik, not tied to whether you pay.
| marwis wrote:
| Sadly, photos takeout breaks if you have too many files. Then
| you have to setup multiple partial takeouts (each with some
| subset of albums) which is a huge pain.
|
| I used to use OneDrive as a 2nd backup but Microsoft started
| doing same crap (removal of EXIF).
| alphabettsy wrote:
| I prefer local-first because I want versioned backups and that
| seems harder with cloud-first.
|
| I've had complete data loss but accidentally deleting something
| is more common.
|
| Currently using both Backblaze and Arq Backup for Mac/Windows,
| Restic on Linux.
| proactivesvcs wrote:
| Synchronisation is not backup. Almost every restore that I've
| performed for myself and others resulted from accidental
| deletion. Sync propogates accidental deletion. It also doesn't
| make your data resilient to malware, bit rot or removal of data
| for TOS violation.
| m463 wrote:
| > Synchronization is not backup.
|
| also RAID is not backup. :)
|
| also related: https://www.jwz.org/doc/backups.html
| zeagle wrote:
| Agree. It can be but you need to insert version control in
| there to protect from this. Either as a software package,
| iterative backups, or the file system.
| GordonS wrote:
| > Sync propogates accidental deletion
|
| This is true, but it's so worth noting some sync products (e.g.
| Seafile) can keep a history of changes, allowing you to restore
| deleted files.
|
| Sync is not backup though, as you allude to.
| jermaustin1 wrote:
| This is what I do. I sync to my NAS, my NAS syncs to my
| Backblaze B2 with file versioning. My current monthly bill is
| only $4... I cannot believe how cheap Backblaze has been.
|
| I can even get the B2 URL and share a file anytime I want
| from inside my NAS.
|
| Is it as easy as Dropbox? No, but I haven't missed using
| dropbox. I just mount my NAS to my RDP sessions when I need
| files "synced" between my various computers.
| Groxx wrote:
| I've been eyeballing this kind of setup lately, because
| Dropbox has become so hostile to simple use. Hours of delay
| between "changed small text file" and "synced", endless
| "bugs" that push things to cloud-first storage rather than
| the synced folders I have set up, and using more and more
| resources with every upgrade. I'm so sick of their
| business-oriented shift.
|
| </rant>
|
| Got a favorite NAS that works well? Without the risk of the
| vendor's cloud deciding to delete your data? I haven't yet
| picked one, beyond "what if I just used a Pi and some USB
| drives..."
| anderspitman wrote:
| Check out syncthing an rclone if you're not already aware
| of them.
| jermaustin1 wrote:
| I use Synology. Its not the best, but it Just Works and
| got the "Shell" for only $120, plopped in 2 8TB spinners,
| and haven't really touched it other than moving house 3
| times.
|
| If I was going to do it over again, I would get one of
| their commercial grade NASes/ They are a bit more
| expensive but have a decent CPU and RAM in them and can
| run VMs/containers/etc and be your own little cloud.
|
| They also have some cool features like, the ability to
| link 2 remote Synology NASes together and sync back and
| forth as well as backing up to any backup service you
| want.
| outworlder wrote:
| Not the person you are asking, but I got some leftover
| computer parts that I had been amassing for years, bought
| a few drives and added TrueNAS to it.
|
| It can do anything the big consumer NAS can, and a lot
| more. If it fails, it uses off the shelf components.
|
| The only thing that could have been done better is ECC
| memory (for some people that's a dealbreaker). In which
| case, one can get a server from ebay for a pittance which
| will fit the bill.
| MikusR wrote:
| Syncthing (and probably other sync utilities) allow versioning
| and keeping deleted files in separate directory.
| proactivesvcs wrote:
| Yup, which are great for noticing you deleted a file
| accidentally, before you went to lunch. But for anything more
| than very simple, recent restoration, versioning is not a
| backup. It (can/does) mangle file names, metadata, folder
| structure and can be a nightmare to perform larger restores
| with. Versions are a convenience.
| [deleted]
| Jeff_Brown wrote:
| I use Borg and Rclone together, to cover both aspects. One
| unexpectedly nice thing about that is a Borg repo consists of
| far fewer files than the thing it backs up, so Google is
| unlikely to rate limit you as described in the article.
| schainks wrote:
| Is there any documentation on how you did this? I'm very
| interested ^_^
| proactivesvcs wrote:
| Agreed - using a sync service can help bring in data to be
| backed up, or make copies of a backup. The advantage you
| mentioned is one of many that we get from using a backup
| system for backups.
| teddyh wrote:
| "Just as we respect and care for our ancestors, so we must
| respect and care for our old backups, for one day they may
| achieve great glory."
|
| -- http://www.taobackup.com/history.html
| proactivesvcs wrote:
| LOL! Sage and hilarity in one small story. Reading through
| the rest now :-)
| layer8 wrote:
| It is if it supports history/versioning.
| rsync wrote:
| If you'd like to start using rclone, and you should, this is a
| high quality _and general_ howto:
|
| https://rsync.net/resources/howto/rclone.html
|
| The example is S3 <--> rsync.net (naturally) but the step by step
| instructions are applicable to setting up any combination of
| "remotes".
|
| rclone is like youtube-dl - a powerful tool that seems almost
| magical.
| gregsadetsky wrote:
| I would love to be able to cron rclone (e.g. have it run
| automatically) on rsync.net :-)
|
| - a happy rsync.net customer
| rsync wrote:
| We can do this. Just email.
| caseyohara wrote:
| rclone is indeed magical. I recently migrated a customer's
| business from Box to Dropbox, using rsync.net as an
| intermediary to save my own bandwidth. rclone and rsync.net
| work very well together.
| rsync wrote:
| Can you email info@ ?
|
| I would love to write down your recipes/commands and put them
| up so others can see them - box -> rsync.net -> dropbox is an
| interesting workflow ...
| caseyohara wrote:
| Absolutely.
| akerl_ wrote:
| I have something akin to this. For the vast majority of my data,
| a cloud platform is the authoritative storage: Gmail, Google
| Photos, GitHub, etc. I have a local NAS that does streaming
| backups of those systems to give me a persistent local copy, and
| then the NAS also backs up the totality of its local backups to
| an S3 bucket whose policy enforces deletion protection.
| wahnfrieden wrote:
| Including overwrite protection? No admin access to those
| policies from the server?
| akerl_ wrote:
| The NAS's user has ListObject, PutObject, and DeleteObject.
| The bucket has versioning enabled, and DeleteObject doesn't
| allow deleting prior versions. So the NAS can delete what's
| immediately visible in the bucket, but it can't permanently
| delete things.
|
| The other way to set this up is to configure Object Lock on
| your S3 bucket: https://docs.aws.amazon.com/AmazonS3/latest/u
| serguide/object...
|
| The upside of versioning over Object Lock, for my use case,
| is that the backup scripts can be very simple, because they
| don't have to deal with what happens if they want to clean up
| a file but don't have permissions to. They just do their
| thing, and I'm confident that old versions are retained. The
| downside of this approach is that my S3 usage will increase
| over time, because I'm retaining all old content. So
| eventually it'll cost enough for me to decide to either
| switch to Object Lock or figure out a safe way to prune old
| content.
| TOGoS wrote:
| Another solution: stuff everything that fits into Git repos,
| fetch --all from each of your different computers regularly.
|
| For things that don't fit nicely into Git (e.g. large collections
| of large binary files), I use a homegrown git-like system for
| storing filesystem snapshots:
| http://github.com/TOGoS/ContentCouch
| Proven wrote:
| zeagle wrote:
| It's a good way of doing it with near unlimited bandwidth and if
| a machine or drive dies you just replace it with no data loss. I
| save everything important encrypted on my vps to seafile and my
| Nas pills back night backups back home. Software/family just see
| it as network drive.
___________________________________________________________________
(page generated 2022-09-28 23:01 UTC)