[HN Gopher] An upside-down backup strategy
       ___________________________________________________________________
        
       An upside-down backup strategy
        
       Author : IvyMike
       Score  : 89 points
       Date   : 2022-09-27 17:47 UTC (1 days ago)
        
 (HTM) web link (ivymike.dev)
 (TXT) w3m dump (ivymike.dev)
        
       | nicoburns wrote:
       | > I still want backups. But instead of "backing up files in the
       | cloud", I back them up locally, by redownloading them to a local
       | archival drive. This is upside-down of most people's backup
       | strategy, but it really is quite nice once you get it set up.
       | 
       | Is it? I thought that was what everyone did.
        
         | outworlder wrote:
         | 'Everyone' normally syncs locally, then uploads to the cloud,
         | as 'offsite' storage.
        
       | ramses0 wrote:
       | I'm in a slightly similar place to the author, but would love
       | some other backup nerds to throw in some advice.
       | 
       | I'm aware of the traditional "requirements" for backup (eg:
       | attribute/metadata, cross platform, restore testing, etc), but in
       | modern usage, I've discovered a split (and lean towards "cloud
       | first").
       | 
       | It boils down to "active" files v. "passive" data. I found this
       | when testing rclone around preserving executable bits and trying
       | to back up "active" (`chmod +x`) git working directories. It was
       | a mess as all shell scripts lost permissions, and especially
       | frustrating because backing up Git working directories is like
       | kindof useless, but also very useful for those times when
       | something goes wrong.
       | 
       | However, when backing up my MP3 rips or family videos (or large
       | video editing projects), I'm really just needing passive blobs,
       | not bit+attribute duplicate restores, especially when the local
       | computer is effectively a moderately thin client compared to an
       | exact duplicate of all possible data that I've backed up to a
       | cloud provider (eg: rsync.net in my case).
       | 
       | Also there are tiers of cost and accessibility that aren't
       | necessarily managed well with most modern backup software.
       | 
       | I've settled on rsync.net for bulk "warm" access, and syncthing
       | with a few local raspberry pi's, and all local computers/phones
       | as kind of a "hot" replacement for "Documents" directory.
       | 
       | I had to do a moderately complete backup with "restic" from one
       | local HD to another on my desktop and that worked well for being
       | able to mount snapshots and pull files out, but it all feels like
       | a lot of overhead compared to what I'm really shooting for.
       | 
       | Further problems are the general unreliability of fuse/sshfs-
       | mounted filesystems, as "slow" file access for some things is
       | quite acceptable, but unreliable or hanging on individual file
       | access gets real old real quick.
       | 
       | So it boils down towards the far away end of the backup "cone"
       | being S3-glacier-ish (cheap-ish), and mostly blob data with no
       | extended attributes needed. Mid-tier is rsync/warm, mostly blob-
       | ish shared between multiple systems, along with some kinds of
       | per-device system restore capabilities (or home dir restore
       | capabilities), and near/hot is Syncthing, and a local NAS of
       | everything if possible. Modulo local device/hardware management,
       | and attempting to reduce any administrative overhead, especially
       | for casual/non-primary users of the same system.
       | 
       | It doesn't feel like nirvana, am I missing any gaps in thought
       | here? Anyone else have good experiences or suggestions on
       | something somewhat comprehensive between "do nothing, let the
       | cloud sort it out" and "be prepared, have your own local and
       | remote backup/transfer/restore processes"?
        
       | bombcar wrote:
       | One thing I'd add is make sure you're snapshotting via ZFS or
       | something similar, because the most likely cause of dataloss is
       | accidental deletion, and if your synced copies are perfect
       | replicas, they'll replicate the deletion, too.
        
         | rsync wrote:
         | If I could speak directly to the op I would recommend flipping
         | back to a "rightside up" model and doing dumb, mirror, 1:1
         | backups to rsync.net and then configuring a day/week/month
         | schedule of ZFS snapshots on that end.
         | 
         | Those ZFS snapshots are immutable / read-only so they not only
         | serve as retention, they protect against Mallory (and
         | ransomware and malware, etc.)
        
       | mikessoft_gmail wrote:
        
       | secabeen wrote:
       | The most careful of my cloud-first users maintain local
       | snapshotted NAS copies of their cloud data.
        
       | mikessoft_gmail wrote:
        
       | Eun wrote:
       | For Google I just use Googles Takeout[1] on a regular basis.
       | 
       | [1]: https://takeout.google.com/settings/takeout
        
         | ComodoHacker wrote:
         | But do you verify your backups?
        
       | ahupp wrote:
       | Note that the Google Photos API which rclone uses for gphotos
       | sources does not allow downloading original quality images (only
       | re-compressed versions), and strips EXIF. So I personally don't
       | use it for backup purposes, instead I:
       | 
       | 1) Sync my Google Drive to a local Synology NAS
       | 
       | 2) Periodically request a Google Takeout, which dumps the entire
       | contents of my Google account (including original photos) into
       | Drive.
       | 
       | 3) A nightly script unzips the takeout archives so they can be
       | picked up by an incremental backup to Backblaze B2. The NAS also
       | does local snapshots.
       | 
       | This has some nice properties:
       | 
       | - There are 3 distinct copies
       | 
       | - If a file is inadvertently deleted I can restore from either
       | local snapshot or B2.
       | 
       | - If my google account is nuked I still have an archive of mail,
       | photos, etc. Though that's limited to the frequency of my Takeout
       | requests.
       | 
       | - Since the NAS is the single point for backups it's a natural
       | place to put non-cloud files, or Dropbox etc and have them
       | automatically picked up for backup without setting up anything
       | new.
        
         | anderspitman wrote:
         | You can get full quality photos of you're a paying Google
         | customer right?
        
           | ahupp wrote:
           | It's an API limitation afaik, not tied to whether you pay.
        
         | marwis wrote:
         | Sadly, photos takeout breaks if you have too many files. Then
         | you have to setup multiple partial takeouts (each with some
         | subset of albums) which is a huge pain.
         | 
         | I used to use OneDrive as a 2nd backup but Microsoft started
         | doing same crap (removal of EXIF).
        
       | alphabettsy wrote:
       | I prefer local-first because I want versioned backups and that
       | seems harder with cloud-first.
       | 
       | I've had complete data loss but accidentally deleting something
       | is more common.
       | 
       | Currently using both Backblaze and Arq Backup for Mac/Windows,
       | Restic on Linux.
        
       | proactivesvcs wrote:
       | Synchronisation is not backup. Almost every restore that I've
       | performed for myself and others resulted from accidental
       | deletion. Sync propogates accidental deletion. It also doesn't
       | make your data resilient to malware, bit rot or removal of data
       | for TOS violation.
        
         | m463 wrote:
         | > Synchronization is not backup.
         | 
         | also RAID is not backup. :)
         | 
         | also related: https://www.jwz.org/doc/backups.html
        
         | zeagle wrote:
         | Agree. It can be but you need to insert version control in
         | there to protect from this. Either as a software package,
         | iterative backups, or the file system.
        
         | GordonS wrote:
         | > Sync propogates accidental deletion
         | 
         | This is true, but it's so worth noting some sync products (e.g.
         | Seafile) can keep a history of changes, allowing you to restore
         | deleted files.
         | 
         | Sync is not backup though, as you allude to.
        
           | jermaustin1 wrote:
           | This is what I do. I sync to my NAS, my NAS syncs to my
           | Backblaze B2 with file versioning. My current monthly bill is
           | only $4... I cannot believe how cheap Backblaze has been.
           | 
           | I can even get the B2 URL and share a file anytime I want
           | from inside my NAS.
           | 
           | Is it as easy as Dropbox? No, but I haven't missed using
           | dropbox. I just mount my NAS to my RDP sessions when I need
           | files "synced" between my various computers.
        
             | Groxx wrote:
             | I've been eyeballing this kind of setup lately, because
             | Dropbox has become so hostile to simple use. Hours of delay
             | between "changed small text file" and "synced", endless
             | "bugs" that push things to cloud-first storage rather than
             | the synced folders I have set up, and using more and more
             | resources with every upgrade. I'm so sick of their
             | business-oriented shift.
             | 
             | </rant>
             | 
             | Got a favorite NAS that works well? Without the risk of the
             | vendor's cloud deciding to delete your data? I haven't yet
             | picked one, beyond "what if I just used a Pi and some USB
             | drives..."
        
               | anderspitman wrote:
               | Check out syncthing an rclone if you're not already aware
               | of them.
        
               | jermaustin1 wrote:
               | I use Synology. Its not the best, but it Just Works and
               | got the "Shell" for only $120, plopped in 2 8TB spinners,
               | and haven't really touched it other than moving house 3
               | times.
               | 
               | If I was going to do it over again, I would get one of
               | their commercial grade NASes/ They are a bit more
               | expensive but have a decent CPU and RAM in them and can
               | run VMs/containers/etc and be your own little cloud.
               | 
               | They also have some cool features like, the ability to
               | link 2 remote Synology NASes together and sync back and
               | forth as well as backing up to any backup service you
               | want.
        
               | outworlder wrote:
               | Not the person you are asking, but I got some leftover
               | computer parts that I had been amassing for years, bought
               | a few drives and added TrueNAS to it.
               | 
               | It can do anything the big consumer NAS can, and a lot
               | more. If it fails, it uses off the shelf components.
               | 
               | The only thing that could have been done better is ECC
               | memory (for some people that's a dealbreaker). In which
               | case, one can get a server from ebay for a pittance which
               | will fit the bill.
        
         | MikusR wrote:
         | Syncthing (and probably other sync utilities) allow versioning
         | and keeping deleted files in separate directory.
        
           | proactivesvcs wrote:
           | Yup, which are great for noticing you deleted a file
           | accidentally, before you went to lunch. But for anything more
           | than very simple, recent restoration, versioning is not a
           | backup. It (can/does) mangle file names, metadata, folder
           | structure and can be a nightmare to perform larger restores
           | with. Versions are a convenience.
        
             | [deleted]
        
         | Jeff_Brown wrote:
         | I use Borg and Rclone together, to cover both aspects. One
         | unexpectedly nice thing about that is a Borg repo consists of
         | far fewer files than the thing it backs up, so Google is
         | unlikely to rate limit you as described in the article.
        
           | schainks wrote:
           | Is there any documentation on how you did this? I'm very
           | interested ^_^
        
           | proactivesvcs wrote:
           | Agreed - using a sync service can help bring in data to be
           | backed up, or make copies of a backup. The advantage you
           | mentioned is one of many that we get from using a backup
           | system for backups.
        
         | teddyh wrote:
         | "Just as we respect and care for our ancestors, so we must
         | respect and care for our old backups, for one day they may
         | achieve great glory."
         | 
         | -- http://www.taobackup.com/history.html
        
           | proactivesvcs wrote:
           | LOL! Sage and hilarity in one small story. Reading through
           | the rest now :-)
        
         | layer8 wrote:
         | It is if it supports history/versioning.
        
       | rsync wrote:
       | If you'd like to start using rclone, and you should, this is a
       | high quality _and general_ howto:
       | 
       | https://rsync.net/resources/howto/rclone.html
       | 
       | The example is S3 <--> rsync.net (naturally) but the step by step
       | instructions are applicable to setting up any combination of
       | "remotes".
       | 
       | rclone is like youtube-dl - a powerful tool that seems almost
       | magical.
        
         | gregsadetsky wrote:
         | I would love to be able to cron rclone (e.g. have it run
         | automatically) on rsync.net :-)
         | 
         | - a happy rsync.net customer
        
           | rsync wrote:
           | We can do this. Just email.
        
         | caseyohara wrote:
         | rclone is indeed magical. I recently migrated a customer's
         | business from Box to Dropbox, using rsync.net as an
         | intermediary to save my own bandwidth. rclone and rsync.net
         | work very well together.
        
           | rsync wrote:
           | Can you email info@ ?
           | 
           | I would love to write down your recipes/commands and put them
           | up so others can see them - box -> rsync.net -> dropbox is an
           | interesting workflow ...
        
             | caseyohara wrote:
             | Absolutely.
        
       | akerl_ wrote:
       | I have something akin to this. For the vast majority of my data,
       | a cloud platform is the authoritative storage: Gmail, Google
       | Photos, GitHub, etc. I have a local NAS that does streaming
       | backups of those systems to give me a persistent local copy, and
       | then the NAS also backs up the totality of its local backups to
       | an S3 bucket whose policy enforces deletion protection.
        
         | wahnfrieden wrote:
         | Including overwrite protection? No admin access to those
         | policies from the server?
        
           | akerl_ wrote:
           | The NAS's user has ListObject, PutObject, and DeleteObject.
           | The bucket has versioning enabled, and DeleteObject doesn't
           | allow deleting prior versions. So the NAS can delete what's
           | immediately visible in the bucket, but it can't permanently
           | delete things.
           | 
           | The other way to set this up is to configure Object Lock on
           | your S3 bucket: https://docs.aws.amazon.com/AmazonS3/latest/u
           | serguide/object...
           | 
           | The upside of versioning over Object Lock, for my use case,
           | is that the backup scripts can be very simple, because they
           | don't have to deal with what happens if they want to clean up
           | a file but don't have permissions to. They just do their
           | thing, and I'm confident that old versions are retained. The
           | downside of this approach is that my S3 usage will increase
           | over time, because I'm retaining all old content. So
           | eventually it'll cost enough for me to decide to either
           | switch to Object Lock or figure out a safe way to prune old
           | content.
        
       | TOGoS wrote:
       | Another solution: stuff everything that fits into Git repos,
       | fetch --all from each of your different computers regularly.
       | 
       | For things that don't fit nicely into Git (e.g. large collections
       | of large binary files), I use a homegrown git-like system for
       | storing filesystem snapshots:
       | http://github.com/TOGoS/ContentCouch
        
       | Proven wrote:
        
       | zeagle wrote:
       | It's a good way of doing it with near unlimited bandwidth and if
       | a machine or drive dies you just replace it with no data loss. I
       | save everything important encrypted on my vps to seafile and my
       | Nas pills back night backups back home. Software/family just see
       | it as network drive.
        
       ___________________________________________________________________
       (page generated 2022-09-28 23:01 UTC)