[HN Gopher] Make Your Own Backup System - Part 1: Strategy Befor...
       ___________________________________________________________________
        
       Make Your Own Backup System - Part 1: Strategy Before Scripts
        
       Author : Bogdanp
       Score  : 86 points
       Date   : 2025-07-19 19:43 UTC (3 hours ago)
        
 (HTM) web link (it-notes.dragas.net)
 (TXT) w3m dump (it-notes.dragas.net)
        
       | rr808 wrote:
       | I dont need a backup system. I just need a standardized way to
       | keep 25 years of photos for a family of 4 with their own phones,
       | cameras, downloads, scans etc. I still haven't found anything
       | good.
        
         | bambax wrote:
         | You do need a backup. But before that, you need a family NAS.
         | There are plenty of options. (But a NAS is not a backup.)
        
         | xandrius wrote:
         | Downloads and scans are generally trash unless deemed
         | important.
         | 
         | For the phones and cameras, setup Nextcloud and have it
         | automatically sync to your own home network. Then have a
         | nightly backup to another disk with a health check after it
         | finishes.
         | 
         | After that you can pick either a cloud host which your trust or
         | get another drive of ours into someone else's server to have
         | another locstion for your 2nd backup and you're golden.
        
         | sandreas wrote:
         | I use syncthing... it's great for that purpose, Android is not
         | officially supported but there is a fork, that works fine.
         | Maybe you want to combine it with either ente.io or immich
         | (also available for self-hosted) for photo backup.
         | 
         | I would also distinguish between documents (like PDF and TIFF)
         | and photos - there is also paperless ngx.
        
           | setopt wrote:
           | I like Syncthing but it's not a great option on iOS.
        
             | sandreas wrote:
             | What about Mobius Sync?
             | 
             | https://mobiussync.com/
        
               | baby_souffle wrote:
               | It's an option... But still beholden to the arbitrary
               | restriction apple has on data access.
        
           | bravesoul2 wrote:
           | Isn't that like a Dropbox approach? If you have 2tb photos
           | this means you need 2tb storage on everything?
        
         | palata wrote:
         | I recently found that Nextcloud is good enough to "collect" the
         | photos from my family onto my NAS. And my NAS makes encrypted
         | backups to a cloud using restic.
        
         | rsolva wrote:
         | Check out ente.io - it is really good!
        
           | nor-and-or-not wrote:
           | I second that, and you can even self-host it.
        
         | bravesoul2 wrote:
         | Struggling too.
         | 
         | For me one win/mac with backblaze. Dump everything to that
         | machine. Second ext. Drive backup just in case.
        
         | haiku2077 wrote:
         | A NAS running Immich, maybe?
        
       | bambax wrote:
       | It's endlessly surprising how people don't care / don't think
       | about backups. And not just individuals! Large companies too.
       | 
       | I'm consulting for a company that makes around EUR1 billion
       | annual turnover. They don't make their own backups. They rely on
       | disk copies made by the datacenter operator, which happen
       | randomly, and which they don't test themselves.
       | 
       | Recently a user error caused the production database to be
       | destroyed. The most recent "backup" was four days old. Then we
       | had to replay all transactions that happened during those four
       | days. It's insane.
       | 
       | But the most insane part was, nobody was shocked or terrified
       | about the incident. "Business as usual" it seems.
        
         | polishdude20 wrote:
         | If it doesn't affect your bottom line enough to do it right,
         | then I guess it's ok?
        
         | treetalker wrote:
         | Possibly for legal purposes? Litigation holds are a PITA and
         | generators of additional liability exposure, and backups can
         | come back to bite you.
        
           | haiku2077 wrote:
           | Companies that big have legal requirements to keep much of
           | their data around for 5-7 years anyway.
        
         | daneel_w wrote:
         | It's also endlessly surprising how people over-think the
         | process and requirements.
        
       | bobsmooth wrote:
       | I just pay $60 a year to backblaze.
        
       | gmuslera wrote:
       | How data changes, and what changes it, matters when trying to
       | optimize backups.
       | 
       | A full OS installation may not change a lot, or change with
       | security updates that anyway are stored elsewhere.
       | 
       | Configurations have their own lifecycle, actors, and good
       | practices on how to keep and backup them. Same with code.
       | 
       | Data is what matters if you have saved somewhat everything else.
       | And it could have a different treatment file tree backups from
       | I.e. database backups.
       | 
       | Logs is something that frequently changes, but you can have a
       | proper log server for which logs are data.
       | 
       | Things can be this granular, or go for storage backup. But the
       | granularity, while may add complexity, may lower costs and
       | increase how much of what matters you can store for longer
       | periods of time.
        
         | o11c wrote:
         | Other things that matter (some overlap):
         | 
         | * Is the file userland-compressed, filesystem-or-device-
         | compressed, or uncompressed?
         | 
         | * What are you going to do about secret keys?
         | 
         | * Is the file immutable, replace-only (most files), append-only
         | (not limited to logs; beware the need to defrag these), or
         | fully mutable (rare - mostly databases or dangerous archive
         | software)?
         | 
         | * Can you rely on page size for (some) chunking, or do you need
         | to rely entirely on content-based chunking?
         | 
         | * How exactly are you going to garbage-collect the data from
         | no-longer-active backups?
         | 
         | * Does your filesystem expose an _accurate_ "this file changed"
         | signal, or better an actual hash? Does it support chunk
         | sharing? Do you know how those APIs work?
         | 
         | * Are you crossing a kernel version that is one-way
         | incompatible?
         | 
         | * Do you have control of the raw filesystem at the other side?
         | (e.g. the most efficient backup for btrfs is only possible with
         | this)
        
       | sandreas wrote:
       | Nice writeup... Although I'm missing a few points...
       | 
       | In my opinion a good backup (system) is only good, if it has been
       | tested to be restorable as fast as possible and the procedure is
       | clear (like in documented).
       | 
       | How often have I heard or seen backups that "work great" and "oh,
       | no problem we have them" only to see them fail or take ages to
       | restore, when the disaster has happened (2 days can be an
       | expensive amount of time in a production environment). Quite too
       | often only parts could be restored.
       | 
       | Another missing aspect is within the snapshots section... I like
       | restic, which provides repository based backup with deduplicated
       | snapshots for FILES (not filesystems). It's pretty much what you
       | want if you don't have ZFS (or other reliable snapshot based
       | filesystems) to keep different versions of your files that have
       | been deleted on the filesystem.
       | 
       | The last aspect is partly mentioned, the better PULL than PUSH
       | part. Ransomware is really clever these days and if you PUSH your
       | backups, it can also encrypt or delete all your backups, because
       | it has access to everything... So you could either use readonly
       | media (like Blurays) or PULL is mandatory. It is also helpful to
       | have auto-snapshotting on ZFS via zfs-auto-snapshot, zrepl or
       | sanoid to go back in time to where the ransomware has started its
       | journey.
        
         | sgc wrote:
         | Since you mentioned restic, is there something wrong with using
         | restic append-only with occasional on-server pruning instead of
         | pulling? I thought this was the recommended way of avoiding
         | ransomware problems using restic.
        
           | sandreas wrote:
           | There are several methods... there is also restic rest-server
           | (https://github.com/restic/rest-server). I personally use ZFS
           | with pull via ssh...
        
         | TacticalCoder wrote:
         | > So you could either use readonly media (like Blurays) or PULL
         | is mandatory.
         | 
         | Or like someone already commented you can use a server that
         | allows push but doesn't allow to mess with older files. You can
         | for example restrict ssh to only the _scp_ command and the ssh
         | server can moreover offer a chroot 'ed environment to which scp
         | shall copy the backups. And the server can for example daily
         | rotate that chroot.
         | 
         | The push can then push one thing: daily backups. It cannot log
         | in. It cannot overwrite older backups.
         | 
         | Short of a serious SSH exploit where the ransomware could both
         | re-configure the server to accept all ssh (and not just scp)
         | and escape the chroot box, the ransomware is simply not
         | destroying data from before the ransomware found its way on the
         | system.
         | 
         | My backup procedure does that for the one backup server that I
         | have on a dedicated server: a chroot'ed ssh server that only
         | accepts scp and nothing else. It's of course just one part of
         | the backup procedure, not the only thing I rely on for backups.
         | 
         | P.S: it's not incompatible with also using read-only media
        
           | anonymars wrote:
           | I don't understand why this is dead..is it wrong advice? Is
           | there some hidden flaw? Is it simply because the content is
           | repeated elsewhere?
           | 
           | On the face of it "append-only access (no changes)" seems
           | sound to me
        
       | daneel_w wrote:
       | My valuable data is less than 100 MiB. I just
       | tar+compress+encrypt a few select directories/files twice a week
       | and keep a couple of months of rotation. No incremental hassle
       | necessary. I store copies at home and I store copies outside of
       | home. It's a no-frills setup that costs nothing, is just a few
       | lines of *sh script, takes care of itself, and never really
       | needed any maintenance.
        
         | mavilia wrote:
         | This comment made me rethink what I have that is actually
         | valuable data. My photos alone even if culled down to just my
         | favorites would probably be at least a few gigs. Contacts from
         | my phone would be small. Other than that I guess I wouldn't be
         | devastated if I lost anything else. Probably should put my
         | recovery keys somewhere safer but honestly the accounts most
         | important to me don't have recovery keys.
         | 
         | Curious what you consider valuable data?
         | 
         | Edit: I should say for pictues I have around 2Tb right now
         | (downside of being a hobby photographer)
        
           | daneel_w wrote:
           | With valuable I should've elaborated that it's my set of
           | constantly changing daily-use data. Keychain, documents and
           | notes, e-mail, bookmarks, active software projects, those
           | kinds of things.
           | 
           | I have a large amount of memories and "mathom" as well, in
           | double copies, but I connect and add to this data so rarely
           | that it absolutely does not have to be part of any ongoing
           | backup plan.
        
       | progbits wrote:
       | > One way is to ensure that machines that must be backed up via
       | "push" [..] can only access their own space. More importantly,
       | the backup server, for security reasons, should maintain its own
       | filesystem snapshots for a certain period. In this way, even in
       | the worst-case scenario (workload compromised -> connection to
       | backup server -> deletion of backups to demand a ransom), the
       | backup server has its own snapshots
       | 
       | My preferred solution is to let client only write new backups,
       | never delete. The deletion is handled separately (manually or
       | cron on the target).
       | 
       | You can do this with rsync/ssh via the allowed command feature in
       | .ssh/authorized_keys.
        
         | haiku2077 wrote:
         | This is also why I use rclone copy instead of rclone sync for
         | my backups, using API keys without permission to delete
         | objects.
        
       | bob1029 wrote:
       | I think the cleanest, most compelling backup strategies are those
       | employed by RDBMS products. [A]sync log replication is really
       | powerful at taking any arbitrary domain and making sure it exists
       | in the other sites exactly.
       | 
       | You might think this is unsuitable for your photo/music/etc.
       | collection, but there's no technical reason you couldn't use the
       | database as the primary storage mechanism. SQLite will take you
       | to ~281 terabytes with a 64k page size. MSSQL supports something
       | crazy like 500 petabytes. The blob data types will choke on your
       | 8k avengers rip, but you could store it in 1 gig chunks - There
       | are probably other benefits to this anyways.
        
       | kernc wrote:
       | Make your own backup system--is exactly what I did. I felt git
       | porcelain had a stable-enough API to accommodate this popular use
       | case.
       | 
       | https://kernc.github.io/myba/
        
       | binwiederhier wrote:
       | Thank you for sharing. A curious read. I am looking forward to
       | the next post.
       | 
       | I've been working on backup and disaster recovery software for 10
       | years. There's a common phrase in our realm that I feel obligated
       | to share, given the nature of this article.
       | 
       | > "Friends don't let friends build their own Backup and Disaster
       | Recovery (BCDR) solution"
       | 
       | Building BCDR is notoriously difficult and has many gotchas. The
       | author hinted at some of them, but maybe let me try to drive some
       | of them home.
       | 
       | - Backup is not disaster recovery: In case of a disaster, you
       | want to be up and running near-instantly. If you cannot get back
       | up and running in a few minutes/hours, your customers will lose
       | your trust and your business will hurt. Being able to restore a
       | system (file server, database, domain controller) with minimal
       | data loss (<1 hr) is vital for the survival of many businesses.
       | See Recovery Time Objective (RTO) and Recovery Point Objective
       | (RPO).
       | 
       | - Point-in-time backups (crash consistent vs application
       | consistent): A proper backup system should support point-in-time
       | backups. An "rsync copy" of a file system is not a point-in-time
       | backup (unless the system is offline), because the system changes
       | constantly. A point-in-time backup is a backup in which each
       | block/file/.. maps to the same exact timestamp. We typically
       | differentiate between "crash consistent backups" which are
       | similar to pulling the plug on a running computer, and
       | "application consistent backups", which involves asking all
       | important applications to persist their state to disk and freeze
       | operations while the backup is happening. Application consistent
       | backups (which is provided by Microsoft's VSS, as mentioned by
       | the author) significantly reduce the chances of corruption. You
       | should never trust an "rsync copy" or even crash consistent
       | backups.
       | 
       | - Murphy's law is really true for storage media: My parent's put
       | their backups on external hard drives, and all of r/DataHoarder
       | seems to buy only 12T HDDs and put them in a RAID0. In my
       | experience, hard drives of all kinds fail all the time (though
       | NVMe SSD > other SSD > HDD), so having backups in multiple places
       | (3-2-1 backup!) is important.
       | 
       | (I have more stuff I wanted to write down, but it's late and the
       | kids are up early.)
        
       ___________________________________________________________________
       (page generated 2025-07-19 23:00 UTC)