[HN Gopher] Make Your Own Backup System - Part 1: Strategy Befor...
___________________________________________________________________
Make Your Own Backup System - Part 1: Strategy Before Scripts
Author : Bogdanp
Score : 338 points
Date : 2025-07-19 19:43 UTC (1 days ago)
(HTM) web link (it-notes.dragas.net)
(TXT) w3m dump (it-notes.dragas.net)
| rr808 wrote:
| I dont need a backup system. I just need a standardized way to
| keep 25 years of photos for a family of 4 with their own phones,
| cameras, downloads, scans etc. I still haven't found anything
| good.
| bambax wrote:
| You do need a backup. But before that, you need a family NAS.
| There are plenty of options. (But a NAS is not a backup.)
| xandrius wrote:
| Downloads and scans are generally trash unless deemed
| important.
|
| For the phones and cameras, setup Nextcloud and have it
| automatically sync to your own home network. Then have a
| nightly backup to another disk with a health check after it
| finishes.
|
| After that you can pick either a cloud host which your trust or
| get another drive of ours into someone else's server to have
| another locstion for your 2nd backup and you're golden.
| sandreas wrote:
| I use syncthing... it's great for that purpose, Android is not
| officially supported but there is a fork, that works fine.
| Maybe you want to combine it with either ente.io or immich
| (also available for self-hosted) for photo backup.
|
| I would also distinguish between documents (like PDF and TIFF)
| and photos - there is also paperless ngx.
| setopt wrote:
| I like Syncthing but it's not a great option on iOS.
| sandreas wrote:
| What about Mobius Sync?
|
| https://mobiussync.com/
| baby_souffle wrote:
| It's an option... But still beholden to the arbitrary
| restriction apple has on data access.
| msh wrote:
| Synctrain is better, and free
| bravesoul2 wrote:
| Isn't that like a Dropbox approach? If you have 2tb photos
| this means you need 2tb storage on everything?
| palata wrote:
| I recently found that Nextcloud is good enough to "collect" the
| photos from my family onto my NAS. And my NAS makes encrypted
| backups to a cloud using restic.
| rsolva wrote:
| Check out ente.io - it is really good!
| nor-and-or-not wrote:
| I second that, and you can even self-host it.
| bravesoul2 wrote:
| Struggling too.
|
| For me one win/mac with backblaze. Dump everything to that
| machine. Second ext. Drive backup just in case.
| haiku2077 wrote:
| A NAS running Immich, maybe?
| senectus1 wrote:
| yup. works a treat for me.
|
| still need to back it up though as a NAS/RAID isnt backup.
| BirdieNZ wrote:
| I'm trialing a NAS with Immich, and then backing up the media
| and Immich DB dump daily to AWS S3 Deep Archive. It has Android
| and iOS apps, and enough of the feature set of Google Photos to
| keep me happy.
|
| You can also store photos/scans on desktops in the same NAS and
| make sure Immich is picking them up (and then the backup script
| will catch them if they get imported to Immich). For an HN user
| it's pretty straight-forward to set up.
| Jedd wrote:
| Is '25 years of photos' a North American measure of data I was
| previously unfamiliar with?
|
| As bambax noted, you do in fact need a backup system -- you
| just don't realise that yet.
|
| And you _want_ a way of _sharing_ data between devices. Without
| knowing what you 've explored, and constraints imposed by your
| vendors of choice, it's hard to be prescriptive.
|
| FWIW I use syncthing on gnu/linux, microsoft windows, android,
| in a mesh arrangement, for several collections of stuff,
| anchored back to two dedicated archive targets (small memory /
| large storage debian VMs) running at two different sites, and
| then perform regular snapshots on _those_ using borgbackup.
| This gives me backups and archives. My RPO is 24h but could
| easily be reduced to whatever figure I want.
|
| I believe this method won't work if Apple phones / tablets are
| involved, as you are not allowed to run background tasks (for
| syncthing) on your devices.
|
| (I have ~500GB of photos, and several 10-200GB collections of
| docs and miscellaneous files, as unique repositories - none of
| these experience massive changes, it's mostly incremental
| differences, so it is pretty frugal with diff-based backup
| systems.)
| ethan_smith wrote:
| PhotoPrism or Immich are solid self-hosted options that handle
| deduplication and provide good search/tagging for family
| photos. For cloud, Backblaze B2 + Cryptomator can give you
| encrypted storage at ~$1/TB/month with DIY scripts for uploads.
| mhuffman wrote:
| >I still haven't found anything good.
|
| I have used pCloud for years with no issue.
|
| Also external "slow" storage drives are fairly inexpensive now
| as a third backup if your whole life's images and important
| documents are at stake.
|
| Always best to keep multiple copies of photos or documents that
| you care about in multiple places. Houses can flood or burn,
| computers and storage can fail. No need to be over-paranoid
| about it, but two copies of important things isn't asking too
| much of someone.
| rafamaddd wrote:
| Others have already mentioned Syncthing[^1]. Here's what I'm
| doing on a budget since I don't have a homeserver/NAS or
| anything like that.
|
| First you need to choose a central device where you're going to
| send all of the important stuff from other devices like
| smartphones, laptops, etc. Then you need to setup Syncthing,
| which works on linux, macos, windows and others. For android
| there's Syncthing-fork[^2] but for iOS idk.
|
| Setup the folders you want to backup on each device, for
| android, the folders I recommend to backup are DCIM, documents,
| downloads. For the most part, everything you care about will be
| there. But I setup a few others like
| Android/media/WhatsApp/Media to save all photos shared on
| chats.
|
| Then on this central device that's receiving everything from
| others, that's where you do the "real" backups. I my case, I'm
| doing backups to a external HDD, and also to a cloud provider
| with restic[^3].
|
| I highly recommend restic, genuinely great software for
| backups. It is incremental (like BTRFS snapshots), has backends
| for a bunch of providers, including any S3 compatible storage
| and if combined with rclone, you have access to virtually any
| provider. It is encrypted, and because of how it was built, can
| you still search/navigate your remote snapshots without having
| to download the entire snapshot (borg[^4] also does this), the
| most important aspect of this is that you can restore
| individual folders/files. And this crucial because most
| providers for cloud storage will charge you more depending on
| how much bandwidth you have used. I have already needed to
| restore files and folders from my remote backups in multiple
| occasions and it works beautifully.
|
| [^1]: https://github.com/syncthing/syncthing [^2]:
| https://github.com/Catfriend1/syncthing-android [^3]:
| https://github.com/restic/restic [^4]:
| https://github.com/borgbackup/borg
| bambax wrote:
| It's endlessly surprising how people don't care / don't think
| about backups. And not just individuals! Large companies too.
|
| I'm consulting for a company that makes around EUR1 billion
| annual turnover. They don't make their own backups. They rely on
| disk copies made by the datacenter operator, which happen
| randomly, and which they don't test themselves.
|
| Recently a user error caused the production database to be
| destroyed. The most recent "backup" was four days old. Then we
| had to replay all transactions that happened during those four
| days. It's insane.
|
| But the most insane part was, nobody was shocked or terrified
| about the incident. "Business as usual" it seems.
| polishdude20 wrote:
| If it doesn't affect your bottom line enough to do it right,
| then I guess it's ok?
| rapfaria wrote:
| I'd go even a step further: For the big corp, having a point
| of failure that lives outside its structure can be a feature,
| and not a bug.
|
| "Oh there goes Super Entrepise DB Partner again" turns into a
| product next fiscal year, that shutdowns the following year
| because the scope was too big, but at least they tried to
| make things better.
| justsomehnguy wrote:
| RTO/RPO is a thing. Despite many companies declare waht they
| need SLA of five nines and RPO in minutes... this situations
| are quite evident what many of them are fine with SLA of 95%
| SLA and PTO of _weeks_
| treetalker wrote:
| Possibly for legal purposes? Litigation holds are a PITA and
| generators of additional liability exposure, and backups can
| come back to bite you.
| haiku2077 wrote:
| Companies that big have legal requirements to keep much of
| their data around for 5-7 years anyway.
| daneel_w wrote:
| It's also endlessly surprising how people over-think the
| process and requirements.
| tguvot wrote:
| this is side effect of soc2 auditor approved disaster recovery
| policies.
|
| company where i worked, had something similar. i spent a couple
| of months going through all teams, figuring out how disaster
| recovery policies are implemented (all of them were approved
| soc auditors).
|
| outcome of my analysis was that in case of major disasters it
| will be easier to shut down company and go home than trying to
| recover to working state within reasonable amount of time.
| truetraveller wrote:
| Wait, the prod db, like the whole thing? Losing 4 days of data?
| How does that work. Aren't customers upset? Not doubting your
| account, but maybe you missed something, because for a $1
| billion company, that's likely going to have huge consequences.
| bambax wrote:
| Well it was "a" production database, the one that tracks
| supplier orders and invoices so that suppliers can eventually
| get paid. The database is populated by a data stream, so
| after restoration of the old version, they replayed the data
| stream (that is indeed stored somewhere, but in only one
| version (not a backup)).
|
| And this was far from painless: the system was unavailable
| for a whole day, and all manual interventions on the system
| (like comments, corrections, etc.) that had been done between
| the restoration date and the incident, were irretrievably
| lost. -- There were not too many of those apparently, but
| still.
| bobsmooth wrote:
| I just pay $60 a year to backblaze.
| somehnguy wrote:
| Do you mean $99?
| bobsmooth wrote:
| I might be grandfathered on the old price, not sure.
| somehnguy wrote:
| I would be surprised. From the announcements I can find
| they don't mention any permanent grandfathering. When your
| plan renews the price increases - and their last increase
| was 2 years ago.
| philjohn wrote:
| Backblaze is great, but restores can be a bit time consuming,
| even on a fast FTTP connection.
|
| I do have BackBlaze on my desktop, but I also have UrBackup
| running on all the computers in the house which backs up to a
| RaidZ2 array, and then a daily offsite backup of the "current"
| backup (which is just the files stored in a directory in
| UrBackup) via restic and rclone to JottaCloud.
|
| VM's and containers backup to Proxmox Backup Server and the
| main datastore of that is also shipped offsite every day, as
| well as a second Proxmox Backup Server locally (but separate
| from the rack).
|
| I test restores monthly and so far so good.
| gmuslera wrote:
| How data changes, and what changes it, matters when trying to
| optimize backups.
|
| A full OS installation may not change a lot, or change with
| security updates that anyway are stored elsewhere.
|
| Configurations have their own lifecycle, actors, and good
| practices on how to keep and backup them. Same with code.
|
| Data is what matters if you have saved somewhat everything else.
| And it could have a different treatment file tree backups from
| I.e. database backups.
|
| Logs is something that frequently changes, but you can have a
| proper log server for which logs are data.
|
| Things can be this granular, or go for storage backup. But the
| granularity, while may add complexity, may lower costs and
| increase how much of what matters you can store for longer
| periods of time.
| o11c wrote:
| Other things that matter (some overlap):
|
| * Is the file userland-compressed, filesystem-or-device-
| compressed, or uncompressed?
|
| * What are you going to do about secret keys?
|
| * Is the file immutable, replace-only (most files), append-only
| (not limited to logs; beware the need to defrag these), or
| fully mutable (rare - mostly databases or dangerous archive
| software)?
|
| * Can you rely on page size for (some) chunking, or do you need
| to rely entirely on content-based chunking?
|
| * How exactly are you going to garbage-collect the data from
| no-longer-active backups?
|
| * Does your filesystem expose an _accurate_ "this file changed"
| signal, or better an actual hash? Does it support chunk
| sharing? Do you know how those APIs work?
|
| * Are you crossing a kernel version that is one-way
| incompatible?
|
| * Do you have control of the raw filesystem at the other side?
| (e.g. the most efficient backup for btrfs is only possible with
| this)
| sandreas wrote:
| Nice writeup... Although I'm missing a few points...
|
| In my opinion a good backup (system) is only good, if it has been
| tested to be restorable as fast as possible and the procedure is
| clear (like in documented).
|
| How often have I heard or seen backups that "work great" and "oh,
| no problem we have them" only to see them fail or take ages to
| restore, when the disaster has happened (2 days can be an
| expensive amount of time in a production environment). Quite too
| often only parts could be restored.
|
| Another missing aspect is within the snapshots section... I like
| restic, which provides repository based backup with deduplicated
| snapshots for FILES (not filesystems). It's pretty much what you
| want if you don't have ZFS (or other reliable snapshot based
| filesystems) to keep different versions of your files that have
| been deleted on the filesystem.
|
| The last aspect is partly mentioned, the better PULL than PUSH
| part. Ransomware is really clever these days and if you PUSH your
| backups, it can also encrypt or delete all your backups, because
| it has access to everything... So you could either use readonly
| media (like Blurays) or PULL is mandatory. It is also helpful to
| have auto-snapshotting on ZFS via zfs-auto-snapshot, zrepl or
| sanoid to go back in time to where the ransomware has started its
| journey.
| sgc wrote:
| Since you mentioned restic, is there something wrong with using
| restic append-only with occasional on-server pruning instead of
| pulling? I thought this was the recommended way of avoiding
| ransomware problems using restic.
| sandreas wrote:
| There are several methods... there is also restic rest-server
| (https://github.com/restic/rest-server). I personally use ZFS
| with pull via ssh...
| TacticalCoder wrote:
| > So you could either use readonly media (like Blurays) or PULL
| is mandatory.
|
| Or like someone already commented you can use a server that
| allows push but doesn't allow to mess with older files. You can
| for example restrict ssh to only the _scp_ command and the ssh
| server can moreover offer a chroot 'ed environment to which scp
| shall copy the backups. And the server can for example daily
| rotate that chroot.
|
| The push can then push one thing: daily backups. It cannot log
| in. It cannot overwrite older backups.
|
| Short of a serious SSH exploit where the ransomware could both
| re-configure the server to accept all ssh (and not just scp)
| and escape the chroot box, the ransomware is simply not
| destroying data from before the ransomware found its way on the
| system.
|
| My backup procedure does that for the one backup server that I
| have on a dedicated server: a chroot'ed ssh server that only
| accepts scp and nothing else. It's of course just one part of
| the backup procedure, not the only thing I rely on for backups.
|
| P.S: it's not incompatible with also using read-only media
| anonymars wrote:
| I don't understand why this is dead..is it wrong advice? Is
| there some hidden flaw? Is it simply because the content is
| repeated elsewhere?
|
| On the face of it "append-only access (no changes)" seems
| sound to me
| quesera wrote:
| TacticalCoder's comments appear to be auto-deaded for the
| last week or so.
|
| I did not see a likely reason in a quick review of their
| comment history.
|
| You can view a comment directly by following the "... ago"
| link, and from there you can use the "vouch" link to revive
| the comment. I vouched for a few of TacticalCoder's recent
| comments.
| marcusb wrote:
| > Ransomware is really clever these days and if you PUSH your
| backups, it can also encrypt or delete all your backups,
| because it has access to everything
|
| That depends on how you have access to your backup servers
| configured. I'm comfortable with append-only backup enforcement
| for push backups[0] with Borg and Restic via SSH, although I do
| use offline backup drive rotation as a last line of defense for
| my local backup set. YMMV.
|
| 0 - https://marcusb.org/posts/2024/07/ransomware-resistant-
| backu...
| guillem_lefait wrote:
| Could you elaborate on your strategy to rotate your disks ?
| marcusb wrote:
| It's pretty simple: the backup host has the backup disk
| attached via a usb cradle. There's a file in the root
| directory of the backup disk file system that gets touched
| when the drive is rotated. A cron jobs emails me if this
| file is more than 3 months old. When I rotate the disk, I
| format the new disk and recreate the restic repos for the
| remote hosts. I then move the old disk into a fireproof
| safe. I keep four drives in rotation, so at any given point
| in time I have the online drive plus three with
| progressively older backup sets in the safe.
| KPGv2 wrote:
| > tested to be restorable as fast as possible
|
| That depends on your goal, right? If it took me six months to
| recover my family photo backups, that'd be fine by me.
| daneel_w wrote:
| My valuable data is less than 100 MiB. I just
| tar+compress+encrypt a few select directories/files twice a week
| and keep a couple of months of rotation. No incremental hassle
| necessary. I store copies at home and I store copies outside of
| home. It's a no-frills setup that costs nothing, is just a few
| lines of *sh script, takes care of itself, and never really
| needed any maintenance.
| mavilia wrote:
| This comment made me rethink what I have that is actually
| valuable data. My photos alone even if culled down to just my
| favorites would probably be at least a few gigs. Contacts from
| my phone would be small. Other than that I guess I wouldn't be
| devastated if I lost anything else. Probably should put my
| recovery keys somewhere safer but honestly the accounts most
| important to me don't have recovery keys.
|
| Curious what you consider valuable data?
|
| Edit: I should say for pictues I have around 2Tb right now
| (downside of being a hobby photographer)
| daneel_w wrote:
| With valuable I should've elaborated that it's my set of
| constantly changing daily-use data. Keychain, documents and
| notes, e-mail, bookmarks, active software projects, those
| kinds of things.
|
| I have a large amount of memories and "mathom" as well, in
| double copies, but I connect and add to this data so rarely
| that it absolutely does not have to be part of any ongoing
| backup plan.
| mystifyingpoi wrote:
| With photos, it is kinda different story. If I lost 50% of my
| last vacation photos, I would probably not even notice when
| scrolling through them. It makes me very nostalgic for analog
| cameras, where my parents would have to think strategically,
| how to use 30 or so slots on the analog film for 7 day trip.
| rossant wrote:
| If you die suddenly tomorrow, what would you want your family
| to recover? What would you want your grandchildren to have
| access to in a few decades? That's your valuable data. They may
| not need or want to inherit from hundreds of thousands of
| files. Chances are that a few key photos, videos, and text
| would be enough.
| daneel_w wrote:
| I do have a contingency plan for all my digitial memories and
| my online accounts etc.
| progbits wrote:
| > One way is to ensure that machines that must be backed up via
| "push" [..] can only access their own space. More importantly,
| the backup server, for security reasons, should maintain its own
| filesystem snapshots for a certain period. In this way, even in
| the worst-case scenario (workload compromised -> connection to
| backup server -> deletion of backups to demand a ransom), the
| backup server has its own snapshots
|
| My preferred solution is to let client only write new backups,
| never delete. The deletion is handled separately (manually or
| cron on the target).
|
| You can do this with rsync/ssh via the allowed command feature in
| .ssh/authorized_keys.
| haiku2077 wrote:
| This is also why I use rclone copy instead of rclone sync for
| my backups, using API keys without permission to delete
| objects.
| 3eb7988a1663 wrote:
| I fall into the "pull" camp so this is less of a worry. The
| server to be backed-up should have no permissions to the backup
| server. If an attacker can root your live server (with more
| code/services to exploit), they do not automatically also gain
| access to the backup system.
| amelius wrote:
| I also implemented my backup scheme using "pull" as it is
| easier to do than an append-only system, and therefore
| probably more secure as there is less room for mistakes. The
| backup server can only be accessed through a console
| directly, which is a bit annoying sometimes, but at least it
| writes summaries back to the network.
| bobek wrote:
| It is not particularly hard either. Checkout restic server.
|
| https://github.com/restic/rest-server/
| godelski wrote:
| Another thing you can do is just run a container or a specific
| backup user. Something like with a systemd-nspawn can give you
| a pretty lightweight chroot "jail" and you can ensure that
| anyone inside that jail can't do any rm commands.
| pacman -S arch-install-scripts # Need this package
| (for debian you need debootstrap) pacstrap -c
| /mnt/backups/TestSpawn base # Makes chroot systemd-
| nspawn -D /mnt/backups/TestSpawn # Logs in passwd
| # Set the root password. Do whatever else you need then exit
| sudo ln -s /mnt/backups/TestSpawn /var/lib/machines/TestSpawn
| sudo machinectl start TestSpawn # Congrats, you can
| now control with machinectl
|
| Configs work like normal systemd stuff. So you can limit access
| controls, restrict file paths, make the service boot only at
| certain times or activate based on listening to a port, make
| only accessible via 192.168.1.0/24 (or 100.64.0.0/10), limit
| memory/CPU usage, or whatever you want. (I also like to use
| BTRFS subvolumes) You could also go systemd-vmspawn for a full
| VM if you really wanted to.
|
| Extra nice, you can use importctl to then replicate.
| zeec123 wrote:
| > My preferred solution is to let client only write new
| backups, never delete.
|
| I wish for syncoid to add this feature. I want it to only copy
| snapshots to the backup server. The server then deletes old
| snapshots. At the moment it requires delete permissions.
| KAMSPioneer wrote:
| You can do this by using a dedicated syncoid user and ZFS
| delegated permissions: https://openzfs.github.io/openzfs-
| docs/man/master/8/zfs-allo...
|
| You'll need to add the --no-elevate-permissions flag to your
| syncoid job.
| dspillett wrote:
| I do both. It requires two backup locations, but I want that
| anyway. My backup sources push to an intermediate location and
| the primary backups pull from there. The intermediate location
| is smaller so can hold less, but does still keep snapshots.
|
| This means that neither my backup sources nor the main backup
| sinks need to authenticate with each other, in fact I make sure
| that they can't, they can only authenticate with the
| intermediate and it can't authenticate with them0. If any one
| or two of the three parts is compromised there is a chance that
| the third will be safe. Backing up the credentials for all this
| is handled separately to make sure I'm not storing the keys to
| the entire kingdom on any internet connectable hosts. The few
| bits of data that I have that are truly massively important are
| backed up with extra measures (including an actual offline
| backup) on top.
|
| With this separation, verifying backups requires extra steps
| too. The main backups occasionally verify checksums of the data
| that hold, and send a copy of the hashes for the latest backup
| back to the intermediate host(s) where that can read back to
| compare to hashes generated1 at the sources2 in order to detect
| certain families of corruption issues.
|
| --------
|
| [0] I think of the arrangement as a soft-offline backup,
| because like an offline backup nothing on the sources can
| (directly) corrupt the backup snapshots at the other end.
|
| [1] These are generated at backup time, to reduce false alerts
| from files modified soon after they are read for sending to the
| backups.
|
| [2] The hashes are sent to the intermediate, so the comparison
| could be done there and in fact I should probably do that as
| it'll make sending alerts when something seems wrong more
| reliable, but that isn't how I initially set things up and I've
| not done any major renovations in ages.
| bob1029 wrote:
| I think the cleanest, most compelling backup strategies are those
| employed by RDBMS products. [A]sync log replication is really
| powerful at taking any arbitrary domain and making sure it exists
| in the other sites exactly.
|
| You might think this is unsuitable for your photo/music/etc.
| collection, but there's no technical reason you couldn't use the
| database as the primary storage mechanism. SQLite will take you
| to ~281 terabytes with a 64k page size. MSSQL supports something
| crazy like 500 petabytes. The blob data types will choke on your
| 8k avengers rip, but you could store it in 1 gig chunks - There
| are probably other benefits to this anyways.
| rs186 wrote:
| It works in theory, but usability is almost non-existent with
| this approach unless someone creates an app that interacts with
| this database and provides file system-like access to users.
| Any normal human would be better off with Dropbox or Google
| Drive.
|
| Almost like GMail Drive back in the day but worse.
|
| https://en.m.wikipedia.org/wiki/GMail_Drive
| kernc wrote:
| Make your own backup system--is exactly what I did. I felt git
| porcelain had a stable-enough API to accommodate this popular use
| case.
|
| https://kernc.github.io/myba/
| binwiederhier wrote:
| Thank you for sharing. A curious read. I am looking forward to
| the next post.
|
| I've been working on backup and disaster recovery software for 10
| years. There's a common phrase in our realm that I feel obligated
| to share, given the nature of this article.
|
| > "Friends don't let friends build their own Backup and Disaster
| Recovery (BCDR) solution"
|
| Building BCDR is notoriously difficult and has many gotchas. The
| author hinted at some of them, but maybe let me try to drive some
| of them home.
|
| - Backup is not disaster recovery: In case of a disaster, you
| want to be up and running near-instantly. If you cannot get back
| up and running in a few minutes/hours, your customers will lose
| your trust and your business will hurt. Being able to restore a
| system (file server, database, domain controller) with minimal
| data loss (<1 hr) is vital for the survival of many businesses.
| See Recovery Time Objective (RTO) and Recovery Point Objective
| (RPO).
|
| - Point-in-time backups (crash consistent vs application
| consistent): A proper backup system should support point-in-time
| backups. An "rsync copy" of a file system is not a point-in-time
| backup (unless the system is offline), because the system changes
| constantly. A point-in-time backup is a backup in which each
| block/file/.. maps to the same exact timestamp. We typically
| differentiate between "crash consistent backups" which are
| similar to pulling the plug on a running computer, and
| "application consistent backups", which involves asking all
| important applications to persist their state to disk and freeze
| operations while the backup is happening. Application consistent
| backups (which is provided by Microsoft's VSS, as mentioned by
| the author) significantly reduce the chances of corruption. You
| should never trust an "rsync copy" or even crash consistent
| backups.
|
| - Murphy's law is really true for storage media: My parents put
| their backups on external hard drives, and all of r/DataHoarder
| seems to buy only 12T HDDs and put them in a RAID0. In my
| experience, hard drives of all kinds fail all the time (though
| NVMe SSD > other SSD > HDD), so having backups in multiple places
| (3-2-1 backup!) is important.
|
| (I have more stuff I wanted to write down, but it's late and the
| kids will be up early.)
| sebmellen wrote:
| Also if you have a NAS, don't use the same hard drive type for
| both.
| poonenemity wrote:
| Ha. That quote made me chuckle; it reminded me of a performance
| by the band Alice in Chains, where a similar quote appeared.
|
| Re: BCDR solutions, they also sell trust among B2B companies.
| Collectively, these solutions protect billions, if not
| trillions of dollars worth of data, and no CTO in their right
| mind would ever allow an open-source approach to backup and
| recovery. This is primarily also due to the fact that backups
| need to be highly available. Scrolling through a snapshot list
| is one of the most tedious tasks I've had to do as a sysadmin.
| Although most of these solutions are bloated and violate
| userspace like nobody's business, it is ultimately the
| company's reputation that allows them to sell products.
| Although I respect Proxmox's attempt at cornering the Broadcom
| fallout, I could go at length about why it may not be able to
| permeate the B2B market, but it boils down to a simple formula
| (not educational, but rather from years of field experience):
|
| > A company's IT spend grows linearly with valuation up to a
| threshold, then increases exponentially between a certain
| range, grows polynomially as the company invests in vendor-
| neutral and anti-lock-in strategies, though this growth may
| taper as thoughtful, cost-optimized spending measures are
| introduced.
|
| - Ransomware Protection: Immutability and WORM (Write Once Read
| Many) backups are critical components of snapshot-based backup
| strategies. In my experience, legal issues have arisen from
| non-compliance in government IT systems. While "ransomware" is
| often used as a buzzword by BCDR vendors to drive sales, true
| immutability depends on the resiliency and availability of the
| data across multiple locations. This is where the 3-2-1 backup
| strategy truly proves its value.
|
| Would like to hear your thoughts on more backup principles!
| koolba wrote:
| > An "rsync copy" of a file system is not a point-in-time
| backup (unless the system is offline), because the system
| changes constantly. A point-in-time backup is a backup in
| which each block/file/.. maps to the same exact timestamp.
|
| You can do this with some extra steps in between.
| Specifically you need a snapshotting file system like zfs.
| You run the rsync on the snapshot to get an atomic view of
| the file system.
|
| Of course if you're using zfs, you might just want to export
| the actual snapshot at that point.
| sudobash1 wrote:
| Unless you are doing more steps, that is still just a crash
| consistent backup. Better than plain rsync, but still not
| ideal.
| KPGv2 wrote:
| > having backups in multiple places (3-2-1 backup!) is
| important
|
| Yeah and for the vast majority of individual cybernauts, that
| "1" is almost unachievable without paying for a backup service.
| And at that point, why are you doing _any_ of it yourself
| instead of just running their rolling backup + snapshot app?
|
| There isn't a person in the world who lives in a different city
| from me (that "1" isn't protection when there's a tornado or
| flood or wildfire) that I'd ask to run a computer 24/7 and do
| maintenance on it when it breaks down.
| danlitt wrote:
| My solution for this has been to leave a machine running in
| the office (in order to back up my home machine). It doesn't
| really need to be on 24/7, it's enough to turn it on every
| few days just to pull the last few backups.
| justsomehnguy wrote:
| If you aren't at CERN level of data - you can always rent a
| VPS/dedicated server for this.
|
| It's a matter of the value of your data. Or how much it would
| cost you to lose it.
| Spivak wrote:
| > You should never trust an "rsync copy" or even crash
| consistent backups.
|
| This leads you to the secret forbidden knowledge that you only
| need to back up your database(s) and file/object storage.
| Everything else can be, or has to be depending on how strong
| that 'never' is, recreated from your provisioning tools. All
| those Veeam VM backups some IT folks hoard like dragons are
| worthless.
| kijin wrote:
| Exactly. There is no longer any point in backing up an entire
| "server" or a "disk". Servers and disks are created and
| destroyed automatically these days. It's the database that
| matters, and each type of database has its own tooling for
| creating "application consistent backups".
| mekster wrote:
| For regular DB like MySQL/PostgreSQL, just snapshot on zfs
| without thinking.
| binwiederhier wrote:
| Databases these days are pretty resilient to restoring
| from crash consistent backups like that, so yes, you'll
| likely be fine. It's a good enough approach for many
| cases. But you can't be sure that it really recovers.
|
| However, ZFS snapshots alone are not a good enough backup
| if you don't off-site them somewhere else. A
| server/backplane/storage controller could die or corrupt
| your entire zpool, or the place could burn down. Lots of
| ways to fail. You gotta at least zfs send the snapshots
| somewhere.
| binwiederhier wrote:
| This strongly depends on your environment and on your
| RTO/RPO.
|
| Sure, there are environments that have automatically
| deployed, largely stateless servers. Why back them up if
| you can recreate them in an hour or two ;-)
|
| Even then, though, if we're talking about important
| production systems with an RTO of only a few minutes, then
| having a BCDR solution with instant virtualization is worth
| your weight in gold. I may be biased though, given that I
| professionally write BCDR software, hehe.
|
| However, many environments are not like that: There are
| lots of stateful servers out there with bespoke
| configurations, lots of "the customer needed this to be
| that way and it doesn't fit our automation". Having all
| servers backed up the same way gives you peace of mind if
| you manage servers for a living. Being able to just spin up
| a virtual machine of a server and run things from a backup
| while you restore or repair the original system is truly
| magical.
| mekster wrote:
| 3-2-1 analogy is old. We have infinite flexibility on where we
| can put data unlike before cloud servers existed.
|
| I'd at least have file system snapshots locally for easy
| recovery in case of manual mistakes, have it copied at a remote
| location using implementation A and let it snapshot there too,
| copy same amount on another location using implementation B and
| let it snapshot there too, so not only you'd have durability,
| implementation bugs on a backup process can also be mitigated.
|
| zfs is a godsend for this and I use Borg as secondary
| implementation, which seems enough for almost any disasters.
| Shank wrote:
| > Security: I avoid using mainstream cloud storage services like
| Dropbox or Google Drive for primary backups. Own your data!
|
| What does this have to do with security? You shouldn't be backing
| up data in a way that's visible to the server. Use something like
| restic. Do not rely on the provider having good security.
| inopinatus wrote:
| Perhaps Part 1 ought to be headlined, "design the restore
| system", this being the part of backup that actually matters.
| kayson wrote:
| The thing that always gets me about backup consistency is that
| it's impossibly difficult to ensure that application data is in a
| consistent state without bringing everything down. You can create
| a disk snapshot, but there's no guarantee that some service isn't
| mid-write or mid-procedure at the point of the snapshot. So if
| you were to restore the backup from the snapshot you would
| encounter some kind of corruption.
|
| Database dumps help with this, to a large extent, especially if
| the application itself is making the dumps at an appropriate
| time. But often you have to make the dump outside the
| application, meaning you could hit it in the middle of a sequence
| of queries.
|
| Curious if anyone has useful tips for dealing with this.
| booi wrote:
| I think generally speaking, databases are resilient to this so
| taking a snapshot of the disk at any point is sufficient as a
| backup. The only danger is if you're using some sort of on-
| controller disk cache with no battery backup, then basically
| you're lying to the database about what has flushed and there
| can be inconsistencies on "power failure" (i.e. live snapshot).
|
| But for the most part as especially in the cloud, this
| shouldn't be an issue.
| Jedd wrote:
| It's not clear if there are other places that application state
| is being stored, outside your database, that you need to
| capture. Do you mean things like caches? (I'd hope not.)
|
| pg_dump / mysqldump both solve the problem of snapshotting your
| live database _safely_ , but can introduce some bloat /
| overhead you may have to deal with somehow. All pretty well
| documented and understood though.
|
| For larger postgresql databases I've sometimes adopted the
| other common pattern of a read-only replica dedicated for
| backups: you pause replication, run the dump against that
| backup instance (where you're less concerned about how long
| that takes, and what cruft it leaves behind that'll need
| subsequent vacuuming) and then bring replication back.
| Jedd wrote:
| Feels weird to talk about strategy for your backups without
| mentioning RPO, RTO, or even RCO - even though some of those
| concepts are nudged up against in TFA.
|
| Those terms are handy for anyone not familiar with the space to
| go do some further googling.
|
| Also odd to not note the distinction between backups and archives
| - at least in terms, of what users' expectations are around the
| two terms / features - or even mention archiving.
|
| (How fast can I get back to the most recent fully-functional
| state, vs how can I recover a file I was working on last Tuesday
| but deleted last Wednesday.)
| godelski wrote:
| > without mentioning RPO, RTO, or even RCO > Those
| terms are handy for anyone not familiar with the space to go do
| some further googling.
|
| You should probably get people started RPO:
| Recovery Point Objective RTO: Recovery Time Objective
| RCo: Recovery Consistency
|
| I'm pretty sure they aren't mentioned because these aren't
| really necessary for doing self-hosted backups. Do we really
| care much about how fast we recover files? Probably not. At
| least not more than that they exist and we can restore them.
| For a business, yeah, recovery time is critical as that's
| dollars lost.
|
| FWIW, I didn't know these terms until you mentioned them, so
| I'm not an expert. Please correct me if I'm misunderstanding or
| being foolishly naive (very likely considering the previous
| statement). But as I'm only in charge of personal backups,
| should I really care about this stuff? My priorities are that I
| have backups and that I can restore. A long running rsync is
| really not a big issue. At least not for me.
|
| https://francois-encrenaz.net/what-is-cloud-backup-rto-rpo-r...
| Jedd wrote:
| Fair that I should have spelled them out, though my point was
| that TFA touched on some of the considerations that are
| covered by those fundamental and well known concepts / terms.
|
| Knowing the jargon for a space makes it easier to find more
| topical information. Searching on those abbreviations would
| be sufficient, anyway.
|
| TFA talks about the right questions to consider when planning
| backups (but not archives) - eg 'What downtime can I tolerate
| in case of data loss?' (that's your RTO, effectively).
|
| I'd argue the concepts encapsulated in those TLAs - even if
| they sound a bit enterprisey - are important for planning
| your backups, with 'self-hosted' not being an exception _per
| se_ , just having different numbers.
|
| Sure, as you say 'Do we really care about how fast we recover
| files?' - perhaps you don't need things back in an hour, but
| you _do_ have an opinion about how long that should take, don
| 't you?
|
| You also ask 'should I really care about this stuff?'
|
| I can't answer that for you, other than turn it back to 'What
| losses are you happy to tolerate, and what costs / effort are
| you willing to incur to mitigate?'. (That'll give you a rough
| intersection of two lines on your graph.)
|
| This pithy aphorism exists for a good reason : )
| > There are two types of people: those who have lost data,
| > and those who do backups.
| gchamonlive wrote:
| A good time as ever for a shameless plug.
|
| For my archlinux setup, configuration and backup strategy:
| https://github.com/gchamon/archlinux-system-config
|
| For the backup system, I've cooked an automation layer on top of
| borg: https://github.com/gchamon/borg-automated-backups
| topspin wrote:
| I built a disaster recovery system using python and borg. It
| snapshots 51 block devices on a SAN and then uses borg to
| backup 71 file systems from these snapshots. The entire data
| set is then synced to S3. And yes, I've tested the result in a
| offsite: recovering files systems to entirely different block
| storage and booting VMs, so I'm confident that it would work if
| necessary, although not terribly quickly, because the recovery
| automation is complex and incomplete.
|
| I can't share it. But if you contemplate such a thing, it is
| possible, and the result is extremely low cost. Borg is pretty
| awesome.
| firesteelrain wrote:
| I run a system that has multi site replication to multiple
| Artifactory instances all replicating from one single Master to
| all Spokes. Each one can hold up to 2PB. While Artifactory
| supports writing to a backup location, given the size of our
| artifacts, we chose to not have an actual backup. Just live
| replication to five different sites. Never have tried to restore
| or replicate back to main. I am not even sure how that would work
| if the spokes are all "*-cache".
|
| Backend storage for each Artifactory instance is Dell Isilon.
| tomheskinen wrote:
| artifactory is great, and i do something very similar
| KPGv2 wrote:
| It's a very interesting thought experiment. But, all of this and
| at the end of the day you still need to have a computer running
| in a different city 24/7 for a safe backup (floods and tornados
| will mess up your buddy's house five miles away, too). This is
| why, in the end, I settled for paying for a rolling backup
| service.
| udev4096 wrote:
| PBS for proxmox and restic for anything outside is the best
| combo. Super easy to configure and manage
| senectus1 wrote:
| Whats the cheapest place to store offsite backups these days?
|
| I intend to fully encrypt before sending so it _should_ be safe
| from prying eyes from all but the most cashed up nation states
| :-P
| vaylian wrote:
| > "Schrodinger's backups" (i.e., never tested, thus both valid
| and invalid at the same time)
|
| What are some good testing strategies?
| k1t wrote:
| Occasionally you need to try restoring from your backups.
|
| Obviously a full restore gives you full confidence, and it goes
| down from there.
|
| Ideally try to restore about 10% of your content every
| month,but really it depends on how high stakes your backups
| are.
| vaylian wrote:
| Where do you restore to? Do you restore to a spare computer
| or do you restore into some isolated part (folder) of the
| production system from which the backup was originally taken?
|
| And to which extent can this be automated, so that the backup
| gets automatic health checks?
| orhmeh09 wrote:
| Just use restic. It handles these things.
| kbr2000 wrote:
| Dirvish [0] is worth looking at, light-weight and providing a
| good set of functionality (rotation, incremental backups,
| retention, pre/post scripts). It is a scripted wrapper around
| rsync [1] so you profit from all that functionality too (remote
| backups, compression for limited links, metadata/xattr support,
| various sync criteria, etc.)
|
| This has been a lifesaver for 20+ years, thanks to JW Schultz!
|
| The questions/topics in the article go really well along with it.
|
| [0] https://dirvish.org/ [1] https://rsync.samba.org/
| zelphirkalt wrote:
| What does dirvish do better or simpler than rsync?
| kbr2000 wrote:
| It permits you to config more complicated backups more
| easily. You can inherit and override rules, which is handy if
| you need to do for example hundreds of similar style backups,
| with little exceptions. The same with include/exclude
| patterns, quickly gets complicated with just rsync.
|
| It generates indices for its backups that allow you to search
| for files over all snapshots taken (which gives you an
| overview of which snapshots contain some file for you to
| retrieve/inspect). See dirvish-locate.
|
| Does expiration of snapshots, given your retention strategy
| (encoded in rules, see dirvish.conf and dirvish-expire).
|
| It consistently creates long rsync commandlines you would
| otherwise need to do by hand.
|
| In the end you get one directory per snapshot, giving a
| complete view over what got backed up. Unchanged files are
| hard-linked thus limiting backup storage consumption. Changed
| files are stored. But each snapshot has the whole backed up
| structure in it so you could rsync it back at restore time
| (or pick selectively individual files if needed). Hence the
| "virtual".
|
| Furthermore: backup reporting (summary files) which you could
| be piped into an E-mail or turned into a webpage, good and
| simple documentation, pre/post scripts (this turns out to be
| really useful to do DB dumps before taking a backup etc.)
|
| You'll still need to take care of all other aspects of
| designing your backup storage (SAS
| controllers/backplanes/cabling, disks, RAID, LVM2, XFS, ...)
| and networking (10 GbE, switching, routing if needed, ...) if
| you need that (works too for only local though). Used this
| successfully in animation film development as an example,
| where it backed up hundreds of machines and centralized
| storage for a renderfarm, about 2 PBytes worth (with Coraid
| and SuperMicro hardware). Rsync traversing the filesystem to
| find out changes could be challenging at times with enormous
| FS (even based on only the metadata), but for that we created
| other backup jobs that where fed with specific file-lists
| generated by the renderfarm processes, thus skipping the
| search for changes...
| crinkly wrote:
| Lazy solution here that has worked fine forever through a
| complete hardware failure and burglary. Scratch disk inside
| desktop. External disk kept in house. External disk kept off
| site. All external disks are Samsung T7 Shield.
|
| Robocopy /MIR daily to scratch or after I've done something
| significant. Weekly to external disk. Swap external disk offsite
| every 1 month.
| chrisandchris wrote:
| > All external disks are Samsung T7 Shield
|
| And make sure to use at least a different batch, or better a
| different model. Same batch or same model tend to fail at the
| same time (usually if you need to restore data and the disk is
| under heavy load).
| crinkly wrote:
| Not a terrible idea that. Thank you. I will check dates,
| firmware versions and serial numbers to see.
| HankB99 wrote:
| Coincidentally the 2.5 admins podcast just published an episode
| on ZFS basics: Why ZFS https://2.5admins.com/2-5-admins-256/
|
| ZFS relates to backups. In my case (among the many things I like
| about ZFS) is that it preserves hard links which I used to reduce
| the space requirements for my primary `rsync` backup but which
| `rsync` blew up copying to my remote backup. (Yes, there's a
| switch to preserve hard links but it is not sufficiently
| performant for this application.)
|
| (Episode #256 which is a number that resonates with many of us.
| ;) )
___________________________________________________________________
(page generated 2025-07-20 23:02 UTC)