[HN Gopher] Kopia - Fast and Secure Open-Source Backup
___________________________________________________________________
Kopia - Fast and Secure Open-Source Backup
Author : chetangoti
Score : 132 points
Date : 2021-06-11 11:40 UTC (11 hours ago)
(HTM) web link (kopia.io)
(TXT) w3m dump (kopia.io)
| encryptluks2 wrote:
| Can I get some recent performance benchmarks vs restic? I only
| recently switched to restic and for the most part it is great
| (definitely an improvement from CrashPlan IMO), but it seems like
| this might actually have a few improvements to make it worth
| considering switching from restic.
| ntolia wrote:
| I ran some performance benchmarks not too long ago.
| https://blog.kasten.io/benchmarking-kopia-architecture-scale...
| brimstedt wrote:
| Thanks.
|
| It would have been interesting with restore times as well.
|
| Quite important if it will take you an hour or a day to
| restore..
|
| L
| rkalla wrote:
| Any discussion about backup should always include a shout out to
| one of the most beautiful versions of this: bvckup2
|
| About 10 years ago the founder just decided to start writing the
| most streamlined, beautiful pieces of software he could -
| obsession around NTFS nuances to improve performance and reduce
| overhead to an unbelievable degree.
|
| 10 years later, it's still him (and maybe a few other folks) and
| the software is unbelievably polished.
| dsego wrote:
| On Mac the most polished is Bombich's Carbon Copy Cloner, it's
| a thing of beauty. I might go back to Mac just for that piece
| of software alone.
| Jiejeing wrote:
| I believe you about polish and performance, but that solution
| is neither open-source nor cross-platform (unless I failed to
| find it on the website).
| lucb1e wrote:
| Nor free. The download button says a 2-week trial is
| included.
|
| Which is okay, but licensing is enough of a hassle when
| restic etc. exist that I'm not going to bother with that for
| my systems. A design goal of restic (sorry I'm just familiar
| with that one, not affiliated with it) is also
| recoverability: if your repository gets horribly corrupted or
| you can't run the software easily anymore then the author
| wanted to be able to recover things still. One of the early
| talks (at a local CCC) explains how to decrypt things
| manually in a few minutes -- obviously he knows what he's
| doing and will be faster than me, but still. Having closed
| source software as an alternative to that... I dunno.
| Jiejeing wrote:
| Yeah, I'm not complaining and it's entirely fair to sell
| your work, but I'm always wary of relying on a closed-
| source (+ licensed) tool for things as critical as backups.
| I may be partial though, having interacted with the Borg
| author in-person a few times at Congress, which convinced
| me I could trust it to not shred my data.
| poronski wrote:
| Clickable - https://www.bvckup2.com
| MikusR wrote:
| It has a gui and works on Linux/Windows/MacOS. Also has both
| deduplication and compression.
| _def wrote:
| Finally! I've been waiting patiently for a open source
| x-platform solution that ticks those boxes. edit: ah,
| nevermind. I just tried it and I'm sure it's great for online
| backups. But not so well suited for backups on a plain usb hdd.
| nsriv wrote:
| Pretty sure it allows and works with local USB storage as
| well, covered in documentation here:
| https://kopia.io/docs/repositories/#local-storage.
| forgotpwd16 wrote:
| As a trivia, kopia means copy in Polish which is founder's mother
| tongue.
| kemonocode wrote:
| Interesting, I'd be willing to give it a try after I had to
| abandon Duplicati as it just seemed like a lost cause. Right now
| my backup setup consists on a UrBackup server my machines connect
| to, and a borg repository that is synced against the latest
| backup and then sent remotely to S3. It works, but could
| definitively use some streamlining...
| tut-urut-utut wrote:
| Looks interesting. I always liked the borg concept, but never
| actually used it for backups because it lacks windows support and
| a nice gui.
|
| Will give it a try.
| PopeRigby wrote:
| Borg does have a GUI: https://vorta.borgbase.com/
|
| Although it's subjective if you find it nice or not.
| AaronFriel wrote:
| How does this compare to Duplicacy in terms of throughput?
| przemub wrote:
| I love the new tendency to use Polish (and other languages')
| names for programs!
| pivic wrote:
| Indeed! I'm from Sweden, where 'kopia' is Swedish for the
| English word 'copy'. Same meaning in Polish?
| DominikD wrote:
| It means both "copy" and "lance" (as in: weapon used for
| jousting), hence the logo.
| Tade0 wrote:
| The other day I heard someone use the word "zuk" referring to a
| bug, and I think it fits amazingly well.
| coolspot wrote:
| Is it pronounced like Russian "Zhuk"?
| Tade0 wrote:
| Yes, exactly.
| scns wrote:
| https://github.com/qarmin/czkawka
|
| Means hiccup
| lucb1e wrote:
| Another one? Ten years ago I had trouble finding anything, then
| over time I learned of duplicity (2002?!1), bup (2010), restic
| (2015), borg (2010)... all basically solving the same problem of
| encrypted incremental backups.
|
| The landing page doesn't mention why they made yet another
| solution. In the comments someone also mentions bvckup2. Is there
| an overview somewhere of all the different solutions? Any selling
| points here?
|
| 1 That's way older than I expected, but it's also the first one I
| found and I stopped using it because it took many gigabytes on my
| then-250GB SSD for a local cache just to be able to do
| incremental backups. Maybe that's why I had the feeling I
| couldn't find anything good at the time.
| MikusR wrote:
| It has a gui and works on Linux/Windows/MacOS. Also has both
| deduplication and compression.
| lucb1e wrote:
| Thanks!
|
| Still a bit strange to me that they started a whole new
| project rather than contribute patches or fork something
| else.
|
| The only one I'm fairly familiar with is restic. It also
| compiles for the major OSes and has deduplication. For
| compression, I think there are patches available, but
| alternatively they could just have contributed one. That
| leaves tying the command-line interface to a few buttons in a
| GUI.
|
| Edit: Kopia turns out to be from 2016. I guess the author
| didn't know of the others yet, or the others weren't as
| mature. This makes a lot more sense, somehow the .io domain
| and my first hearing of it only now made me expect this was
| written recently.
| MikusR wrote:
| Restic issues for adding GUI and compression are 7 years
| old. If if there was interest/"easy to do" for implementing
| them that would have been done.
| lucb1e wrote:
| What I meant is that the author of Kopia could have done
| that and made their life a lot easier compared to
| starting all the way from scratch. But I posted that
| before realizing that Kopia is barely a year younger than
| Restic.
|
| But of course it could also just be what u/poronski said
| in a sibling comment. His Noodly holiness knows I make a
| lot of software that already exists just because I enjoy
| the making and having it customized. In fact I think I
| also started... let me `stat` that directory... yup, in
| 2016 I started working on (and abandoned) my own
| implementation of encrypted backups. Also because online
| storage prices were through the roof (~40x the hardware
| cost price with servers and bandwidth included) and I
| thought I could do that cheaper.
| poronski wrote:
| To each their own.
|
| For a lot of people it's way more fun to make a new thing
| than to fork and patch someone else's.
| rglover wrote:
| Easy on the gas, bud.
| klodolph wrote:
| I have been using Kopia for some time now after switching from
| Duplicity. Very happy with Kopia. You can just point Kopia at a
| GCS or S3 bucket and shove files there. Easy to restore files.
| You can list snapshots and files and do partial restores fairly
| easily. Old data gets expired on a timeline that you dictate.
|
| Duplicity was a pain by comparison. I think Duplicity has a
| number of design flaws that become evident once you use it for a
| while.
| foolinaround wrote:
| how would this type of software compare against Dropbox /
| NextCloud?
| jbnorth wrote:
| They're fundamentally different in the problems they solve.
| Dropbox is a cloud file storage and syncing service. NextCloud
| is an open source alternative to something like Dropbox in that
| it offers file sharing and syncing but also much more on top of
| that. It's really closer to something like the Google suite of
| personal cloud services with Google Drive, Photos, Contacts,
| and Calendar. Kopia is a backup solution for the files on your
| computer. You can use cloud file storage providers as the
| destination for these backups but it doesn't handle the storage
| of the backups itself. You have to provide that storage to it.
| z77dj3kl wrote:
| The only thing I need (and is sorely missing from Restic) is that
| the metadata be kept separate from the actual data. That way I
| can store the data in AWS S3 Deep Glacier at a cost of nothing
| per year, and still do incremental backups. Currently the
| architecture of Restic for instance requires all data to be
| quickly and cheaply accessible; which makes it impossible for
| this.
|
| I have terabytes of data that I'd be happy to dump encrypted and
| compressed in Deep Glacier and happy to pay $500 to retrieve if I
| were to mess up my hard drives, but otherwise don't want to pay
| for the costs of normal S3.
|
| Does Kopia separate metadata from the actual encrypted/compressed
| blobs?
| gingerlime wrote:
| I wonder how it compares to restic or borg. Besides the gui
| anyway...
| benrockwood wrote:
| Looks like a polished restic to me.
| isbvhodnvemrwvn wrote:
| Polished*
| StavrosK wrote:
| A hard joke to get, but I liked it.
| ntolia wrote:
| You can see some of the performance differences here -
| https://blog.kasten.io/benchmarking-kopia-architecture-scale...
| jeremyw wrote:
| Note this compares an older restic version that doesn't
| include the order of magnitude improvements in cloud
| communication.
| Scaevolus wrote:
| Restic doesn't support compressing backups, and kopia does.
| Otherwise, the architectures appear to be very similar.
| lucb1e wrote:
| How much space do you typically save by compressing these
| days? Given that even smallish things like documents are
| already compressed archives, pictures/audio/movies of course
| already have heavy purpose-specific compression.
|
| The main things I can still think of that are sparse on
| purpose are database files and disk images (not very
| mainstream, but also not uncommon). So like, a few gigabytes
| per terabyte (a few promille) unless you're really heavy on
| either databases or virtual machines?
|
| I can see why one would like to enable it, but deduplication
| (which breaks if you naively implement compression, iirc
| that's why restic hasn't yet implemented it) is much more
| worth it because it enables incremental backups and you
| don't, for example, have to worry about making a copy of
| another system that has many of the same files (think game
| files or system files).
| wmf wrote:
| I assume compression works well for source code and other
| developer artifacts. Obviously you have to do dedupe, then
| compression, then encryption.
| lucb1e wrote:
| Source code compresses very well indeed, but my hunch is
| that it's peanuts. Let's see, I've got a projects
| directory with various projects from the past decade (all
| custom, there's a separate dir for downloaded
| repositories). I've mostly written things in Python and
| PHP (the JS/CSS/HTML stuff is on a server mixed with
| things like owncloud or SMF or so; harder to isolate).
|
| PHP: 591 KiB, 196 files, 13'813 LOC, 1'236 comment lines.
|
| Python: 345 KiB, 136 files, 7'760 LOC, 930 comment lines.
|
| If someone spends 5 minutes of developer time trying to
| compress that to save disk space, that's already not
| worth it. Also in huge projects, the actual code is not
| going to be taking gigabytes of space. And if you mean in
| git history: that is, again, already compressed.
|
| Other developer artifacts: I've got 26 GiB of project
| directories, this time including downloaded software and
| it will also include binaries (hashcat and jtr are in
| there, I wouldn't be surprised if there's also a medium-
| sized dictionary or two). Doing tar c . does not seem to
| add much overhead (26.5 GiB). Compressing that stream
| with pigz -1 (multithreaded gzip) brings it down to 17
| GiB.
|
| 35% off is better than I thought! I wonder which files
| compress so well, hmm let me `find -type f | shuf | head
| -9001 | while read line; do echo "$(($(wc -c
| <"$line")-$(<"$line" pigz -1 | wc -c))) $line"; done |
| sort -n`... The largest difference is a huge 121 MiB
| binary that compresses down to 36 MiB. I didn't know
| these files were so sparse (not a C(++) dev),
| interesting!
|
| While I'm looking into this, let's also look at my
| "documents" directory. It's 43 GiB and compresses down to
| 33 GiB. Not as good, but still worth it, more than I
| thought! (And this compression isn't good, but probably
| not more than 10% worse before the compression gets
| impractically slow.) It might not quite get a total
| backup size down by a disk size (e.g. not 1T down to
| 500G), but it definitely allows to keep more history
| before having to worry about what you want to keep and
| what you want to toss.
| aidenn0 wrote:
| For local backups, compression is less of an issue; for
| things that are compressible, transparent file system
| compression seems to get about 110% the size of what I would
| get by any non-CPU bound levels of compression using a tgz.
| Since (as others in this thread have noted), compressible
| files tend to also be smaller files (the only exception I can
| think of would be if your log rotation doesn't compress old
| logs), the fact that only a fraction of what I backup is 10%
| larger is kind of "okay." When you're sending across the
| network though it can be a big deal.
| MikusR wrote:
| On disk the backups are encrypted, that means no
| transparent compression.
| isbvhodnvemrwvn wrote:
| For people who haven't dealt with this - a good
| encryption scheme produces output which you can't tell
| apart from a purely random stream of bits - it has very
| high entropy, and is therefore not compressible.
| poronski wrote:
| Gotta say kopia does have a very strong restic vibe to it.
|
| Not a bad thing, just means that restic managed to get lots of
| things right.
___________________________________________________________________
(page generated 2021-06-11 23:01 UTC)