[HN Gopher] Show HN: bef - a tool that encodes/decodes interleav...
___________________________________________________________________
Show HN: bef - a tool that encodes/decodes interleaved erasure
coded streams
Author : gbletr42
Score : 36 points
Date : 2024-03-09 16:57 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| gbletr42 wrote:
| Hello Hacker News! I'm been developing this piece of software for
| about a week now, to serve as a fast and easy to use replacement
| for par2cmdline and zfec. Now that it is in good and presentable
| state, I'm releasing it to the world to get users, feedback,
| testing on architectures that aren't x86[-64], etc. If you have
| any feedback, questions, or find any bugs/problems, do let me
| know.
| speps wrote:
| I'd recommend explaining even a tiny bit what erasure coding
| is. I had to look it up as I didn't know the term. It's really
| cool, explain it yourself, why you're excited about it!
| gbletr42 wrote:
| Sure, erasure coding is a form of error correcting codes that
| can be applied to data such that you can lose some n number
| of codes before you can successfully reconstruct the input
| data. For example, take k input symbols, and put it into an
| erasure code algorithm to get k+n symbols, where any n of the
| output symbols can be lost before you fail to be able to
| reconstruct the data. Symbols in this case can be some number
| of bits/bytes.
|
| This is a really important property in situations where there
| can be big giant bursts of errors, because you can still
| reconstruct the data regardless. IIRC, CDs/DVDs/BDs all use
| two concatenated Reed Solomon (a type of erasure coding)
| coded symbols that are then interleaved with each other,
| which provides the disk protection against things like
| accidental scratches.
| speps wrote:
| Nice! Add that to the README ;)
| gbletr42 wrote:
| Done! Thank you, for it never occurred to me someone
| might stumble upon my software without already knowing (a
| dumb lapse of mind, I know).
| mkeedlinger wrote:
| Hey this is very cool! And something I've looked for multiple
| times before
| alchemist1e9 wrote:
| Super neat!! I currently take encrypted zbackup files par2 split
| them and then seqbox and burn them to M-discs.
|
| What is a better sequence of steps in your opinion?
|
| https://github.com/MarcoPon/SeqBox
| gbletr42 wrote:
| I'm not familiar with zbackup, but from a google search appears
| to be a tool to deduplicate and encrypt data. The process
| envisioned while making this was to use a series of pipes unix
| style to make the backup, e.g.
|
| tar c dir | zstd | gpg -e | bef -c -o backup.tar.zst.gpg.bef
|
| and then to get back that file with the terribly long filename
|
| bef -d -i backup.tar.zst.gpg.bef | unzstd | gpg -d | tar x
| alchemist1e9 wrote:
| Makes sense and is very neat. I can likely replace the par2
| splits with a single file using your tool.
|
| Since your head is in the thick of this problem, I'd
| recommend you look at seqbox and consider implementing sbx
| headers and blocks as an optional container that would give
| you resilience to filesystem corruption. That way your tool
| would be an all in one bitrot safeguard and streaming/pipe
| based!
|
| Regarding zbackup, it's perhaps a bit obscure but extremely
| useful tool for managing data. The way I use it I'm able to
| get both dedup and lazy incremental backups, although with a
| computational cost, but not so significant. The encryption is
| a nice side effect of its implementation that is also handy.
| frutiger wrote:
| Small error in your restore command (given your creation
| command)
|
| > bef -d -i backup.tar.zst.gpg.bef | unzstd | gpg -d | tar x
|
| Should probably be
|
| > bef -d -i backup.tar.zst.gpg.bef | gpg -d | unzstd | tar x
| nehal3m wrote:
| I am genuinely not trying to be a vulgar dingus but in my native
| language (Dutch) this is the verb for orally pleasuring women.
|
| I don't know how that figures into your decision to name it this
| but at least now you're aware.
| gbletr42 wrote:
| I was not, as I am a english speaker with no knowledge of
| Dutch. I find it a funny coincidence, but if I change the name
| bef now it'll mess with my muscle memory.
| nehal3m wrote:
| Fair enough, guess it's not that big a deal. Cheers for
| making it though!
| stragies wrote:
| Somebody could compile the (probably extremely short, if not
| nil) list of pronounceable sequences of phonemes, that are not
| vulgar, sexist, demeaning, insulting, "hurtful", or otherwise
| objectionable in any language.
| myself248 wrote:
| In _any_ language? I suspect that approaches the null set.
|
| ...And then copyright them all. Muhahaha.
| frutiger wrote:
| Do you happen to know its etymology?
| nehal3m wrote:
| Your guess is as good as mine. A quick search proved
| inconclusive, there's a few explanations but they seem
| anecdotal at best. Most point to old uses of the word bef the
| same way we use "muts" (knitted cap) as a metaphor for female
| genitalia today.
| zcw100 wrote:
| This sounds similar to fountain codes
| https://en.m.wikipedia.org/wiki/Fountain_code
| loeg wrote:
| Fountain codes are a class of erasure code, but not a specific
| tool.
| ComputerGuru wrote:
| This is really nice work, great job.
|
| You mention how the parameters are all customizable but I want to
| ask almost the opposite: is there a recommend set of defaults for
| xxx situation that the user can apply, so they _don 't_ have to
| be experts to figure out usage?
|
| e.g. a recommended option for "sharing over the internet" vs
| "burning to a dvd" vs "writing to tape"
|
| (I'm aware that these have their own redundancies/error control,
| but obviously I do not consider them sufficient.)
| gbletr42 wrote:
| Currently there is just one set of defaults, but I could very
| well add multiple different defaults dedicated to certain use
| cases such as the ones you have. I imagine it'd be something
| like this
|
| bef -c --default share -i input -o output, bef -c --default dvd
| -i input -o output, bef -c --default tape -i input -o output,
| etc.
|
| It seems like a good idea and wouldn't exactly be hard to
| implement.
| PhilipRoman wrote:
| Thats a really cool program. Is it possible to do it the opposite
| way - recover data with occasional noise inserted between
| symbols?
| gbletr42 wrote:
| Sadly my tool currently doesn't account for that type of
| corruption, as it doesn't know what is good data or bad data
| when reading. So if bad data is inserted between
| symbols/fragments, rather than corrupting the symbols/fragments
| themselves, the tool would read them naively and exit with an
| error when the hashes don't add up. I'm sure there's a clever
| way of defending against that, but as of this moment I'm not
| entirely certain how best to do so.
___________________________________________________________________
(page generated 2024-03-09 23:00 UTC)