[HN Gopher] Show HN: bef - a tool that encodes/decodes interleav...
       ___________________________________________________________________
        
       Show HN: bef - a tool that encodes/decodes interleaved erasure
       coded streams
        
       Author : gbletr42
       Score  : 36 points
       Date   : 2024-03-09 16:57 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | gbletr42 wrote:
       | Hello Hacker News! I'm been developing this piece of software for
       | about a week now, to serve as a fast and easy to use replacement
       | for par2cmdline and zfec. Now that it is in good and presentable
       | state, I'm releasing it to the world to get users, feedback,
       | testing on architectures that aren't x86[-64], etc. If you have
       | any feedback, questions, or find any bugs/problems, do let me
       | know.
        
         | speps wrote:
         | I'd recommend explaining even a tiny bit what erasure coding
         | is. I had to look it up as I didn't know the term. It's really
         | cool, explain it yourself, why you're excited about it!
        
           | gbletr42 wrote:
           | Sure, erasure coding is a form of error correcting codes that
           | can be applied to data such that you can lose some n number
           | of codes before you can successfully reconstruct the input
           | data. For example, take k input symbols, and put it into an
           | erasure code algorithm to get k+n symbols, where any n of the
           | output symbols can be lost before you fail to be able to
           | reconstruct the data. Symbols in this case can be some number
           | of bits/bytes.
           | 
           | This is a really important property in situations where there
           | can be big giant bursts of errors, because you can still
           | reconstruct the data regardless. IIRC, CDs/DVDs/BDs all use
           | two concatenated Reed Solomon (a type of erasure coding)
           | coded symbols that are then interleaved with each other,
           | which provides the disk protection against things like
           | accidental scratches.
        
             | speps wrote:
             | Nice! Add that to the README ;)
        
               | gbletr42 wrote:
               | Done! Thank you, for it never occurred to me someone
               | might stumble upon my software without already knowing (a
               | dumb lapse of mind, I know).
        
       | mkeedlinger wrote:
       | Hey this is very cool! And something I've looked for multiple
       | times before
        
       | alchemist1e9 wrote:
       | Super neat!! I currently take encrypted zbackup files par2 split
       | them and then seqbox and burn them to M-discs.
       | 
       | What is a better sequence of steps in your opinion?
       | 
       | https://github.com/MarcoPon/SeqBox
        
         | gbletr42 wrote:
         | I'm not familiar with zbackup, but from a google search appears
         | to be a tool to deduplicate and encrypt data. The process
         | envisioned while making this was to use a series of pipes unix
         | style to make the backup, e.g.
         | 
         | tar c dir | zstd | gpg -e | bef -c -o backup.tar.zst.gpg.bef
         | 
         | and then to get back that file with the terribly long filename
         | 
         | bef -d -i backup.tar.zst.gpg.bef | unzstd | gpg -d | tar x
        
           | alchemist1e9 wrote:
           | Makes sense and is very neat. I can likely replace the par2
           | splits with a single file using your tool.
           | 
           | Since your head is in the thick of this problem, I'd
           | recommend you look at seqbox and consider implementing sbx
           | headers and blocks as an optional container that would give
           | you resilience to filesystem corruption. That way your tool
           | would be an all in one bitrot safeguard and streaming/pipe
           | based!
           | 
           | Regarding zbackup, it's perhaps a bit obscure but extremely
           | useful tool for managing data. The way I use it I'm able to
           | get both dedup and lazy incremental backups, although with a
           | computational cost, but not so significant. The encryption is
           | a nice side effect of its implementation that is also handy.
        
           | frutiger wrote:
           | Small error in your restore command (given your creation
           | command)
           | 
           | > bef -d -i backup.tar.zst.gpg.bef | unzstd | gpg -d | tar x
           | 
           | Should probably be
           | 
           | > bef -d -i backup.tar.zst.gpg.bef | gpg -d | unzstd | tar x
        
       | nehal3m wrote:
       | I am genuinely not trying to be a vulgar dingus but in my native
       | language (Dutch) this is the verb for orally pleasuring women.
       | 
       | I don't know how that figures into your decision to name it this
       | but at least now you're aware.
        
         | gbletr42 wrote:
         | I was not, as I am a english speaker with no knowledge of
         | Dutch. I find it a funny coincidence, but if I change the name
         | bef now it'll mess with my muscle memory.
        
           | nehal3m wrote:
           | Fair enough, guess it's not that big a deal. Cheers for
           | making it though!
        
         | stragies wrote:
         | Somebody could compile the (probably extremely short, if not
         | nil) list of pronounceable sequences of phonemes, that are not
         | vulgar, sexist, demeaning, insulting, "hurtful", or otherwise
         | objectionable in any language.
        
           | myself248 wrote:
           | In _any_ language? I suspect that approaches the null set.
           | 
           | ...And then copyright them all. Muhahaha.
        
         | frutiger wrote:
         | Do you happen to know its etymology?
        
           | nehal3m wrote:
           | Your guess is as good as mine. A quick search proved
           | inconclusive, there's a few explanations but they seem
           | anecdotal at best. Most point to old uses of the word bef the
           | same way we use "muts" (knitted cap) as a metaphor for female
           | genitalia today.
        
       | zcw100 wrote:
       | This sounds similar to fountain codes
       | https://en.m.wikipedia.org/wiki/Fountain_code
        
         | loeg wrote:
         | Fountain codes are a class of erasure code, but not a specific
         | tool.
        
       | ComputerGuru wrote:
       | This is really nice work, great job.
       | 
       | You mention how the parameters are all customizable but I want to
       | ask almost the opposite: is there a recommend set of defaults for
       | xxx situation that the user can apply, so they _don 't_ have to
       | be experts to figure out usage?
       | 
       | e.g. a recommended option for "sharing over the internet" vs
       | "burning to a dvd" vs "writing to tape"
       | 
       | (I'm aware that these have their own redundancies/error control,
       | but obviously I do not consider them sufficient.)
        
         | gbletr42 wrote:
         | Currently there is just one set of defaults, but I could very
         | well add multiple different defaults dedicated to certain use
         | cases such as the ones you have. I imagine it'd be something
         | like this
         | 
         | bef -c --default share -i input -o output, bef -c --default dvd
         | -i input -o output, bef -c --default tape -i input -o output,
         | etc.
         | 
         | It seems like a good idea and wouldn't exactly be hard to
         | implement.
        
       | PhilipRoman wrote:
       | Thats a really cool program. Is it possible to do it the opposite
       | way - recover data with occasional noise inserted between
       | symbols?
        
         | gbletr42 wrote:
         | Sadly my tool currently doesn't account for that type of
         | corruption, as it doesn't know what is good data or bad data
         | when reading. So if bad data is inserted between
         | symbols/fragments, rather than corrupting the symbols/fragments
         | themselves, the tool would read them naively and exit with an
         | error when the hashes don't add up. I'm sure there's a clever
         | way of defending against that, but as of this moment I'm not
         | entirely certain how best to do so.
        
       ___________________________________________________________________
       (page generated 2024-03-09 23:00 UTC)