[HN Gopher] Practical Reed-Solomon for Programmers
___________________________________________________________________
Practical Reed-Solomon for Programmers
Author : heydenberk
Score : 178 points
Date : 2021-06-13 09:51 UTC (1 days ago)
(HTM) web link (berthub.eu)
(TXT) w3m dump (berthub.eu)
| crazygringo wrote:
| Are there any uses cases for Reed-Solomon at the application
| level?
|
| My impression was always that error checking was implemented at
| the hardware level, or else at the OS/driver level.
|
| But just curious if there are some applications I'm missing.
| curtisf wrote:
| You may want to be able to run highly resilient databases even
| on non-error-detecting hardware. (Most consumer disks, and even
| a lot of enterprise disks, don't reliably detect/report disk
| corruption)
|
| You might also be building a distributed system, where blocks
| are spread across multiple disks. In that case, an "error"
| you're trying to correct might be the loss of a disk or server
| in the distributed system; there is no one single disk or OS
| responsible for all the relevant blocks in that case.
|
| But, both of these are assuming you're building some kind of
| data-store, which is not a typical user application. Writing
| robust data-stores is hard for many reasons, error correcting
| is just one of them.
| gopalv wrote:
| > uses cases for Reed-Solomon at the application level
|
| Hadoop has an RS implementation inside the filesystem (called
| "erasure coding"), instead of storing 3 copies of the same
| data, it can actually instead store ~1.5 copies as (6+3) or
| (10+4).
|
| Previously, I've run into this tech in satellite internet
| gateways, but distributed filesystems is where I've gone
| through the math & probabilities of failure properly.
|
| I work on perf & the extra network hops (with 3 replicas, you
| read 100% of data local, when you stripe it that doesn't work)
| and math for the error correction are hot spots when you are
| trying to keep all cores busy.
| the_benno wrote:
| I worked on a research project in undergrad that used Reed-
| Solomon -- we showed a series of images to users, who would
| later pick those images out of a set of un-trained images in
| order to authenticate or access some privileged data, sans
| encryption. the error-correcting allowed for some degree of
| misremembering or forgetfulness, as is normal in human image
| recognition tasks.
|
| The professor later spun up a startup that has since been
| acquired by Dropbox, but I'm unaware what kind of product
| they're currently working on.
| mook wrote:
| Parchive 2 was, I believe, an implementation of Reed-Solomon at
| the application level; mostly used (when it was introduced) to
| counter loss from missing chunks on binary newsgroups. I think
| RAR might have picked up a similar feature around that time
| too?
|
| I believe parchive version 1 was simple XOR parity.
| akalin wrote:
| No, parchive v1 also used Reed-Solomon. The main difference
| between v1 and v2 was that v1 worked on the file level, but
| v2 divided the set of files into blocks.
|
| Also, the parity matrix for v1 would be sometimes singular
| (non-invertible), which v2 tried to fix but it didn't quite
| work.
| [deleted]
| sparc24 wrote:
| s3, dropbox - https://dropbox.tech/infrastructure/inside-the-
| magic-pocket
| alphabet9000 wrote:
| i used a reed solomon library recently in a project that
| encodes and decodes binary data stored in a video channel. i
| added it to compensate for compression/noise introduced in the
| re-encoding process on third party sites like youtube. here's
| an example of what it looks like: [0]
|
| [0] https://cicada.jollo.org/xg/sp.mp4
| kevin_thibedeau wrote:
| If you need to transmit data over an unreliable link where
| retransmission isn't practical you will want some sort of FEC.
| If hardware isn't available to do the job then it gets done in
| software.
| cycomanic wrote:
| QR codes use Reed Solomon codes
| coder543 wrote:
| If anyone wants to read more about Reed Solomon, this is the
| article that helped me understand Reed Solomon earlier this year:
| https://innovation.vivint.com/introduction-to-reed-solomon-b...
| ahubert wrote:
| (author here) Nice! Added that to the page, very worthwhile.
| Thanks.
| vmilner wrote:
| This BBC engineering paper is also useful.
|
| https://downloads.bbc.co.uk/rd/pubs/whp/whp-pdf-
| files/WHP031...
| sparc24 wrote:
| These 2 papers helped me better understand some more practical
| aspects of it.
|
| https://mirrors.edge.kernel.org/pub/linux/kernel/people/hpa/...
| https://www.usenix.org/legacy/event/fast08/tech/full_papers/...
| conductor wrote:
| Thanks for the article and code.
|
| Also check out https://github.com/Parchive/par2cmdline
|
| I wish something like this was integrated into a modern and free
| archive format. RAR has it. RAR also got volume recovery: when
| creating split multi-volume archives you can also create recovery
| volumes - each recovery volume can replace _any_ other _one_
| missing volume. It was a life-saver in floppy-disk times.
| h2odragon wrote:
| Amusing: the linked library page, https://www.ka9q.net/code/fec/
| says (from 2007) "The new Intel Macs are not yet supported. Stay
| tuned."
|
| Ignore an issue for long enough, and it stops being an issue :)
| ithkuil wrote:
| There are other erasure codes that go beyond RS. For example LRC
| (Local Reconstruct Codes) is very interesting; it improves on
| reed Solomon by requiring less bandwidth/IO for repair reads.
|
| https://www.usenix.org/conference/atc12/technical-sessions/p...
| mrfusion wrote:
| Is this what tcp/ip uses?
| TonyTrapp wrote:
| No, but Reed-Solomon codes are used on on CDs.
| jandrese wrote:
| TCP doesn't do forward error correction at all. It uses a
| retransmission scheme instead. TCP's error correction is not
| only to insure a consistent datastream, but also to more
| equitably share limited bandwidth.
| rl1987 wrote:
| No. TCP, IP and UDP are all using Internet Checksum (RFC 1071).
| Ethernet uses CRC.
| brandmeyer wrote:
| All of these are techniques for verifying a message. Galileo
| also uses a 24-bit CRC to verify its messages after
| performing forward error correction.
| znpy wrote:
| I went and skimmed the rfc you're citing and it's fro 1988,
| it's old enough to have samples in C code where the
| "register" keyword is still used (afaik nowadays most
| compilers will ignore it) along with assembly code for
| motorola 68020, Cray CPUs and the IBM-370.
|
| EDIT: for reference:
| https://datatracker.ietf.org/doc/html/rfc1071.html
| stagger87 wrote:
| Forward error correction is usually at the PHY layer, not the
| transport layer. So the more appropriate question is, does
| 10GbE/100GbE/WLAN/etc use RS coding? (and yes some of them do.)
| nayuki wrote:
| Mathematical derivation and runnable code for a Reed-Solomon
| error-correcting code decoder: https://www.nayuki.io/page/reed-
| solomon-error-correcting-cod...
| IAmLiterallyAB wrote:
| What is the difference between RS and Hamming codes?
| rurban wrote:
| Hamming Codes can only recover from 2 wrong bits per byte, with
| RS it's configurable.
| moreati wrote:
| A semi-related question/puzzle. Is it possible to reverse
| engineer parameters of a Reed Solomoon function?
|
| Given one or more known inputs/outputs of a Reed Solomon function
| with known symbol size, message length, number of parity symbols.
| Is it possible to calculate the other parameters (polynomial
| coefficients, primitive element etc)?
|
| I failed to find an answer a few years ago, when trying to
| interoperate with chirp.io (now gone).
| https://math.stackexchange.com/questions/663643/discover-par...
| xorvoid wrote:
| Doesn't seem too hard actually, given enough example input. The
| parity is just a linear combination of the input so you can set
| up a linear system and solve for those things encoding matrix
| rows pretty easily. From there you may need to make some
| assumptions about what type of matrix was used (e.g.
| vandermonde is popular) and perhaps the field size being used
| (e.g. GF(2^8), and with those, it should be easy to determine
| the reducing polynomial. Some of these choices are small enough
| to be able to brute force (finite fields used for practical
| problems tend to be small, naturally). If you have a dataset, I
| might actually toy with it myself.. sounds fun.
| moreati wrote:
| Thanks. I have 30 examples I got from the API, many moons ago
| https://github.com/moreati/chirppy/blob/master/examples.json.
| My finite field knowledge is not up to solving it
| analytically, my attempts to brute force it are in
| https://github.com/moreati/chirppy/blob/master/trials.py.
| That includes functions to go between the characters, and
| integer representations.
| xrd wrote:
| I loved this article and going down the rabbit hole with the
| associated links was excited to see a discussion from Russ Cox
| (golang) about how QR codes rely on RS (and another great
| article)
|
| https://research.swtch.com/qart
| boredprograming wrote:
| Just to add to this topic, usually Reed Solomon codes are
| interleaved with Turbo codes or LDPC.
|
| The reason is this: Reed Solomon corrects entire symbols, but can
| only correct so many symbols per block. LDPC and Turbo correct
| single bit errors. If you have a lot of single bit errors, mixed
| with completely corrupt blocks, it's best to use both.
|
| Interleaving usually mixes the bits around too, to spread long
| chains of errors out between symbols.
|
| This setup is used in digital radio, 3G, CD and DVD, DSL, etc
| Tcepsa wrote:
| At the risk of oversimplifying, is there any value in the
| analogy, "It's like RAID but for data streams"?
| frabert wrote:
| Problem with that is, as the article states, RAID6 _can_ use RS
| as its error correction method, which turns this description
| into something a bit too circular for my taste
___________________________________________________________________
(page generated 2021-06-14 23:00 UTC)