[HN Gopher] History of lossless data compression algorithms (2014)
___________________________________________________________________
History of lossless data compression algorithms (2014)
Author : gscott
Score : 102 points
Date : 2022-06-29 16:29 UTC (6 hours ago)
(HTM) web link (ethw.org)
(TXT) w3m dump (ethw.org)
| bumper_crop wrote:
| Very timely; I found out yesterday that the UPX program (EXE
| specific compressor from many years ago) was made by the same
| guys who made LZO. I had this realization that there is a progeny
| from people who write compression stuff. In a similar vein, The
| Rob Pike post from yesterday mentioned Ken Thompson had made a
| custom audio compression for the Plan 9 release. He also made the
| UTF-8 compressor too. I love seeing how these people keep
| refining the state of the art.
| Jerrrry wrote:
| The thoughts around information theory, entropy, compression,
| emergent behavior, and the intersectional limits of all of them
| are quite reminiscent and memetic to those who are interested.
|
| Complexity is fuckin cool.
| a-dub wrote:
| you may enjoy this text:
| http://www.inference.org.uk/itila/book.html
| Jerrrry wrote:
| thank you, saved
| mtn_nerd wrote:
| How could they not include the middle-out algorithm?
| optimalsolver wrote:
| Data compression is apparently equivalent to general
| intelligence:
|
| http://mattmahoney.net/dc/rationale.html
|
| ...which gives way to the theory of mind known as compressionism:
|
| http://ceur-ws.org/Vol-1419/paper0045.pdf
| Dylan16807 wrote:
| That's a factor to some extent, but if you're sticking with
| lossless then the better your compression gets the more your
| remaining data is dominated by noise.
| gfody wrote:
| that's why you need two brains, one to losslessly compress
| through symbolic expressiveness and another to lossily
| compress by generating noise like the incompressible noise
| anonGone73 wrote:
| The Head and the Heart
| joe_the_user wrote:
| Perfect compression is equivalent to finding the Kolmogorov
| complexity of a string. That's incomputable and so perfect
| compression is beyond any algorithm, even a "generally
| intelligent" algorithm. Generality in compression might
| increase an algorithms effectiveness but there's no proof this
| generality is equivalent to generality in dealing with the
| world humans deal with.
| motohagiography wrote:
| Is the theoretical limit to compression basically what
| Kolmolgorov complexity is, or is there another
| shannon/information theory theorem that defines the maximum
| possible lossless compression in terms of symbols over a channel,
| where the channel represents the algorithm? ("compression
| theorem" refers to something else more abstract I think.)
|
| Intuitively it seems like there would have to be one for lossless
| compression, and a proof that a theorem for the limits of lossy
| compression was either impossible or an open loop/Hard problem.
| joe_the_user wrote:
| Yes, the Kolmolgorov complexity of a string X is the minimum
| length of a program /string that when fed into an efficient
| universal computer will yield back the string X. The
| Kolmolgorov complexity of a string is also undecideable - it's
| essentially equivalent to the halting problem.
|
| "Best Lossy Compression" is ambiguous - how much loss is
| acceptable is isn't defined in the phrase.
| natly wrote:
| Not mentioned here is Fabrice Bellards record breaking (at an
| enormous compute cost) compressor https://bellard.org/nncp/.
| lucb1e wrote:
| What does "ratio (bpb)" mean? I'd guess bytes-per-byte or
| something, like how many bytes of original you get for each
| byte of compression, but it doesn't work out: the original size
| is 1e9 bytes, compressed (rounded) 3.2e8, so that's a ratio of
| 3.1 (1e9/3.2e8=3.09989). The program size amounts to a rounding
| error on that figure. The bpb value given is 2.58, nowhere near
| 3.1.
|
| Edit: the paper defines it as "bits per input byte". What kinda
| measure is that, it's like "how well did it compress as
| compared to a factor 8", why 8?!
| isaacimagine wrote:
| Given 8 bits of input, how many bits does it take to
| reconstruct the original 8, on average?
|
| 8 / 2.58 = 3.1
| dang wrote:
| Discussed at the time:
|
| _History of Lossless Data Compression Algorithms_ -
| https://news.ycombinator.com/item?id=8088842 - July 2014 (44
| comments)
| harveywi wrote:
| Grammar-based compression algorithms are completely missing from
| this "history".
| duskwuff wrote:
| I'm not sure grammar-based compressors are a meaningful part of
| the history of lossless data compression. They're a topic in
| coding theory research, but, as far as I'm aware, they haven't
| had a significant impact in practical algorithms used in the
| field.
| lucb1e wrote:
| Haven't heard of that before. Typing it into Wikipedia, I got
| https://en.wikipedia.org/wiki/Grammar-based_code which explains
| it in so much as "To compress a data sequence x=x[0..n], a
| grammar-based code transforms x into a context-free grammar G."
| (Very helpful, love it when they use pseudocode with single-
| letter recursive variable definitions which needs accompanying
| text for it to make sense... and the text isn't there.) There
| is an example given on the right in image form, which looks
| like recursive dictionary-based string substitution:
| "We hold these truths" compressed: we_holdQ5ths
| Q=Rse we_holdRse_5ths R=Se we_holdSese_5ths
| S=_th we_hold_these_5ths 5=tru we_hold_these_truths
|
| This method isn't described anywhere nor was the original
| included, but this seems to be the algorithm used.
| kloch wrote:
| Does anyone else remember a joke/malware DOS program that
| circulated on BBS's in the very early 1990's that promised 98%
| compression of any file?
|
| It just shortened the size in the FAT table (or similar gimmick
| that would result in data loss as you wrote new data to the
| disk)...
| lizardactivist wrote:
| This article is relevant and intersects with some of the
| algorithms in the topic link:
|
| https://oku.edu.mie-u.ac.jp/~okumura/compression/history.htm...
| ctur wrote:
| This seems pretty incomplete if it doesn't include anything about
| brotli, lz4, or zstd, or Finite State Entropy in general. It's
| basically missing the past decade or more.
|
| There's a lot of good history here, but just be aware it is
| missing the state of the art and thus shouldn't be used for
| deciding anything about algorithms or approaches to use today.
| _delirium wrote:
| > It's basically missing the past decade or more.
|
| The page history [1] shows that the article was mostly written
| in December 2011, with only minor edits since then, so that
| sounds about right.
|
| [1]
| https://ethw.org/w/index.php?title=History_of_Lossless_Data_...
| lucb1e wrote:
| Does that mean this is also outdated? I'm not sure if zstd,
| brotli, etc. are PAQ-like
|
| > on the bleeding edge of archival software are the PAQ*
| formats
|
| For "a history of" I don't mind if they don't include the
| latest decade which is still subject to interpretation for how
| to structure the text and emphasize key moments/inventions and
| such, but if they make statements pertaining to the present
| that's a different story.
| felixhandte wrote:
| No widely deployed compressors are PAQ-like. The speed-up
| anticipated by the OP never materialized.
___________________________________________________________________
(page generated 2022-06-29 23:00 UTC)