hngopher.com

       [HN Gopher] History of lossless data compression algorithms (2014)
       ___________________________________________________________________
        
       History of lossless data compression algorithms (2014)
        
       Author : gscott
       Score  : 102 points
       Date   : 2022-06-29 16:29 UTC (6 hours ago)
        
 (HTM) web link (ethw.org)
 (TXT) w3m dump (ethw.org)
        
       | bumper_crop wrote:
       | Very timely; I found out yesterday that the UPX program (EXE
       | specific compressor from many years ago) was made by the same
       | guys who made LZO. I had this realization that there is a progeny
       | from people who write compression stuff. In a similar vein, The
       | Rob Pike post from yesterday mentioned Ken Thompson had made a
       | custom audio compression for the Plan 9 release. He also made the
       | UTF-8 compressor too. I love seeing how these people keep
       | refining the state of the art.
        
         | Jerrrry wrote:
         | The thoughts around information theory, entropy, compression,
         | emergent behavior, and the intersectional limits of all of them
         | are quite reminiscent and memetic to those who are interested.
         | 
         | Complexity is fuckin cool.
        
           | a-dub wrote:
           | you may enjoy this text:
           | http://www.inference.org.uk/itila/book.html
        
             | Jerrrry wrote:
             | thank you, saved
        
       | mtn_nerd wrote:
       | How could they not include the middle-out algorithm?
        
       | optimalsolver wrote:
       | Data compression is apparently equivalent to general
       | intelligence:
       | 
       | http://mattmahoney.net/dc/rationale.html
       | 
       | ...which gives way to the theory of mind known as compressionism:
       | 
       | http://ceur-ws.org/Vol-1419/paper0045.pdf
        
         | Dylan16807 wrote:
         | That's a factor to some extent, but if you're sticking with
         | lossless then the better your compression gets the more your
         | remaining data is dominated by noise.
        
           | gfody wrote:
           | that's why you need two brains, one to losslessly compress
           | through symbolic expressiveness and another to lossily
           | compress by generating noise like the incompressible noise
        
             | anonGone73 wrote:
             | The Head and the Heart
        
         | joe_the_user wrote:
         | Perfect compression is equivalent to finding the Kolmogorov
         | complexity of a string. That's incomputable and so perfect
         | compression is beyond any algorithm, even a "generally
         | intelligent" algorithm. Generality in compression might
         | increase an algorithms effectiveness but there's no proof this
         | generality is equivalent to generality in dealing with the
         | world humans deal with.
        
       | motohagiography wrote:
       | Is the theoretical limit to compression basically what
       | Kolmolgorov complexity is, or is there another
       | shannon/information theory theorem that defines the maximum
       | possible lossless compression in terms of symbols over a channel,
       | where the channel represents the algorithm? ("compression
       | theorem" refers to something else more abstract I think.)
       | 
       | Intuitively it seems like there would have to be one for lossless
       | compression, and a proof that a theorem for the limits of lossy
       | compression was either impossible or an open loop/Hard problem.
        
         | joe_the_user wrote:
         | Yes, the Kolmolgorov complexity of a string X is the minimum
         | length of a program /string that when fed into an efficient
         | universal computer will yield back the string X. The
         | Kolmolgorov complexity of a string is also undecideable - it's
         | essentially equivalent to the halting problem.
         | 
         | "Best Lossy Compression" is ambiguous - how much loss is
         | acceptable is isn't defined in the phrase.
        
       | natly wrote:
       | Not mentioned here is Fabrice Bellards record breaking (at an
       | enormous compute cost) compressor https://bellard.org/nncp/.
        
         | lucb1e wrote:
         | What does "ratio (bpb)" mean? I'd guess bytes-per-byte or
         | something, like how many bytes of original you get for each
         | byte of compression, but it doesn't work out: the original size
         | is 1e9 bytes, compressed (rounded) 3.2e8, so that's a ratio of
         | 3.1 (1e9/3.2e8=3.09989). The program size amounts to a rounding
         | error on that figure. The bpb value given is 2.58, nowhere near
         | 3.1.
         | 
         | Edit: the paper defines it as "bits per input byte". What kinda
         | measure is that, it's like "how well did it compress as
         | compared to a factor 8", why 8?!
        
           | isaacimagine wrote:
           | Given 8 bits of input, how many bits does it take to
           | reconstruct the original 8, on average?
           | 
           | 8 / 2.58 = 3.1
        
       | dang wrote:
       | Discussed at the time:
       | 
       |  _History of Lossless Data Compression Algorithms_ -
       | https://news.ycombinator.com/item?id=8088842 - July 2014 (44
       | comments)
        
       | harveywi wrote:
       | Grammar-based compression algorithms are completely missing from
       | this "history".
        
         | duskwuff wrote:
         | I'm not sure grammar-based compressors are a meaningful part of
         | the history of lossless data compression. They're a topic in
         | coding theory research, but, as far as I'm aware, they haven't
         | had a significant impact in practical algorithms used in the
         | field.
        
         | lucb1e wrote:
         | Haven't heard of that before. Typing it into Wikipedia, I got
         | https://en.wikipedia.org/wiki/Grammar-based_code which explains
         | it in so much as "To compress a data sequence x=x[0..n], a
         | grammar-based code transforms x into a context-free grammar G."
         | (Very helpful, love it when they use pseudocode with single-
         | letter recursive variable definitions which needs accompanying
         | text for it to make sense... and the text isn't there.) There
         | is an example given on the right in image form, which looks
         | like recursive dictionary-based string substitution:
         | "We hold these truths"         compressed: we_holdQ5ths
         | Q=Rse we_holdRse_5ths         R=Se  we_holdSese_5ths
         | S=_th we_hold_these_5ths         5=tru we_hold_these_truths
         | 
         | This method isn't described anywhere nor was the original
         | included, but this seems to be the algorithm used.
        
       | kloch wrote:
       | Does anyone else remember a joke/malware DOS program that
       | circulated on BBS's in the very early 1990's that promised 98%
       | compression of any file?
       | 
       | It just shortened the size in the FAT table (or similar gimmick
       | that would result in data loss as you wrote new data to the
       | disk)...
        
       | lizardactivist wrote:
       | This article is relevant and intersects with some of the
       | algorithms in the topic link:
       | 
       | https://oku.edu.mie-u.ac.jp/~okumura/compression/history.htm...
        
       | ctur wrote:
       | This seems pretty incomplete if it doesn't include anything about
       | brotli, lz4, or zstd, or Finite State Entropy in general. It's
       | basically missing the past decade or more.
       | 
       | There's a lot of good history here, but just be aware it is
       | missing the state of the art and thus shouldn't be used for
       | deciding anything about algorithms or approaches to use today.
        
         | _delirium wrote:
         | > It's basically missing the past decade or more.
         | 
         | The page history [1] shows that the article was mostly written
         | in December 2011, with only minor edits since then, so that
         | sounds about right.
         | 
         | [1]
         | https://ethw.org/w/index.php?title=History_of_Lossless_Data_...
        
         | lucb1e wrote:
         | Does that mean this is also outdated? I'm not sure if zstd,
         | brotli, etc. are PAQ-like
         | 
         | > on the bleeding edge of archival software are the PAQ*
         | formats
         | 
         | For "a history of" I don't mind if they don't include the
         | latest decade which is still subject to interpretation for how
         | to structure the text and emphasize key moments/inventions and
         | such, but if they make statements pertaining to the present
         | that's a different story.
        
           | felixhandte wrote:
           | No widely deployed compressors are PAQ-like. The speed-up
           | anticipated by the OP never materialized.
        
       ___________________________________________________________________
       (page generated 2022-06-29 23:00 UTC)