[HN Gopher] From WinZips to Cat GIFs, Jacob Ziv's Algorithms Hav...
___________________________________________________________________
From WinZips to Cat GIFs, Jacob Ziv's Algorithms Have Powered
Decades of Compr
Author : chha
Score : 52 points
Date : 2021-04-22 07:34 UTC (15 hours ago)
(HTM) web link (spectrum.ieee.org)
(TXT) w3m dump (spectrum.ieee.org)
| tombert wrote:
| I know it was meant as a bit of a goof, but i always thought that
| the "look up the file in pi!" idea was a clever way of doing
| compression[1], exploiting the fact that pi is normal and thus
| any given sequence of numbers will happen _eventually_ , meaning
| that any given _file_ that could possibly exist will show up
| eventually. Using this, you could theoretically just store two
| numbers to compress _anything_ , a beginning point and an end
| point for its position in pi.
|
| I'm aware this would be unbearably slow and the numbers to store
| really big files might end up taking more memory than LZMA, so I
| don't need a lecture about it, I just thought it was an
| interesting idea.
|
| [1] https://github.com/philipl/pifs
| anuragbiyani wrote:
| It's a commonly held misconception that Pi is proven to be
| normal - it is NOT (it is a conjecture as of now) [1] [2].
| Proving normality of number is a very hard problem, and hardly
| any numbers outside of purposefully constructed ones (such as
| Champernowne's constant [3]) are proven normal.
|
| In fact it's not even proven that every digit occurs
| _infinitely_ many times in the decimal expansion of Pi. [4]
|
| So https://github.com/philipl/pifs is wrong in claiming that
| all byte stream will exist somewhere in Pi (it's not proven).
| Also it's worth calling out that even if Pi was Normal it will
| likely take more space to store the indices of two location as
| it will for original data itself (for at least majority of the
| integers), so it's not much of a "compression" strictly
| speaking. It's easy to see how this will work out for a known
| normal number - Champernowne's constant [3] -> Unlike Pi,
| Champernowne's constant is guaranteed to contain all the
| possible natural number sequences, but storing just the
| starting index of them in this constant is going to take much
| longer than the entire number itself (e.g., number "11" start
| at index 12 (1-indexing), number "12" starts at index 14, and
| so on - the size of index increases much faster than integer
| being looked up itself).
|
| [1] https://mathworld.wolfram.com/NormalNumber.html
|
| [2] https://math.stackexchange.com/a/216578
|
| [3] Champernowne's constant (in base 10) is the concatenation
| of all positive integers and treating them as the decimal
| expansion (following "0."): 0.12345678910111213... It can be
| trivially seen that it contains all natural number strings. It
| is also proven to be Normal in base 10 (which is a stronger
| property). See
| https://en.wikipedia.org/wiki/Champernowne_constant for
| details.
|
| [4]
| https://en.wikipedia.org/wiki/Normal_number#Properties_and_e...
| jonny_eh wrote:
| How would that be any better than "point to a spot on the
| number line"?
| crazygringo wrote:
| It's not just slow but it wouldn't be compression either.
|
| On average, the length of the number required to point to the
| starting digit will be the same length as the data you're
| trying to compress.
|
| It would be 0% compression in the long run.
| treesprite82 wrote:
| > On average, the length of the number required to point to
| the digit will be the same length as the data you're trying
| to compress. It would be 0% compression in the long run
|
| Worse than 0%. The average position of numbers 0000-9999 in
| pi is 9940, for example.
|
| I believe ideally you'd want a De Bruijn sequence
| (https://en.wikipedia.org/wiki/De_Bruijn_sequence -
| "optimally short with respect to the property of containing
| every string of length n at least once") but even that won't
| get 1:1 compression.
| 0xcde4c3db wrote:
| > The following year, the two researchers issued a refinement,
| LZ78. That algorithm became the basis for the Unix compress
| program used in the early '80s; WinZip and Gzip, born in the
| early '90s; and the GIF and TIFF image formats.
|
| I don't understand the inclusion of WinZip and gzip here. Those
| are based on DEFLATE, which I thought was deliberately derived
| from LZ77 and LZSS rather than anything particular to the
| LZ78/LZW lineage (for patent reasons). Am I confused about
| something?
| beagle3 wrote:
| I'm finding conflicting web docs about what algorithms pkzip
| prior to 2.0 (deflate == gzip) used. There were "shrink" and
| "implode", they had some run-length encoding, and some entropy
| coding (shannon fano trees, later replaced by huffman coding in
| deflate). My vague memory says there was an LZW among them,
| with PKZIP 0.8 or something, but I'm not sure.
|
| What I am sure about, is that the predecessor to PKZIP,
| "PKARC", most certainly did use LZW; It was compatible with SEA
| ARC, but much, much faster (like 5x or so). SEA sued Phil Katz,
| and his response was to drop ARC compatibility and release
| PKZIP which was faster and better -- though the incumbent
| "deflate" method did not appear until version 2 a few years
| later.
| forgotmypw17 wrote:
| accessible link for nojs clients:
|
| https://archive.is/Bb1Aw
| bobbyi_settv wrote:
| Did the word "compression" get cut off due to title length? Or is
| it intentionally being compressed in the original title as a
| meta-joke? Turns out it's the former
| mygoodaccount wrote:
| It's lossless too (with the context)!
| dogma1138 wrote:
| Well quite to opposite it's a pretty good example of lossy
| compression.
|
| Lossless compression has to be lossless outside of any
| context or interpretation, it's mathematically reversible.
|
| This is rather a good analogy of a compression akin to say
| MP3.
| [deleted]
___________________________________________________________________
(page generated 2021-04-22 23:01 UTC)