[HN Gopher] Shannon's Source Coding Theorem (2020)
___________________________________________________________________
Shannon's Source Coding Theorem (2020)
Author : ElFitz
Score : 90 points
Date : 2023-06-18 13:38 UTC (9 hours ago)
(HTM) web link (mbernste.github.io)
(TXT) w3m dump (mbernste.github.io)
| ElFitz wrote:
| I was curious as to wether or not there is a way to determine the
| actual amount of information in a given file (say, an image), and
| thus a theoretical limit to lossless compression, which is
| apparently called "Kolmogorov Complexity" [1], and this came up.
|
| [1]: https://en.m.wikipedia.org/wiki/Kolmogorov_complexity
| contravariant wrote:
| Entropy doesn't make much sense for a single instance. There
| trivially exists a codec that will translate any particular
| file into any other. To get a real idea how much information is
| in a file you'll need some statistical model about how it may
| have been generated (thankfully the model doesn't have to be
| perfect).
|
| You will also want to be a bit careful what assumptions you
| allow this model to make. In a very real sense a hash is mostly
| enough information to recover a file if you know it exists
| somewhere, as torrents and magnet links prove.
| supriyo-biswas wrote:
| The formula for entropy and the source coding theorem discussed
| in the article describe exactly that.
| waveBidder wrote:
| beyond, in fact, since they're lossy formats
| canjobear wrote:
| If you have a probability distribution over patterns you are
| likely to find in a file, then the Shannon entropy gives you
| the best losslessly compressed filesize you can achieve.
|
| Kolmogorov complexity is the length of the shortest computer
| program in some programming language that can generate your
| file. It gives the same number across different programming
| languages, up to an uncomputable and likely very large additive
| constant. It's a more universal notion of compressibility than
| Shannon entropy, because it doesn't depend on a probability
| distribution, but it is uncomputable.
|
| The concepts are related because for any programming language,
| you can see it as corresponding to a probability distribution
| such that p(x) \propto 2^-(length of the program that generates
| x).
| exhaze wrote:
| A practical way I used to grok this is to try to zip up an mp4
| or a jpeg. You'll notice the size doesn't really change. That's
| because these formats already applied entropy coding
| compression techniques and have compressed the information very
| close to the Shannon entropy limit.
| CorrectHorseBat wrote:
| JPEG XL supports reversible JPEGs transcoding in ~20% less
| space, which wouldn't be possible if JPEG was very close to
| the Shannon entropy limit.
| [deleted]
| KRAKRISMOTT wrote:
| Run your file through every compression algorithm (that doesn't
| utilize too much prior knowledge i.e. not LLMs) that you can
| get your hands on. After a certain point it won't be able to
| get compressed further. It's a good approximation for the limit
| I think.
| jsenn wrote:
| This might give a decent estimate of the entropy of the file,
| but it is a poor approximation of the Kolmogorov complexity
| in general.
|
| All major compression libraries that I know of essentially
| work by "factoring out" statistical regularities in the input
| string--i.e. repeating patterns. But many strings have
| regularity that can't be captured statistically. For example,
| the sequence 1, 2, 3, 4, ... has an entropy rate of 1,
| because all digits and all subsequences of digits occur with
| equal frequency, so most modern compression libraries are
| going to be unable to compress it to any significant degree
| [1]. However, it obviously has a very compact representation
| as a program in your favourite language.
|
| If you want to go down a bit of a rabbit hole, this video
| from a researcher in this field gives an overview, as well as
| some proposed methods for approximating Kolmogorov Complexity
| in a reasonable way (even though in a technical sense
| approximating it with known, fixed bounds is I believe
| impossible): https://www.youtube.com/watch?v=HXM3BUXsY4g.
| This paper also discusses a theory that decomposes KC into a
| statistical part (Shannon entropy) and a "computational
| part": https://csc.ucdavis.edu/~cmg/compmech/pubs/CalcEmergTi
| tlePag....
|
| [1]: Note that if you encode it as ASCII or even N-bit
| integers there _will_ be significant redundancy in the
| encoding. To properly test it you 'd have to encode the
| numbers in a packed binary encoding with no padding.
| ElFitz wrote:
| I wouldn't have thought of that.
|
| I was more interested in learning about the possible theory
| behind such a limit, to see if we had a way to actually know
| how far such optimisations _can_ go. Basically, for any given
| file, what would be the absolute best we could hope for
| (without resorting to losing information).
| oh_sigh wrote:
| Not to be pedantic, but for any given file the answer is 0%
| compression, due to the pigeonhole principle.
| ElFitz wrote:
| Not at all! Having basically no knowledge in that area,
| aside from a vague understanding of the purpose and uses
| of information theory, it is more than welcome!
|
| For those interested also unfamiliar with it:
|
| - https://eprint.iacr.org/2021/1333.pdf
|
| - https://matt.might.net/articles/why-infinite-or-
| guaranteed-f...
| MauranKilom wrote:
| That question makes pretty little sense without fixing some
| constraints like "what other files exist" and "what
| language are you allowed to reproduce them".
|
| In a world where every single file is exactly "abcd", it
| takes zero bits to represent the file - you don't need any
| information to know that the file contains "abcd" because
| all files do.
|
| In theory, the language you use to output the file could
| have a one-bit coding '0' for "produce the file 'there are
| comments on HN'" (and all other commands starting with
| '1...'). So in the world we live in, the smallest a file
| can be compressed to is 1 bit - if you just use the right
| language.
|
| None of these are practical in general, but the point is
| that Kolmogorov Complexity is not meaningful without
| specifying more assumptions/priors.
| ElFitz wrote:
| Absolutely, that is part of what I have come to
| understand while diving into the topic. That I first
| needed to answer quite a few other questions for that one
| to even make sense.
|
| Thank you for the examples; they are much clearer and
| straightforward than most of those I've come across.
| guimplen wrote:
| This is a misconception. The main result in the
| Kolmogorov complexity theory that there is precisely the
| "universal prior" that comes simply from the notion of
| computability.
| teleforce wrote:
| Shannon is such an engineering marvel as Einstein is for science.
| His perennial works, hobbies, papers and theses outlined the
| fundamentals of modern engineering foundations including digital,
| computer and communication systems including AI systems of early
| forms of chess engine, LLM, etc [1]. Kleinrock once noted that he
| has to pursue exotic field of queuing theory that led to packet
| switching and then Internet, because most of popular problems in
| his research field have been solved by Shannon (who was also one
| of Kleinrock's doctoral research study committee members).
|
| [1]How Claude Shannon Helped Kick-start Machine Learning:
|
| https://www.spectrum.ieee.org/claude-shannon-information-the...
| 082349872349872 wrote:
| I'm fond of his "mind-reading machine":
| https://literateprograms.org/mind_reading_machine__awk_.html
|
| (compare Agassi/Becker for the value of single-bit predictions)
| metadat wrote:
| Wow, what a fascinating concept and read. It's humbling to
| consider and marvel at what early technology pioneers like
| Claude Shannon envisioned and accomplished with a tiny,
| miniscule sliver of computational power compared to what we
| are surrounded by today.
|
| I couldn't resist submitting it, especially considering it
| has not yet been discussed on HN*:
|
| https://news.ycombinator.com/item?id=36385198
|
| (*A different link covering the same subject was submitted 6
| years ago but never caught on:
| https://news.ycombinator.com/item?id=14450232)
| nico wrote:
| In the example in the article, if you want to communicate a
| series of 5 dice rolls, you can send a binary symbol of the
| number of each roll (1-6), 5 times. Using 3-bit coding for each
| number, it would take 15bits (5 numbers in the sequence, 3 bits
| each) to communicate a message
|
| But there's an alternative way, instead of each number
| individually, you can code the sequences of 5 numbers, which
| there are 6 _5_ 4 _3_ 2 of in this case, 6!/1! = 720, which can
| be encoded in binary with 8 bits
|
| So instead of 15 bits, you can use 8
| rododendro wrote:
| All sequences of 5 numbers in the range [6] should be 6 * 6 *
| ... * 6 = 6^5, which would give 12.92 bits, so 13. The only way
| to actually do better than sending every symbol would be run
| length encoding or other non fixed alphabet encodings, which
| fortunately do not need to be longer than the entropy of the
| distribution.
| contravariant wrote:
| You don't need variable length you just need to send the data
| in bigger blocks. Count all 6^5n ways to throw 5 dice n
| times, order them lexicographically and send the index in
| 5nlog(6) bits, rounding up.
|
| Your inefficiency is just the rounding error, which quickly
| becomes trivial. In theory some block sizes should even
| coincidentally be close to a whole number of bits.
| nico wrote:
| You are right, it's 6^5 to include sequences with repeating
| symbols
|
| In any case, you don't need to index the whole numbers space
| to transmit all the information
|
| Like you say, you can use 13 bits instead of 15
|
| And using variable length encoding you can reduce it even
| more
|
| But what is the distribution in this case?
| rododendro wrote:
| Well in this case the distribution would simply be a
| uniform distribution over [1:6], which has entropy 2.58 for
| each symbol. Which is also the most entropic distribution
| on the sets of size 6. Like other have said, you can't
| actually know in a real world scenario your underlying
| distribution in whatever file you are trying to compress,
| but since you want to compress it you are telling the
| program that the file has interesting information
| (otherwise it would be just noise) and so grammar based
| encoding would probably not be best, instead lz77 or lz78
| would perform better (for example, if you know more about
| the file you could even do better by transforming first
| into some frequency domain and then compressing).
| nico wrote:
| Fascinating. Thank you.
|
| Why do you need to know more about the file to transform
| to frequency domain?
| rododendro wrote:
| Because no information is lost if you simply change
| domain, but you may spot more patterns faster or easier.
| For example (even if a bit dated)
| https://ieeexplore.ieee.org/abstract/document/1614066
|
| There is a lot of research in this field because of the
| IoT fad/trend. If you want to know more i would suggest
| to simply pic a paper from IEEE and deep dive into the
| terms and terminology and simply skip the proofs of most
| algorithms/theorems.
| AnimalMuppet wrote:
| > 720, which can be encoded in binary with 8 bits
|
| Even if 720 were the right number, that takes 10 bits, not 8.
| nico wrote:
| Yes, embarrassing mistake
|
| Still 10 is less than 15
|
| Additionally, as someone else pointed out, it would be 6^5
| sequences, which would need 13 bits
|
| Still 13 is less than 15
|
| But the interesting thing is, you can then index the most
| commonly used sequences of sequences and compress by a few
| more bits
|
| You can keep the process going until you have a 2 bit index
| for the most commonly used sequences (of sequences (of
| sequences... ))
| AnimalMuppet wrote:
| Sorry to keep nitpicking, but: In this case, you can't
| compress further. These are dice rolls; there _aren 't_
| "most commonly used sequences" (if the dice are any good).
| nico wrote:
| Yes, however, the moment you choose a message size, there
| is always only a finite set of ways to arrange the
| symbols in those positions
| activatedgeek wrote:
| Another excellent book that I recently read was "Information
| Theory: A Tutorial Introduction" by James V. Stone [1]. This book
| is a fairly light read and builds up everything from scratch
| which was very nice.
|
| Shannon's original paper [2] is often touted to be very
| parseable, but it is always enlightening to read the same basics
| many times from multiple sources, especially something as
| foundational as source coding (and broader information theory) in
| a more modern language.
|
| I also finally discovered the meaning of "differential" in
| differential entropy [2], the continuous counterpart of discrete
| probability masses, something that I always swept under the rug.
|
| [1]: https://www.librarything.com/work/16165301/book/242295611
| [2]: https://ieeexplore.ieee.org/document/6773024 [3]:
| https://sanyamkapoor.com/kb/differential-entropy
___________________________________________________________________
(page generated 2023-06-18 23:01 UTC)