[HN Gopher] Kolmogorov Complexity and Compression Distance
___________________________________________________________________
Kolmogorov Complexity and Compression Distance
Author : rgbimbochamp
Score : 76 points
Date : 2024-03-30 18:20 UTC (4 hours ago)
(HTM) web link (smunshi.net)
(TXT) w3m dump (smunshi.net)
| causal wrote:
| Confused how the interesting number paradox proves KC cannot be
| computed.
| srcreigh wrote:
| The author is referring to something called Chaitin
| incompleteness.
|
| https://en.wikipedia.org/wiki/Kolmogorov_complexity#Chaitin'...
|
| Of course trivially some KC can be proven, ex a language with 1
| or 0 characters that is interpreted to a specific string. Or to
| prove KC(x) where the compressed value has length N and you can
| list out all the results for all strings of length less than N,
| and they don't equal x, proves KC(x)=N.
|
| The interesting number paradox (Berry's paradox) is more
| related to Chaitin incompleteness.
|
| Basically, given a language there's some code which enumerates
| proofs that KC of a string is more than some constant L, and
| returns the first one it finds.
|
| If the constant L is large enough, it becomes larger than the
| entire proof generating code. So the proof generating code will
| never find a proof of any KC larger than L.
|
| It's interesting to think about that the language gets more
| complex, proofs for larger strings become possible. And what it
| would mean for the languages to keep getting more complex
| indefinitely.
|
| it's a similar train of thought to busy beaver numbers and how
| systems of logic (PA,ZFC) become independent to values like
| BB(745), and what it could mean to have more and more advanced
| types of logic which don't become independent until some high
| target n.
| causal wrote:
| This seems to assume that KC can be infinite. That must have
| been proven at some point? Otherwise it may be that there is
| some upper-bound for L which happens to also be the KC for a
| KC-computer.
| srcreigh wrote:
| yes, it's a pigeonhole principle argument. A bit tricky to
| actually enumerate everything. Imagine KC had a limit k.
| Then there's a fixed number of strings that can be
| compressed to k characters. so considering how there's an
| infinite number of strings of length greater than k, they
| can't all be compressed to be at most k. Therefore KC has
| no limit
| Opocio wrote:
| Me neither.
|
| But how I see it is that for solving KC in full generality
| you'll have to:
|
| - Start with the program that explicitly returns the original
| string. Let's say it has length N - run all possible programs
| that are shorter than N (just try all combinations of
| characters) - look at the results and pick the shortest program
| that compiles and outputs the original string
|
| The problem there is that you have to wait for all programs to
| end, and you don't know if they will end or not. So you have a
| problem that's equivalent to the halting problem (and that's
| not solvable) (and the halting problem is loosely related to
| the interesting number problem).
|
| (This is not a proof and I don't have a background in the field
| btw)
| causal wrote:
| That intuitively makes sense to me.
| nyrikki wrote:
| Impredicativity is the property you may want to dig into for
| formal proofs on why self references can be problematic.
|
| There is an important difference between semantically complete
| and syntactically complete that may cause some barriers.
|
| Godels completeness theorem is about semantic completeness
| while his incompleteness theorems are about syntactic
| completeness.
|
| From Wikipedia: > A formal system is syntactically complete if
| and only if no unprovable sentence can be added to it without
| introducing an inconsistency.
|
| 'This statement is false', which Godel mapped to natural
| numbers is an example of that inconsistency.
|
| If KC was computable, there would be an infinity of paradoxes
| like the interesting number paradox.
|
| The Berry paradox that is linked to in the INP link in the page
| has a subheading that relates it to KC computability.
|
| https://en.m.wikipedia.org/wiki/Berry_paradox
| explaininjs wrote:
| Similar to how the interesting number paradox relies on a
| "shortcut statement" to force-up the number of non-interest, If
| Kolmogorov complexity were computable you could create a
| "shortcut program" to force-down the shortest length of the
| program:
|
| Given: TM length of a JS runtime is 1,000,000 cells.
|
| Assume: KC is computable, and TM length of a `function
| KolmoglorovComplexity(string s)` is 4,000,000 cells.
|
| Known: KC's of values grow infinitely large - only 2^n-1
| possible values can ever be encoded by n bits.
|
| Take: function Shortcut() { for (const s in
| generateEveryStringFromShortestUp()) { if (
| KolomoglorovComplexity(s > 10,000,000) ) return s } }
|
| You see that the Shortcut function is encoded in 5,000,135
| cells (plus that string generator, but that's small/constant),
| but it computes a value of arbitrarily large complexity
| (rather, one cell increase in the program length causes 10x
| increase in the complexity). A contradiction.
| causal wrote:
| Still confused. What is contradictory about a simple program
| computing a more complex program? Randomly generating a more
| complex program does not make the complex program reducible
| to a random string generator.
| yamrzou wrote:
| Well, I reached the end of the article (interesting btw), and
| still not convinced why bob can't claim that there was no foul-
| play involved and that his got his result due to excellent luck.
| ComplexSystems wrote:
| You don't need Kolmogorov complexity for this; simple
| hypothesis testing will do. The null hypothesis is that the
| coin is fair and the alternative is that it's biased. If Bob
| was correct, then there would simply _never_ be any way to
| refute the null hypothesis of a fair coin, no matter what,
| since it can simply output anything at all with equal
| probability as anything else. In reality, that isn 't how
| hypothesis testing works, and pretty much any standard
| technique (computing p-values, likelihood ratios, etc) will
| agree that 20 tails in a row is extremely unlikely given the
| null hypothesis in a way that 10 tails and 10 heads is not.
| wood_spirit wrote:
| An excellent rabbit hole to dive into is the equivalence of
| compression and general AI. Every programmer should make a
| compressor (and, separately, a ray tracer)!
|
| See http://prize.hutter1.net/
| avmich wrote:
| Some examples for a particular algorithm:
| https://rosettacode.org/wiki/LZW_compression
| JDEW wrote:
| > It has been demonstrated that KC(x), can be reasonably
| estimated by the number of bits required to encode x using a
| compressor C (such as gzip)
|
| Talk about a cliffhanger :)
|
| Using [0] you get 32B for Alice and 40B for Bob.
|
| [0] It has been demonstrated that KC(x), can be reasonably
| estimated by the number of bits required to encode x using a
| compressor C (such as gzip)
| pizza wrote:
| I think maybe another way to put this is that Alice's number is
| in a typical set [0] of the distribution of bitstrings whereas
| Bob's might not be. Depending on the tolerance, the typical set
| can have near-total coverage of the distribution. Another way of
| making this about compression is that a random code that could
| encode typical set strings well probably will suffer some
| overhead when encoding Bob's, but most strings it will encode
| close to optimally.
|
| [0] https://en.wikipedia.org/wiki/Typical_set
| arketyp wrote:
| Richard von Mises (brother of the economist) formulated a
| definition of randomness as a sequence of data that, were you a
| gambler, you cannot by any strategy make money on betting on the
| outcomes. This was before computational calculus and was later
| developed by Kolmogorov and others in algorithmic complexity. The
| modern variation would be (Wiki) "considering a finite sequence
| random (with respect to a class of computing systems) if any
| program that can generate the sequence is at least as long as the
| sequence itself".
| n4r9 wrote:
| > you cannot by any strategy make money on betting on the
| outcomes
|
| What does "strategy" mean here? I might just happen to have a
| strategy which involves betting on the exact sequence of heads
| and tails in a given sequence. The analogy in terms of
| languages is that my language might just happen to have a short
| keyword that represents a given sequence of heads and tails.
|
| I don't know much about Kolmogorow complexity so I'm certainly
| missing something here. Potentially there is a subtle clause in
| the technical definition that doesn't make it through to these
| articles.
| PartiallyTyped wrote:
| > What does "strategy" mean here? I might just happen to have
| a strategy which involves betting on the exact sequence of
| heads and tails in a given sequence.
|
| That's a very narrow program.
|
| > The analogy in terms of languages is that my language might
| just happen to have a short keyword that represents a given
| sequence of heads and tails.
|
| The sequence still needs to be generated "somehow". Either by
| executing the program and producing the sequence, or by
| explicitly stating it. Even if you have it "cached" and
| "represented" in your language, you still need to generate
| the sequence. The resources spent here is the Kolmogorov
| complexity.
|
| The easiest way to expand your little program is to say that
| you have a seed s.t. any consecutive generation results in a
| consecutive sequence that matches up to the period of the
| generator. Now it is more generic, but has a period. You can
| then expand this to accept multiple seeds and once it has
| reached a period, to simply take the next seed.
|
| Should this sequence be finite, you are in luck. Your program
| can have length O(generator + N/P) where N is length of
| sequence, and P is the period of your RNG.
|
| All this is is just compression which plays into the whole
| Kolmogorov complexity.
| canjobear wrote:
| > What does "strategy" mean here?
|
| Any function that outputs bets.
| a_wild_dandan wrote:
| Doesn't the modern variation break for programs with lossless
| encoding/decoding? At least, for sufficiently long sequences? A
| Huffman/byte-pair encoding would shred any trillion-bit+
| sequence, for instance. But I intuitively expect many random
| trillion-bit sequences exist.
| chaboud wrote:
| There is no encoding that would shred "any" (read: every)
| trillion bit sequence. If that were true, some fundamentals
| of information theory and compressibility would break down.
|
| Lossless encoding works by taking advantage of the more
| commonly observed sequences of data having lower information
| entropy. For things like audio encoding, where discontinuous
| sequences aren't naturally observed (or pleasing to listen
| to), lossless encoding has a lot to work with.
| mxkopy wrote:
| For any fixed compression scheme, there is an input string
| that is actually lengthened by it rather than shortened.
|
| However Huffman isn't a fixed compression scheme since it
| makes a different frequency tree for different corpora.
| canjobear wrote:
| Do you have a citation? I didn't know the idea went back that
| far.
| marius_k wrote:
| Sometimes I wonder what would be the smallest program to generate
| humans DNA. How many operations would it take and how would it
| compare to real world iterations of total evolution.
| wood_spirit wrote:
| Interestingly, dna programs are quite compressible
|
| https://en.m.wikipedia.org/wiki/Compression_of_genomic_seque...
| arketyp wrote:
| Not sure what kinds of selection pressures there has been for
| shorter DNA strings, but presumably you could compress it a
| great deal putting it in a .zip file. Now imagine the havoc
| caused by random mutations on that format though.
| amelius wrote:
| If we ever find a perfect theory of physics, then that might be
| the smallest program to generate human DNA.
| copx wrote:
| >Bob claims that since the probability of getting both his and
| Alice's sequence is the same (2-20 ), it proves that there was no
| foul-play involved.
|
| ..and Bob is 100% right.
|
| >Bob credits his excellent luck. Alice is smart and cannot be
| easily convinced. She get's back at Bob by claiming that
| probability cannot be used in this context as it reveals no
| information regarding the randomness of the obtained sequences.
| One can take a quick glance at the obtained sequences and easily
| point out that Alice's sequence is more random than Bob's
| sequence.
|
| No, it is not. Given a perfectly random coin toss Bob's sequence
| is indeed just as likely as Alice's sequence and in no way "less
| random" because both sequences result from the same randomness
| with equal probability.
|
| A nice example of human intuition being at odds with probability
| math, though. Bob's result seems less likely but it really is
| not. Which reminds me that I actually had to write my own
| computer simulation of the Monty Hall Problem before I was
| willing to believe the correct answer. I think (most?) human
| brains have a bug in the "understanding probability" subroutine.
| ptero wrote:
| Not quite. Specifically, _assuming independent random tosses_ A
| and B sequences are equally likely. No objection here.
|
| But the question posed is different: given a specific sequence,
| how likely it to have come from independent coin tosses? That
| is, how likely is it that Bob is cheating and his sequence was
| in fact _not_ a sequence of a fair coin tosses.
|
| And for this KC is a reasonable measure. My 2c.
| Gimpei wrote:
| Couldn't you say that the distribution of tosses is less likely
| in the case of Bob?
| Hunpeter wrote:
| The "bug" in this case imo, is that we interpret A's sequence
| as "random garbage" without regard to the actual contents,
| whereas we interpret B's as "all Ts". The question our brain
| asks then is "is it more likely to get random garbage or all
| Ts?"
| nerdponx wrote:
| Right. It might be more interesting to consider the count of
| Ts and Hs instead of considering exact sequences.
| mojomark wrote:
| I'm going to keep reading (because I love the KC topic), but I'd
| appreciate anyone confirming if the following are errors in this
| article:
|
| 1.) Conflating usage of the term "random" and "complexity". After
| all, a set of "randomly" drawn sample permutations from an
| alphabet are all equally likely. However, their "complexity" may
| differ, which is basically the point of the article, but the term
| more or less "random" keeps being used to refer to permutations
| with more or less "complexity", which I think is probably going
| to perpetuate confusion on this topic.
|
| 2.) From the article: "Moreover, a string cannot be compressed if
| its KC(x)>=|x|". Shouldn't the expression accompanying this
| statement be KC(x)=|x| ?
| tromp wrote:
| Regarding point 1), one can easily show that with probability
| >= 1 - 2^{-k}, a randomly chosen bitstring x of length n must
| satisfy KC(x) >= n-k. After all, there are only 1+2+...
| 2^{n-k-1} = 2^{n-k}-1 shorter descriptions. So highly
| compressible strings are highly unlikely.
|
| Regarding 2), No, most strings x do not satisfy KC(x) = |x|,
| since you need to use some bits to specify that you're giving x
| literally. See the first theorem of [1].
|
| [1]
| https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...
| rhelz wrote:
| re #1: the conflation is justified, but you couldn't guess that
| just from what was presented in the OP. There are some cool
| theorems which justify it tho---if you like Kolmogorov
| complexity you are in for a fun ride.
|
| re #2: No. Basically the > part of it handles the case when the
| smallest program which prints out the string is actually LARGER
| than the length of the string. In that case, the string is
| still incompressible. Compression means mapping from larger
| strings to smaller strings.
| tromp wrote:
| > let's assume that there exists a universal language U
|
| Why not specify it?
|
| > That gives us the true language-agnostic definition of
| Kolmogorov Complexity as follows:
|
| Choosing the language of Turing Machines does not make the
| definition language agnostic.
|
| Aiming for the simplest definition of description complexity, I
| instead based my definitions on the older computational model of
| lambda calculus in [1].
|
| Unlike the assumed UTM above, the universal lambda machine is
| easy to describe in detail: (l 1 1) (l l l 1 (l
| l l l 3 (l 5 (3 (l 2 (3 (l l 3 (l 1 2 3))) (4 (l 4 (l 3 1 (2
| 1)))))) (1 (2 (l 1 2)) (l 4 (l 4 (l 2 (1 4))) 5)))) (3 3) 2) (l 1
| ((l 1 1) (l 1 1)))
|
| Furthermore, it allows almost identical definitions of various
| variations of descriptional complexity, namely
|
| 1) plain complexity
|
| 2) prefix complexity
|
| 3) monotone complexity
|
| all of which have their application in Algorithmic Information
| Theory [2].
|
| [1]
| https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...
|
| [2] https://homepages.cwi.nl/~paulv/kolmogorov.html
| rhelz wrote:
| There is another, more insidious, problem with trying to give a
| language agnostic definition: Different languages will have
| different symbols which are outputable.
|
| If you have a Turing machine which can only print out binary
| digits, then it can't print out a chinese character, no matter
| how long the input program is.
|
| Yeah, you can do something like unicode, and associate a binary
| string with each chinese character--but printing out that
| binary string is not printing out the chinese character. It's
| printing out a binary string.
|
| In particular, your lambda-calculus based turing machine cannot
| print out chinese characters. It therefore cannot be used to
| define a _universal_ complexity for any string.
| mxkopy wrote:
| I feel like a conversion from binary strings to
| Unicode/Chinese characters would be in PTIME, so adding a
| conversion machine would be a nonfactor for languages in most
| complexity classes.
| smallnamespace wrote:
| The stronger result here is that any sort of conversion you
| can explicitly specify can be turned into a program.
|
| Since Kolmogorov Complexity is specified in terms of
| lengths of programs, that means the KC between two
| different pairs of encodings can differ at most by a
| constant amount (the size of the program that converts back
| and forth).
|
| The above is a bit handwavey, there are details you can
| tighten up (Is it the program size or something smaller?
| The program length in _which_ encoding?), but heuristically
| that 's why theorists can talk about "the" Kolmogorov
| complexity without getting bogged down in with pesky
| encoding details.
|
| It's also why we usually don't worry too much about the
| fine details of Turing Machines (alphabet, etc.), since you
| can generally emulate one sort of Turing Machine pretty
| easily with a short program written on another.
| canjobear wrote:
| Why is this a problem? No information is lost when characters
| (or graphics) are encoded in binary.
| SimplyUnknown wrote:
| But Chinese (or mandarin) is not a context-free grammar
| whereas I believe that encoding a language on a turing
| machine implies a context-free grammar so this example
| doesn't hold.
| canjobear wrote:
| The whole point of Kolmogorov complexity is that description
| lengths under different Turing-complete description languages
| (such as UTM and lambda calculus) are only different up to a
| constant that depends on the languages and not on the thing
| being described.
| avmich wrote:
| > Another thing to note is that Kolmogorov complexity of a string
| cannot be computed. There cannot exist a computer that will
| always guarantee the Kolmogorov complexity for all the strings.
|
| Sounds a bit puzzling. Surely for a particular programming
| language we can enumerate all programs, ordered by length etc.
| and check which is the shortest one giving the given string. So
| what's uncomputable here? For long strings that could take long
| time, but - ?
| MutualMisinfo wrote:
| The programs we check might not halt.
| floobertoober wrote:
| With a Turing complete language, you can't know whether a given
| program eventually yields the string, or continues indefinitely
| tromp wrote:
| > So what's uncomputable here?
|
| Deciding whether the universal machine will _ever_ halt on a
| particular input. I.e. the good old halting problem.
| rhelz wrote:
| I think its bootless to try to define the "minimum possible"
| Kolmogorov complexity. Here's why:
|
| 1. Note, kolmogorov complexity is defined by the length of the
| shortest _program_ which prints out the string. What counts is
| the _number_ of instructions, and not the _complexity of those
| instructions_.
|
| 2. So say S is a very complex spring. We can always construct a
| turing machine which could print out S using a zero length
| program: it could just start in a state which prints out S when
| you turn it on, and then halts.
|
| 3. So there is no such thing as a turing machine which prints out
| _every_ string shorter than _any_ other turing machine prints it
| out, QED.
|
| That's the bad news. The good news is we don't even need to do
| that. For any string S, say that M and N are any two universal
| turing machines. Without loss of generality, specify that KM(S)
| <= KN(S). Then there is always some C for which KM(S) <= KN(S) +
| C. The constant C being the length of the program required to
| _emulate_ machine M on machine N.
|
| We are used to abstracting out constant sums and constant factors
| like this. The strings we are dealing with (as a species) are
| growing in length exponentially--that's why we went from 8-bit,
| to 16bit, etc computers. So as the length of S goes to infinity,
| the difference between the its complexity for any two machines
| becomes negligible.
| alfanick wrote:
| A side question: is this taught in CS curriculum you know? It was
| at my uni (fairly good one, in a minor European country), and
| this experience biases me because I assume every CS knows
| Kolmogorov complexity.
| quibono wrote:
| Yes, at least in the UK. From working through some US
| university curricula - it's also present there as well.
| davesque wrote:
| Something I've always noticed with the notion of Kolmogorov
| complexity is that the question of determining the lowest level
| of computation is problematic.
|
| For example, in the article, the author first defines the basic
| idea of KC. But then they correctly point out that the basic idea
| depends very much on the exact language that is chosen. So they
| describe how theorists have defined the notion of universal
| computation. But even this adjustment doesn't seem to escape the
| fact the we still depend on a system of mathematical symbols to
| describe the theory. And the notion of a Turing machine itself
| depends on other abstract concepts such as time and space, each
| with their own inherent, conceptual complexity. What sorts of
| minds (i.e. brains) are required to make sense out of the theory
| and what physical system is required for them to operate
| correctly? If the definition of KC includes a notion of how
| complex the Turing machine is that is required to compute a
| string, then the further down you go, the less the difference in
| complexity should be between any one string and another. After
| all, they all exist in the same universe!
|
| I guess it just goes to show how much the idea of KC lives in the
| realm of theory. As soon as you pose the question of complexity
| so abstractly, you invite in all kinds of theoretical
| considerations that make the meaning more slippery. That's why KC
| really doesn't deserve to be compared to Shannon entropy as it
| often is.
|
| But let me draw a comparison anyway like I said you shouldn't!
| Because Alice from the article could also have made a strong
| argument against Bob by just pointing out that the Shannon
| entropy of his string was lower, which is very relevant in terms
| of the number of heads or tails and the likelihood of seeing a
| particular count of them.
| veerd wrote:
| 1. Choice of language only matters up to an additive constant
| (e.g. you could just write a simulator so language A can run
| language B).
|
| 2. If you want something with less physical grounding, you
| could use lambda calculus instead of Turing machines.
|
| 3. Kolmogorov Complexity and Shannon Entropy are compared with
| one another because they both are talking about the same thing:
| optimal compression. Kolmogorov Complexity talks about the
| compressibility of individual objects and Shannon Entropy talks
| about compressibility of streams of i.i.d. random variables.
| anonzzzies wrote:
| Kolmogorov complexity is a lovely subject and one of the more
| influential ones in my life.
|
| THE book https://link.springer.com/book/10.1007/978-0-387-49820-1
| is absolutely a thing to read. It was for me 30 years ago and it
| aged well.
| derbOac wrote:
| That is a great book on the subject -- the authors have
| published some important work in this area in papers as well.
___________________________________________________________________
(page generated 2024-03-30 23:00 UTC)