[HN Gopher] Kolmogorov Complexity and Compression Distance
       ___________________________________________________________________
        
       Kolmogorov Complexity and Compression Distance
        
       Author : rgbimbochamp
       Score  : 76 points
       Date   : 2024-03-30 18:20 UTC (4 hours ago)
        
 (HTM) web link (smunshi.net)
 (TXT) w3m dump (smunshi.net)
        
       | causal wrote:
       | Confused how the interesting number paradox proves KC cannot be
       | computed.
        
         | srcreigh wrote:
         | The author is referring to something called Chaitin
         | incompleteness.
         | 
         | https://en.wikipedia.org/wiki/Kolmogorov_complexity#Chaitin'...
         | 
         | Of course trivially some KC can be proven, ex a language with 1
         | or 0 characters that is interpreted to a specific string. Or to
         | prove KC(x) where the compressed value has length N and you can
         | list out all the results for all strings of length less than N,
         | and they don't equal x, proves KC(x)=N.
         | 
         | The interesting number paradox (Berry's paradox) is more
         | related to Chaitin incompleteness.
         | 
         | Basically, given a language there's some code which enumerates
         | proofs that KC of a string is more than some constant L, and
         | returns the first one it finds.
         | 
         | If the constant L is large enough, it becomes larger than the
         | entire proof generating code. So the proof generating code will
         | never find a proof of any KC larger than L.
         | 
         | It's interesting to think about that the language gets more
         | complex, proofs for larger strings become possible. And what it
         | would mean for the languages to keep getting more complex
         | indefinitely.
         | 
         | it's a similar train of thought to busy beaver numbers and how
         | systems of logic (PA,ZFC) become independent to values like
         | BB(745), and what it could mean to have more and more advanced
         | types of logic which don't become independent until some high
         | target n.
        
           | causal wrote:
           | This seems to assume that KC can be infinite. That must have
           | been proven at some point? Otherwise it may be that there is
           | some upper-bound for L which happens to also be the KC for a
           | KC-computer.
        
             | srcreigh wrote:
             | yes, it's a pigeonhole principle argument. A bit tricky to
             | actually enumerate everything. Imagine KC had a limit k.
             | Then there's a fixed number of strings that can be
             | compressed to k characters. so considering how there's an
             | infinite number of strings of length greater than k, they
             | can't all be compressed to be at most k. Therefore KC has
             | no limit
        
         | Opocio wrote:
         | Me neither.
         | 
         | But how I see it is that for solving KC in full generality
         | you'll have to:
         | 
         | - Start with the program that explicitly returns the original
         | string. Let's say it has length N - run all possible programs
         | that are shorter than N (just try all combinations of
         | characters) - look at the results and pick the shortest program
         | that compiles and outputs the original string
         | 
         | The problem there is that you have to wait for all programs to
         | end, and you don't know if they will end or not. So you have a
         | problem that's equivalent to the halting problem (and that's
         | not solvable) (and the halting problem is loosely related to
         | the interesting number problem).
         | 
         | (This is not a proof and I don't have a background in the field
         | btw)
        
           | causal wrote:
           | That intuitively makes sense to me.
        
         | nyrikki wrote:
         | Impredicativity is the property you may want to dig into for
         | formal proofs on why self references can be problematic.
         | 
         | There is an important difference between semantically complete
         | and syntactically complete that may cause some barriers.
         | 
         | Godels completeness theorem is about semantic completeness
         | while his incompleteness theorems are about syntactic
         | completeness.
         | 
         | From Wikipedia: > A formal system is syntactically complete if
         | and only if no unprovable sentence can be added to it without
         | introducing an inconsistency.
         | 
         | 'This statement is false', which Godel mapped to natural
         | numbers is an example of that inconsistency.
         | 
         | If KC was computable, there would be an infinity of paradoxes
         | like the interesting number paradox.
         | 
         | The Berry paradox that is linked to in the INP link in the page
         | has a subheading that relates it to KC computability.
         | 
         | https://en.m.wikipedia.org/wiki/Berry_paradox
        
         | explaininjs wrote:
         | Similar to how the interesting number paradox relies on a
         | "shortcut statement" to force-up the number of non-interest, If
         | Kolmogorov complexity were computable you could create a
         | "shortcut program" to force-down the shortest length of the
         | program:
         | 
         | Given: TM length of a JS runtime is 1,000,000 cells.
         | 
         | Assume: KC is computable, and TM length of a `function
         | KolmoglorovComplexity(string s)` is 4,000,000 cells.
         | 
         | Known: KC's of values grow infinitely large - only 2^n-1
         | possible values can ever be encoded by n bits.
         | 
         | Take: function Shortcut() { for (const s in
         | generateEveryStringFromShortestUp()) { if (
         | KolomoglorovComplexity(s > 10,000,000) ) return s } }
         | 
         | You see that the Shortcut function is encoded in 5,000,135
         | cells (plus that string generator, but that's small/constant),
         | but it computes a value of arbitrarily large complexity
         | (rather, one cell increase in the program length causes 10x
         | increase in the complexity). A contradiction.
        
           | causal wrote:
           | Still confused. What is contradictory about a simple program
           | computing a more complex program? Randomly generating a more
           | complex program does not make the complex program reducible
           | to a random string generator.
        
       | yamrzou wrote:
       | Well, I reached the end of the article (interesting btw), and
       | still not convinced why bob can't claim that there was no foul-
       | play involved and that his got his result due to excellent luck.
        
         | ComplexSystems wrote:
         | You don't need Kolmogorov complexity for this; simple
         | hypothesis testing will do. The null hypothesis is that the
         | coin is fair and the alternative is that it's biased. If Bob
         | was correct, then there would simply _never_ be any way to
         | refute the null hypothesis of a fair coin, no matter what,
         | since it can simply output anything at all with equal
         | probability as anything else. In reality, that isn 't how
         | hypothesis testing works, and pretty much any standard
         | technique (computing p-values, likelihood ratios, etc) will
         | agree that 20 tails in a row is extremely unlikely given the
         | null hypothesis in a way that 10 tails and 10 heads is not.
        
       | wood_spirit wrote:
       | An excellent rabbit hole to dive into is the equivalence of
       | compression and general AI. Every programmer should make a
       | compressor (and, separately, a ray tracer)!
       | 
       | See http://prize.hutter1.net/
        
         | avmich wrote:
         | Some examples for a particular algorithm:
         | https://rosettacode.org/wiki/LZW_compression
        
       | JDEW wrote:
       | > It has been demonstrated that KC(x), can be reasonably
       | estimated by the number of bits required to encode x using a
       | compressor C (such as gzip)
       | 
       | Talk about a cliffhanger :)
       | 
       | Using [0] you get 32B for Alice and 40B for Bob.
       | 
       | [0] It has been demonstrated that KC(x), can be reasonably
       | estimated by the number of bits required to encode x using a
       | compressor C (such as gzip)
        
       | pizza wrote:
       | I think maybe another way to put this is that Alice's number is
       | in a typical set [0] of the distribution of bitstrings whereas
       | Bob's might not be. Depending on the tolerance, the typical set
       | can have near-total coverage of the distribution. Another way of
       | making this about compression is that a random code that could
       | encode typical set strings well probably will suffer some
       | overhead when encoding Bob's, but most strings it will encode
       | close to optimally.
       | 
       | [0] https://en.wikipedia.org/wiki/Typical_set
        
       | arketyp wrote:
       | Richard von Mises (brother of the economist) formulated a
       | definition of randomness as a sequence of data that, were you a
       | gambler, you cannot by any strategy make money on betting on the
       | outcomes. This was before computational calculus and was later
       | developed by Kolmogorov and others in algorithmic complexity. The
       | modern variation would be (Wiki) "considering a finite sequence
       | random (with respect to a class of computing systems) if any
       | program that can generate the sequence is at least as long as the
       | sequence itself".
        
         | n4r9 wrote:
         | > you cannot by any strategy make money on betting on the
         | outcomes
         | 
         | What does "strategy" mean here? I might just happen to have a
         | strategy which involves betting on the exact sequence of heads
         | and tails in a given sequence. The analogy in terms of
         | languages is that my language might just happen to have a short
         | keyword that represents a given sequence of heads and tails.
         | 
         | I don't know much about Kolmogorow complexity so I'm certainly
         | missing something here. Potentially there is a subtle clause in
         | the technical definition that doesn't make it through to these
         | articles.
        
           | PartiallyTyped wrote:
           | > What does "strategy" mean here? I might just happen to have
           | a strategy which involves betting on the exact sequence of
           | heads and tails in a given sequence.
           | 
           | That's a very narrow program.
           | 
           | > The analogy in terms of languages is that my language might
           | just happen to have a short keyword that represents a given
           | sequence of heads and tails.
           | 
           | The sequence still needs to be generated "somehow". Either by
           | executing the program and producing the sequence, or by
           | explicitly stating it. Even if you have it "cached" and
           | "represented" in your language, you still need to generate
           | the sequence. The resources spent here is the Kolmogorov
           | complexity.
           | 
           | The easiest way to expand your little program is to say that
           | you have a seed s.t. any consecutive generation results in a
           | consecutive sequence that matches up to the period of the
           | generator. Now it is more generic, but has a period. You can
           | then expand this to accept multiple seeds and once it has
           | reached a period, to simply take the next seed.
           | 
           | Should this sequence be finite, you are in luck. Your program
           | can have length O(generator + N/P) where N is length of
           | sequence, and P is the period of your RNG.
           | 
           | All this is is just compression which plays into the whole
           | Kolmogorov complexity.
        
           | canjobear wrote:
           | > What does "strategy" mean here?
           | 
           | Any function that outputs bets.
        
         | a_wild_dandan wrote:
         | Doesn't the modern variation break for programs with lossless
         | encoding/decoding? At least, for sufficiently long sequences? A
         | Huffman/byte-pair encoding would shred any trillion-bit+
         | sequence, for instance. But I intuitively expect many random
         | trillion-bit sequences exist.
        
           | chaboud wrote:
           | There is no encoding that would shred "any" (read: every)
           | trillion bit sequence. If that were true, some fundamentals
           | of information theory and compressibility would break down.
           | 
           | Lossless encoding works by taking advantage of the more
           | commonly observed sequences of data having lower information
           | entropy. For things like audio encoding, where discontinuous
           | sequences aren't naturally observed (or pleasing to listen
           | to), lossless encoding has a lot to work with.
        
           | mxkopy wrote:
           | For any fixed compression scheme, there is an input string
           | that is actually lengthened by it rather than shortened.
           | 
           | However Huffman isn't a fixed compression scheme since it
           | makes a different frequency tree for different corpora.
        
         | canjobear wrote:
         | Do you have a citation? I didn't know the idea went back that
         | far.
        
       | marius_k wrote:
       | Sometimes I wonder what would be the smallest program to generate
       | humans DNA. How many operations would it take and how would it
       | compare to real world iterations of total evolution.
        
         | wood_spirit wrote:
         | Interestingly, dna programs are quite compressible
         | 
         | https://en.m.wikipedia.org/wiki/Compression_of_genomic_seque...
        
         | arketyp wrote:
         | Not sure what kinds of selection pressures there has been for
         | shorter DNA strings, but presumably you could compress it a
         | great deal putting it in a .zip file. Now imagine the havoc
         | caused by random mutations on that format though.
        
         | amelius wrote:
         | If we ever find a perfect theory of physics, then that might be
         | the smallest program to generate human DNA.
        
       | copx wrote:
       | >Bob claims that since the probability of getting both his and
       | Alice's sequence is the same (2-20 ), it proves that there was no
       | foul-play involved.
       | 
       | ..and Bob is 100% right.
       | 
       | >Bob credits his excellent luck. Alice is smart and cannot be
       | easily convinced. She get's back at Bob by claiming that
       | probability cannot be used in this context as it reveals no
       | information regarding the randomness of the obtained sequences.
       | One can take a quick glance at the obtained sequences and easily
       | point out that Alice's sequence is more random than Bob's
       | sequence.
       | 
       | No, it is not. Given a perfectly random coin toss Bob's sequence
       | is indeed just as likely as Alice's sequence and in no way "less
       | random" because both sequences result from the same randomness
       | with equal probability.
       | 
       | A nice example of human intuition being at odds with probability
       | math, though. Bob's result seems less likely but it really is
       | not. Which reminds me that I actually had to write my own
       | computer simulation of the Monty Hall Problem before I was
       | willing to believe the correct answer. I think (most?) human
       | brains have a bug in the "understanding probability" subroutine.
        
         | ptero wrote:
         | Not quite. Specifically, _assuming independent random tosses_ A
         | and B sequences are equally likely. No objection here.
         | 
         | But the question posed is different: given a specific sequence,
         | how likely it to have come from independent coin tosses? That
         | is, how likely is it that Bob is cheating and his sequence was
         | in fact _not_ a sequence of a fair coin tosses.
         | 
         | And for this KC is a reasonable measure. My 2c.
        
         | Gimpei wrote:
         | Couldn't you say that the distribution of tosses is less likely
         | in the case of Bob?
        
         | Hunpeter wrote:
         | The "bug" in this case imo, is that we interpret A's sequence
         | as "random garbage" without regard to the actual contents,
         | whereas we interpret B's as "all Ts". The question our brain
         | asks then is "is it more likely to get random garbage or all
         | Ts?"
        
           | nerdponx wrote:
           | Right. It might be more interesting to consider the count of
           | Ts and Hs instead of considering exact sequences.
        
       | mojomark wrote:
       | I'm going to keep reading (because I love the KC topic), but I'd
       | appreciate anyone confirming if the following are errors in this
       | article:
       | 
       | 1.) Conflating usage of the term "random" and "complexity". After
       | all, a set of "randomly" drawn sample permutations from an
       | alphabet are all equally likely. However, their "complexity" may
       | differ, which is basically the point of the article, but the term
       | more or less "random" keeps being used to refer to permutations
       | with more or less "complexity", which I think is probably going
       | to perpetuate confusion on this topic.
       | 
       | 2.) From the article: "Moreover, a string cannot be compressed if
       | its KC(x)>=|x|". Shouldn't the expression accompanying this
       | statement be KC(x)=|x| ?
        
         | tromp wrote:
         | Regarding point 1), one can easily show that with probability
         | >= 1 - 2^{-k}, a randomly chosen bitstring x of length n must
         | satisfy KC(x) >= n-k. After all, there are only 1+2+...
         | 2^{n-k-1} = 2^{n-k}-1 shorter descriptions. So highly
         | compressible strings are highly unlikely.
         | 
         | Regarding 2), No, most strings x do not satisfy KC(x) = |x|,
         | since you need to use some bits to specify that you're giving x
         | literally. See the first theorem of [1].
         | 
         | [1]
         | https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...
        
         | rhelz wrote:
         | re #1: the conflation is justified, but you couldn't guess that
         | just from what was presented in the OP. There are some cool
         | theorems which justify it tho---if you like Kolmogorov
         | complexity you are in for a fun ride.
         | 
         | re #2: No. Basically the > part of it handles the case when the
         | smallest program which prints out the string is actually LARGER
         | than the length of the string. In that case, the string is
         | still incompressible. Compression means mapping from larger
         | strings to smaller strings.
        
       | tromp wrote:
       | > let's assume that there exists a universal language U
       | 
       | Why not specify it?
       | 
       | > That gives us the true language-agnostic definition of
       | Kolmogorov Complexity as follows:
       | 
       | Choosing the language of Turing Machines does not make the
       | definition language agnostic.
       | 
       | Aiming for the simplest definition of description complexity, I
       | instead based my definitions on the older computational model of
       | lambda calculus in [1].
       | 
       | Unlike the assumed UTM above, the universal lambda machine is
       | easy to describe in detail:                   (l 1 1) (l l l 1 (l
       | l l l 3 (l 5 (3 (l 2 (3 (l l 3 (l 1 2 3))) (4 (l 4 (l 3 1 (2
       | 1)))))) (1 (2 (l 1 2)) (l 4 (l 4 (l 2 (1 4))) 5)))) (3 3) 2) (l 1
       | ((l 1 1) (l 1 1)))
       | 
       | Furthermore, it allows almost identical definitions of various
       | variations of descriptional complexity, namely
       | 
       | 1) plain complexity
       | 
       | 2) prefix complexity
       | 
       | 3) monotone complexity
       | 
       | all of which have their application in Algorithmic Information
       | Theory [2].
       | 
       | [1]
       | https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d8...
       | 
       | [2] https://homepages.cwi.nl/~paulv/kolmogorov.html
        
         | rhelz wrote:
         | There is another, more insidious, problem with trying to give a
         | language agnostic definition: Different languages will have
         | different symbols which are outputable.
         | 
         | If you have a Turing machine which can only print out binary
         | digits, then it can't print out a chinese character, no matter
         | how long the input program is.
         | 
         | Yeah, you can do something like unicode, and associate a binary
         | string with each chinese character--but printing out that
         | binary string is not printing out the chinese character. It's
         | printing out a binary string.
         | 
         | In particular, your lambda-calculus based turing machine cannot
         | print out chinese characters. It therefore cannot be used to
         | define a _universal_ complexity for any string.
        
           | mxkopy wrote:
           | I feel like a conversion from binary strings to
           | Unicode/Chinese characters would be in PTIME, so adding a
           | conversion machine would be a nonfactor for languages in most
           | complexity classes.
        
             | smallnamespace wrote:
             | The stronger result here is that any sort of conversion you
             | can explicitly specify can be turned into a program.
             | 
             | Since Kolmogorov Complexity is specified in terms of
             | lengths of programs, that means the KC between two
             | different pairs of encodings can differ at most by a
             | constant amount (the size of the program that converts back
             | and forth).
             | 
             | The above is a bit handwavey, there are details you can
             | tighten up (Is it the program size or something smaller?
             | The program length in _which_ encoding?), but heuristically
             | that 's why theorists can talk about "the" Kolmogorov
             | complexity without getting bogged down in with pesky
             | encoding details.
             | 
             | It's also why we usually don't worry too much about the
             | fine details of Turing Machines (alphabet, etc.), since you
             | can generally emulate one sort of Turing Machine pretty
             | easily with a short program written on another.
        
           | canjobear wrote:
           | Why is this a problem? No information is lost when characters
           | (or graphics) are encoded in binary.
        
           | SimplyUnknown wrote:
           | But Chinese (or mandarin) is not a context-free grammar
           | whereas I believe that encoding a language on a turing
           | machine implies a context-free grammar so this example
           | doesn't hold.
        
         | canjobear wrote:
         | The whole point of Kolmogorov complexity is that description
         | lengths under different Turing-complete description languages
         | (such as UTM and lambda calculus) are only different up to a
         | constant that depends on the languages and not on the thing
         | being described.
        
       | avmich wrote:
       | > Another thing to note is that Kolmogorov complexity of a string
       | cannot be computed. There cannot exist a computer that will
       | always guarantee the Kolmogorov complexity for all the strings.
       | 
       | Sounds a bit puzzling. Surely for a particular programming
       | language we can enumerate all programs, ordered by length etc.
       | and check which is the shortest one giving the given string. So
       | what's uncomputable here? For long strings that could take long
       | time, but - ?
        
         | MutualMisinfo wrote:
         | The programs we check might not halt.
        
         | floobertoober wrote:
         | With a Turing complete language, you can't know whether a given
         | program eventually yields the string, or continues indefinitely
        
         | tromp wrote:
         | > So what's uncomputable here?
         | 
         | Deciding whether the universal machine will _ever_ halt on a
         | particular input. I.e. the good old halting problem.
        
       | rhelz wrote:
       | I think its bootless to try to define the "minimum possible"
       | Kolmogorov complexity. Here's why:
       | 
       | 1. Note, kolmogorov complexity is defined by the length of the
       | shortest _program_ which prints out the string. What counts is
       | the _number_ of instructions, and not the _complexity of those
       | instructions_.
       | 
       | 2. So say S is a very complex spring. We can always construct a
       | turing machine which could print out S using a zero length
       | program: it could just start in a state which prints out S when
       | you turn it on, and then halts.
       | 
       | 3. So there is no such thing as a turing machine which prints out
       | _every_ string shorter than _any_ other turing machine prints it
       | out, QED.
       | 
       | That's the bad news. The good news is we don't even need to do
       | that. For any string S, say that M and N are any two universal
       | turing machines. Without loss of generality, specify that KM(S)
       | <= KN(S). Then there is always some C for which KM(S) <= KN(S) +
       | C. The constant C being the length of the program required to
       | _emulate_ machine M on machine N.
       | 
       | We are used to abstracting out constant sums and constant factors
       | like this. The strings we are dealing with (as a species) are
       | growing in length exponentially--that's why we went from 8-bit,
       | to 16bit, etc computers. So as the length of S goes to infinity,
       | the difference between the its complexity for any two machines
       | becomes negligible.
        
       | alfanick wrote:
       | A side question: is this taught in CS curriculum you know? It was
       | at my uni (fairly good one, in a minor European country), and
       | this experience biases me because I assume every CS knows
       | Kolmogorov complexity.
        
         | quibono wrote:
         | Yes, at least in the UK. From working through some US
         | university curricula - it's also present there as well.
        
       | davesque wrote:
       | Something I've always noticed with the notion of Kolmogorov
       | complexity is that the question of determining the lowest level
       | of computation is problematic.
       | 
       | For example, in the article, the author first defines the basic
       | idea of KC. But then they correctly point out that the basic idea
       | depends very much on the exact language that is chosen. So they
       | describe how theorists have defined the notion of universal
       | computation. But even this adjustment doesn't seem to escape the
       | fact the we still depend on a system of mathematical symbols to
       | describe the theory. And the notion of a Turing machine itself
       | depends on other abstract concepts such as time and space, each
       | with their own inherent, conceptual complexity. What sorts of
       | minds (i.e. brains) are required to make sense out of the theory
       | and what physical system is required for them to operate
       | correctly? If the definition of KC includes a notion of how
       | complex the Turing machine is that is required to compute a
       | string, then the further down you go, the less the difference in
       | complexity should be between any one string and another. After
       | all, they all exist in the same universe!
       | 
       | I guess it just goes to show how much the idea of KC lives in the
       | realm of theory. As soon as you pose the question of complexity
       | so abstractly, you invite in all kinds of theoretical
       | considerations that make the meaning more slippery. That's why KC
       | really doesn't deserve to be compared to Shannon entropy as it
       | often is.
       | 
       | But let me draw a comparison anyway like I said you shouldn't!
       | Because Alice from the article could also have made a strong
       | argument against Bob by just pointing out that the Shannon
       | entropy of his string was lower, which is very relevant in terms
       | of the number of heads or tails and the likelihood of seeing a
       | particular count of them.
        
         | veerd wrote:
         | 1. Choice of language only matters up to an additive constant
         | (e.g. you could just write a simulator so language A can run
         | language B).
         | 
         | 2. If you want something with less physical grounding, you
         | could use lambda calculus instead of Turing machines.
         | 
         | 3. Kolmogorov Complexity and Shannon Entropy are compared with
         | one another because they both are talking about the same thing:
         | optimal compression. Kolmogorov Complexity talks about the
         | compressibility of individual objects and Shannon Entropy talks
         | about compressibility of streams of i.i.d. random variables.
        
       | anonzzzies wrote:
       | Kolmogorov complexity is a lovely subject and one of the more
       | influential ones in my life.
       | 
       | THE book https://link.springer.com/book/10.1007/978-0-387-49820-1
       | is absolutely a thing to read. It was for me 30 years ago and it
       | aged well.
        
         | derbOac wrote:
         | That is a great book on the subject -- the authors have
         | published some important work in this area in papers as well.
        
       ___________________________________________________________________
       (page generated 2024-03-30 23:00 UTC)