[HN Gopher] Machine Learning, Kolmogorov Complexity, and Squishy...
___________________________________________________________________
Machine Learning, Kolmogorov Complexity, and Squishy Bunnies (2019)
Author : coolvision
Score : 58 points
Date : 2021-02-04 15:34 UTC (7 hours ago)
(HTM) web link (www.theorangeduck.com)
(TXT) w3m dump (www.theorangeduck.com)
| chriswarbo wrote:
| Very cool. My only niggle is that PCA doesn't 'feel like'
| Kolmogorov Complexity to me: it's much closer to Shannon
| information than Algorithmic information.
|
| Shannon information depends on population statistics of messages,
| usually has a 'shallow' and domain-specific interpreter (to turn
| encodings back into messages, e.g. like a Huffman tree), and
| tries to minimise the average length of a _set_ of encoded
| messages. In this case the NN is the interpreter, the length of
| the encoding is the number of deformation axes, and the
| population is the training data (which we hope is representative
| of the real data).
|
| Algorithmic information is different. It depends on patterns
| within an _individual_ message, rather than across a population.
| The interpreter is a Universal Turing Machine (or equivalent),
| which is general-purpose. Encodings are programs which output a
| message, and can be arbitrarily 'deep' by running for an
| arbitrary amount of time.
|
| Another way to look at it is: using PCA to solve the problem of
| 'automatically decide which deformation axes to include' is
| slightly interesting; much _more_ interesting is the problem of
| 'how can we automatically decide that this data should be
| modelled as a set of deformations along various axes?'. That's a
| _much_ harder problem, and tends to require a lot of search
| through program-space (whether NNs or GOFAI).
| yters wrote:
| Shannon info and KC are asymptotically equivalent.
| chriswarbo wrote:
| That's why I used 'scare quotes'. It's very similar (perhaps
| provably equivalent, I'm not sure) to how Frequentist and
| Bayesian probabilities are asymptotically equivalent; yet
| offer their own perspective on few/one-shot cases.
|
| In my experience, general practice w.r.t. Shannon information
| is to manually pick a model with some numerical parameters
| (Huffman tree, deformation-axes-NN, etc.), then set those
| parameters to work well on an example population.
|
| General practice w.r.t. algorithmic information seems to
| involve searching program space for a good model. The
| adjustable part is the language, which can certainly be
| manually tailored to the application, but doesn't tend to be
| the bulk of the work (e.g. _automatically_ searching for an
| efficient language tends to be redundant once we 're doing
| program search; since those programs will contain their own
| abstractions in any case).
| wfhpw wrote:
| Hugged until KO:
| https://web.archive.org/web/20210204154139/http://www.theora...
| mywittyname wrote:
| PCA can solve a surprisingly large number of problems. And when
| it can't directly solve a problem, PCA can be used to as a
| preliminary analytics step to rank features by weight/impact
| which can then be combined with iterative methods to solve other
| types of problems.
|
| PCA was my introduction to data science and it remains one of my
| favorite tools to pull out. I'm really surprised by the number of
| data scientists that I meet who never use it, or have even never
| heard of it. Then again, even Data Science From Scratch dedicates
| like two pages to the subject, so maybe it is not something
| modern data science programs spend any time on.
| uoaei wrote:
| I mostly use it to compress my input features, i.e., to
| minimize the number of inputs needed on a given model,
| accepting a tiny loss in accuracy. This is fine because with
| linear models (such as the first/last layer of an NN) it merely
| amounts to a rotation, and the mapping back to the original
| feature space is well-defined.
| fwsgonzo wrote:
| Very illuminating. Must have taken a long time to write that blog
| post.
|
| Did the PS5 have a chip that could execute small neural networks?
| My PC doesn't have one right now, but if I bought the latest in
| GPU there is a dedicated chip already. This is all very new to
| me.
| [deleted]
| derbOac wrote:
| I remember this essay and liked it quite a bit. I do research in
| this area and, although there's an armchair aspect to the piece,
| there's also something insightful in it about the "memory versus
| computation" paradigm. The way the algorithmic complexity
| literature discusses modeling I think is missing something -- an
| intuitive treatment of the tradeoffs involved in different types
| of data representations -- and this schema I think is on to
| something.
| [deleted]
| PartiallyTyped wrote:
| Another achievement of ML is the Support Vector Machine (SVM),
| and I'd say that it represents better the tradeoff between space
| and computation.
|
| In fact, SVMs literally make that tradeoff by using the Kernel
| trick to save up on the computational costs of projecting to
| higher, possibly infinite dimensions. As we know, projecting up
| to higher dimensions can provide us with a 100% accuracy on the
| training data, but of course, that's overfitting and we use
| constrains/support vectors to regularize the learner.
___________________________________________________________________
(page generated 2021-02-04 23:01 UTC)