[HN Gopher] How to fit any dataset with a single parameter
___________________________________________________________________
How to fit any dataset with a single parameter
Author : tambourine_man
Score : 114 points
Date : 2021-09-29 18:50 UTC (4 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| axegon_ wrote:
| Hypothetically you could achieve something similar using this[1].
| Note the word "hypothetically".
|
| [1] https://github.com/cakenggt/Library-Of-Pybel
| caterama wrote:
| This is beautiful. The real numbers are uncountable, so seems
| intuitive that could approximate any other space with them...
| basically a hash function. But this one has an inverse! It's so
| cool that someone actually implemented this.
| Sharlin wrote:
| You don't need uncountability. Any nontrivial interval of
| _rational_ numbers contains every finite string that exists, in
| fact infinite copies of each!
| blauditore wrote:
| It's not about uncountability, but simply that their space is
| infinite. The same would work with (unbounded) integer numbers.
| bflesch wrote:
| This sounds like it could explain many ML models ;)
| darawk wrote:
| I've thought about this for a while. I didn't go to the trouble
| of actually inventing the function, but just the fact that it
| ought to be possible in principle is sufficient to change your
| perspective on some things. Most notably, parameter count based
| information criteria.
|
| AIC, BIC, et al. need to be reformulated for each model in which
| they're used, and not all parameters can be treated as fungible.
| shmageggy wrote:
| Prior work: "One parameter is always enough", Piantadosi (2018)
|
| https://www.google.com/search?q=one%20parameter%20piantadosi
|
| Haven't read the posted article but sounds like the same idea and
| motivation
| ww520 wrote:
| Let me guess before reading the article. Infinite precision float
| point number?
| aappleby wrote:
| very silly. :)
| cs702 wrote:
| Fun read. But if we allow ourselves as much precision as we need,
| we don't even need a parameter. Any _constant_ that is a _normal
| number_ should suffice. Such constants already contain every
| possible sequence of digits you could muster -- i.e., they
| already contain every possible dataset.
|
| EDIT: I replaced "transcendental" with "normal" after reading
| Scarblac's comment below:
| https://news.ycombinator.com/item?id=28699622 -- many important
| transcendental numbers, including p (Pi), are thought to be (but
| have not been proven to be!) normal.
| KerrickStaley wrote:
| It's not true that any transcendental number would work. A
| transcendental number is a number which isn't the root of any
| polynomial with rational coefficients. This property doesn't
| imply that it contains all number sequences.
|
| It hasn't even been proven that pi contains all number
| sequences.
|
| See https://math.stackexchange.com/questions/216343/does-pi-
| cont... for more details.
| cs702 wrote:
| You're _right_. I edited my comment. Thank you for pointing
| it out!
| creddit wrote:
| This actually isn't known to be true afaict:
| https://www.askamathematician.com/2009/11/since-pi-is-infini...
| cs702 wrote:
| You're _right_. I edited my comment. Thank you for pointing
| it out!
| Scarblac wrote:
| I don't believe just being transcendental is sufficient for
| that property.
|
| E.g. I can't imagine that pi with all the '1' digits replaced
| by '2' isn't transcendental, but it clearly doesn't contain
| every sequence of digits.
|
| I think you mean normal numbers.
| cs702 wrote:
| You're _right_ : normal, not transcendental.
|
| IIRC, most real numbers are normal, and many important
| transcendental numbers are thought to be (but have not been
| proven to be) normal.
|
| I edited my comment. Thank you for the correction!
| gnramires wrote:
| Right, but any constant doesn't "encode" this information. To
| use a normal constant to encode information, you need to encode
| the location of the substring of interest. Generally, the
| location of the substring of interest needs to require as many
| bits as the substring itself (unless there's an intimate
| relationship between the number and the substrings of
| interest). So arguably it's the location that encodes the data,
| the number is irrelevant (and why not encode the data directly
| into this location?).
| civilized wrote:
| Too easy. Now fit any dataset using a substring of digits from
| the binary expansion of pi
| zwieback wrote:
| more digits than I care for
| idiotsecant wrote:
| First, this is a fun implementation and I love it.
|
| Second, you could as easily embed an infinite size dataset into
| an infinitely long binary string and say that you've reproduced
| your dataset with a 'single' parameter! That's sort of what this
| is doing, with some extra steps.
| hnlmorg wrote:
| This reminds me of the fun thought experiment of using
| /dev/random as storage. Given an infinite amount of time you'll
| find every file you need.
| laumars wrote:
| Your second point is what I first assumed this paper was doing
| before I read it.
| cobookman wrote:
| Up until ARG_MAX is reached, or you run out of RAM that is...
| https://serverfault.com/a/844666
| [deleted]
| asimjalis wrote:
| The word "single parameter" is doing a lot of work in this
| statement.
| llimllib wrote:
| intentionally:
|
| > In addition to casting doubts on the validity of parameter-
| counting methods and highlighting the importance of complexity
| bounds based on Occam's razor such as minimum description
| length (that trade off goodness-of-fit with expressive power),
| we hope that fa may also serve as entertainment for curious
| data scientists
| air7 wrote:
| This reminds me of a joke idea I read somewhere: You can encode
| the entire Encyclopaedia Britannica using a single mark on a
| simple stick!
|
| Just encode the text as a ascii codes after the decimal dot of a
| zero. (0.656168.. etc). Then just mark that ratio of the sticks
| length and you're done...
| alittlesalami wrote:
| This is Arithmetic Coding:
| https://en.wikipedia.org/wiki/Arithmetic_coding
| ohazi wrote:
| You jest, but this is almost exactly how arithmetic coding
| works.
|
| If you arrange your symbols and contexts carefully, you can
| even use this as a technique for progressive or lossy
| compression -- i.e. the more accurately you specify the ratio,
| the higher fidelity your result.
| thehappypm wrote:
| You can do it easier than that -- just make a stick of length
| .65168 meters. (Same idea, just the "full 1 meter stick" is
| defined elsewhere, not by the length of the stick).
| tshaddox wrote:
| That's way harder, because you have to cut a stick to write
| and have a meter ruler to read.
| an1sotropy wrote:
| I remember reading about this as a kid in one of Martin
| Gardner's books, either "aha!" or "aha! Gotcha"
| gh02t wrote:
| And of course in a footnote of the documentation for this
| encoding "Sufficiently precise measurement of this mark left as
| an exercise to the reader."
| mjburgess wrote:
| I don't see this as a joke, but a radical and important point.
|
| Reality, in being geometrical, is infinitely informationally
| dense (with a discrete conception of information).
|
| This distinction between geometrical space and time, and
| discrete algorithmic computability is unbridgeable.
|
| And hence there is an extemely firm footing on which to reject:
| AI, brain scanning readers, teleporters, etc and most sci-fi
| computationalim.
|
| Almost nothing can be simulated, as in, realised by a merely
| computational system.
| joshgrib wrote:
| I just wanted to add that reality being a non-discrete
| geometric thing is still an assumption - we don't know for
| sure that it isn't discrete and a lot of quantum stuff points
| more toward a discrete reality than a continuous one.
|
| So assuming continuous space/time and discrete information
| then I'd agree, but as far as we know space/time aren't
| continuous, but just appear that way to us. It doesn't seem
| like we know for sure that it is discrete, but at the least
| I'd say it's solid evidence that stuff like brain scanning is
| definitely in the realm of possibility.
|
| This answer from the Physics StackExchange nicely covers how
| time/space could appear continuous to us even if they are in
| fact discrete at lower levels. Also some interesting
| discussion in the other answers
| https://physics.stackexchange.com/a/35676
| mjburgess wrote:
| I think for my purposes defining continuous = unmeasurably
| discrete produces the same results.
|
| Ie., there is an irreducible geometrical continuity in the
| sense that no discontinuity can ever appear. The state
| density is maximal.
|
| via this route we reporduce the same point:
| computationalism/simulation'ism' is then just the thesis
| that computers qua measurably discrete systems can realise
| dense unmeasurable discrete systems.
|
| This can be shown to be impossible with much the same
| argument: spatial and temporal geometrical properties
| obtain in virtue of dense discreteness; and fail to obtain
| at measureable levels.
|
| The key property of continuity is its irreducibility to
| measurably discrete systems. That irreducibility isn't,
| however , limited to continuity .
|
| Wolfram makes this point about the failures of reductionism
| in a perfectly discrete context, ie., that no CA can
| compute a CA whose complexity is greater than it can
| summarise.
|
| I prefer to press a continuous angle: our best theories of
| all of reality are continuous and geometrical . That energy
| levels are discrete in bounded quantum systems has almost
| nothing to do with the indispensability of contintuous
| mathemtatjcs in every known physical system -- including
| that very bounded wavefn
| whatshisface wrote:
| It's computational again when you allow the computation to be
| precise enough for any practical purpose, which is not
| infinitely precise.
| wallacoloo wrote:
| These are weird conclusions. Any attempt to measure "reality"
| gives some amount of uncertainty. The only way for this to
| lead to the relatively stable experience you perceive is if
| those small variations in measurement lead to relatively
| small differences in perception. In which case, you can
| truncate the resolution of your simulation and still get
| plausible results.
|
| I assure you there are plenty of groups out there simulating
| systems that operate with similar densities (but lower
| volume) to the brain.
| mgraczyk wrote:
| I wouldn't say that the distinction is "unbridgeable". Keep
| in mind that we send packets over plenty of "geometrical" and
| seemingly infinitely dense analog channels all the time, like
| electricity over a copper wire or EM waves over the air.
|
| The "mark on a stick" channel has a capacity like any other
| channel. If you're sending just one symbol, you could easily
| calculate the information capacity given a desired
| probability of bit-error.
|
| Assuming you can put the mark in exactly the right spot, you
| can model the "noise" as a distribution over the values that
| the reader will measure. If you model this as `mark + zero-
| mean normal distribution` with a known variance, then your
| stick is just an AWGN channel.
| tbabb wrote:
| I think there are at least two things wrong with this take:
|
| One is that I don't think it follows from the premise that
| the continuity of the physical world precludes AI, brain
| scanning, etc. Even if the physical world were continuous
| (likely not, see below), an arbitrary degree of approximation
| could be attained, in principle. At the _very least_ I would
| not call the footing "firm".
|
| The second is that the universe is very likely not continuous
| anyway. The Beckenstein bound[1] puts an upper limit on the
| number of bits of information a region of space may contain.
| If the ruler tickmark were either measured or localized to
| the precision required to encode the information, the
| information density would cause it to collapse into a black
| hole. This would happen once your measurement needs to be
| about as precise as a Planck length, which would allow you to
| encode about 115 bits of information with your tickmark.
|
| (This in of itself is independent of the fact that you would
| need to construct the ruler out of objects that the universe
| permits; your ruler tickmark would need to be made of and
| measured with discrete fundamental particles, which by their
| very nature are quantized).
|
| [1] https://en.wikipedia.org/wiki/Bekenstein_bound
| oever wrote:
| Or use a larger stick. Every doubling of the stick length
| gives another bit of information. A stick of one light-year
| length would add about 53 bits.
| tbabb wrote:
| The main point I'm making is that the information density
| of the physical world is not unlimited as GP suggests.
|
| But I think that example just shows how few bits you can
| really get out the exercise!
| [deleted]
| dlivingston wrote:
| Even if space and time were continuous (which things like the
| Planck length would discredit), there are still discrete
| objects in that continuum.
|
| Elementary particles, for example, are discrete. You could
| argue that they have continuous effects vis a vis the EM
| field and spatial positioning, but ensemble effects usually
| render that irrelevant at large enough scales.
| tsimionescu wrote:
| I will note that even in QM, both space and time are
| considered continuous - the Planck length is just a
| smallest measurable distance, but nothing in QM currently
| assumes that particles must be separated by an integer
| multiple of the Planck length (unlike spins, for example).
|
| I believe there are some theories of quantum gravity that
| do rely on the idea that space-time is quantized in integer
| multiples of Planck's length, but these are far from
| definitive theories.
|
| A much more relevant limit in terms of possible information
| density is Heisenberg's uncertainty principle, which
| essentially puts a limit on the maximum possible precision
| for any measurement.
| yongjik wrote:
| In theory, pi has infinite digits. You could publish a book
| of a trillion digits of pi, and you have barely scratched the
| surface: in fact you published a precisely 0.00000% of all
| digits of pi.
|
| In practice, you "only" need ~42 digits of pi to draw a
| circle spanning the entire known universe (diameter of 8.8 *
| 10^26 m) and it will deviate from the ideal circle by less
| than the size of a proton (0.8 * 10^-15 m).
|
| Having a theoretically infinite precision does not mean that
| it makes a measurable difference.
| Kranar wrote:
| Every number has infinite digits or can be made to have
| infinite digits for a given representation, but that's not
| the same as a number having an infinite amount of
| information. PI represents a finite amount of information,
| as opposed to say a number like Chaitin's constant which
| represents an infinite and irreducible amount of
| information:
|
| https://en.wikipedia.org/wiki/Chaitin%27s_constant
| tsimionescu wrote:
| This is not just a false idea, but an obviously false one,
| contradicted by all the laws of physics. If you were right,
| then any finite volume would contain an infinite amount of
| information, which would mean it has infinite entropy,
| temperature, and energy.
|
| Also, by the same logic you apply to space, you could say
| that time is infinitely divisible, so you could create a
| computer which finishes an infinite amount of steps in a
| finite amount of time.
| mjburgess wrote:
| information here, ie log of a probability, is a continuous
| notion -- it is real-valued, as not least, log is a
| transcendental fn --
|
| i specifically said with a /discrete/ conception , geometry
| is infinitely dense (of discrete states)
| tsimionescu wrote:
| In all physical theories, any finite system has a finite
| number of _distinguishable_ states. So it is not
| infinitely informationally dense, especially when working
| with discrete bits of information.
|
| Not to mention, the finer the distinctions between two
| states of a system, the more energy you need to
| distinguish them. So, the less impact these differences
| can have, unless the system is extraordinarily energetic
| (and even then, you end up in fundamental limits of
| energy per volume, like the Schwarzschild radius).
|
| So again, there is no sense in which a finite part of the
| universe is universally dense.
|
| Even worse for your argument, all currently known laws of
| physics use computable functions (the randomness in QM
| notwithstanding). So, by definition, all known laws of
| physics can be simulated by an ideal Turing machine
| (again, give or take some randomness in QM, depending on
| the interpretation you chose to believe in and on how you
| chose to simulate the QM system).
| rnhmjoj wrote:
| There's an interesting paper[1] that argues that real
| numbers aren't physical, for the precise reasons you
| stated. That is, only the subset of real numbers that
| contain a finite amount of information (like constructive
| real numbers) are physically meaningful.
|
| A consequence of this is that classical physics is not
| really deterministic: this is because, in general (ie.
| including chaotic systems), the evolution of a system
| depends on a set of initial conditions that are specified
| by full real numbers, impossible to measure with finite
| precision. So, the use of real numbers is hiding the
| indeterminism in the initial conditions, much like the
| function in this article encodes a dataset in a single
| parameter.
|
| [1]: https://arxiv.org/abs/1803.06824
| tsimionescu wrote:
| I think the precise definition of what exactly is
| physically meaningful is more interesting a bit, because
| pi is just as 'real' as 1 in some sense. You can't any
| more precisely measure 1m than pi meters. A perfect
| square doesn't exist any more or less precisely than a
| perfect circle.
|
| Basically, the universe can just as well be approximated
| in two different ways. One, it's all straight lines, and
| anything that looks like a circle/curve is actually a
| piece of a veeery many sided polygon if you look closely
| enough at it. Two, it's all curves of some curvature, and
| anything that looks like a straight line is actually a
| segment of a veeery large circle. The first perspective
| corresponds to taking the rational numbers as physical.
| The second one corresponds to taking pi as physical (and
| then it's unclear whether e is also physical, or which
| other transcendental numbers are, but 1 is definitely un-
| physical then).
|
| Probably the constructive framework is the best way to
| express this concept, just starting from whatever
| constant we chose to define as the unit.
| Ginden wrote:
| Existence of analog reality is quite problematic, because it
| opens can of worms - eg. hypercomputation is possible.
| gnramires wrote:
| I used to think there must be something 'special' to brains
| to distinguish us from computers. There isn't. Brains encode
| finite amounts of information (quantum mechanics seems to
| imply bounded local information). We are a huge information
| network ourselves -- that's what consciousness is (with some
| added bits like self-identity and various particulars
| structures that dictate the _character_ of our experience).
|
| But that doesn't mean brains aren't special -- it means
| brains are special _and_ computers are special. Even more: it
| seems to imply computers, AI, etc. can be as special as
| ourselves, sentient, and perhaps even more special in ways we
| haven 't realized yet.
|
| It's difficult to even imagine a physical theory with
| unbounded local information. It seems to open the possibility
| to crazy things like hypercomputation, which do not seem very
| well defined. (For example: every 1-1/n seconds, (n>1) from
| now, flip a switch ON/OFF. At what state will the switch be
| at t>1s? An at exactly t=1s?)
|
| Note: while information and information flow itself bounded
| (hence no hypercomputation), I don't know of any obvious
| objections to continuous time. (I'm not sure the continuity
| of time has any profound implications)
| mjburgess wrote:
| the issue isn't brains
|
| no algorithm running on a cpu can move a muscle --
|
| it is precisely that movement is a spatiotemporal property
| which means no turing machine can realise it
|
| movement isn't a symbolic operation
| tsimionescu wrote:
| This is such a strange claim to make, I can't even tell
| what you could have meant. We are surrounded by
| simplistic mechanical computations moving muscles , and
| have been for more than a hundred years, since the
| industrial revolution at least.
| gnramires wrote:
| What do you suppose those computers are doing, then? ;)
|
| https://www.youtube.com/watch?v=tF4DML7FIWk
|
| https://www.youtube.com/watch?v=tfb6aEUMC04&t=115s
| idiotsecant wrote:
| Reality isn't quite infinitely dense though, it's hard to
| imagine how you could encode data geometrically over a
| distance smaller than the planck length, for example. Reality
| just has a very, very fine resolution.
| Rapzid wrote:
| A quick Google shows 6.25e+34 plank lengths in a meter. So,
| seems impossible to measure and make the mark.
|
| Edit: As a fraction of appended ascii codes.
| petercooper wrote:
| And if the mark were instead the removal of a single atom
| in a 40cm length, the resolution would be even poorer..
| able to encode a single 30 bit number by my very rough
| calculation.
| dnautics wrote:
| > And hence there is an extemely firm footing on which to
| reject: AI
|
| This is like saying it's impossible to build a water pump
| without solving the quantum mechanical interactions that
| govern water flow.
| [deleted]
| WithinReason wrote:
| How do you know reality is not computational?
| meowphius wrote:
| That is from the novel "Hard-Boiled Wonderland and the End of
| the World" by Haruki Murakami[0].
|
| [0]: https://everything2.com/title/Encyclopedia+on+a+toothpick
| joiguru wrote:
| One theoretical objection to this idea is that distances
| measured have an accuracy limit of Planck length (1.616e-35
| meters). So if your number needs more precision then "just
| mark" step can't be done.
| pveierland wrote:
| My shot at the calculation for fun :-)
|
| Stick encoding with graphite resolution (0.335 * 10^-9 meter)
| [1]: "Uti" (31 bits -> 3 UTF8 characters)
|
| Stick encoding with Planck resolution (1.616255 * 10^-35 meter)
| [2]: "Utility ought " (115 bits -> 14 UTF8 characters)
|
| Complete first sentence: "Utility ought to be the principal
| intention of every publication." [3]
|
| It appears that this storage scheme may not be suited towards
| the safekeeping of literature.
|
| [1]
| https://www.wolframalpha.com/input/?i=floor%28floor%28log_2%...
|
| [2]
| https://www.wolframalpha.com/input/?i=floor%28floor%28log_2%...
|
| [3] https://digital.nls.uk/encyclopaedia-
| britannica/archive/1441...
| Nition wrote:
| "Simply" scale up the stick to your desired minimum level of
| precision!
| michaelcampbell wrote:
| This reminds me of the "worst way to ask a user for a telephone
| number" UI, after one UI forced the user to select each digit in
| a 0-9 dropdown.
|
| The specific one I'm thinking of is just spit out a scrollable
| numeric string of Pi, and make the user scroll until their phone
| number was the digits of Pi that matched it.
|
| My (7 digit) phone number occurs after digit position 9_000_000,
| and occurs 21 times in the first 200_000_000 digits.
| albertzeyer wrote:
| I think it is still not proven that Pi contains all possible
| strings.
|
| https://math.stackexchange.com/questions/216343/does-pi-cont...
| throwawaytemp27 wrote:
| I wonder if you gave away your phone number there? Or how many
| 7 digit numbers match those criteria?
| a-dub wrote:
| this made me smile. :)
|
| oh, uncountable infinities.
| kzrdude wrote:
| How to have fun with mpmath :)
| jldugger wrote:
| >The purpose of this paper is to show that all the samples of any
| arbitrary dataset X can be reproduced by a simple differentiable
| equation: fa(x) = sin^2 (2^xt arcsin [?]a)
|
| So... use a PRNG?
| andi999 wrote:
| Doesn't this exclude negative numbers?
___________________________________________________________________
(page generated 2021-09-29 23:00 UTC)