[HN Gopher] Fitting an elephant with four non-zero parameters
___________________________________________________________________
Fitting an elephant with four non-zero parameters
Author : belter
Score : 177 points
Date : 2024-07-14 14:27 UTC (8 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| EdwardCoffin wrote:
| Freeman Dyson recounts the episode [1] that inspired this paper
| in his web of life interviews (prepositioned to the fitting an
| elephant bit) [2]
|
| [1] https://youtu.be/hV41QEKiMlM
|
| [2] https://youtu.be/hV41QEKiMlM?t=118
| parker-3461 wrote:
| Thanks for linking these, I was not very familiar of these
| works/discussions taking place in the past, but these really
| helped establish the context. Very grateful that these videos
| are readily available.
| EdwardCoffin wrote:
| I listened to the whole series with Dyson some time in the
| past year. It was well worth it. I also listened to the
| series with Murray Gell-Mann [1] and Hans Bethe [2]. All time
| well worth spending, and I've been thinking of downloading
| all the bits, concatenating them into audio files, and
| putting them on my phone for listening to when out on walks
| (I'm pretty sure the videos do not add anything essential:
| it's just a video of the interviewee talking - no visual
| aids).
|
| [1] https://www.youtube.com/playlist?list=PLVV0r6CmEsFxKFx-0l
| sQD...
|
| [2] https://www.youtube.com/watch?v=LvgLyzTEmJk&list=PLVV0r6C
| mEs...
| lazamar wrote:
| Lol. Loved it.
|
| This was a lovely passage from Dyson's Web of Stories interview,
| and it struck a chord with me, like it clearly did with the
| authors too.
|
| It happened when Dyson took the preliminary results of his work
| on the Pseudoscalar theory of Pions to Fermi and Fermi very
| quickly dismissed the whole thing. It was a shock to Dyson but
| freed him from wasting more time on it.
|
| Fermi: When one does a theoretical calculation, either you have a
| clear physical module in mind or a rigorous mathematical basis.
| You have neither. How many free parameters did you use for your
| fitting?
|
| Dyson: 4
|
| Fermi: You know, Johnny Von Neumann always used to say 'with four
| parameters I can fit an elephant; and with five I can make him
| wiggle his trunk'.
| dheera wrote:
| I wish there was more humor on arXiv.
|
| If I could make a discovery in my own time without using company
| resources I would absolutely publish it in the most humorous way
| possible.
| btown wrote:
| There's plenty of humor on arXiv, and that's part of why it's
| so incredible!
|
| Some lists:
|
| https://academia.stackexchange.com/questions/86346/is-it-ok-...
|
| https://www.ellipsix.net/arxiv-joke-papers.html
| azeemba wrote:
| Consider posting this as a new post! It seems like a fun list
| to read through
| mananaysiempre wrote:
| Joke titles and/or author lists are also quite popular, e.g.
| the Greenberg, Greenberger, Greenbergest paper[1], a paper
| with a cat coauthor whose title I can't seem to recall (but
| I'm sure there's more than one I've encountered), or even the
| venerable, unfortunate in its joke but foundational in its
| substance Alpher, Bethe, Gamow paper[2]. Somewhat closer to
| home, I think computer scientist Conor McBride[3] is the
| champion of paper titles (entries include "Elimination with a
| motive", "The gentle art of levitation", "I am not a number:
| I am a free variable", "Clowns to the left of me, jokers to
| the right", and "Doo bee doo bee doo") and sometimes code in
| papers: letmeB this (F you) | you == me = B
| this | otherwise = F you
| letmeB this (B that) = B that letmeB this
| (App fun arg) = letmeB this fun `App` letmeB this arg
|
| (Yes, this is working code; yes, it's crystal clear in the
| context of the paper.)
|
| [1] https://arxiv.org/abs/hep-ph/9306225
|
| [2] https://en.wikipedia.org/wiki/Alpher%E2%80%93Bethe%E2%80%
| 93G...
|
| [3] http://strictlypositive.org/
| msp26 wrote:
| Pretraining on the Test Set Is All You Need
|
| https://arxiv.org/abs/2309.08632
| boywitharupee wrote:
| what's the purpose of this? is it one of those 'fun' problems to
| solve?
| jfoutz wrote:
| This quote might help -
| https://en.wikipedia.org/wiki/Von_Neumann%27s_elephant#Histo...
|
| yes, a fun problem, but also a criticism of using to many
| parameters.
| Steuard wrote:
| Sadly, the constant term (the average r_0) is never specified in
| the paper (it seems to be something in the neighborhood of 180?):
| getting that right is necessary to produce the image, and I can't
| see any way _not_ to consider it a fifth necessary parameter. So
| I don 't think they've genuinely accomplished their goal.
|
| (Seriously, though, this was a lot of fun!)
| rsfern wrote:
| They say in the text that it's the average value of the data
| points they fit to. I think whether to count it as a parameter
| depends on whether you consider standardization to be part of
| the model or not
| Steuard wrote:
| I see your point, that it's really just an overall
| normalization for the size rather than anything to do with
| the _shape_. I can accept that, and I 'll grant them the
| "four non-zero parameters" claim.
|
| Though in that case, I would have liked for them to make it
| explicit. Maybe normalize it to "1", and scale the other
| parameters appropriately. (Because as it stands, I don't
| think you can reproduce their figure from their paper.)
| lupire wrote:
| IIUC:
|
| A real-parameter (r(theta) = sum(r_k cos(k theta))) Fourier
| series can only draw a "wiggly circle" figure with one point on
| each radial ray from the origin.
|
| A compex parameter (z(theta) = sum(e^(z_ theta))) can draw more
| squiggly figures (epicycles) -- the pen can backtrack as the
| drawing arm rotates, as each parameter can move a point somewhere
| on a small circle around the point computed from the previous
| parameter (and recursively).
|
| Obligatory 3B1B https://m.youtube.com/watch?v=r6sGWTCMz2k
|
| Since a complex parameter is 2 real parameters, we should compare
| the best 4-cosine curve to the best 2-complex-exponential curve.
| elijahbenizzy wrote:
| This is humorous (and well-written), but I think its more than
| that.
|
| I'm always making the joke (observation) that ML (AI) is just
| curve-fitting. Whether "just curve-fitting" is enough to produce
| something "intelligent" is, IMO, currently unanswered, largely
| due to differing viewpoints on the meaning of "intelligent".
|
| In this case they're demonstrating some very clean, easy-to-
| understand curve-fitting, but it's really the same process --
| come up with a target, optimize over a loss function, and hope
| that it generalizes, (this one, obviously, does not. But the
| elephant is cute.)
|
| This raises the question Neumann was asking -- why have so many
| parameters? Ironically (or maybe just interestingly), we've done
| a _lot_ with a ton of parameters recently, answering it with
| "well, with a lot of parameters you can do cool things".
| visarga wrote:
| > Whether "just curve fitting" is enough to produce something
| "intelligent" is, IMO, currently unanswered
|
| Continual "curve fitting" to the real world can create
| intelligence. What is missing is not something inside the
| model. It's missing a mechanism to explore, search and expand
| its experience.
|
| Our current crop of LLMs ride on human experience, they have
| not largely participated in creating their own experiences.
| That's why people call it imitation learning or parroting. But
| once models become more agentic they can start creating useful
| experiences on their own. AlphaZero did it.
| soist wrote:
| AlphaZero did not create any experiences. AlphaZero was
| software written by people to play board games and that's all
| it ever did.
| visarga wrote:
| AZ trained in self-play mode for millions of games, over
| multiple generations of a player pool.
| soist wrote:
| I am familiar with the literature on reinforcement
| learning.
| pharrington wrote:
| They're saying the board games AlphaZero played with
| itself _are_ experiences.
| elijahbenizzy wrote:
| There are a whole bunch of assumptions here. But sure, if you
| view the world as a closed system, then you have a decision
| as a function of inputs:
|
| 1. The world around you 2. The experiences within your
| (really, the past view of the world around you) 3. Innateness
| of you (sure, this could be 2 but I think it's also something
| else) 4. The experience you find + the way you change
| yourself to impact (1), (2), and (3)
|
| If you think of intelligence as all of these, then you're
| making the assumption that all that's required for (2), (3),
| and (4) is "agentic systems", which I think skips a few steps
| (as the author of an agent framework myself...). All this is
| to say that "what makes intelligence" is largely unsolved,
| and nobody really knows, because we actually don't understand
| this ourselves.
| luplex wrote:
| I mean the devil is in the details. In Reinforcement Learning,
| the target moves! In deep learning, you often do things like
| early stopping to prevent too much optimization.
| soist wrote:
| There is no such thing as too much optimization. Early
| stopping is to prevent overfitting to the training set. It's
| a trick just like most advances in deep learning because the
| underlying mathematics is fundamentally not suited for
| creating intelligent agents.
| maitola wrote:
| In the case of AI, the more parameters, the better! In Physics
| is the opposite.
| elijahbenizzy wrote:
| One of the hardest parts of training models is avoiding
| overfitting, so "more parameters are better" should be more
| like "more parameters are better given you're using those
| parameters in the right way, which can get hard and
| complicated".
|
| Also LLMs just straight up _do_ overfit, which makes them
| function as a database, but a really bad one. So while more
| parameters might just be better, that feels like a cop-out to
| the real problem. TBD what scaling issues we hit in the
| future.
| will1am wrote:
| A dichotomy between these fields
| will1am wrote:
| Your humorous observation captures a fundamental truth to some
| extent
| maitola wrote:
| I love the ironic side of the article. Perhaps they should add
| the reason for it, from Fermi's and Neumann's. When you are
| building a model of reality in Physics, If something doesn't fit
| the experiments, you can't just add a parameter (or more) variate
| it and fit the data. The model should have zero parameters,
| ideally, or the least possible, or, even at a more deeper level,
| the parameters should emerge naturally from some simple
| assumptions. With 4 parameters you don't know whether you are
| really capturing a true aspect of reality of just fitting the
| data of some experiment.
| jampekka wrote:
| This was mentioned in the first paragraph of the paper. The
| paper is mostly humoristic.
|
| That said, the wisdom of the quip has been widely lost in many
| fields. In many fields data is "modeled" with huge regression
| models with dozens of parameters or even neural networks with
| billions of parameters.
|
| > In 1953, Enrico Fermi criticized Dyson's model by quoting
| Johnny von Neumann: "With four parameters I can fit an
| elephant, and with five I can make him wiggle his trunk."[1].
| This quote is intended to tell Dyson that while his model may
| appear complex and precise, merely increasing the number of
| parameters to fit the data does not necessarily imply that the
| model has real physical significance.
| karmakaze wrote:
| That's how I feel about dark matter. Oh this galaxy is slower
| than this other similar one. The first one must have less
| dark matter then.
|
| What can't be fit by declaring the amount of dark matter that
| _must be present_ fits the data? It 's unfalsifiable, just
| because we haven't found it, doesn't mean it doesn't exist.
| Even worse than string/M-theory which at least has math.
| edflsafoiewq wrote:
| It's easy to say "Epicycles! Epicycles!", but people are
| going to continue using their epicycles until a Copernicus
| comes along.
| dilawar wrote:
| Hmm..
|
| Hodgin and Huxley did ground-breaking work on squid's giant
| axon and modelled neural activity. They had multiple parameters
| extracted from 'curve fitting' of recorded potential and
| injected currents which were much later mapped to sodium
| channels. Similarly, another process to potassium channels.
|
| I woudnt worry too much having multiple parameters -- even four
| when 3 can't just explain the model.
| nyssos wrote:
| Neuron anatomy is the product of hundreds of millions of
| years of brute contingency. There are reasons why it can't be
| certain ways (organisms that were that way [would have] died
| or failed to reproduce) but no reason whatsoever why it had
| to be exactly _this_ way. It didn 't, there are plenty of
| other ways that nerves _could_ have worked, this is just the
| way they actually do.
|
| The physics equivalent is something like eternal inflation as
| an explanation for apparent fine-tuning - except that even if
| it's correct it's still absolutely nowhere near as complex or
| as contingent as biology.
| will1am wrote:
| The balance between empirical data fitting and genuine
| understanding of the underlying reality
| edflsafoiewq wrote:
| Isn't the form of an equation really just another sort of
| parameter?
| qarl wrote:
| Yes, it is.
|
| Which makes the only truly zero parameter system the
| collection of all systems, in all forms.
| tobias2014 wrote:
| This is why I think that modeling elementary physics is
| nothing else than fitting data. We might end up with
| something that we perceive as "simple", or not. But in any
| case all the fitting has been hidden in the process of ruling
| out models. It's just that a lot of the fitting process is
| (implicitly) being done by theorists; we come up with new
| models and that are then being falsified.
|
| For example, how many parameters does the Standard Model
| have? It's not clear what you count as a parameter. Do you
| count the group structure, the other mathematical structure
| that has been "fitted" through decades of comparisons with
| experiments?
| bee_rider wrote:
| Ya know, in academic writing I tend to struggle with making it
| sound nice and formal. I try not to use the super-stilted
| academic style, but it is still always a struggle to walk the
| line between too loose and too jargony.
|
| Maybe this sort of thing would be a really good tradition.
| Everyone must write a very silly article with some mathematical
| arguments in it. Then, we can all go forward with the comfort of
| knowing that we aren't really at risk of breaking new grounds in
| appearing unserious.
|
| It is well written and very understandable!
| xpe wrote:
| One take away: Don't count parameters. Count bits.
| Scene_Cast2 wrote:
| Better yet, count entropy.
| xpe wrote:
| Why "better"? Entropy in the information theoretic sense is
| usually quantified in bits.
| xpe wrote:
| Another take away (not directly stated in the article but
| implied): Counting the information content of a model is more
| than just the parameters; the structure of the model itself
| conveys information.
| will1am wrote:
| I think often underappreciated insight
| pharmacy7766 wrote:
| One parameter is enough:
| https://aip.scitation.org/doi/10.1063/1.5031956
| dweinus wrote:
| "This single parameter model provides a large improvement over
| the prior state of the art in fitting an elephant"
|
| Lol
| Nition wrote:
| Nice. This is like how you can achieve unlimited compression by
| storing your data in a filename instead of in the file.
| aqme28 wrote:
| > It only satisfies a weaker condition, i.e., using four non-zero
| parameters instead of four parameters.
|
| Why would that be a harder problem? In the case that you get a
| zero parameter, you could inflate it by some epsilon and the
| solution would basically be the same.
| Sesse__ wrote:
| They also, effectively, fit information in the indexes of the
| parameters. I.e., _which_ of the parameters are nonzero carries
| real information.
|
| In a sense, they have done their fitting using nine parameters,
| of which five are zero.
___________________________________________________________________
(page generated 2024-07-14 23:00 UTC)