[HN Gopher] Fitting an elephant with four non-zero parameters
       ___________________________________________________________________
        
       Fitting an elephant with four non-zero parameters
        
       Author : belter
       Score  : 177 points
       Date   : 2024-07-14 14:27 UTC (8 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | EdwardCoffin wrote:
       | Freeman Dyson recounts the episode [1] that inspired this paper
       | in his web of life interviews (prepositioned to the fitting an
       | elephant bit) [2]
       | 
       | [1] https://youtu.be/hV41QEKiMlM
       | 
       | [2] https://youtu.be/hV41QEKiMlM?t=118
        
         | parker-3461 wrote:
         | Thanks for linking these, I was not very familiar of these
         | works/discussions taking place in the past, but these really
         | helped establish the context. Very grateful that these videos
         | are readily available.
        
           | EdwardCoffin wrote:
           | I listened to the whole series with Dyson some time in the
           | past year. It was well worth it. I also listened to the
           | series with Murray Gell-Mann [1] and Hans Bethe [2]. All time
           | well worth spending, and I've been thinking of downloading
           | all the bits, concatenating them into audio files, and
           | putting them on my phone for listening to when out on walks
           | (I'm pretty sure the videos do not add anything essential:
           | it's just a video of the interviewee talking - no visual
           | aids).
           | 
           | [1] https://www.youtube.com/playlist?list=PLVV0r6CmEsFxKFx-0l
           | sQD...
           | 
           | [2] https://www.youtube.com/watch?v=LvgLyzTEmJk&list=PLVV0r6C
           | mEs...
        
       | lazamar wrote:
       | Lol. Loved it.
       | 
       | This was a lovely passage from Dyson's Web of Stories interview,
       | and it struck a chord with me, like it clearly did with the
       | authors too.
       | 
       | It happened when Dyson took the preliminary results of his work
       | on the Pseudoscalar theory of Pions to Fermi and Fermi very
       | quickly dismissed the whole thing. It was a shock to Dyson but
       | freed him from wasting more time on it.
       | 
       | Fermi: When one does a theoretical calculation, either you have a
       | clear physical module in mind or a rigorous mathematical basis.
       | You have neither. How many free parameters did you use for your
       | fitting?
       | 
       | Dyson: 4
       | 
       | Fermi: You know, Johnny Von Neumann always used to say 'with four
       | parameters I can fit an elephant; and with five I can make him
       | wiggle his trunk'.
        
       | dheera wrote:
       | I wish there was more humor on arXiv.
       | 
       | If I could make a discovery in my own time without using company
       | resources I would absolutely publish it in the most humorous way
       | possible.
        
         | btown wrote:
         | There's plenty of humor on arXiv, and that's part of why it's
         | so incredible!
         | 
         | Some lists:
         | 
         | https://academia.stackexchange.com/questions/86346/is-it-ok-...
         | 
         | https://www.ellipsix.net/arxiv-joke-papers.html
        
           | azeemba wrote:
           | Consider posting this as a new post! It seems like a fun list
           | to read through
        
           | mananaysiempre wrote:
           | Joke titles and/or author lists are also quite popular, e.g.
           | the Greenberg, Greenberger, Greenbergest paper[1], a paper
           | with a cat coauthor whose title I can't seem to recall (but
           | I'm sure there's more than one I've encountered), or even the
           | venerable, unfortunate in its joke but foundational in its
           | substance Alpher, Bethe, Gamow paper[2]. Somewhat closer to
           | home, I think computer scientist Conor McBride[3] is the
           | champion of paper titles (entries include "Elimination with a
           | motive", "The gentle art of levitation", "I am not a number:
           | I am a free variable", "Clowns to the left of me, jokers to
           | the right", and "Doo bee doo bee doo") and sometimes code in
           | papers:                 letmeB this (F you) | you == me = B
           | this                           | otherwise = F you
           | letmeB this (B that)            = B that       letmeB this
           | (App fun arg)       = letmeB this fun `App` letmeB this arg
           | 
           | (Yes, this is working code; yes, it's crystal clear in the
           | context of the paper.)
           | 
           | [1] https://arxiv.org/abs/hep-ph/9306225
           | 
           | [2] https://en.wikipedia.org/wiki/Alpher%E2%80%93Bethe%E2%80%
           | 93G...
           | 
           | [3] http://strictlypositive.org/
        
         | msp26 wrote:
         | Pretraining on the Test Set Is All You Need
         | 
         | https://arxiv.org/abs/2309.08632
        
       | boywitharupee wrote:
       | what's the purpose of this? is it one of those 'fun' problems to
       | solve?
        
         | jfoutz wrote:
         | This quote might help -
         | https://en.wikipedia.org/wiki/Von_Neumann%27s_elephant#Histo...
         | 
         | yes, a fun problem, but also a criticism of using to many
         | parameters.
        
       | Steuard wrote:
       | Sadly, the constant term (the average r_0) is never specified in
       | the paper (it seems to be something in the neighborhood of 180?):
       | getting that right is necessary to produce the image, and I can't
       | see any way _not_ to consider it a fifth necessary parameter. So
       | I don 't think they've genuinely accomplished their goal.
       | 
       | (Seriously, though, this was a lot of fun!)
        
         | rsfern wrote:
         | They say in the text that it's the average value of the data
         | points they fit to. I think whether to count it as a parameter
         | depends on whether you consider standardization to be part of
         | the model or not
        
           | Steuard wrote:
           | I see your point, that it's really just an overall
           | normalization for the size rather than anything to do with
           | the _shape_. I can accept that, and I 'll grant them the
           | "four non-zero parameters" claim.
           | 
           | Though in that case, I would have liked for them to make it
           | explicit. Maybe normalize it to "1", and scale the other
           | parameters appropriately. (Because as it stands, I don't
           | think you can reproduce their figure from their paper.)
        
       | lupire wrote:
       | IIUC:
       | 
       | A real-parameter (r(theta) = sum(r_k cos(k theta))) Fourier
       | series can only draw a "wiggly circle" figure with one point on
       | each radial ray from the origin.
       | 
       | A compex parameter (z(theta) = sum(e^(z_ theta))) can draw more
       | squiggly figures (epicycles) -- the pen can backtrack as the
       | drawing arm rotates, as each parameter can move a point somewhere
       | on a small circle around the point computed from the previous
       | parameter (and recursively).
       | 
       | Obligatory 3B1B https://m.youtube.com/watch?v=r6sGWTCMz2k
       | 
       | Since a complex parameter is 2 real parameters, we should compare
       | the best 4-cosine curve to the best 2-complex-exponential curve.
        
       | elijahbenizzy wrote:
       | This is humorous (and well-written), but I think its more than
       | that.
       | 
       | I'm always making the joke (observation) that ML (AI) is just
       | curve-fitting. Whether "just curve-fitting" is enough to produce
       | something "intelligent" is, IMO, currently unanswered, largely
       | due to differing viewpoints on the meaning of "intelligent".
       | 
       | In this case they're demonstrating some very clean, easy-to-
       | understand curve-fitting, but it's really the same process --
       | come up with a target, optimize over a loss function, and hope
       | that it generalizes, (this one, obviously, does not. But the
       | elephant is cute.)
       | 
       | This raises the question Neumann was asking -- why have so many
       | parameters? Ironically (or maybe just interestingly), we've done
       | a _lot_ with a ton of parameters recently, answering it with
       | "well, with a lot of parameters you can do cool things".
        
         | visarga wrote:
         | > Whether "just curve fitting" is enough to produce something
         | "intelligent" is, IMO, currently unanswered
         | 
         | Continual "curve fitting" to the real world can create
         | intelligence. What is missing is not something inside the
         | model. It's missing a mechanism to explore, search and expand
         | its experience.
         | 
         | Our current crop of LLMs ride on human experience, they have
         | not largely participated in creating their own experiences.
         | That's why people call it imitation learning or parroting. But
         | once models become more agentic they can start creating useful
         | experiences on their own. AlphaZero did it.
        
           | soist wrote:
           | AlphaZero did not create any experiences. AlphaZero was
           | software written by people to play board games and that's all
           | it ever did.
        
             | visarga wrote:
             | AZ trained in self-play mode for millions of games, over
             | multiple generations of a player pool.
        
               | soist wrote:
               | I am familiar with the literature on reinforcement
               | learning.
        
               | pharrington wrote:
               | They're saying the board games AlphaZero played with
               | itself _are_ experiences.
        
           | elijahbenizzy wrote:
           | There are a whole bunch of assumptions here. But sure, if you
           | view the world as a closed system, then you have a decision
           | as a function of inputs:
           | 
           | 1. The world around you 2. The experiences within your
           | (really, the past view of the world around you) 3. Innateness
           | of you (sure, this could be 2 but I think it's also something
           | else) 4. The experience you find + the way you change
           | yourself to impact (1), (2), and (3)
           | 
           | If you think of intelligence as all of these, then you're
           | making the assumption that all that's required for (2), (3),
           | and (4) is "agentic systems", which I think skips a few steps
           | (as the author of an agent framework myself...). All this is
           | to say that "what makes intelligence" is largely unsolved,
           | and nobody really knows, because we actually don't understand
           | this ourselves.
        
         | luplex wrote:
         | I mean the devil is in the details. In Reinforcement Learning,
         | the target moves! In deep learning, you often do things like
         | early stopping to prevent too much optimization.
        
           | soist wrote:
           | There is no such thing as too much optimization. Early
           | stopping is to prevent overfitting to the training set. It's
           | a trick just like most advances in deep learning because the
           | underlying mathematics is fundamentally not suited for
           | creating intelligent agents.
        
         | maitola wrote:
         | In the case of AI, the more parameters, the better! In Physics
         | is the opposite.
        
           | elijahbenizzy wrote:
           | One of the hardest parts of training models is avoiding
           | overfitting, so "more parameters are better" should be more
           | like "more parameters are better given you're using those
           | parameters in the right way, which can get hard and
           | complicated".
           | 
           | Also LLMs just straight up _do_ overfit, which makes them
           | function as a database, but a really bad one. So while more
           | parameters might just be better, that feels like a cop-out to
           | the real problem. TBD what scaling issues we hit in the
           | future.
        
           | will1am wrote:
           | A dichotomy between these fields
        
         | will1am wrote:
         | Your humorous observation captures a fundamental truth to some
         | extent
        
       | maitola wrote:
       | I love the ironic side of the article. Perhaps they should add
       | the reason for it, from Fermi's and Neumann's. When you are
       | building a model of reality in Physics, If something doesn't fit
       | the experiments, you can't just add a parameter (or more) variate
       | it and fit the data. The model should have zero parameters,
       | ideally, or the least possible, or, even at a more deeper level,
       | the parameters should emerge naturally from some simple
       | assumptions. With 4 parameters you don't know whether you are
       | really capturing a true aspect of reality of just fitting the
       | data of some experiment.
        
         | jampekka wrote:
         | This was mentioned in the first paragraph of the paper. The
         | paper is mostly humoristic.
         | 
         | That said, the wisdom of the quip has been widely lost in many
         | fields. In many fields data is "modeled" with huge regression
         | models with dozens of parameters or even neural networks with
         | billions of parameters.
         | 
         | > In 1953, Enrico Fermi criticized Dyson's model by quoting
         | Johnny von Neumann: "With four parameters I can fit an
         | elephant, and with five I can make him wiggle his trunk."[1].
         | This quote is intended to tell Dyson that while his model may
         | appear complex and precise, merely increasing the number of
         | parameters to fit the data does not necessarily imply that the
         | model has real physical significance.
        
           | karmakaze wrote:
           | That's how I feel about dark matter. Oh this galaxy is slower
           | than this other similar one. The first one must have less
           | dark matter then.
           | 
           | What can't be fit by declaring the amount of dark matter that
           | _must be present_ fits the data? It 's unfalsifiable, just
           | because we haven't found it, doesn't mean it doesn't exist.
           | Even worse than string/M-theory which at least has math.
        
             | edflsafoiewq wrote:
             | It's easy to say "Epicycles! Epicycles!", but people are
             | going to continue using their epicycles until a Copernicus
             | comes along.
        
         | dilawar wrote:
         | Hmm..
         | 
         | Hodgin and Huxley did ground-breaking work on squid's giant
         | axon and modelled neural activity. They had multiple parameters
         | extracted from 'curve fitting' of recorded potential and
         | injected currents which were much later mapped to sodium
         | channels. Similarly, another process to potassium channels.
         | 
         | I woudnt worry too much having multiple parameters -- even four
         | when 3 can't just explain the model.
        
           | nyssos wrote:
           | Neuron anatomy is the product of hundreds of millions of
           | years of brute contingency. There are reasons why it can't be
           | certain ways (organisms that were that way [would have] died
           | or failed to reproduce) but no reason whatsoever why it had
           | to be exactly _this_ way. It didn 't, there are plenty of
           | other ways that nerves _could_ have worked, this is just the
           | way they actually do.
           | 
           | The physics equivalent is something like eternal inflation as
           | an explanation for apparent fine-tuning - except that even if
           | it's correct it's still absolutely nowhere near as complex or
           | as contingent as biology.
        
         | will1am wrote:
         | The balance between empirical data fitting and genuine
         | understanding of the underlying reality
        
         | edflsafoiewq wrote:
         | Isn't the form of an equation really just another sort of
         | parameter?
        
           | qarl wrote:
           | Yes, it is.
           | 
           | Which makes the only truly zero parameter system the
           | collection of all systems, in all forms.
        
           | tobias2014 wrote:
           | This is why I think that modeling elementary physics is
           | nothing else than fitting data. We might end up with
           | something that we perceive as "simple", or not. But in any
           | case all the fitting has been hidden in the process of ruling
           | out models. It's just that a lot of the fitting process is
           | (implicitly) being done by theorists; we come up with new
           | models and that are then being falsified.
           | 
           | For example, how many parameters does the Standard Model
           | have? It's not clear what you count as a parameter. Do you
           | count the group structure, the other mathematical structure
           | that has been "fitted" through decades of comparisons with
           | experiments?
        
       | bee_rider wrote:
       | Ya know, in academic writing I tend to struggle with making it
       | sound nice and formal. I try not to use the super-stilted
       | academic style, but it is still always a struggle to walk the
       | line between too loose and too jargony.
       | 
       | Maybe this sort of thing would be a really good tradition.
       | Everyone must write a very silly article with some mathematical
       | arguments in it. Then, we can all go forward with the comfort of
       | knowing that we aren't really at risk of breaking new grounds in
       | appearing unserious.
       | 
       | It is well written and very understandable!
        
       | xpe wrote:
       | One take away: Don't count parameters. Count bits.
        
         | Scene_Cast2 wrote:
         | Better yet, count entropy.
        
           | xpe wrote:
           | Why "better"? Entropy in the information theoretic sense is
           | usually quantified in bits.
        
       | xpe wrote:
       | Another take away (not directly stated in the article but
       | implied): Counting the information content of a model is more
       | than just the parameters; the structure of the model itself
       | conveys information.
        
         | will1am wrote:
         | I think often underappreciated insight
        
       | pharmacy7766 wrote:
       | One parameter is enough:
       | https://aip.scitation.org/doi/10.1063/1.5031956
        
         | dweinus wrote:
         | "This single parameter model provides a large improvement over
         | the prior state of the art in fitting an elephant"
         | 
         | Lol
        
         | Nition wrote:
         | Nice. This is like how you can achieve unlimited compression by
         | storing your data in a filename instead of in the file.
        
       | aqme28 wrote:
       | > It only satisfies a weaker condition, i.e., using four non-zero
       | parameters instead of four parameters.
       | 
       | Why would that be a harder problem? In the case that you get a
       | zero parameter, you could inflate it by some epsilon and the
       | solution would basically be the same.
        
         | Sesse__ wrote:
         | They also, effectively, fit information in the indexes of the
         | parameters. I.e., _which_ of the parameters are nonzero carries
         | real information.
         | 
         | In a sense, they have done their fitting using nine parameters,
         | of which five are zero.
        
       ___________________________________________________________________
       (page generated 2024-07-14 23:00 UTC)