[HN Gopher] Machine learning and information theory concepts tow...
       ___________________________________________________________________
        
       Machine learning and information theory concepts towards an AI
       Mathematician
        
       Author : marojejian
       Score  : 72 points
       Date   : 2024-10-12 18:18 UTC (4 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | marojejian wrote:
       | Abstract. The current state-of-the-art in artificial intelligence
       | is impressive, especially in terms of mastery of language, but
       | not so much in terms of mathematical reasoning. What could be
       | missing? Can we learn something useful about that gap from how
       | the brains of mathematicians go about their craft? This essay
       | builds on the idea that current deep learning mostly succeeds at
       | system 1 abilities - which correspond to our intuition and
       | habitual behaviors - but still lacks something important
       | regarding system 2 abilities - which include reasoning and robust
       | uncertainty estimation. It takes an information-theoretical
       | posture to ask questions about what constitutes an interesting
       | mathematical statement, which could guide future work in crafting
       | an AI mathematician. The focus is not on proving a given theorem
       | but on discovering new and interesting conjectures. The central
       | hypothesis is that a desirable body of theorems better summarizes
       | the set of all provable statements, for example by having a small
       | description length while at the same time being close (in terms
       | of number of derivation steps) to many provable statements.
        
       | marojejian wrote:
       | I found this to be an interesting read, though it's just a high-
       | level plan, vs. breaking new ground. I like how the ideas are
       | synthesized.
       | 
       | A fun detail is to be found in a footnote: "A mathematician is a
       | person who can find analogies between theorems; a better
       | mathematician is one who can see analogies between proofs and the
       | best mathematician can notice analogies between theories. One can
       | imagine that the ultimate mathematician is one who can see
       | analogies between analogies." (attributed to Stefan Banach in
       | Randrianantoanina and Randrianantoanina [2007]).
       | 
       | And reading this, it seems to me this implies the "ultimate"
       | mathematician is a poet.
       | 
       | P.S. I initially assumed "Randrianantoanina and
       | Randrianantoanina" was the title of some nerdy speculative
       | fiction. It turns out this is a regular paper reference. The
       | authors are Malagasy, no doubt. They have such cool names there.
        
         | ben_w wrote:
         | > P.S. I initially assumed "Randrianantoanina and
         | Randrianantoanina" was the title of some nerdy speculative
         | fiction. It turns out this is a regular paper reference. The
         | authors are Malagasy, no doubt. They have such cool names
         | there.
         | 
         | Given what else Banach is famous for, I assumed a Banach-Tarski
         | joke.
        
         | card_zero wrote:
         | > the "ultimate" mathematician is a poet.
         | 
         | Approximately, but not actually, a member of the set of poets.
        
         | svantana wrote:
         | > it seems to me this implies the "ultimate" mathematician is a
         | poet.
         | 
         | And, conversely, that the best mathematical theorems read like
         | beautiful poetry.
        
       | ford wrote:
       | Given the pace of AI lately - might be worth mentioning this is
       | from March of this year
        
       | thomasahle wrote:
       | A key aspect in AlphaProof was how to find good training
       | problems. For AlphaProof they managed to scrape 1 million
       | "International Mathematical Olympiad (IMO) style" problems.
       | 
       | Here Bengio seems to just suggest using estimated entropy
       | `E_P(t|T(S))[- log P(t | T(S))]` of the theorem to find
       | interesting/surprising theorems to try and prove.
       | 
       | This reminds me of the early work in estimating LLM
       | uncertainty/hallucinations using perplexity. It didn't work very
       | well.
       | 
       | Actually generating good "exercises" for the system to "practice"
       | on, and theorems to try and prove, still seems like the biggest
       | road block for an AI mathematician. It connects to the
       | philosophical question of why we are doing math in the first
       | place?
        
       | sgt101 wrote:
       | I love that "system 1" and "system 2" are tossed out as
       | fundamental concepts of mind when they're basically labels for
       | some semi-random ideas based on bullshit studies in a pop sci
       | fluff book. Is there any actual science that ties these ideas to
       | mathematical reasoning?
       | 
       | Then we have:
       | 
       | >> By analogy, we can now state the main hypothesis proposed in
       | this paper: a crucial component of the usefulness of a new proven
       | theorem t (in the context of previous theorems T(S)) is how
       | efficiently T(S)[?]{t} compresses the set of all provable
       | mathematical statements M. That is, T (S) [?] {t} is a good
       | compression of M if many provable statements, beyond those in S,
       | can be derived from T (S) [?] {t}, say using at most k derivation
       | steps.
       | 
       | which says "Occam's Razor, that's good isn't it?" in more words.
        
         | drdeca wrote:
         | I don't think that's really what Occam's razor says.
         | 
         | If one's goal is to use ML stuff to try to produce new useful
         | theorems, it seems useful towards that goal to come up with a
         | numeric heuristic for how useful a new theorem is. And, stating
         | such a heuristic explicitly seems reasonable.
         | 
         | Just because the heuristic that they state, which they expect
         | to be a crucial component of what makes a theorem useful to
         | prove in a given context, isn't particularly surprising,
         | doesn't make it not worth saying.
         | 
         | to elaborate on "isn't particularly surprising" : if you asked
         | many other people to come up with heuristics for the usefulness
         | of a new potential theorem, I imagine many of the answers you
         | would get (counting multiplicity if multiple people give
         | identical answers) would be fairly similar.
         | 
         | Even if a hypothesis is an obvious-to-consider hypothesis, if
         | one wants to talk about it, it is worth stating what it is
         | first.
        
         | CooCooCaCha wrote:
         | > I love that "system 1" and "system 2" are tossed out as
         | fundamental concepts of mind when they're basically labels for
         | some semi-random ideas based on bullshit studies in a pop sci
         | fluff book. Is there any actual science that ties these ideas
         | to mathematical reasoning?
         | 
         | I think this fails to hit the mark on what people actually care
         | about with regards to system 1 vs system 2. It's really just,
         | can we build models that are able to vary how long they think
         | about a problem? Current AI models are very limited in this
         | aspect and I think most would agree that being able to think
         | about a problem for a period of time is a useful feature that
         | we humans use all the time.
        
         | kendalf89 wrote:
         | System 1 and 2 are just another way of describing dual process
         | theory, which existed long before Daniel Kahneman wrote,
         | Thinking Fast and Slow, and is still the prevailing theory of
         | mind in its category today. There were maybe 1 or 2 studies
         | mentioned in that book that were not very replicable but other
         | than that, the overall consensus by leading experts in that
         | field is positive, from everything I've seen, which is
         | impressive considering how old the book is now.
         | 
         | I have no idea if dual process theory is actually useful for
         | teaching computers how to math, but it seems unfair to just
         | dismiss it as pop science bunk.
         | 
         | https://en.m.wikipedia.org/wiki/Dual_process_theory
        
       | youoy wrote:
       | I feel that here the authors think about mathematics as just the
       | language and not what it talks about. We know at least since
       | Godel that mathematics is more than just the formal system. I
       | even remember going to a talk by Gromov and he getting "angry"
       | about a question from the public in the lines of "what if we
       | suppose this extra thing and change this other thing on that
       | theorem", and him saying that it's stupid to think of new
       | theorems for the sake of doing it, that the language of
       | mathematics are like metaphors speaking about something else.
       | 
       | In my experience learning math, their claim that "The central
       | hypothesis is that a desirable body of theorems better summarizes
       | the set of all provable statements, for example by having a small
       | description length" is not true. Short != Better, better is what
       | gets me faster to form the correct intuitive idea about the
       | mathematical statement. For example, several times I have
       | experienced the fact of understanding the formal definitions and
       | proofs of a theory, but it's not until I form the correct
       | intuitions (maybe months later), that I truly understand the
       | theory. And it's not until I have the correct intuitions, that I
       | can successfully apply the theory to create meaningful new
       | theorems.
       | 
       | Anyways, I understand that one has to start from somewhere and
       | the point of view of the article is more tractable and explicit.
        
       ___________________________________________________________________
       (page generated 2024-10-12 23:00 UTC)