[HN Gopher] Machine learning and information theory concepts tow...
___________________________________________________________________
Machine learning and information theory concepts towards an AI
Mathematician
Author : marojejian
Score : 72 points
Date : 2024-10-12 18:18 UTC (4 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| marojejian wrote:
| Abstract. The current state-of-the-art in artificial intelligence
| is impressive, especially in terms of mastery of language, but
| not so much in terms of mathematical reasoning. What could be
| missing? Can we learn something useful about that gap from how
| the brains of mathematicians go about their craft? This essay
| builds on the idea that current deep learning mostly succeeds at
| system 1 abilities - which correspond to our intuition and
| habitual behaviors - but still lacks something important
| regarding system 2 abilities - which include reasoning and robust
| uncertainty estimation. It takes an information-theoretical
| posture to ask questions about what constitutes an interesting
| mathematical statement, which could guide future work in crafting
| an AI mathematician. The focus is not on proving a given theorem
| but on discovering new and interesting conjectures. The central
| hypothesis is that a desirable body of theorems better summarizes
| the set of all provable statements, for example by having a small
| description length while at the same time being close (in terms
| of number of derivation steps) to many provable statements.
| marojejian wrote:
| I found this to be an interesting read, though it's just a high-
| level plan, vs. breaking new ground. I like how the ideas are
| synthesized.
|
| A fun detail is to be found in a footnote: "A mathematician is a
| person who can find analogies between theorems; a better
| mathematician is one who can see analogies between proofs and the
| best mathematician can notice analogies between theories. One can
| imagine that the ultimate mathematician is one who can see
| analogies between analogies." (attributed to Stefan Banach in
| Randrianantoanina and Randrianantoanina [2007]).
|
| And reading this, it seems to me this implies the "ultimate"
| mathematician is a poet.
|
| P.S. I initially assumed "Randrianantoanina and
| Randrianantoanina" was the title of some nerdy speculative
| fiction. It turns out this is a regular paper reference. The
| authors are Malagasy, no doubt. They have such cool names there.
| ben_w wrote:
| > P.S. I initially assumed "Randrianantoanina and
| Randrianantoanina" was the title of some nerdy speculative
| fiction. It turns out this is a regular paper reference. The
| authors are Malagasy, no doubt. They have such cool names
| there.
|
| Given what else Banach is famous for, I assumed a Banach-Tarski
| joke.
| card_zero wrote:
| > the "ultimate" mathematician is a poet.
|
| Approximately, but not actually, a member of the set of poets.
| svantana wrote:
| > it seems to me this implies the "ultimate" mathematician is a
| poet.
|
| And, conversely, that the best mathematical theorems read like
| beautiful poetry.
| ford wrote:
| Given the pace of AI lately - might be worth mentioning this is
| from March of this year
| thomasahle wrote:
| A key aspect in AlphaProof was how to find good training
| problems. For AlphaProof they managed to scrape 1 million
| "International Mathematical Olympiad (IMO) style" problems.
|
| Here Bengio seems to just suggest using estimated entropy
| `E_P(t|T(S))[- log P(t | T(S))]` of the theorem to find
| interesting/surprising theorems to try and prove.
|
| This reminds me of the early work in estimating LLM
| uncertainty/hallucinations using perplexity. It didn't work very
| well.
|
| Actually generating good "exercises" for the system to "practice"
| on, and theorems to try and prove, still seems like the biggest
| road block for an AI mathematician. It connects to the
| philosophical question of why we are doing math in the first
| place?
| sgt101 wrote:
| I love that "system 1" and "system 2" are tossed out as
| fundamental concepts of mind when they're basically labels for
| some semi-random ideas based on bullshit studies in a pop sci
| fluff book. Is there any actual science that ties these ideas to
| mathematical reasoning?
|
| Then we have:
|
| >> By analogy, we can now state the main hypothesis proposed in
| this paper: a crucial component of the usefulness of a new proven
| theorem t (in the context of previous theorems T(S)) is how
| efficiently T(S)[?]{t} compresses the set of all provable
| mathematical statements M. That is, T (S) [?] {t} is a good
| compression of M if many provable statements, beyond those in S,
| can be derived from T (S) [?] {t}, say using at most k derivation
| steps.
|
| which says "Occam's Razor, that's good isn't it?" in more words.
| drdeca wrote:
| I don't think that's really what Occam's razor says.
|
| If one's goal is to use ML stuff to try to produce new useful
| theorems, it seems useful towards that goal to come up with a
| numeric heuristic for how useful a new theorem is. And, stating
| such a heuristic explicitly seems reasonable.
|
| Just because the heuristic that they state, which they expect
| to be a crucial component of what makes a theorem useful to
| prove in a given context, isn't particularly surprising,
| doesn't make it not worth saying.
|
| to elaborate on "isn't particularly surprising" : if you asked
| many other people to come up with heuristics for the usefulness
| of a new potential theorem, I imagine many of the answers you
| would get (counting multiplicity if multiple people give
| identical answers) would be fairly similar.
|
| Even if a hypothesis is an obvious-to-consider hypothesis, if
| one wants to talk about it, it is worth stating what it is
| first.
| CooCooCaCha wrote:
| > I love that "system 1" and "system 2" are tossed out as
| fundamental concepts of mind when they're basically labels for
| some semi-random ideas based on bullshit studies in a pop sci
| fluff book. Is there any actual science that ties these ideas
| to mathematical reasoning?
|
| I think this fails to hit the mark on what people actually care
| about with regards to system 1 vs system 2. It's really just,
| can we build models that are able to vary how long they think
| about a problem? Current AI models are very limited in this
| aspect and I think most would agree that being able to think
| about a problem for a period of time is a useful feature that
| we humans use all the time.
| kendalf89 wrote:
| System 1 and 2 are just another way of describing dual process
| theory, which existed long before Daniel Kahneman wrote,
| Thinking Fast and Slow, and is still the prevailing theory of
| mind in its category today. There were maybe 1 or 2 studies
| mentioned in that book that were not very replicable but other
| than that, the overall consensus by leading experts in that
| field is positive, from everything I've seen, which is
| impressive considering how old the book is now.
|
| I have no idea if dual process theory is actually useful for
| teaching computers how to math, but it seems unfair to just
| dismiss it as pop science bunk.
|
| https://en.m.wikipedia.org/wiki/Dual_process_theory
| youoy wrote:
| I feel that here the authors think about mathematics as just the
| language and not what it talks about. We know at least since
| Godel that mathematics is more than just the formal system. I
| even remember going to a talk by Gromov and he getting "angry"
| about a question from the public in the lines of "what if we
| suppose this extra thing and change this other thing on that
| theorem", and him saying that it's stupid to think of new
| theorems for the sake of doing it, that the language of
| mathematics are like metaphors speaking about something else.
|
| In my experience learning math, their claim that "The central
| hypothesis is that a desirable body of theorems better summarizes
| the set of all provable statements, for example by having a small
| description length" is not true. Short != Better, better is what
| gets me faster to form the correct intuitive idea about the
| mathematical statement. For example, several times I have
| experienced the fact of understanding the formal definitions and
| proofs of a theory, but it's not until I form the correct
| intuitions (maybe months later), that I truly understand the
| theory. And it's not until I have the correct intuitions, that I
| can successfully apply the theory to create meaningful new
| theorems.
|
| Anyways, I understand that one has to start from somewhere and
| the point of view of the article is more tractable and explicit.
___________________________________________________________________
(page generated 2024-10-12 23:00 UTC)