[HN Gopher] AI solves International Math Olympiad problems at si...
       ___________________________________________________________________
        
       AI solves International Math Olympiad problems at silver medal
       level
        
       Author : ocfnash
       Score  : 695 points
       Date   : 2024-07-25 15:29 UTC (7 hours ago)
        
 (HTM) web link (deepmind.google)
 (TXT) w3m dump (deepmind.google)
        
       | adverbly wrote:
       | > First, the problems were manually translated into formal
       | mathematical language for our systems to understand. In the
       | official competition, students submit answers in two sessions of
       | 4.5 hours each. Our systems solved one problem within minutes and
       | took up to three days to solve the others.
       | 
       | Three days is interesting... Not technically silver medal
       | performance I guess, but let's be real I'd be okay waiting a
       | month for the cure to cancer.
        
         | ZenMikey wrote:
         | I haven't read TFA as I'm at work, but I would be very
         | interested to know what the system was doing in those three
         | days. Were there failed branches it explored? Was it just
         | fumbling its way around until it guessed correctly? What did
         | the feedback loop look like?
        
           | qsort wrote:
           | I can't find a link to an actual paper, that just seems to be
           | a blog post. But from what I gather the problems were
           | manually translated to Lean 4, and then the program is doing
           | some kind of tree search. I'm assuming they are leveraging
           | the proof checker to provide feedback to the model.
        
           | tsoj wrote:
           | This is NOT the paper, but probably a very similar solution:
           | https://arxiv.org/abs/2009.03393
        
           | visarga wrote:
           | > just fumbling its way around until it guessed correctly
           | 
           | As opposed to 0.999999% of the human population who can't do
           | it even if their life depends on it?
        
             | dsign wrote:
             | I was going to come here to say that. I remember being a
             | teenager and giving up in frustration at IMO problems. And
             | I was competing at IPhO.
        
             | logicchains wrote:
             | I guess you mean 99.9999%?
        
           | lacker wrote:
           | _The training loop was also applied during the contest,
           | reinforcing proofs of self-generated variations of the
           | contest problems until a full solution could be found._
           | 
           | So they had three days to keep training the model, on
           | synthetic variations of each IMO problem.
        
           | thomasahle wrote:
           | They just write "it's like alpha zero". So presumably they
           | used a version of MCTS where each terminal node is scored by
           | LEAN as either correct or incorrect.
           | 
           | Then they can train a network to evaluate intermediate
           | positions (score network) and one to suggest things to try
           | next (policy network).
        
         | ComplexSystems wrote:
         | Or the simultaneous discovery of thousands of cryptographic
         | exploits...
        
           | poincaredisk wrote:
           | Still waiting for the first one. I'm not holding my breath -
           | just like fuzzing found a lot of vulnerabilities in low-level
           | software, I expect novel automated analysis approaches will
           | yield some vulnerabilities - but that won't be a catastrophic
           | event just like fuzzing wasn't.
        
             | sqeaky wrote:
             | I hope it doesn't find a new class of bug. Find another
             | thing like Spectre could be problematic.
             | 
             | EDIT - I hope if that new class of bug exists that it is
             | found. I hope that new class of bug doesn't exist.
        
             | throwaway240403 wrote:
             | Hope that's true. Really mucks up the world a bit if not.
        
             | criddell wrote:
             | It's rumored that the NSA has 600 mathematicians working
             | for them. If they are the ones finding the exploits you
             | will probably never hear about them until they are
             | independently discovered by someone who can publish.
        
             | ComplexSystems wrote:
             | Why don't you think that AI models will, perhaps rather
             | soon, surpass human capabilities in finding security
             | vulnerabilities? Because an AI that's even equally
             | competent would be a fairly catastrophic event.
        
         | nnarek wrote:
         | "three days" does not say anything about how much computational
         | power is used to solve problems, maybe they have used 10% of
         | all GCP :)
        
           | vlovich123 wrote:
           | The thing is though, once we have a benchmark that we pass,
           | it's pretty typical to be able to bring down time required in
           | short order through performance improvements and iterating on
           | ideas. So if you knew you had GAI but it took 100% of all GCP
           | for 3 years to give a result, within the next 5 years that
           | would come down significantly (not least of which you'd build
           | HW dedicated to accelerating the slow parts).
        
             | tsimionescu wrote:
             | That's patently false for many classes of problems. We know
             | exactly how to solve the traveling salesman problem, and
             | have for decades, but we're nowhere close to solving a
             | random 1000 city case (note: there are approximate methods
             | that can find good, but not optimal, results on millions of
             | cities). Edit: I should say 1,000,000 city problem, as
             | there are some solutions for 30-60k cities from the 2000s.
             | 
             | And there are good reasons to believe that theorem finding
             | and proof generation are at least NP-hard problems.
        
               | rowanG077 wrote:
               | The person said typical not always the case. Just because
               | there are obviously cases where it didn't happen does
               | mean it it's still not typically the case.
        
               | vlovich123 wrote:
               | We're not talking about mathematical optimality here,
               | both from the solution found and for the time taken. The
               | point is whether this finds results more cheaply than a
               | human can and right now it's better on some problems
               | while others it's worse. Clearly if a human can do it,
               | there is a way to solve it in a cheaper amount of time
               | and it would be flawed reasoning to think that improving
               | the amount of time would be asymptotically optimal
               | already.
               | 
               | While I agree that not all problems show this kind of
               | acceleration in performance, that's typically only true
               | if you've already spent so much time trying to solve it
               | that you've asymptoted to the optimal solution. Right now
               | we're nowhere near the asymptote for AI improvements.
               | Additionally, there's so many research dollars flowing
               | into AI precisely because the potential upside here is
               | nowhere near realized and there's lots of research lines
               | still left to be explored. George Hinton ended the AI
               | winter.
        
               | visarga wrote:
               | > The point is whether this finds results more cheaply
               | than a human can
               | 
               | If you need to solve 1000 problems in 3 days you wouldn't
               | find the humans that can do it. So it would not be
               | cheaper if it's not possible.
        
               | vlovich123 wrote:
               | Well if it takes 10% of all of Google's servers 3 days to
               | solve, you may find it difficult to scale out to solving
               | 1000 problems in 3 days as well.
               | 
               | As for humans, 100 countries send 6 students to solve
               | these problems. It also doesn't mean that these problems
               | aren't solvable by anyone else. These are just the "best
               | 6" where best = can solve and solve most quickly. Given a
               | three day budget, 1000 problems could reasonably be
               | solvable and you know exactly who to tap to try to solve
               | them. Also, while the IMO is difficult and winners tend
               | to win other awards like Field Medals, there's many
               | professional mathematicians who never even bother because
               | that type of competition isn't interesting to them. It's
               | not unreasonable to expect that professional
               | mathematicians are able to solve these problems as well
               | if they wanted to spend 3 days on it.
               | 
               | But in terms of energy per solve, humans are definitely
               | cheaper. As you note the harder part is scaling it out
               | but so far the AI isn't solving problems that are
               | impossible for humans, just that given enough time it
               | managed to perform the same task. That's a very promising
               | result but supremacy is slightly a ways off for now (this
               | AI can't win the competition for now)
        
           | falcor84 wrote:
           | And say they did use 10% of all GCP? Would it be less
           | impressive? This is a result that was considered by experts
           | to be far beyond the state of the art; it's absolutely ok if
           | it's not very efficient yet.
           | 
           | Also, for what it's worth, I'm pretty sure that I wouldn't
           | have been able to solve it myself in three days, even if I
           | had access to all of GCP, Azure and AWS (except if I could
           | mine crypto to then pay actual IMO-level mathematicians to
           | solve it for me).
        
             | nnarek wrote:
             | yes it is very impressive, especially autoformalization of
             | problems written in natural language and also proof search
             | of theorems
        
             | data_maan wrote:
             | Which experts said that?
             | 
             | I don't think that's the case at all. The writing was
             | already on the wall.
        
         | wongarsu wrote:
         | The problem solved "within minutes" is also interesting. I'd
         | interpret that as somewhere between 2 and 59 minutes. Given the
         | vagueness probably on the higher end, otherwise they'd
         | celebrate it more. The students had 6 tasks in 9 hours, so on
         | average 1.5h per task. If you add the time a student would take
         | to (correctly!) translate the problems to their input format,
         | their best-case runtime is probably about as fast as a silver-
         | medalist would take to solve the problem on their own.
         | 
         | But even if they aren't as fast as humans yet this is very
         | valuable. Both as a stepping stone, and because at a certain
         | scale compute is much easier to scale than skilled
         | mathematicians.
        
           | gjm11 wrote:
           | They say "our systems" (presumably meaning AlphaProof _and_
           | AlphaGeometry 2) solved one problem  "within minutes", and
           | later on the page they say that the geometry question (#4)
           | was solved by AlphaGeometry in 19 seconds.
           | 
           | So either (1) "within minutes" was underselling the abilities
           | of the system, or (2) what they actually meant was that the
           | geometry problem was solved in 19 seconds, one of the others
           | "within minutes" (I'd guess #1 which is definitely easier
           | than the other two they solved), and the others in
           | unspecified times of which the longer was ~3 days.
           | 
           | I'd guess it's the first of those.
           | 
           | (Euclidean geometry has been a kinda-solved domain for some
           | time; it's not super-surprising that they were able to solve
           | that problem quickly.)
           | 
           | As for the long solve times, I would guess they're related to
           | this fascinating remark:
           | 
           | > The training loop was also applied during the contest,
           | reinforcing proofs of self-generated variations of the
           | contest problems until a full solution could be found.
        
             | visarga wrote:
             | Euclidian Geometry still requires constructions to solve,
             | and those are based in intuition.
        
               | gjm11 wrote:
               | There are known algorithms that can solve _all_ problems
               | in euclidean (ruler-and-compasses) geometry, no intuition
               | required. The most effective algorithms of this type are
               | quite inefficient, though, and (at least according to
               | DeepMind) don't do as well as AlphaGeometry does at e.g.
               | IMO geometry problems.
        
         | lolinder wrote:
         | It feels pretty disingenuous to claim silver-medal status when
         | your machine played by significantly different rules. The
         | article is light on details, but it says they wired it up to a
         | theorem prover, presumably with feedback sent back to the AI
         | model for re-evaluation.
         | 
         | How many cycles of guess-and-check did it take over the course
         | of three days to get the right answer?
         | 
         | If the IMO contestants were allowed to use theorem provers and
         | were given 3 days (even factoring in sleep) would AlphaProof
         | still have gotten silver?
         | 
         | > let's be real I'd be okay waiting a month for the cure to
         | cancer.
         | 
         | I don't think these results suggest that we're on the brink of
         | knowledge coming at a substantially faster rate than before.
         | Humans have been using theorem provers to advance our
         | understanding for decades. Now an LLM has been wired up to one
         | too, but it still took 8x as long to solve the problems as our
         | best humans did _without any computer assistance_.
        
           | ToucanLoucan wrote:
           | I am so exhausted of the AI hype nonsense. LLMs are not
           | fucking curing cancer. Not now, not in five years, not in a
           | hundred years. That's not what they do.
           | 
           | LLM/ML is fascinating tech that has a lot of legitimate
           | applications, but _it is not fucking intelligent, artificial
           | or otherwise,_ and I am sick to death of people treating it
           | like it is.
        
             | apsec112 wrote:
             | What observation, if you saw it, do you think would falsify
             | that hypothesis?
        
               | necovek wrote:
               | It seems unlikely people will employ _only_ ML models,
               | especially LLM, to achieve great results: they will
               | combine it with human insights (through direction and
               | concrete algorithms).
               | 
               | It's obvious that's happening with LLMs even today to
               | ensure they don't spew out too much bullshit or harmful
               | content. So let's get to a point where we can trust AI
               | as-is first, and let's talk about what's needed to
               | achieve the next milestone after and _if_ we get there.
               | 
               | And I love asking every new iteration of ChatGPT/Gemini
               | something along the lines of "What day was yesterday if
               | yesterday was a Thursday?" It just makes me giggle.
        
             | 29athrowaway wrote:
             | It is not artificial? so it is natural then?
        
             | jercos wrote:
             | A significant part of the problem is that a majority of
             | people are unaware of just how simple what they consider
             | "intelligence" really is. You don't need actual
             | intelligence to replace the public-facing social role of a
             | politician, or a talking head, or a reactive-only middle
             | manager. You just need words strung together that fit a
             | problem.
        
           | golol wrote:
           | I believe you are misreading this.
           | 
           | First of all, this is not a sport and the point is not to
           | compare AI to humans. The point is to compare AI to IMO-
           | difficulty problems.
           | 
           | Secondly, this is now some hacky trick where Brute force and
           | some theorem prover magic are massaged to solve a select few
           | problems and then you'll never hear about it again. They are
           | building a general pipeline which turns informal natural
           | lamguage mathematics (of which we have ungodly amounts
           | available) into formalized mathematics, and in addition
           | trains a model to prove such kinds of mathematics. This can
           | also work for theory building. This can become a real
           | mathematical assistant that can help a mathematician test an
           | argument, play with variations of a definition, try 100
           | combinations of some estimates, apply a classic but lengthy
           | technique etc. etc.
        
             | lolinder wrote:
             | > First of all, this is not a sport and the point is not to
             | compare AI to humans. The point is to compare AI to IMO-
             | difficulty problems.
             | 
             | If this were the case then the headline would be "AI solves
             | 4/6 IMO 2024 problems", it wouldn't be claiming "silver-
             | medal standard". Medals are generally awarded by comparison
             | to other contestants, not to the challenges overcome.
             | 
             | > This can become a real mathematical assistant that can
             | help a mathematician test an argument, play with variations
             | of a definition, try 100 combinations of some estimates,
             | apply a classic but lengthy technique etc. etc.
             | 
             | This is great, and I'm not complaining about what the team
             | is working on, I'm complaining about how it's being sold.
             | Headlines like these from lab press releases will feed the
             | AI hype in counterproductive ways. The NYT literally has a
             | headline right now: "Move Over Mathematicians, Here Comes
             | AlphaProof".
        
               | golol wrote:
               | At the IMO "silver medal" afaik is define as some tange
               | of points, which more or less equals some range of
               | problems solved. For me it is fair to say that "silver-
               | medal performance" is IMO langauge for about 4/6 problems
               | solved. And what's the problem if some clickbait websites
               | totally spin the result? They would've done it anyways
               | even with a different title, and I also don't see the
               | harm. Let people be wrong.
        
               | mathnmusic wrote:
               | No, "silver medal" is defined as a range of points to be
               | earned in the allotted time (4.5 hours for both papers of
               | 3 problems each).
        
               | lolinder wrote:
               | And the cutoffs are chosen _after_ the results are in,
               | not in advance.
        
             | riku_iki wrote:
             | > They are building a general pipeline which turns informal
             | natural lamguage mathematics
             | 
             | but this part currently sucks, because they didn't trust it
             | and formalized problems manually.
        
               | golol wrote:
               | Yea that's fair, but I don't think it will keep sucking
               | forever as formalization is in principle just a
               | translation process.
        
               | riku_iki wrote:
               | and we don't have 100% accuracy in translation in
               | ambiguous texts, because system often need some domain
               | knowledge, context etc. And math has 0% tolerance to
               | mistakes.
               | 
               | I also expect that math formalized by machine will be
               | readable by machine and hardly understandable by humans.
        
           | regularfry wrote:
           | I'm not sure it matters that it had access to a theorem
           | prover. The fact that it's possible to build a black box that
           | solves hard proofs on its own _at all_ is the fascinating
           | bit.
           | 
           | > it still took 8x as long to solve the problems as our best
           | humans did without any computer assistance.
           | 
           | Give it a year and that ratio will be reversed. At least. But
           | also it matters less how long it takes if doubling the number
           | of things reasoning at a best-human level is pronounced
           | "ctrl-c, ctrl-v".
        
         | 10100110 wrote:
         | Don't confuse interpolation with extrapolation. Curing cancer
         | will require new ideas. IMO requires skill proficiency in tasks
         | where the methods of solving are known.
        
           | trueismywork wrote:
           | They are the same things
        
           | golol wrote:
           | Mathematicians spend most of their time interpolating between
           | known ideas and it would be extremely helpful to have
           | computer assistance with that.
        
           | visarga wrote:
           | Search is extrapolation. Learning is interpolation.
           | Search+Learn is the formula used by AZ. Don't forget AZ
           | taught us humans a thing or two about a game we had 2000
           | years head start in, and starting from scratch not from human
           | supervision.
        
             | xdavidliu wrote:
             | no, search is not extrapolation. Extrapolation means taking
             | some data and projecting out beyond the limits of that
             | data. For example, if my bank account had $10 today and $20
             | tomorrow, then I can extrapolate and say it might have $30
             | the day after tomorrow. Interpolation means taking some
             | data and inferring the gaps of that data. For example, if I
             | had $10 today and $30 the day after tomorrow, I can
             | interpolate and say I probably had $20 tomorrow.
             | 
             | Search is different from either of those things, it's when
             | you have a target and a collection of other things, and are
             | trying to find the target in that collection.
        
               | visarga wrote:
               | Search can go from a random init model to beating humans
               | at Go. That is not interpolation.
               | 
               | - Search allows exploration of the game tree, potentially
               | finding novel strategies.
               | 
               | - Learning compresses the insights gained from search
               | into a more efficient policy.
               | 
               | - This compressed policy then guides future searches more
               | effectively.
               | 
               | Evolution is also a form of search, and it is open-ended.
               | AlphaProof solved IMO problems, those are chosen to be
               | out of distribution, simple imitation can't solve them.
               | Scientists do (re)search, they find novel insights nobody
               | else discovered before. What I want to say is that search
               | is on a whole different level than what neural nets do,
               | they can only interpolate their training data, search
               | pushes outside of the known data distribution.
               | 
               | It's actually a combo of search+learning that is
               | necessary, learning is just the little brother of search,
               | it compresses novel insights into the model. You can
               | think of training a neural net also as search - the best
               | parameters that would fit the training set.
        
           | trotro wrote:
           | The methods are know, but the solutions to the IMO problems
           | weren't. So the AI did extrapolate a solution.
           | 
           | Also, there's no reason to affirm that an eventual cure for
           | cancer requires fundamentally new methods. Maybe the current
           | methods are sufficient, it's just that nobody has been
           | "smart" enough to put the pieces together. (disclaimer: not
           | an expert at all)
        
             | dsign wrote:
             | I think you are correct though. We don't need new physics
             | to cure cancer. But we may need information-handling,
             | reasoning and simulation systems which are orders of
             | magnitude bigger and more complex than anything we have
             | this year. We also need to stop pussy-footing and diddling
             | with ideologies and start working on the root cause of
             | cancer and almost every other disease, which is aging.
        
             | markusde wrote:
             | Unlike curing cancer, the IMO problems were specifically
             | designed to be solvable
        
           | xdavidliu wrote:
           | new doesn't necessarily mean "an extremal point that's not
           | the average of two existing points". The set of existing
           | knowledge is not necessarily continuous; the midpoint between
           | two known points may be unknown, and thus would be a "new"
           | point that could be obtained by interpolation.
        
           | im3w1l wrote:
           | I think this is kinda false actually on the cancer side. We
           | have reached a point where we have known approaches that
           | work. It's "just" a matter of putting them into practice
           | which will of course require solving many little details,
           | which is very important and time-consuming work, but it
           | doesn't require super-human genius level of lateral thinking,
           | just a few millions man years of grinding away at it.
        
       | iamronaldo wrote:
       | No benchmarks of any kind?
        
       | arnabgho wrote:
       | https://x.com/GoogleDeepMind/status/1816498082860667086
        
       | piombisallow wrote:
       | IMO problems aren't fundamentally different from chess or other
       | games, in that the answer is already known.
        
         | Smaug123 wrote:
         | I really don't understand what you mean by this. 1) it's not
         | known whether chess is a win for White or not. 2) IMO problems,
         | such as 2024 problem 1 which the system solved, are often
         | phrased as "Determine all X such that...".
        
           | diffeomorphism wrote:
           | You are attacking a straw man and the point made is pretty
           | good.
           | 
           | Competition problems are designed to be actually solvable by
           | contestants. In particular, the problems should be solvable
           | using a reasonable collection of techniques and many "prep
           | courses" will teach you many techniques, tools and algorithms
           | and a good starting point is to throw that stuff at any given
           | problem. So just like chess _openings_ putting in lots of leg
           | work will give you some good results for that part. You might
           | very well lose in mid and late game, just like this AI might
           | struggle with  "actual problems"
           | 
           | It is of course still very impressive, but that is an
           | important point.
        
             | Smaug123 wrote:
             | I'm attacking nobody! I literally couldn't understand the
             | point, so I said so: as stated, its premises are simply
             | clearly false!
             | 
             |  _Your_ point, however, is certainly a good one: IMO
             | problems are an extremely narrow subset of the space of
             | mathematical problems, which is itself not necessarily even
             | 50% of the space of the _work of a mathematician_.
        
         | wufufufu wrote:
         | Kinda? Chess isn't solved. Complex problems can have better
         | solutions discovered in the future.
        
           | jeremyjh wrote:
           | It isn't solved but the evaluation (which side is better, by
           | how much, and which moves are best) of a strong engine is -
           | for all practical purposes - an answer to every chess
           | position you can pose to it. This makes it easy to gauge
           | improvement and benchmark against other systems compared to
           | some other problems.
        
         | Davidzheng wrote:
         | lol
        
         | osti wrote:
         | But the answer is probably not known by you, in particular.
        
           | piombisallow wrote:
           | Yes, sure, but this doesn't mean that this generalizes to
           | open math research problems, which would be the useful real-
           | world application of this. Otherwise this is just playing
           | known games with known rules, granted better/faster than
           | humans.
        
         | energy123 wrote:
         | IMO and Chess are the same in the most important respect, you
         | can use Lean or a simulated chess game to create unlimited
         | quality training labels. Any problem of this category should be
         | solved with enough compute and clever
         | architecture/metacognition design. The more intractable
         | problems are where data is hard to find or synthesize.
        
         | almostgotcaught wrote:
         | > in that the answer is already known.
         | 
         | you realize this holds true for all of math right? outside of
         | godel incompleteness potholes every proof/theorem is a
         | permutation of ZFC. And you can fix the potholes by just
         | filling them in with more Cs.
        
           | diffeomorphism wrote:
           | That seems shallow and not really useful. Like sorting is
           | easy, just take all permutations and look at each one to see
           | whether is sorted.... it just might you take longer than the
           | heat death of the universe to actually do that.
        
           | lanstin wrote:
           | There are elementary Diophantine equations that are
           | independent of ZFC. What is your approach for those?
        
       | petters wrote:
       | The problems were first converted into a formal language. So they
       | were partly solved by the AI
        
         | jeremyjh wrote:
         | Yes and it is difficult for me to believe that there is not
         | useful human analysis and understanding involved in this
         | translation that the AI is helpless without. But that I suppose
         | is a problem that could be tackled with a different model...
        
           | adrianN wrote:
           | Even so, having a human formalize the problems and an AI to
           | find machine checkable proofs could be very useful for
           | mathematicians.
        
           | sebzim4500 wrote:
           | It is vastly easier to do the formalization than to actually
           | solve the problem. Any undergraduate with some lean
           | familiarity could do it in minutes.
        
             | Davidzheng wrote:
             | Disagree! Some problems are much harder than others. If you
             | don't believe me please go formalize P5 in this year imo.
        
         | rpois wrote:
         | Does this formalization process include giving it the answer it
         | should try to prove?
        
           | Smaug123 wrote:
           | Nope, per Oliver Nash who worked on the thing:
           | https://news.ycombinator.com/item?id=41070372
        
         | clbrmbr wrote:
         | IIUC, a Gemini-based system could translate the natural
         | language questions into Lean, but in the blog post they don't
         | really commit to whether this was done just to generate
         | training data or was used in the competition.
        
         | golol wrote:
         | Formalization is in principle just a translation process and
         | should be a much simpler problem than the actual IMO problem.
         | Besides, they also trained a Gemini model which formalizes
         | natural language problems, and this is how they generated
         | training data for AlphaProof. I would therefore expect that
         | they could have also formalized the IMO problems with that
         | model and just did it manually because the point is not to
         | demonstrate formalizing but instead proof capabilities.
        
           | riku_iki wrote:
           | > Formalization is in principle just a translation process
           | and should be a much simpler problem than the actual IMO
           | problem
           | 
           | maybe not, because you need to connect many complicated
           | topics/terms/definitions together, and you don't have a way
           | to reliably verify if formalized statement is correct.
           | 
           | They built automatic formalization network in this case, but
           | didn't trust it and formalized it manually.
        
           | pishpash wrote:
           | Yet the facts at hand are the opposite of what you say.
           | Reliable formalizer was the more difficult problem than
           | solving formalized IMO problems, because they have not
           | produced one.
        
             | fngjdflmdflg wrote:
             | That does not necessarily follow from the facts at hand.
             | For example they may have prioritized work on the proof
             | solver itself as they may feel that that is the more
             | important result. Alternatively if their goal is to build a
             | proof solver then building the formalizer would be useless
             | if they could not build the actual proof solver.
        
               | pishpash wrote:
               | A proof solver existed. They were improving the proof
               | solver explicitly by making the formalizer a part of the
               | training. Formalizer reliability is the key novelty. It
               | turns out it was only reliable enough for training. So
               | unless they made the problem statement at the outset that
               | "we'll only make the formalizer strong enough to train
               | but not use", I disagree with that assessment.
        
               | fngjdflmdflg wrote:
               | That's a good point. The formalizer was used to created
               | the training data for the proof solver, so they likely
               | worked on it more than if they just used it as a
               | preprocessing step during inference. It is still possible
               | that they worked on the formalizer until they got good
               | results from it enough to create good training data, and
               | then began training as soon as possible and did not spend
               | too much time trying to improve the formalizer. Depending
               | on how long the training was expected to take, perhaps
               | that is a reasonable assumption. Although I think I agree
               | more with your view now.
        
         | trotro wrote:
         | But formalization is the easy part for humans. I'm sure every
         | mathematician would be be happy if the only thing required to
         | prove a result was to formalize it in Lean and feed it to the
         | AI to find the proof.
        
           | Davidzheng wrote:
           | Not sure every mathematician would be happy to do this... it
           | sounds much less pleasant than thinking. It's like saying
           | mathematicians would rather be programmers lol. It's a
           | significant difficult problem which i believe should be left
           | completely to AI. Human formalization should become dead
        
       | c0l0 wrote:
       | That's great, but does that particular model also know
       | if/when/that it _does not_ know?
        
         | foota wrote:
         | Never?
         | 
         | Edit: To defend my response, the model definitely knows when it
         | hasn't yet found a correct response, but this is categorically
         | different from knowing that it does not know (and of course
         | monkeys and typewriters etc., can always find a proof
         | eventually if one exists).
        
         | ibash wrote:
         | Yes
         | 
         | > AlphaProof is a system that trains itself to prove
         | mathematical statements in the formal language Lean. ... Formal
         | languages offer the critical advantage that proofs involving
         | mathematical reasoning can be formally verified for
         | correctness.
        
         | diffeomorphism wrote:
         | While that was probably meant to be rhetorical, the answer
         | surprisingly seems to be an extremely strong "Yes, it does".
         | Exciting times.
        
       | PaulHoule wrote:
       | See https://en.wikipedia.org/wiki/Automated_Mathematician for an
       | early system that seems similar in some way.
        
         | golol wrote:
         | This Wikipedia page makes AM kind of comes across as a nonsense
         | project whose outputs no one (besides the author) bothered to
         | decipher.
        
       | Davidzheng wrote:
       | HOLY SHIT. It's amazing
        
       | osti wrote:
       | So they weren't able to solve the combinatorics problem. I'm not
       | super well versed in competition math, but combinatorics always
       | seem to be the most interesting problems to me.
        
       | mupuff1234 wrote:
       | Can it / did it solve problems that weren't solved yet?
        
         | raincole wrote:
         | Techinically yes. And it's easy. You can probably do it with
         | your PC's computational power.
         | 
         | The thing is that most math "problems" are not solved not
         | becasue they're hard, but because they're not interesting
         | enough to even be discovered by humans.
        
           | mupuff1234 wrote:
           | Yeah, I mean "interesting" problems (perhaps not fields medal
           | interesting, but interesting enough)
        
       | Ericson2314 wrote:
       | The lede is a bit buried: they're using Lean!
       | 
       | This is important for more than Math problems. Making ML models
       | wrestle with proof systems is a good way to avoid bullshit in
       | general.
       | 
       | Hopefully more humans write types in Lean and similar systems as
       | a much way of writing prompts.
        
         | Ericson2314 wrote:
         | They're def gonna go after the Riemann hypothesis with this,
         | hehe.
        
           | nwoli wrote:
           | Guessing the context here is that the RH was recently
           | translated into Lean. Would be very cool if they threw their
           | compute on that
        
             | Smaug123 wrote:
             | I think you might be thinking of the recent project to
             | start Fermat's Last Theorem? The Riemann hypothesis has
             | been easy to state (given what's in Mathlib) for years.
        
               | Davidzheng wrote:
               | Yeah lol i don't think either is hard to formalize in
               | lean
        
               | raincole wrote:
               | They're not just formalizing Fermant's Last Theorem's
               | statement itself. They're formalizing the proof.
        
         | Smaug123 wrote:
         | And while AlphaProof is clearly _extremely_ impressive, it does
         | give the computer an advantage that a human doesn 't have in
         | the IMO: nobody's going to be constructing Grobner bases in
         | their head, but `polyrith` is just eight characters away. I saw
         | AlphaProof used `nlinarith`.
        
           | Ericson2314 wrote:
           | Hehe, well, we'll need to have a tool-assited international
           | math Olympiad then.
        
             | ComplexSystems wrote:
             | This is the greatest idea ever. Why doesn't this exist?
        
               | panagathon wrote:
               | This does sound like a lot of fun. Since this AI only
               | reaches silver level, presumably such a contest could be
               | quite competitive, with it not yet being clear cut
               | whether a human or a tool-assisted human would come out
               | on top, for any particular problem.
        
           | Davidzheng wrote:
           | Good. I want my AI to use all the advantages it has to
           | reinvent the landscape of mathematics
        
           | xrisk wrote:
           | Can you give some context on how using Lean benefits?
           | 
           | In my understanding, proofs are usually harder to transcribe
           | into Lean which is nobody _writes_ proofs using Lean.
           | 
           | What is a nlinarith?
        
             | llwu wrote:
             | nlinarith is a proof automation that attempts to finish a
             | proof using the simplex method to find a linear combination
             | of hypotheses and things that have already been proven, as
             | well as some quadratic terms of them.
             | 
             | Docs: https://leanprover-
             | community.github.io/mathlib4_docs/Mathlib...
        
           | empath75 wrote:
           | Don't think of lean as a tool that the ai is using (ie,
           | cheating), think of lean plus AlphaProof as a single system.
           | There's no reason to draw artificial boundaries around where
           | the AI is and where the tools that the AI is using are. Lean
           | itself is a traditional symbolic artificial intelligence
           | system.
           | 
           | People want always knock generative AIs for not being able to
           | reason, and we've had automated systems that reason perfectly
           | well for decades, but for some reason that doesn't count as
           | AI to people.
        
           | sebzim4500 wrote:
           | The uses of `nlinarith` are very straight forward
           | manipulations of inequalities, they would be one or two steps
           | for a human too.
        
         | queuebert wrote:
         | That's amazing. I was just about to comment that hooking this
         | up to Lean [1] would be killer. This must be the way forward
         | for higher math, as proofs are getting so complicated that
         | almost no one understands all pieces of major proofs.
         | 
         | 1. https://lean-lang.org/
        
       | riku_iki wrote:
       | Example of proof from AlphaProof system:
       | https://storage.googleapis.com/deepmind-media/DeepMind.com/B...
        
       | golol wrote:
       | This is the real deal. AlphaGeometry solved a very limited set of
       | problems with a lot of brute force search. This is a much broader
       | method that I believe will have a great impact on the way we do
       | mathematics. They are really implementing a self-feeding pipeling
       | from natural language mathematics to formalized mathematics where
       | they can train both formalization and proving. In principle this
       | pipeline can also learn basic theory building like creating
       | auxilliary definitions and Lemmas. I really think this is the
       | holy grail of proof-assistance and will allow us to formalize
       | most mathematics that we create very naturally. Humans will work
       | podt-rigorously and let the machine asisst with filling in the
       | details.
        
         | visarga wrote:
         | > a lot of brute force search
         | 
         | Don't dismiss search, it might be brute force but it goes
         | beyond human level in Go and silver at IMO. Search is also what
         | powers evolution which created us, also by a lot of brute
         | forcing, and is at the core of scientific method (re)search.
        
           | Eridrus wrote:
           | Search is great, search works, but there was not a tonne to
           | learn from the AlphaGeometry paper unless you were
           | specifically interested in solving geometry problems.
        
           | kypro wrote:
           | My old AI professor used to say that every problem is a
           | search problem.
           | 
           | The issue is that to find solutions for useful problems
           | you're often searching through highly complex and often
           | infinite solution spaces.
        
             | visarga wrote:
             | For some problems validation is expensive. Like the
             | particle collider or space telescope, or testing the COVID
             | vaccine. It's actually validation that is the bottleneck in
             | search not ideation.
        
             | pishpash wrote:
             | There's no problem with search. The goal is to search most
             | efficiently.
        
               | deely3 wrote:
               | You mean that by improving search we can solve any
               | problem? What if solution field is infinite, even if we
               | make search algo 10x100 more performant, solution field
               | will still be infinite, no?
        
               | pishpash wrote:
               | Gradient descent is a search. Where does it say the
               | search space has to be small?
        
             | shkkmo wrote:
             | I would argue that no actually searchable solution space is
             | really infinite (if only because infinite turing machines
             | can't exist). Finite solution spaces can get more than
             | large enough to be intractable.
        
               | smokel wrote:
               | What about N? Seems pretty infinite to me, unless with
               | "actually" you mean finite in time and space, which would
               | make your argument a tautology. Or am I missing
               | something?
        
           | Davidzheng wrote:
           | Yes and there's a lot of search here too. That's a key to the
           | approach
        
           | thomasahle wrote:
           | Also AlphaProof had to search for 60 hours for one of the IMO
           | problems it solved.
        
             | renonce wrote:
             | It's going to be significantly faster very soon, we have
             | seen how AlphaGo evolved into KataGo which is many
             | magnitudes more compute efficient
        
             | randcraw wrote:
             | And I understand the upper time limit for each question was
             | 4.5 hours. So it solved one question almost immediately,
             | two well over the allotted time (60 hrs), and two not at
             | all. No medal for you, Grasshopper.
        
           | gowld wrote:
           | What makes solving IMO problems hard is usually the limits of
           | human memory, pattern-matching, and search, not creativity.
           | After all, these are problems that are _already_ solved, and
           | it is expected that many people can solve the problems in
           | about 1 hour 's time.
           | 
           | That makes it, in principle, similar or even easier than a
           | champsionship-level chess move, which often take more than 1
           | hour for a professional human (with more training than an IMO
           | high school student) to solve.
           | 
           | Another interesting concern is that when posing a problem to
           | humans, it's fine to pose an "easy" brute-forceable problem,
           | but humans, being slow brute-searchers, need to find more
           | clever solutions. But if you give such a problem to a
           | computer, it can trivialize it. So to test a computer, you
           | need to pose non- easily-brute-forceable problems, which are
           | harder for the computer than the others, but equally
           | difficult for the humans as the other problems are.
        
             | sebzim4500 wrote:
             | Ok but if you read the actual solutions they aren't a
             | bizarre mess of brute force.
             | 
             | They look like what a human would write if they were trying
             | to come up with a formal proof (albeit it does some steps
             | in a weird order).
        
         | fmap wrote:
         | Agreed, this is a big step forward. Geometry problems are in a
         | different class, since you can translate them into systems of
         | polynomial equations and use well known computer algebra
         | algorithms to solve them.
         | 
         | By contrast, this kind of open ended formalization is something
         | where progress used to be extremely slow and incremental. I
         | worked in an adjacent field 5 years ago and I cannot stress
         | enough that these results are simply out of reach for
         | traditional automated reasoning techniques.
         | 
         | Real automatic theorem proving is also useful for a lot more
         | than pure mathematics. For example, it's simple to write out an
         | axiomatic semantics for a small programming language in lean
         | and pose a question of the form "show that there exists a
         | program which satisfies this specification".
         | 
         | If this approach scales it'll be far more important than any
         | other ML application that has come out in the last few years.
        
           | cubefox wrote:
           | > Agreed, this is a big step forward. Geometry problems are
           | in a different class, since you can translate them into
           | systems of polynomial equations and use well known computer
           | algebra algorithms to solve them.
           | 
           | The blog post indicates the opposite. The geometry problem in
           | the IMO problem set was solved by AlphaGeometry 2, which is
           | an LLM based on Google's Gemini. LLMs are considered
           | relatively general systems. But the other three solved
           | problems were proved by AlphaProof, which is a narrow RL
           | system that is literally based on AlphaZero, the Go and Chess
           | AI. Only its initial (bootstrapping) human training data
           | (proofs) were formalized and augmented by an LLM (Gemini).
        
         | EugeneOZ wrote:
         | No. It's like you are allowed to use search engines to find a
         | solution, nothing more than that.
        
         | geysersam wrote:
         | I imagine a system like this to be vastly more useful outside
         | the realm of mathematics research.
         | 
         | You don't need to be able to prove very hard problems to do
         | useful work. Proving just simple things is often enough. If I
         | ask a language model to complete a task, organize some entries
         | in a certain way, or schedule this or that, write a code that
         | accomplishes X, the result is typically not trustworthy
         | directly. But if the system is able to translate parts of the
         | problem to logic and find a solution, that might make the
         | system much more reliable.
        
           | creata wrote:
           | But for it to be 100% trustworthy, _you 'd_ have to express
           | correctness criteria for those simple tasks as formal
           | statements.
        
       | Smaug123 wrote:
       | So I am extremely hyped about this, but it's not clear to me how
       | much heavy lifting this sentence is doing:
       | 
       | > First, the problems were manually translated into formal
       | mathematical language for our systems to understand.
       | 
       | The non-geometry problems which were solved were all of the form
       | "Determine all X such that...", and the resulting theorem
       | statements are all of the form "We show that the set of all X is
       | {foo}". The downloadable solutions from
       | https://storage.googleapis.com/deepmind-media/DeepMind.com/B...
       | don't make it clear whether the set {foo} was decided by a human
       | during this translation step, or whether the computer found it. I
       | _want_ to believe that the computer found it, but I can 't find
       | anything to confirm. Anyone know?
        
         | ocfnash wrote:
         | The computer did find the answers itself. I.e., it found "even
         | integers" for P1, "{1,1}" for P2, and "2" for P6. It then also
         | provided provided a Lean proof in each case.
        
           | nnarek wrote:
           | formal definition of first theorem already contain answer of
           | the problem "{a : R | [?] k : Z, Even k [?] a = k}" (which
           | mean set of even real numbers).if they say that they have
           | translated first problem into formal definition then it is
           | very interesting how they initially formalized problem
           | without including answer in it
        
             | golol wrote:
             | I would expect that in their data which they train
             | AlphaProof on they have some concept of a "vague problem"
             | whoch could just look like
             | 
             | {Formal description of the set in question} = ?
             | 
             | And then Alphaproof has to find candidate descriptions of
             | this set and prove a theorem that they are equal to the
             | above.
             | 
             | I doubt they would claim to solve the problem if they
             | provided half of the answer.
        
               | puttycat wrote:
               | > I doubt they would claim to solve the problem if they
               | provided half of the answer.
               | 
               | Stranger things have happened
        
             | riku_iki wrote:
             | its not clear if theorem is actual input formal definition,
             | or formal definition was in different form.
        
             | cygaril wrote:
             | Come up with many possible answers, formalize them all, and
             | then try to prove or disprove each of them.
        
               | Davidzheng wrote:
               | This is probably partially what they did idk why it's
               | downvoted lol
        
             | Smaug123 wrote:
             | (You're talking to one of the people who was part of the
             | project, which is why I took @ocfnash's answer as
             | authoritative: they did not cheat.)
        
               | Xelynega wrote:
               | If they're talking to the people who are part of the
               | project I'd hope the answer would contain detail and not
               | expect to be taken as authoritative.
        
               | refulgentis wrote:
               | If they're talking to the public I'd hope that they don't
               | face an infinite loop of ever-more stringent requirements
               | for proof they didn't give it the answer in the question.
        
               | pishpash wrote:
               | That's a weird perception of talking to the public.
        
               | refulgentis wrote:
               | ?
        
             | pishpash wrote:
             | Exactly, a problem and its answer are just different ways
             | of describing the same object. Every step of a proof is a
             | transformation/translation of the same object. It would be
             | disingenuous to say that _some_ heavy lifting isn 't done
             | in formalizing a problem but it seems that step is also
             | performed by a machine:
             | 
             | "We established a bridge between these two complementary
             | spheres by fine-tuning a Gemini model to automatically
             | translate natural language problem statements into formal
             | statements, creating a large library of formal problems of
             | varying difficulty."
             | 
             | I'm confused, is the formalization by Gemini or "manually"?
             | Which is it?
        
           | Davidzheng wrote:
           | Can you elaborate on how it makes guesses like this? Does it
           | do experiments before? Is it raw LLM? Is it feedback loop
           | based on partial progress?
        
             | Sharlin wrote:
             | "AlphaProof is a system that trains itself to prove
             | mathematical statements in the formal language Lean. It
             | couples a pre-trained language model with the AlphaZero
             | reinforcement learning algorithm, which previously taught
             | itself how to master the games of chess, shogi and Go."
        
               | JKCalhoun wrote:
               | Yeah I am not clear the degree to which this system and
               | LLMs are related. Are they related? Or is AlphaProof a
               | complete tangent to CHatGPT and its ilk?
        
               | gowld wrote:
               | It's not an English LLM (Large Language Model).
               | 
               | It's a math Language Model. Not even sure it's a Large
               | Language Model. (Maybe shares a foundational model with
               | an English LLM; I don't know)
               | 
               | It learns mathematical statements, and generates new
               | mathematical statements, then uses search techniques to
               | continue. Similar to Alpha Go's neural network, what
               | makes it new and interesting is how the NN/LLM part makes
               | smart guesses that drastically prune the search tree,
               | before the brute-force search part.
               | 
               | (This is also what humans do to solve math probrems. But
               | humans are really, really slow at brute-force search, so
               | we really almost entirely on the NN pattern-matching
               | analogy-making part.)
        
               | sebzim4500 wrote:
               | My reading of it is that it uses the same architecture as
               | one of the Gemini models but does not share any weights
               | with it. (i.e it's not just a finetune)
        
               | nextos wrote:
               | These kind of LLMs are also very interesting for software
               | engineering. It's just a matter of replacing Lean with
               | something that is more oriented towards proving software
               | properties.
               | 
               | For example, write a formal specification of a function
               | in Dafny on Liquid Haskell and get the LLM to produce
               | code that is formally guaranteed to be correct. Logic-
               | based + probability-based ML.
               | 
               | All GOFAI ideas are still very useful.
        
         | dooglius wrote:
         | The linked page says
         | 
         | > While the problem statements were formalized into Lean by
         | hand, the answers within the problem statements were generated
         | and formalized by the agent.
         | 
         | However, it's unclear what initial format was given to the
         | agents that allowed this step
        
           | Smaug123 wrote:
           | FWIW, GPT-4o transcribed a screenshot of problem 1 perfectly
           | into LaTeX, so I don't think "munge the problem into machine-
           | readable form" is per se a difficult part of it these days
           | even if they did somehow take shortcuts (which it sounds like
           | they didn't).
        
             | pclmulqdq wrote:
             | Comparing "turn photo into LaTeX" to "translate theorems
             | into Lean" is like comparing a child's watercolor drawing
             | to the Mona Lisa.
        
               | Smaug123 wrote:
               | ... no? After the LaTeX output, I told stock GPT4o that
               | the answer was "all even integers", and asked for the
               | statement in Lean. I had to make _two changes_ to its
               | output (both of which were compile-time errors, not
               | misformalisations), and it gave me the formalisation of
               | the difficult direction of the problem.
               | 
               | Both changes were trivial: it had one incorrect (but
               | unnecessary) import, and it used the syntax from Lean 3
               | instead of Lean 4 in one lambda definition. A system that
               | was trained harder on Lean would not make those mistakes.
               | 
               | The one _actual_ error it made was in not proposing that
               | the other direction of the  "if and only if" is required.
               | Again, I am quite confident that this formalisation
               | failure mode is not hard to solve in a system that is,
               | like, actually trained to do this.
               | 
               | Obviously formalising _problems that a working
               | mathematicican solves_ is dramatically harder than
               | formalising IMO problems, and is presumably _way_ ahead
               | of the state of the art.
        
               | lmm wrote:
               | > I am quite confident that this formalisation failure
               | mode is not hard to solve in a system that is, like,
               | actually trained to do this.
               | 
               | Why?
        
               | sebzim4500 wrote:
               | I think that's exagerating a bit. If you are familiar
               | with both Lean and LaTeX then I think transcribing these
               | problems to Lean only takes about twice as long as
               | transcribing them to LaTeX.
        
           | pclmulqdq wrote:
           | So if Lean was used to find the answers, where exactly is the
           | AI? A thin wrapper around Lean?
        
             | dooglius wrote:
             | Lean checks that the proof is valid, it didn't find the
             | proof.
        
             | xrisk wrote:
             | Lean is just the language, Presumably to drive the AI
             | towards the program ("the proof")
        
             | pishpash wrote:
             | The AI is the "solver network", which is the (directed)
             | search over solutions generated by Lean. The AI is in doing
             | an efficient search, I suppose.
             | 
             | I'm also waiting for my answer on the role of the Gemini
             | formalizer, but reading between the lines, it looks like it
             | was only used during training the "solver network", but not
             | used in solving the IMO problems. If so then the hyping is
             | greatly premature, as the hybrid formalizer/solver is the
             | whole novelty of this, but it's not good enough to use end-
             | to-end?
             | 
             | You cannot say AlphaProof learned enough to solve problems
             | if formalization made them easier to solve in the first
             | place! You can say that the "solver network" part learned
             | enough to solve formalized problems better than prior
             | training methods.
        
             | sdenton4 wrote:
             | Think of problems in NP - you can check the answer
             | efficiently, but finding the answer to check is the hard
             | part... This is basically what we're looking at here: The
             | proof checker can quickly evaluate correctness, but we need
             | something to produce the proof, and that's the hard part.
        
         | golol wrote:
         | > When presented with a problem, AlphaProof generates solution
         | candidates and then proves or disproves them by searching over
         | possible proof steps in Lean.
         | 
         | To me, this sounds like Alphaproof receives a "problem",
         | whatever that is (how do you formalize "determine all X such
         | that..."? One is asked to show that an abstract set is actually
         | some easily understandable set...). Then it generates candidate
         | Theorems, persumably in Lean. I.e. the set is {n: P(n)} for
         | some formula or something. Then it searches for proofs.
         | 
         | I think if Alphaproof did not find {foo} but it was given then
         | it would be very outrageous to claim that it solved the
         | problem.
         | 
         | I am also very hyped.
        
         | zerocrates wrote:
         | Interesting that they have a formalizer (used to create the
         | training data) but didn't use it here. Not reliable enough?
        
         | summerlight wrote:
         | To speak generally, that translation part is _much_ easier than
         | the proof part. The problem with automated translation is that
         | the translation result might be incorrect. This happens a lot
         | when even people try formal methods by their hands, so I guess
         | the researchers concluded that they 'll have to audit every
         | single translation regardless of using LLM or whatever tools.
        
           | thomasahle wrote:
           | You'd think that, but Timothy Gowers (the famous
           | mathematician they worked with) wrote
           | (https://x.com/wtgowers/status/1816509817382735986)
           | 
           | > However, LLMs are not able to autoformalize reliably, so
           | they got them to autoformalize each problem many times. Some
           | of the formalizations were correct, but even the incorrect
           | ones were useful as training data, as often they were easier
           | problems.
           | 
           | So didn't actually solve autoformalization, which is why they
           | still needed humans to translate the input IMO 2024 problems.
           | 
           | The reason why formalization is harder than you think is that
           | there is no way to know if you got it right. You can use
           | Reinforcement Learning with proofs and have a clear signal
           | from the proof checker. We don't have a way to verify
           | formalizations the same way.
        
             | llwu wrote:
             | > We don't have a way to verify formalizations the same
             | way.
             | 
             | While there is no perfect method, it is possible to use the
             | agent to determine if the statement is false, has
             | contradictory hypotheses, or a suspiciously short proof.
        
             | thrdbndndn wrote:
             | > However, LLMs are not able to autoformalize reliably, so
             | they got them to autoformalize each problem many times.
             | Some of the formalizations were correct, but even the
             | incorrect ones were useful as training data, as often they
             | were easier problems.
             | 
             | A small detail wasn't clear to me: for these incorrectly
             | formalized problems, how do they get the correct _answer_
             | as ground truth for training? Have a human to manually
             | solve them?
             | 
             | (In contrast to problems actually from "a huge database of
             | IMO-type problems", they do have answers for these
             | already).
        
               | adrianN wrote:
               | You write proofs in a formal language that can be machine
               | checked. If the checker is happy, the proof is correct
               | (unless there is a bug in the checker, but that is
               | unlikely).
        
               | raincole wrote:
               | They said the incorrectly formalized ones are usually
               | easier, so I assume they just hire humans to solve them
               | in the old way until the AI is smart enough to solve
               | these easier problems.
        
               | summerlight wrote:
               | > A small detail wasn't clear to me: for these
               | incorrectly formalized problems, how do they get the
               | correct answer as ground truth for training? Have a human
               | to manually solve them?
               | 
               | Formal proofs can be mechanically checked if it's correct
               | or not. We just don't know what's the answer. Think it as
               | an extremely rigorous type system that typically requires
               | really long type annotations, like annotation itself is a
               | complex program. So if AlphaProof happens to generate a
               | proof that passes this checker, we know that it's
               | correct.
        
           | ajross wrote:
           | > To speak generally, that translation part is much easier
           | than the proof part.
           | 
           | To you or me, sure. But I think the proof that it isn't for
           | this AI system is that they didn't do it. Asking a modern LLM
           | to "translate" something is a pretty solved problem, after
           | all. That argues strongly that what was happening here is not
           | a "translation" but something else, like a semantic
           | distillation.
           | 
           | If you ask a AI (or person) to prove the halting problem,
           | they can't. If you "translate" the question into a specific
           | example that does halt, they can run it and find out.
           | 
           | I'm suspicious, basically.
        
         | kurthr wrote:
         | As is often the case, creating a well formed problem statement
         | often takes as much knowledge (if not work) as finding the
         | solution.
         | 
         | But seriously, if you can't ask the LLM to solve the right
         | question, you can't really expect it to give you the right
         | answer unless you're really lucky. "I'm sorry, but I think you
         | meant to ask a different question. You might want to check the
         | homework set again to be sure, but here's what I think you
         | really want."
        
         | gowld wrote:
         | The article says
         | 
         | > AlphaProof solved two algebra problems and one number theory
         | problem by _determining the answer_ and proving it was correct.
        
         | sebzim4500 wrote:
         | I as someone with a maths degree but who hasn't done this kind
         | of thing for half a decade, was able to immediately guess the
         | solution to (1) but actually proving it is much harder.
        
       | balls187 wrote:
       | What was the total energy consumption required to acheive this
       | result (both training and running)
       | 
       | And, how much CO2 was released into earths atmosphere?
        
         | ozten wrote:
         | Compared to all of the humans who compete at this level and
         | their inputs and outputs for the trailing 5 years.
        
           | balls187 wrote:
           | And?
           | 
           | The result is (likely) net energy consumption, resulting in
           | (likely) net CO2 emissions.
           | 
           | So, what was did it cost us for this achievement in AI?
           | 
           | EDIT TO ADD: It's fair to think that such a presser should
           | not include answers to my questions. But, it's also fair to
           | want that level of transparency given we are dealing with
           | climate change.
        
             | regularfry wrote:
             | You're not wrong and _in general_ the conclusion is that AI
             | emits less CO2 than a human performing the same task.
             | Whether that 's true for this specific task is worth
             | asking, as is the question of how efficient such a process
             | can be made.
        
         | amelius wrote:
         | There's no energy limit in the IMO rules.
        
           | balls187 wrote:
           | The point isn't IMO rules.
           | 
           | It's that we are living in a period of time where there are
           | very real consequences of nearly a century of unchecked CO2
           | due to human industry.
           | 
           | And AI (like crypto before it) requires considerable energy
           | consumption. Because of which, I believe we (people who
           | believe in AI) need to hold companies accountable by very
           | transparently disclosing those energy costs.
        
             | amelius wrote:
             | What if at some point AI figures out a solution to climate
             | change?
        
               | Sivart13 wrote:
               | Why would we be any more likely to implement it, relative
               | to the solutions that humans have already figured out for
               | climate change?
        
               | sensanaty wrote:
               | Well we can be confident in the knowledge that techbros
               | might finally take the issue seriously if an AI tells
               | them to!
        
               | ks2048 wrote:
               | I know this is not an uncommon opinion in tech circles,
               | but I believe an insane thing to hang humanities hopes
               | on. There's no reason to think AI will be omnipotent.
        
               | balls187 wrote:
               | There is not, but there is plenty of historical evidence
               | that scientific and technological progress has routinely
               | addressed humanities crisis du jour.
        
               | staunton wrote:
               | You mean new and incredibly effective ways to shape
               | public opinion? I guess it might. But then someone would
               | still have to use it for that purpose...
        
             | password54321 wrote:
             | > need to hold companies accountable by very transparently
             | disclosing those energy costs.
             | 
             | And if they do, then what? If it is "too high" do we delay
             | research because we need to keep the world how it is for
             | you? What about all the other problems others face that
             | could be solved by doubling down on compute for AI
             | research?
        
               | balls187 wrote:
               | > And if they do, then what? If it is "too high" do we
               | delay research because we need to keep the world how it
               | is for you?
               | 
               | First, it's keeping the world how it is for _all of us_ ,
               | not just me.
               | 
               | Second, to answer you question, I think that is a
               | decision for _all of us_ to weigh in on, but before we
               | can do that, we must be informed as best as we can.
               | 
               | Do sacrifices have to be made for the greater good?
               | Absolutely. Do for-profit mega corporations get to make
               | those decisions without consent from the public? No.
        
             | xyzzy123 wrote:
             | They are 100% accountable for paying the bill from their
             | electricity provider.
        
       | majikaja wrote:
       | It would be nice if on the page they included detailed
       | descriptions of the proofs it came up with, more information
       | about the capabilities of the system and insights into the
       | training process...
       | 
       | If the data is synthetic and covers a limited class of problems I
       | would imagine what it's doing mostly reduces to some basic search
       | pattern heuristics which would be of more value to understand
       | than just being told it can solve a few problems in three days.
        
         | cygaril wrote:
         | Proofs are here: https://storage.googleapis.com/deepmind-
         | media/DeepMind.com/B...
        
           | majikaja wrote:
           | I found those, I just would have appreciated if the content
           | of the mathematics wasn't sidelined to a separate download as
           | if it's not important. I felt the explanation on the page was
           | shallow, as if they just want people to accept it's a black
           | box.
           | 
           | All I've learnt from this is that they used an unstated
           | amount of computational resources just to basically brute
           | force what a human already is capable of doing in far less
           | time.
        
             | Davidzheng wrote:
             | Very few humans can after years of training. Please don't
             | trivialize.
        
               | necovek wrote:
               | Very few humans go after this type of the training. In my
               | "math talent" school (many of the Serbian/Yugoslavian
               | medal winners came from it), at most a dozen students
               | "trained" for this over 4 high school generations (500
               | students).
               | 
               | Problems are certainly not trivial, but humans are not
               | really putting all their effort into it either, and the
               | few that do train for it, on average medal 50% of the
               | time and get a silver or better 25% of the time (by
               | design) with much less time available to do the problems.
        
       | fancyfredbot wrote:
       | I'm seriously jealous of the people getting paid to work on this.
       | Sounds great fun and must be incredibly satisfying to move the
       | state of the art forward like that.
        
         | Mithriil wrote:
         | Best we can do then is keep ourselves up to date and give our
         | support!
        
         | bearjaws wrote:
         | C'mon you're meant to be re-configuring 3,292,329 line of YML
         | for K8s.
         | 
         | (/s)
        
           | psbp wrote:
           | It's funny that if I could describe my entire career, it
           | would probably be something similar to software
           | janitor/maintenance worker.
           | 
           | I guess I should have pursued a PhD when I was younger.
        
         | onemoresoop wrote:
         | You probably mean envious not jealous.
        
           | yalok wrote:
           | I'm learning something new today. In some other languages
           | these 2 are usually the same 1 word.
        
             | Vinnl wrote:
             | Huh. So I tried to look it up just now and I'm not sure if
             | I understand the difference. (To the extent that there is
             | one - apparently one _can_ mean the other, but I imagine
             | they 're usually used as follows.)
             | 
             | It looks like "jealous" is more being afraid of losing
             | something you have (most commonly e.g. a spouse's
             | affection) to someone else, whereas "envious" is wanting
             | what someone else has?
        
               | onemoresoop wrote:
               | Correct.
        
             | onemoresoop wrote:
             | No worries. I learned this too a while ago (was also using
             | jealous instead of envious and vice versa myself). From my
             | understanding the use of jealous is when you have something
             | but that is threatened by some external factor, eg a
             | partner, a friend having more fun with somebody else.
             | Envious is when you covet something that you do not have
             | currently but wish to, which is in this case playing with
             | exciting tech.
        
         | GuB-42 wrote:
         | I don't know about that. A lot of the work that should have
         | been very satisfying turned out to be boring as hell, if not
         | toxic, while at the same time, some apparently mundane stuff
         | turned out to be really exciting.
         | 
         | I found the work environment to be more important than the
         | subject when it comes to work satisfaction. If you are working
         | on a world changing subject with a team of assholes, you are
         | going to have a bad time, some people really have a skill for
         | sucking the fun out of everything, and office politics are
         | everywhere, especially on world changing subjects.
         | 
         | On the other hand, you can have a most boring subject, say
         | pushing customer data to a database, and have the time of your
         | life: friendly team, well designed architecture, time for
         | experimentation and sharing of knowledge, etc... I have come to
         | appreciate the beauty of a simple thing that just works. It is
         | so rare, maybe even more rare than scientific breakthroughs.
         | 
         | Now, you can also have an awesome work environment _and_ an
         | awesome subject, it is like hitting the jackpot... and a good
         | reason to be envious.
        
       | lolinder wrote:
       | This is a fun result for AI, but a _very_ disingenuous way to
       | market it.
       | 
       | IMO contestants aren't allowed to bring in paper tables, much
       | less a whole theorem prover. They're given two 4.5 hour sessions
       | (9 hours total) to solve all the problems with nothing but
       | pencils, rulers, and compasses [0].
       | 
       | This model, meanwhile, was wired up to a theorem proover and took
       | three solid days to solve the problems. The article is extremely
       | light on details, but I'm assuming that most of that time was
       | guess-and-check: feed the theorem prover a possible answer, get
       | feedback, adjust accordingly.
       | 
       | If the IMO contestants were given a theorem prover and three days
       | (even counting breaks for sleeping and eating!), how would
       | AlphaProof have ranked?
       | 
       | Don't get me wrong, this is a fun project and an exciting result,
       | but their comparison to silver medalists at the IMO is just
       | feeding into the excessive hype around AI, not accurately
       | representing its current state relative to humanity.
       | 
       | [0] 5.1 and 5.4 in the regulations: https://www.imo-
       | official.org/documents/RegulationsIMO.pdf
        
         | gjm11 wrote:
         | Working mathematicians mostly don't use theorem provers in
         | their work, and find that when they do they go significantly
         | more slowly (with of course the compensating advantage of
         | guaranteeing no mistakes in the final result).
         | 
         | A theorem prover is probably more useful for the typical IMO
         | problem than for the typical real research problem, but even so
         | I'd guess that even with a reasonable amount of training most
         | IMO contestants would not do much better for having access to a
         | theorem prover.
         | 
         | Having three days would be a bigger deal, for sure. (But from
         | "computers can't do this" to "computers can do this, but it
         | takes days" is generally a much bigger step than from
         | "computers can do this, but it takes days" to "computers can do
         | this in seconds".)
        
         | golol wrote:
         | The point is not to compare AI and humans, it is to compare AI
         | and IMO-level math problems. It's not for sport.
        
           | lolinder wrote:
           | They're literally comparing AI to human IMO contestants.
           | "DeepProof solves 4/6 IMO problems correctly" would be the
           | non-comparison version of this press release and would give a
           | better sense for how it's actually doing.
        
             | golol wrote:
             | "Solving IMO problems at Silver-Medal level" is pretty much
             | equivalent to solving something like 4/6 problems. It is
             | only a disingenious comparison if you want to read it as a
             | comparison. I mean yea, many people will, but I don't care
             | anout them. People who are technically interested in this
             | know that the point is not to have a competition of AI with
             | humans.
        
               | lolinder wrote:
               | > but I don't care anout them
               | 
               | It's great that you feel safe being so aloof, but I
               | believe we have a responsibility in tech to turn down the
               | AI hype valve.
               | 
               | The NYT is currently running a piece with the headline
               | "Move Over, Mathematicians, Here Comes AlphaProof".
               | People see that, and people react, and we in tech are not
               | helping matters by carelessly making false comparisons.
        
               | golol wrote:
               | Why? Why is hype bad? What actual harm does it cause?
               | 
               | Also the headline is fair, as I do believe that
               | AlphaProof demonstrates an approach to mathematics that
               | will indeed invade mathematicians workspaces. And I say
               | that as a mathemstician.
        
               | Davidzheng wrote:
               | For sure. I feel like mathematicians are not paying
               | attention (maybe deliberately) we have a real shot of
               | solving some incredibly hard open problem in the next few
               | years.
        
               | visarga wrote:
               | I think search-based AI is on a different level than
               | imitation models like GPT. This is not a hallucinating
               | model, and it could potentially discover things that are
               | not written in any books.
               | 
               | Search is amazing. Protein folding searches for the
               | lowest energy configuration. Evolution searches for
               | ecological fit. Culture searches for progress, and
               | science for understanding. Even placing a foot on the
               | ground searches for dynamic equilibrium. Training a ML
               | model searches for best parateters to fit a dataset.
               | Search is universal. Supervised learning is just
               | imitative, search is extrapolative.
        
           | kenjackson wrote:
           | Exactly. The point is what can we eventually get AI to solve
           | problems which we as humans can't. Not if we can win the IMO
           | with a computer.
        
         | blackbear_ wrote:
         | And why aren't you complaining that human participants could
         | train and study for thousands of hours before attempting the
         | problems? And that the training materials they used was itself
         | created and perfected by hundreds of other people, after having
         | themselves spend countless hours studying?
        
           | lolinder wrote:
           | Because so did the AI?
        
             | blackbear_ wrote:
             | That's exactly my point? Both had to learn?
        
         | Davidzheng wrote:
         | I can tell you that as someone who could have gotten bronze (i
         | was too weak for the team) and is now a math phd--I would not
         | have scored as well as alphaproof in three days most likely. In
         | most problems either you find an idea soon or it can be much
         | much longer. It's just not a matter of working and constant
         | progress.
        
           | knotthebest wrote:
           | Skill issue tbh
        
             | Davidzheng wrote:
             | Agreed :'(
        
       | StefanBatory wrote:
       | Wow, that's absolutely impressive to hear!
       | 
       | Also it's making me think that in 5-10 years almost all tasks
       | involving computer scientists or mathematicians will be done in
       | AI. Perhaps people going into trades had a point.
        
         | visarga wrote:
         | Everything that allows for cheap validation is going that way.
         | Math, code, or things we can simulate precisely. LLM ideation +
         | Validation is a powerful combination.
        
       | quirino wrote:
       | I honestly expected the IOI (International Olympiad of
       | Informatics) to be "beaten" much earlier than the IMO. There's
       | AlphaCode, of course, but on the latest update I don't think it
       | was quite on "silver medal" level. And available LLM's are
       | probably not even on "honourable mention" level.
       | 
       | I wonder if some class of problems will emerge that human
       | competitors are able to solve but are particularly tricky for
       | machines. And which characteristics these problems will have
       | (e.g. they'll require some sort of intuition or visualization
       | that is not easily formalized).
       | 
       | Given how much of a dent LLM's are already making on beginner
       | competitions (AtCoder recently banned using them on ABC rounds
       | [1]), I can't help but think that soon these competitions will be
       | very different.
       | 
       | [1] https://info.atcoder.jp/entry/llm-abc-rules-en
        
         | oXman038 wrote:
         | IOI problems are more close to IMO combinatoric problems than
         | other IMO problem types. That might be the reason for that
         | delay. I personally like only combinatoric problems in IMO.
         | Thats why I drop math track and went IOI instead.
         | 
         | I feel why combinatoric is harder for AI models is the same
         | reason why LLM's are not great at reasoning anything out of
         | distribution. LLM's are good pattern recognizers and
         | fascinating at this point. But simple tasks like counting
         | intersections at the Venn diagrams requires more strategy and
         | less pattern recognition. Pure NN based models seem won't be
         | enough to solve these problems. AI agents and RL are promising.
         | 
         | I don't know anything about lean but I am curious that proof of
         | combinatorial problems can be as well represented as number
         | theory or algebra. If combinatorial problem solutions are
         | always closer to natural language, the failure of LLMs are
         | expected. Or, at least we can assume it might take more time to
         | make it better. I am making assumption in here that solutions
         | of combinatorial problems in IMO are more human language
         | oriented and relies on more common sense/informal logic when it
         | compared to geometry or number theory problems.
        
           | Davidzheng wrote:
           | Are you convinced there's a "reason " AI today is worse at
           | combo? Like i don't see enough evidence that it's not an
           | accident.
        
       | robinhouston wrote:
       | Some more context is provided by Tim Gowers on Twitter [1].
       | 
       | Since I think you need an account to read threads now, here's a
       | transcript:
       | 
       | Google DeepMind have produced a program that in a certain sense
       | has achieved a silver-medal peformance at this year's
       | International Mathematical Olympiad.
       | 
       | It did this by solving four of the six problems completely, which
       | got it 28 points out of a possible total of 42. I'm not quite
       | sure, but I think that put it ahead of all but around 60
       | competitors.
       | 
       | However, that statement needs a bit of qualifying.
       | 
       | The main qualification is that the program needed a lot longer
       | than the human competitors -- for some of the problems over 60
       | hours -- and of course much faster processing speed than the poor
       | old human brain.
       | 
       | If the human competitors had been allowed that sort of time per
       | problem they would undoubtedly have scored higher.
       | 
       | Nevertheless, (i) this is well beyond what automatic theorem
       | provers could do before, and (ii) these times are likely to come
       | down as efficiency gains are made.
       | 
       | Another qualification is that the problems were manually
       | translated into the proof assistant Lean, and only then did the
       | program get to work. But the essential mathematics was done by
       | the program: just the autoformalization part was done by humans.
       | 
       | As with AlphaGo, the program learnt to do what it did by teaching
       | itself. But for that it needed a big collection of problems to
       | work on. They achieved that in an interesting way: they took a
       | huge database of IMO-type problems and got a large language model
       | to formalize them.
       | 
       | However, LLMs are not able to autoformalize reliably, so they got
       | them to autoformalize each problem many times. Some of the
       | formalizations were correct, but even the incorrect ones were
       | useful as training data, as often they were easier problems.
       | 
       | It's not clear what the implications of this are for mathematical
       | research. Since the method used was very general, there would
       | seem to be no obvious obstacle to adapting it to other
       | mathematical domains, apart perhaps from insufficient data.
       | 
       | So we might be close to having a program that would enable
       | mathematicians to get answers to a wide range of questions,
       | provided those questions weren't _too_ difficult -- the kind of
       | thing one can do in a couple of hours.
       | 
       | That would be massively useful as a research tool, even if it
       | wasn't itself capable of solving open problems.
       | 
       | Are we close to the point where mathematicians are redundant?
       | It's hard to say. I would guess that we're still a breakthrough
       | or two short of that.
       | 
       | It will be interesting to see how the time the program takes
       | scales as the difficulty of the problems it solves increases. If
       | it scales with a similar ratio to that of a human mathematician,
       | then we might have to get worried.
       | 
       | But if the function human time taken --> computer time taken
       | grows a lot faster than linearly, then more AI work will be
       | needed.
       | 
       | The fact that the program takes as long as it does suggests that
       | it hasn't "solved mathematics".
       | 
       | However, what it does is way beyond what a pure brute-force
       | search would be capable of, so there is clearly something
       | interesting going on when it operates. We'll all have to watch
       | this space.
       | 
       | 1. https://x.com/wtgowers/status/1816509803407040909?s=46
        
         | visarga wrote:
         | > If the human competitors had been allowed that sort of time
         | per problem they would undoubtedly have scored higher.
         | 
         | Or if AlphaProof used more compute they could have slashed that
         | time to 1/10 or less. It's arbitrary as long as we don't define
         | what is the compute the AI should be entitled to use here.
        
       | skywhopper wrote:
       | Except it didn't. The problem statements were hand-encoded into a
       | formal language by human experts, and even then only one problem
       | was actually solved within the time limit. So, claiming the work
       | was "silver medal" quality is outright fraudulent.
        
         | noud wrote:
         | I had exactly the same feeling when reading this blog. Sure,
         | the techniques used to find the solutions are really
         | interesting. But the claim more than they achieve. The problem
         | statements are not available in Lean, and the time limit is 2 x
         | 4.5 hours. Not 3 days.
         | 
         | The article claims they have another model that can work
         | without formal languages, and that it looks very promising. But
         | they don't mention how well that model performed. Would that
         | model also perform at silver medal level?
         | 
         | Also note, that if the problems are provided in a formal
         | language, you can always find the solution in finite amount of
         | time (provided the solution exists). You can brute-force over
         | all possible solutions until you find the solution that proofs
         | the statement. This may take a very long time, but it will find
         | the solutions eventually. You will always solve all the
         | problems and win the IMO at gold medal level. Alphaproof seems
         | to do something similar, but takes smarter decisions which
         | possible solutions to try and which once to skip. What would be
         | the reason they don't achieve gold?
        
       | refulgentis wrote:
       | Goalposts at the moon, FUD at "but what if its obviously fake?".
       | 
       | Real, exact, quotes from the top comments at 1 PM EST.
       | 
       | "I want to believe that the computer found it, but I can't find
       | anything to confirm."
       | 
       | "Curing cancer will require new ideas"
       | 
       | "Maybe they used 10% of all of GCP [Google compute]"
        
       | brap wrote:
       | Are all of these specialized models available for use? Like, does
       | it have an API?
       | 
       | I wonder because on one hand they seem very impressive and
       | groundbreaking, on the other it's hard to imagine why more than a
       | handful of researchers would use them
        
         | creata wrote:
         | > it's hard to imagine why more than a handful of researchers
         | would use them
         | 
         | If you could automatically prove that your concurrency protocol
         | is safe, or that your C program has no memory management
         | mistakes, or that your algorithm always produces the same
         | results as a simpler, more obviously correct but less optimized
         | algorithm, I think that would be a huge benefit for many
         | programmers.
        
       | gallerdude wrote:
       | Sometimes I wonder if in 100 years, it's going to be surprising
       | to people that computers had a use before AI...
        
         | onemoresoop wrote:
         | If AI stays in the computer form though..
        
         | necovek wrote:
         | AI is simply another form of what we've been doing since the
         | dawn of computers: expressing real world problems in the form
         | of computations.
         | 
         | While there are certainly some huge jumps in compute power,
         | theory of data transformation and availability of data to
         | transform, it would surprise me if computers in a 100 years do
         | not still rely on a combination of well-defined and well-
         | understood algorithms and AI-inspired tools that do the same
         | thing but on a much bigger scale.
         | 
         | If not for any other reason, then because there are so many
         | things where you can easily produce a great, always correct
         | result simply by doing very precise, obvious and simple
         | computation.
         | 
         | We've had computers and digital devices for a long while now,
         | yet we still rely heavily on mechanical contraptions. Sure, we
         | improve them with computers (eg. think brushless motors), but I
         | don't think anyone would be surprised today about how did
         | anyone design these same devices (hair dryers, lawn mowers,
         | internal combustion engines...) before computers?
        
       | Jun8 wrote:
       | Tangentially: I found it fascinating to follow along the solution
       | to Problem 6: https://youtu.be/7h3gJfWnDoc (aquaesulian is a node
       | to ancient name of Bath). There's no advanced math and each step
       | is quite simple, I'd guess on a medium 8th grader level.
       | 
       | Note that the 6th question is generally the hardest ("final
       | boss") and many top performers couldn't solve it.
       | 
       | I don't know what Lean is or how see AI's proofs but an AI system
       | that can explain such a question on par with the YouTuber above
       | would be fantastic!
        
       | pmcf wrote:
       | I read this as "In My Opinion" and really thought this about AI
       | dealing with opinionated people. Nope. HN is still safe. For
       | now...
        
       | lo_fye wrote:
       | Remember when people thought computers would never be able to
       | beat a human Grand Master at chess? Ohhh, pre-2000 life, how I
       | miss thee.
        
       | AyyEye wrote:
       | Parlor tricks. Wake me up AI can reliably identify which number
       | is circled at the level of my two year old.
        
       | michael_nielsen wrote:
       | A good brief overview here from Tim Gowers (a Fields Medallist,
       | who participated in the effort), explaining and contextualizing
       | some of the main caveats:
       | https://x.com/wtgowers/status/1816509803407040909
        
       | HL33tibCe7 wrote:
       | This is kind of an ideal use-case for AI, because we can say with
       | absolute certainty whether their solution is correct, completely
       | eliminating the problem of hallucination.
        
       | cs702 wrote:
       | In 1997, machines defeated a World Chess Champion for the first
       | time, using brute-force "dumb search." Critics noted that while
       | "dumb search" worked for chess, it might not necessarily be a
       | general strategy applicable to other cognitive tasks.[a]
       | 
       | In 2016, machines defeated a World Go Champion for the first
       | time, using a clever form of "dumb search" that leverages
       | compute, DNNs, reinforcement learning (RL), and self-play.
       | Critics noted that while this fancy form of "dumb search" worked
       | for Go, it might not necessarily be a general strategy applicable
       | to other cognitive tasks.[a]
       | 
       | In 2024, machines solved insanely hard math problems at the
       | Silver Medal level in an International Math Olympiad for the
       | first time, using a clever form of "dumb search" that leverages
       | compute, DNNs, RL, and a formal language. Perhaps "dumb search"
       | over cleverly pruned spaces isn't as dumb as the critics would
       | like it to be?
       | 
       | ---
       | 
       | [a] http://www.incompleteideas.net/IncIdeas/BitterLesson.html
        
         | klyrs wrote:
         | A brute force search has perfect knowledge. Calling it "dumb"
         | encourages bad analogies -- it's "dumb" because it doesn't
         | require advanced reasoning. It's also "genius" because it
         | always gets the right answer eventually. It's hugely expensive
         | to run.
         | 
         | And you keep shifting the goalpost on what's called "dumb"
         | here.
        
           | adrianN wrote:
           | You might have missed the point.
        
             | bdjsiqoocwk wrote:
             | Tell us what the point was, I think a lot of us are missing
             | it.
        
               | BiteCode_dev wrote:
               | Moving the goal post of dumb search.
        
           | jmalicki wrote:
           | None of the above are brute force search.
        
         | TheDudeMan wrote:
         | So all forms of search are dumb in your view?
        
           | outlore wrote:
           | I think OP is arguing the opposite. In other words, critics
           | of AI innovations call them "dumb search". The goal post
           | keeps moving.
        
             | cs702 wrote:
             | _Exactly_. They are not dumb search!
        
         | sobellian wrote:
         | From whence are you quoting the phrase "dumb search?"
        
         | fspeech wrote:
         | By "dumb" I assume you meant brute-force. Search, as opposed to
         | extrapolation, is what actually produces suprise or creative
         | results, whether it happens in a person's head or on a
         | computer. The issue is to produce the heuristics that can let
         | one push back the combinatorial explosion.
        
         | tux3 wrote:
         | The success in Go was very much not dumb search. Where dumb
         | search had failed to achieve the level of even a very weak
         | player, the neural net's "intuition" about good moves without
         | any search was already very strong. Only the combination was
         | superhuman, and that was anything but the dumb search that had
         | been tried before.
         | 
         | Today's announcement is also not about proving Lean theorems by
         | "dumb search". The success is about search + neural networks.
         | 
         | You're attacking critics for criticizing the solution that has
         | failed, while confusing it for the solution that works to this
         | day.
        
           | cs702 wrote:
           | That's in fact precisely my point:
           | 
           | Clever forms of "dumb search" that leverage compute, DNNs,
           | RL, self-play, and/or formal languages are _not_ dumb at all.
           | 
           | I put the words "dumb search" in quotes precisely because I
           | think critics who dismiss AI progress as such are missing the
           | point.
           | 
           | We're not in disagreement :-)
        
             | tux3 wrote:
             | >Clever forms of "dumb search" that leverage compute, DNNs,
             | RL, self-play, and formal languages are not dumb at all
             | 
             | Right. My point is that you're attacking a position that no
             | real critic holds. Of course we're in agreement then!
             | 
             | "Clever forms of dumb search are not dumb" feels a little
             | like kicking down open doors. We were always going to
             | agree.
        
               | cs702 wrote:
               | _> My point is that you 're attacking a position that no
               | real critic holds_
               | 
               | I'm not so sure. I wrote my comment after seeing quite a
               | few comments on other threads here that read like fancy
               | versions of "this is just brute-force search over
               | cleverly pruned trees." Search the other threads here,
               | and you'll see what I mean.
        
         | Davidzheng wrote:
         | By its nature hard math problems do not have fast algorithmic
         | solutions--you can only solve them by search or clever search.
         | Mathematicians have heuristics on helping us decide intuitively
         | what's going to work what can make progress. But in the end--
         | there is only search
        
         | netcan wrote:
         | Well written.
         | 
         | It kinda had to be this way, I think. There's a something from
         | nothing problem. Douglas Adams brilliantly starts at this
         | point.
         | 
         | We don't understand something from nothing. We don't even have
         | the language to describe it. Concepts like "complexity" are
         | frustratingly resistant to formalization.
         | 
         | "There is no free will." Has recently resurged as a
         | philosophical position... like it did in response to Newton's
         | mechanics.
         | 
         | Matter from void. Life from minerals. Consciousness from
         | electrons. Free will from deterministic components. Smart
         | reasoning & intelligent rationalisation from dumb search,
         | computation, DNNs and such.
         | 
         | I don't think this debate was supposed to be ended by anything
         | short of empirical demonstration.
         | 
         | Endnote: deep blue's victory over Gary was a bunch of
         | preprogrammed bulls--t. Rematch!
        
         | creer wrote:
         | If one of the lessons of LLMs was that much of "common sense"
         | is covered in the written language corpus - that is, perhaps,
         | basic human intelligence is covered by language.
         | 
         | Then, should we expect less with mathematics where written
         | language is the normal way knowledge is propagated, and where
         | formal proofs are wanted? An important distinction here is the
         | coupling of search (not LLM for this one), a formal math
         | language, and theorem proving. Math intelligence may not be
         | merely the math written corpus, but adding the formal language
         | and theorem proving sounds pretty powerful.
         | 
         | All this still lacks self-directed goals. An intention. For now
         | that's taken care of by the human asking questions.
        
           | imtringued wrote:
           | >that is, perhaps, basic human intelligence is covered by
           | language.
           | 
           | It isn't. That's why LLMs are still terrible at basic
           | reasoning tasks. The dataset doesn't contain the human
           | intelligence, just the end results. Nobody is training their
           | LLMs on brain scans.
        
         | mikeknoop wrote:
         | High efficiency "search" is necessary to reach AGI. For
         | example, humans don't search millions of potentially answers to
         | beat ARC Prize puzzles. Instead, humans use our core experience
         | to shrink the search space "intuitively" and deterministically
         | check only a handful of ideas. I think deep-learning guided
         | search is an incredibly promising research direction.
        
       | dmitrygr wrote:
       | > First, the problems were manually translated into formal
       | mathematical language
       | 
       | That is more than half the work of solving them. Headline should
       | read "AI solves the simple part of each IMA problem at silver
       | medal level"
        
       | jerb wrote:
       | Is the score of 28 comparable to the score of 29 here?
       | https://www.kaggle.com/competitions/ai-mathematical-olympiad...
        
         | Davidzheng wrote:
         | No. I would say it is more impressive than 50/50 there.
         | (Source: I used to do math comps back in the day sorry it's not
         | a great source)
        
         | gus_massa wrote:
         | IIUC the American Math Olympiad has 3 rounds. Wining the last
         | one is almost a guaranty gold medal.
         | 
         | The link you posted has problems with a dificulty between the
         | first and second round that are much easier.
         | 
         | I took a quik look at the recent list of problems in the first
         | and second round. I expect this new AI to get a solid 50/50
         | points in this test.
        
       | machiaweliczny wrote:
       | Good, now use DiffusER on algebra somehow please
        
       | gowld wrote:
       | Why is it so hard to make an AI that can translate an informally
       | specified math problem (and Geometry isn't even so informal) into
       | a formal representation?
        
       | nybsjytm wrote:
       | To what extent is the training and structure of AlphaProof
       | tailored specifically to IMO-type problems, which typically have
       | short solutions using combinations of a small handful of specific
       | techniques?
       | 
       | (It's not my main point, but it's always worth remembering - even
       | aside from any AI context - that many top mathematicians can't do
       | IMO-type problems, and many top IMO medalists turn out to be
       | unable to solve actual problems in research mathematics. IMO
       | problems are generally regarded as somewhat niche.)
        
         | Davidzheng wrote:
         | The last statement is largely correct (though idk what the imo
         | medalists that are unable to solve actual problems most
         | mathematicians can't solve most open problems). But i kind of
         | disagree with the assessment of imo problems--the search space
         | is huge if it were as you say it would be easy to search.
        
           | nybsjytm wrote:
           | No, I don't mean that the search space is small. I just mean
           | that there are special techniques which are highly relevant
           | for IMO-type problems. It'd be interesting to know how
           | important that knowledge was for the design and training of
           | AlphaProof.
           | 
           | In other words, how does AlphaProof fare on mathematical
           | problems which aren't in the IMO style? (As such exceptions
           | comprise most mathematical problems)
        
             | Davidzheng wrote:
             | Probably less well? They rely heavily on the dataset of
             | existing problems
        
       | gowld wrote:
       | This shows a major gap in AI.
       | 
       | The proofs of these problems aren't interesting. They were
       | already known before the AI started work.
       | 
       | What's interesting is _how_ the AI found the proof. The only
       | answer we have is  "slurped data into a neural network, matched
       | patterns, and did some brute search".
       | 
       | What were the ideas it brainstormed? What were the dead-end
       | paths? What were the "activations" where the problem seemed
       | similar to a certain piece of input, which led to a guess of a
       | step in the solution?
        
       | atum47 wrote:
       | Oh, the title was changed to international math Olympiad. I was
       | reading IMO as in my opinion, haha
        
       | thrance wrote:
       | Theorem proving is a single-player game with an insanely big
       | search space, I always thouht it would be solved long before AGI.
       | 
       | IMHO, the largest contributors to AlphaProof were the people
       | behind Lean and Mathlib, who took the daunting task of
       | formalizing the _entirety_ of mathematics to themselves.
       | 
       | This lack of formalizing in math papers was what killed any
       | attempt at automation, because AI researcher had to wrestle with
       | the human element of figuring out the author's own notations,
       | implicit knowledge, skipped proof steps...
        
         | camjw wrote:
         | > Theorem proving is a single-player game with an insanely big
         | search space, I always thouht it would be solved long before
         | AGI.
         | 
         | This seems so weird to me - AGI is undefined as a term imo but
         | why would you expect "producing something generally
         | intelligent" (i.e. median human level intelligence) to be
         | significantly harder than "this thing is better than Terrence
         | Tao at maths"?
        
           | thrance wrote:
           | My intuition tells me we humans are generally very bad at
           | math. Proving a theorem, in an ideal way, mostly involves
           | going from point A to point B in the space of all proofs,
           | using previous results as stepping stones. This isn't
           | particularly a "hard" problem for computers which are able to
           | navigate search spaces for various games much more
           | efficiently than us (chess, go...).
           | 
           | On the other hand, navigating the real world mostly consists
           | in employing a ton of heuristics we are still kind of
           | clueless about.
           | 
           | At the end of the day, we won't know before we get there, but
           | I think my reasons are compelling enough to think what I
           | think.
        
             | tim-kt wrote:
             | I don't think that computers have an advantage because they
             | can navigate search spaces efficiently. The search space
             | for difficult theorems is gigantic. Proving them often
             | relies on a combination of experience with rigorous
             | mathematics and very good intuition [1] as well as many,
             | many steps. One example is the classification of all finite
             | simple groups [2], which took about 200 years and a lot of
             | small steps. I guess maybe brute forcing for 200 years with
             | the technology available today might work. But I'm
             | sceptical and sort of hope that I won't be out of a job in
             | 10 years. I'm certainly curious about the current
             | development.
             | 
             | [1] https://terrytao.wordpress.com/career-advice/theres-
             | more-to-...
             | 
             | [2] https://en.m.wikipedia.org/wiki/Classification_of_finit
             | e_sim...
        
               | thrance wrote:
               | Oh for sure, when I say "soon" it's only relative to AGI.
               | 
               | What I meant to convey, is that theorem proving at least
               | is a well-defined problem, and computers have had some
               | successes in similar-ish search problems before.
               | 
               | Also I don't think pure brute-force was ever used to
               | solve any kind of interesting problem.
               | 
               | Chess engines make use of alpha-beta pruning plus some
               | empirical heuristics people came up with over the years.
               | Go engines use Monte-Carlo Tree Search with straight deep
               | learning models node evaluation. Theorem proving, when it
               | is solved, will certainly use some kind of neural
               | network.
        
             | Davidzheng wrote:
             | I also think humans are bad at math. And that we are
             | probably better at IRL but maybe IRL has more data anyway
        
           | raincole wrote:
           | Because "Generally Intelligent" is a very broad and vague
           | term.
           | 
           | "Better than Terrence Tao at solving certain formalized
           | problems" (not necessarily equal to "Better that Terrence Tao
           | at maths) isn't.
        
             | camjw wrote:
             | Seems fair - I strongly think that "certain formalized
             | problems" is very very far away from doing actual maths!
             | And that actual maths is sometimes much less about solving
             | known conjectures and much more about developing new
             | theories.
        
         | Davidzheng wrote:
         | They didn't formalize the entirety of math. Good thing imo
         | doesn't need the entirety. But they didn't even formalize
         | enough for imo--this is probably why combo wasn't solved
        
       | cynicalpeace wrote:
       | Machines have been better than humans at chess for decades.
       | 
       | Yet no one cares. Everyone's busy watching Magnus Carlsen.
       | 
       | We are human. This means we care about what other humans do. We
       | only care about machines insofar as it serves us.
       | 
       | This principle is broadly extensible to work and art. Humans will
       | always have a place in these realms as long as humans are around.
        
         | camjw wrote:
         | Sure but if an AI can prove e.g the Goldbach conjecture then
         | that is a bfd.
        
           | hyperbovine wrote:
           | What if the proof were incomprehensible to humans?
        
             | camjw wrote:
             | I think that is unlikely to be the case - the classic
             | example of a proof that human's "can't understand" is the
             | Four Colour Theorem, but thats because the proof is a
             | reduction to like 100000 special cases which are checked by
             | computer.
             | 
             | To what extent is the proof of Fermat's Last Theorem
             | "incomprehensible to humans" because only like a dozen
             | people on the planet could truly understand it - I don't
             | know.
             | 
             | The point of new proofs is really to learn new things about
             | mathematics, and I'm sure we would learn something from a
             | proof of Goldbach's conjecture.
             | 
             | Finally if it's not peer reviewed then its not a real proof
             | eh.
        
             | steego wrote:
             | It shouldn't count. We need to require it be able to ELI5
             | the proof to Goldbach's conjecture to an actual class of
             | _graduating_ kindergartners.
        
             | fanatic2pope wrote:
             | If it cannot explain how it was proven, was it actually
             | proven?
        
               | cynicalpeace wrote:
               | No. Funny how these discussions too often devolve into
               | semantics lol.
        
           | cynicalpeace wrote:
           | How is that a counterpoint?
        
         | awahab92 wrote:
         | magnus carlsen basically quit because computers ruined chess.
         | As did kasparov.
         | 
         | Fischer was probably the last great player who was unassisted
         | by tools.
        
           | hyperbovine wrote:
           | ?? Carlsen is very much active -- look up the YouTube channel
           | EpicChess for an extremely entertaining recap of what he's up
           | to recently.
        
           | karmakurtisaani wrote:
           | Carlsen plays still at the highest level. He just didn't want
           | to do the world championship anymore, wasn't worth the effort
           | after winning it so many times.
        
         | christianqchung wrote:
         | > This principle is broadly extensible to work and art
         | 
         | Nah, as a consumer it makes no difference to me if a meat
         | packing factory or Amazon warehouse employs 5000 or 5 people.
         | To art, this principle is totally real, but for work, it only
         | applies to some/most of it.
        
           | cynicalpeace wrote:
           | Read the next sentence: "We only care about machines insofar
           | as it serves us."
           | 
           | Imagine a machine doing "work" that only serves itself and
           | other machines that does no service to humanity. It would
           | have no economic value. In fact the whole concept of "work"
           | only makes sense if it is assigned economic value by humans.
        
             | christianqchung wrote:
             | Then we agree, but your chess example made it sound like if
             | a machine could automatically pack meat with little human
             | intervention, people wouldn't want it. That's also not the
             | next sentence. How is it broadly applicable to work?
        
         | smokel wrote:
         | _> Everyone 's busy watching Magnus Carlsen._
         | 
         | Actually, I was looking up Elo ratings of the top _computer_
         | chess players, and learned that it is not that trivial to
         | compare these, due to differences in hardware requirements and
         | whatnot.
        
           | cynicalpeace wrote:
           | Are you arguing computer chess players are as popular as
           | human chess players?
        
         | zone411 wrote:
         | I don't think this principle extends to math proofs. It's much,
         | much easier to verify a proof than to create it, and a second
         | proof will just be a footnote. Not many mathematicians will
         | want to work on that. That said, there is a lot of distance
         | between IMO and the frontiers of research math.
        
           | cynicalpeace wrote:
           | Why does it not extend to math proofs?
        
         | ertgbnm wrote:
         | I'm sure humans will always enjoy chess and art regardless of
         | how much better AI is at it. In the same way, there will
         | probably always be mathematic hobbyist who study math for fun.
         | But I seriously doubt that in the near future there will be
         | mathematicians who will be publishing new advancements that
         | aren't mostly or entirely discovered by AI. A human might get
         | credit for a proof for asking the initial question, but there
         | is pretty much no world where a computer can easily solve a
         | meaningful mathematical problem but we insist on a human solve
         | it more slowly and expensively instead.
        
           | cynicalpeace wrote:
           | My point was where does the concept of "meaningful" come
           | from?
           | 
           | The proof will only have value if it's meaningful to us.
        
       | dan_mctree wrote:
       | I'm curious if we'll see a world where computers could solve math
       | problems so easily, that we'll be overwhelmed by all the results
       | and stop caring. The role of humans might change to asking the
       | computer interesting questions that we care about.
        
         | Davidzheng wrote:
         | I think mathematicians will still care
        
         | klysm wrote:
         | I'm not sure what stop caring really means - like stop caring
         | about the result, or the implications?
        
       | xyst wrote:
       | Billions of dollars spent building this, gW of energy used to
       | train it. And the best it could do is "silver"?
       | 
       | Got to be kidding me. We are fucked
        
       | NoblePublius wrote:
       | A lot of words to say second place
        
       | stonethrowaway wrote:
       | It's like bringing a rocket launcher to a fist fight but I'd like
       | to use these math language models to find gaps in logic when
       | people are making online arguments. It would be an excellent way
       | to verify who has done their homework.
        
       | zhiQ wrote:
       | Coincidentally, I just posted about how well LLMs handle adding
       | long strings of numbers:
       | https://userfriendly.substack.com/p/discover-how-mistral-lar...
        
       | hulitu wrote:
       | > AI solves International Math Olympiad problems at silver medal
       | level
       | 
       | > In the official competition, students submit answers in two
       | sessions of 4.5 hours each. Our systems solved one problem within
       | minutes and took up to three days to solve the others.
       | 
       | Why not compare with students who are given 3 days to submit an
       | answer ? /s
        
       | lumb63 wrote:
       | Can someone explain why proving and math problem solving is not a
       | far easier problem for computers? Why does it require any
       | "artificial intelligence" at all?
       | 
       | For example, suppose a computer is asked to prove the sum of two
       | even numbers is an even number. It could pull up its list of
       | "things it knows about even numbers", namely that an even number
       | modulo 2 is 0. Assuming the first number is "a" and the second is
       | "b", then it knows a=2x and b=2y for some x and y. It then knows
       | via the distributive property that the sum is 2(x+y), which
       | satisfies the definition of an even number.
       | 
       | What am I missing that makes this problem so much harder than
       | applying a finite and known set of axioms and manipulations?
        
         | zone411 wrote:
         | The problems in question require much, much more complex
         | proofs. Try example IMO problems yourself and see if they don't
         | require much intelligence: https://artofproblemsolving.com/wiki
         | /index.php/IMO_Problems_.... And then keep in mind that
         | research math is orders of magnitude more complex still.
        
         | runeblaze wrote:
         | Another answer is that 3SAT and co can be seen as distilled
         | variants of proving statements. Well, 3SAT is famously hard.
        
         | psb217 wrote:
         | In a sense, the model _is_ simply applying a finite and known
         | set of axioms and manipulations. What makes this hard in
         | practice is that the number of possible ways in which to
         | perform multiple steps of this sort of axiomatic reasoning
         | grows exponentially with the length of the shortest possible
         | solution for a given problem. This is similar to the way in
         | which the tree of possible futures in games like go/chess grows
         | exponentially as one tries to plan further into the future.
         | 
         | This makes it natural address these problems using similar
         | techniques, which is what this research team did. The "magic"
         | in their solution is the use of neural nets to make good
         | guesses about which branches of these massive search trees to
         | explore, and make good guesses about how good any particular
         | branch is even before they reach the end of the branch. These
         | tricks let them (massively) reduce the effective branching
         | factor and depth of the search trees required to produce
         | solutions to math problems or win board games.
        
         | ComplexSystems wrote:
         | What you're missing is that this kind of thing has arbitrarily
         | been declared "artificial intelligence" territory. Once the
         | ability of computers to do has been established, it will no
         | longer be artificial intelligence territory; at that point
         | it'll just be another algorithm.
        
         | booleandilemma wrote:
         | Proofs require a certain ingenuity that computers just don't
         | have, imo. A computer would never be able to come up with
         | something like Cantor's diagonalization proof on its own.
        
           | Davidzheng wrote:
           | Are you sure alphaproof can't
        
       | zone411 wrote:
       | The best discussion is here:
       | https://leanprover.zulipchat.com/#narrow/stream/219941-Machi...
        
       | data_maan wrote:
       | It's bullshit. AlphaGeometry can't even solve Pythagoras theorem.
       | 
       | Not opensourcing anything.
       | 
       | This is a dead end on which no further research can be built.
       | 
       | It violates pretty much every principle of incremental
       | improvement on which science is based. It's here just for hype,
       | and the 300+ comments prove it.
        
       | necovek wrote:
       | This is certainly impressive, but whenever IMO is brought up, a
       | caveat should be put out: medals are awarded to 50% of the
       | participants (high school students), with 1:2:3 ratio between
       | gold, silver and bronze. That puts all gold and silver medalists
       | among the top 25% of the participants.
       | 
       | That means that "AI solves IMO problems better than 75% of the
       | students", which is probably even more impressive.
       | 
       | But, "minutes for one problem and up to 3 days for each remaining
       | problem" means that this is unfortunately not a true
       | representation either. If these students were given up to 15 days
       | (5 problems at "up to 3 days each") instead of 9h, there would
       | probably be more of them that match or beat this score too.
       | 
       | It really sounds like AI solved only a single problem in the 9h
       | students get, so it certainly would not be even close to the
       | medals. What's the need to taint the impressive result with
       | apples-to-oranges comparison?
       | 
       | Why not be more objective and report that it took longer but was
       | able to solve X% of problems (or scored X out of N points)?
        
         | muglug wrote:
         | > What's the need to taint the impressive result with apples-
         | to-oranges comparison?
         | 
         | Most of DeepMind's research is a cost-centre for the company.
         | These press releases help justify the continued investment both
         | to investors and to the wider public.
        
         | Davidzheng wrote:
         | In my opinion (not Google s) the only reason they didn't get
         | gold this year (apart from being unlucky on problem selection)
         | is that they didn't want to try for any partial credit in P3
         | and P5. They are so close to the cut off and usually
         | contestants with a little bit of progress can get 1 point. But
         | i guess they didn't want to get a gold on a technicality--it
         | would be bad press. So they settled in a indisputable silver
        
         | tardygrade wrote:
         | One key difference between giving humans more time and giving
         | computer programs more time is that historically we have had
         | more success making the latter run faster than the former.
        
         | acchow wrote:
         | But computers get faster each year, so even with zero progress
         | in actual AI, this will reach human-student speeds in a few
         | years (need a 40x speed up)
        
       | dinobones wrote:
       | I see DeepMind is still playing around with RL + search
       | algorithms, except now it looks like they're using an LLM to
       | generate state candidates.
       | 
       | I don't really find that this impressive. With enough compute you
       | could just do n-of-10,000 LLM generations to "brute force" a
       | difficult problem and you'll get there eventually.
        
         | richard___ wrote:
         | Sigh. Just wrong
        
       ___________________________________________________________________
       (page generated 2024-07-25 23:00 UTC)