[HN Gopher] AI solves International Math Olympiad problems at si...
___________________________________________________________________
AI solves International Math Olympiad problems at silver medal
level
Author : ocfnash
Score : 695 points
Date : 2024-07-25 15:29 UTC (7 hours ago)
(HTM) web link (deepmind.google)
(TXT) w3m dump (deepmind.google)
| adverbly wrote:
| > First, the problems were manually translated into formal
| mathematical language for our systems to understand. In the
| official competition, students submit answers in two sessions of
| 4.5 hours each. Our systems solved one problem within minutes and
| took up to three days to solve the others.
|
| Three days is interesting... Not technically silver medal
| performance I guess, but let's be real I'd be okay waiting a
| month for the cure to cancer.
| ZenMikey wrote:
| I haven't read TFA as I'm at work, but I would be very
| interested to know what the system was doing in those three
| days. Were there failed branches it explored? Was it just
| fumbling its way around until it guessed correctly? What did
| the feedback loop look like?
| qsort wrote:
| I can't find a link to an actual paper, that just seems to be
| a blog post. But from what I gather the problems were
| manually translated to Lean 4, and then the program is doing
| some kind of tree search. I'm assuming they are leveraging
| the proof checker to provide feedback to the model.
| tsoj wrote:
| This is NOT the paper, but probably a very similar solution:
| https://arxiv.org/abs/2009.03393
| visarga wrote:
| > just fumbling its way around until it guessed correctly
|
| As opposed to 0.999999% of the human population who can't do
| it even if their life depends on it?
| dsign wrote:
| I was going to come here to say that. I remember being a
| teenager and giving up in frustration at IMO problems. And
| I was competing at IPhO.
| logicchains wrote:
| I guess you mean 99.9999%?
| lacker wrote:
| _The training loop was also applied during the contest,
| reinforcing proofs of self-generated variations of the
| contest problems until a full solution could be found._
|
| So they had three days to keep training the model, on
| synthetic variations of each IMO problem.
| thomasahle wrote:
| They just write "it's like alpha zero". So presumably they
| used a version of MCTS where each terminal node is scored by
| LEAN as either correct or incorrect.
|
| Then they can train a network to evaluate intermediate
| positions (score network) and one to suggest things to try
| next (policy network).
| ComplexSystems wrote:
| Or the simultaneous discovery of thousands of cryptographic
| exploits...
| poincaredisk wrote:
| Still waiting for the first one. I'm not holding my breath -
| just like fuzzing found a lot of vulnerabilities in low-level
| software, I expect novel automated analysis approaches will
| yield some vulnerabilities - but that won't be a catastrophic
| event just like fuzzing wasn't.
| sqeaky wrote:
| I hope it doesn't find a new class of bug. Find another
| thing like Spectre could be problematic.
|
| EDIT - I hope if that new class of bug exists that it is
| found. I hope that new class of bug doesn't exist.
| throwaway240403 wrote:
| Hope that's true. Really mucks up the world a bit if not.
| criddell wrote:
| It's rumored that the NSA has 600 mathematicians working
| for them. If they are the ones finding the exploits you
| will probably never hear about them until they are
| independently discovered by someone who can publish.
| ComplexSystems wrote:
| Why don't you think that AI models will, perhaps rather
| soon, surpass human capabilities in finding security
| vulnerabilities? Because an AI that's even equally
| competent would be a fairly catastrophic event.
| nnarek wrote:
| "three days" does not say anything about how much computational
| power is used to solve problems, maybe they have used 10% of
| all GCP :)
| vlovich123 wrote:
| The thing is though, once we have a benchmark that we pass,
| it's pretty typical to be able to bring down time required in
| short order through performance improvements and iterating on
| ideas. So if you knew you had GAI but it took 100% of all GCP
| for 3 years to give a result, within the next 5 years that
| would come down significantly (not least of which you'd build
| HW dedicated to accelerating the slow parts).
| tsimionescu wrote:
| That's patently false for many classes of problems. We know
| exactly how to solve the traveling salesman problem, and
| have for decades, but we're nowhere close to solving a
| random 1000 city case (note: there are approximate methods
| that can find good, but not optimal, results on millions of
| cities). Edit: I should say 1,000,000 city problem, as
| there are some solutions for 30-60k cities from the 2000s.
|
| And there are good reasons to believe that theorem finding
| and proof generation are at least NP-hard problems.
| rowanG077 wrote:
| The person said typical not always the case. Just because
| there are obviously cases where it didn't happen does
| mean it it's still not typically the case.
| vlovich123 wrote:
| We're not talking about mathematical optimality here,
| both from the solution found and for the time taken. The
| point is whether this finds results more cheaply than a
| human can and right now it's better on some problems
| while others it's worse. Clearly if a human can do it,
| there is a way to solve it in a cheaper amount of time
| and it would be flawed reasoning to think that improving
| the amount of time would be asymptotically optimal
| already.
|
| While I agree that not all problems show this kind of
| acceleration in performance, that's typically only true
| if you've already spent so much time trying to solve it
| that you've asymptoted to the optimal solution. Right now
| we're nowhere near the asymptote for AI improvements.
| Additionally, there's so many research dollars flowing
| into AI precisely because the potential upside here is
| nowhere near realized and there's lots of research lines
| still left to be explored. George Hinton ended the AI
| winter.
| visarga wrote:
| > The point is whether this finds results more cheaply
| than a human can
|
| If you need to solve 1000 problems in 3 days you wouldn't
| find the humans that can do it. So it would not be
| cheaper if it's not possible.
| vlovich123 wrote:
| Well if it takes 10% of all of Google's servers 3 days to
| solve, you may find it difficult to scale out to solving
| 1000 problems in 3 days as well.
|
| As for humans, 100 countries send 6 students to solve
| these problems. It also doesn't mean that these problems
| aren't solvable by anyone else. These are just the "best
| 6" where best = can solve and solve most quickly. Given a
| three day budget, 1000 problems could reasonably be
| solvable and you know exactly who to tap to try to solve
| them. Also, while the IMO is difficult and winners tend
| to win other awards like Field Medals, there's many
| professional mathematicians who never even bother because
| that type of competition isn't interesting to them. It's
| not unreasonable to expect that professional
| mathematicians are able to solve these problems as well
| if they wanted to spend 3 days on it.
|
| But in terms of energy per solve, humans are definitely
| cheaper. As you note the harder part is scaling it out
| but so far the AI isn't solving problems that are
| impossible for humans, just that given enough time it
| managed to perform the same task. That's a very promising
| result but supremacy is slightly a ways off for now (this
| AI can't win the competition for now)
| falcor84 wrote:
| And say they did use 10% of all GCP? Would it be less
| impressive? This is a result that was considered by experts
| to be far beyond the state of the art; it's absolutely ok if
| it's not very efficient yet.
|
| Also, for what it's worth, I'm pretty sure that I wouldn't
| have been able to solve it myself in three days, even if I
| had access to all of GCP, Azure and AWS (except if I could
| mine crypto to then pay actual IMO-level mathematicians to
| solve it for me).
| nnarek wrote:
| yes it is very impressive, especially autoformalization of
| problems written in natural language and also proof search
| of theorems
| data_maan wrote:
| Which experts said that?
|
| I don't think that's the case at all. The writing was
| already on the wall.
| wongarsu wrote:
| The problem solved "within minutes" is also interesting. I'd
| interpret that as somewhere between 2 and 59 minutes. Given the
| vagueness probably on the higher end, otherwise they'd
| celebrate it more. The students had 6 tasks in 9 hours, so on
| average 1.5h per task. If you add the time a student would take
| to (correctly!) translate the problems to their input format,
| their best-case runtime is probably about as fast as a silver-
| medalist would take to solve the problem on their own.
|
| But even if they aren't as fast as humans yet this is very
| valuable. Both as a stepping stone, and because at a certain
| scale compute is much easier to scale than skilled
| mathematicians.
| gjm11 wrote:
| They say "our systems" (presumably meaning AlphaProof _and_
| AlphaGeometry 2) solved one problem "within minutes", and
| later on the page they say that the geometry question (#4)
| was solved by AlphaGeometry in 19 seconds.
|
| So either (1) "within minutes" was underselling the abilities
| of the system, or (2) what they actually meant was that the
| geometry problem was solved in 19 seconds, one of the others
| "within minutes" (I'd guess #1 which is definitely easier
| than the other two they solved), and the others in
| unspecified times of which the longer was ~3 days.
|
| I'd guess it's the first of those.
|
| (Euclidean geometry has been a kinda-solved domain for some
| time; it's not super-surprising that they were able to solve
| that problem quickly.)
|
| As for the long solve times, I would guess they're related to
| this fascinating remark:
|
| > The training loop was also applied during the contest,
| reinforcing proofs of self-generated variations of the
| contest problems until a full solution could be found.
| visarga wrote:
| Euclidian Geometry still requires constructions to solve,
| and those are based in intuition.
| gjm11 wrote:
| There are known algorithms that can solve _all_ problems
| in euclidean (ruler-and-compasses) geometry, no intuition
| required. The most effective algorithms of this type are
| quite inefficient, though, and (at least according to
| DeepMind) don't do as well as AlphaGeometry does at e.g.
| IMO geometry problems.
| lolinder wrote:
| It feels pretty disingenuous to claim silver-medal status when
| your machine played by significantly different rules. The
| article is light on details, but it says they wired it up to a
| theorem prover, presumably with feedback sent back to the AI
| model for re-evaluation.
|
| How many cycles of guess-and-check did it take over the course
| of three days to get the right answer?
|
| If the IMO contestants were allowed to use theorem provers and
| were given 3 days (even factoring in sleep) would AlphaProof
| still have gotten silver?
|
| > let's be real I'd be okay waiting a month for the cure to
| cancer.
|
| I don't think these results suggest that we're on the brink of
| knowledge coming at a substantially faster rate than before.
| Humans have been using theorem provers to advance our
| understanding for decades. Now an LLM has been wired up to one
| too, but it still took 8x as long to solve the problems as our
| best humans did _without any computer assistance_.
| ToucanLoucan wrote:
| I am so exhausted of the AI hype nonsense. LLMs are not
| fucking curing cancer. Not now, not in five years, not in a
| hundred years. That's not what they do.
|
| LLM/ML is fascinating tech that has a lot of legitimate
| applications, but _it is not fucking intelligent, artificial
| or otherwise,_ and I am sick to death of people treating it
| like it is.
| apsec112 wrote:
| What observation, if you saw it, do you think would falsify
| that hypothesis?
| necovek wrote:
| It seems unlikely people will employ _only_ ML models,
| especially LLM, to achieve great results: they will
| combine it with human insights (through direction and
| concrete algorithms).
|
| It's obvious that's happening with LLMs even today to
| ensure they don't spew out too much bullshit or harmful
| content. So let's get to a point where we can trust AI
| as-is first, and let's talk about what's needed to
| achieve the next milestone after and _if_ we get there.
|
| And I love asking every new iteration of ChatGPT/Gemini
| something along the lines of "What day was yesterday if
| yesterday was a Thursday?" It just makes me giggle.
| 29athrowaway wrote:
| It is not artificial? so it is natural then?
| jercos wrote:
| A significant part of the problem is that a majority of
| people are unaware of just how simple what they consider
| "intelligence" really is. You don't need actual
| intelligence to replace the public-facing social role of a
| politician, or a talking head, or a reactive-only middle
| manager. You just need words strung together that fit a
| problem.
| golol wrote:
| I believe you are misreading this.
|
| First of all, this is not a sport and the point is not to
| compare AI to humans. The point is to compare AI to IMO-
| difficulty problems.
|
| Secondly, this is now some hacky trick where Brute force and
| some theorem prover magic are massaged to solve a select few
| problems and then you'll never hear about it again. They are
| building a general pipeline which turns informal natural
| lamguage mathematics (of which we have ungodly amounts
| available) into formalized mathematics, and in addition
| trains a model to prove such kinds of mathematics. This can
| also work for theory building. This can become a real
| mathematical assistant that can help a mathematician test an
| argument, play with variations of a definition, try 100
| combinations of some estimates, apply a classic but lengthy
| technique etc. etc.
| lolinder wrote:
| > First of all, this is not a sport and the point is not to
| compare AI to humans. The point is to compare AI to IMO-
| difficulty problems.
|
| If this were the case then the headline would be "AI solves
| 4/6 IMO 2024 problems", it wouldn't be claiming "silver-
| medal standard". Medals are generally awarded by comparison
| to other contestants, not to the challenges overcome.
|
| > This can become a real mathematical assistant that can
| help a mathematician test an argument, play with variations
| of a definition, try 100 combinations of some estimates,
| apply a classic but lengthy technique etc. etc.
|
| This is great, and I'm not complaining about what the team
| is working on, I'm complaining about how it's being sold.
| Headlines like these from lab press releases will feed the
| AI hype in counterproductive ways. The NYT literally has a
| headline right now: "Move Over Mathematicians, Here Comes
| AlphaProof".
| golol wrote:
| At the IMO "silver medal" afaik is define as some tange
| of points, which more or less equals some range of
| problems solved. For me it is fair to say that "silver-
| medal performance" is IMO langauge for about 4/6 problems
| solved. And what's the problem if some clickbait websites
| totally spin the result? They would've done it anyways
| even with a different title, and I also don't see the
| harm. Let people be wrong.
| mathnmusic wrote:
| No, "silver medal" is defined as a range of points to be
| earned in the allotted time (4.5 hours for both papers of
| 3 problems each).
| lolinder wrote:
| And the cutoffs are chosen _after_ the results are in,
| not in advance.
| riku_iki wrote:
| > They are building a general pipeline which turns informal
| natural lamguage mathematics
|
| but this part currently sucks, because they didn't trust it
| and formalized problems manually.
| golol wrote:
| Yea that's fair, but I don't think it will keep sucking
| forever as formalization is in principle just a
| translation process.
| riku_iki wrote:
| and we don't have 100% accuracy in translation in
| ambiguous texts, because system often need some domain
| knowledge, context etc. And math has 0% tolerance to
| mistakes.
|
| I also expect that math formalized by machine will be
| readable by machine and hardly understandable by humans.
| regularfry wrote:
| I'm not sure it matters that it had access to a theorem
| prover. The fact that it's possible to build a black box that
| solves hard proofs on its own _at all_ is the fascinating
| bit.
|
| > it still took 8x as long to solve the problems as our best
| humans did without any computer assistance.
|
| Give it a year and that ratio will be reversed. At least. But
| also it matters less how long it takes if doubling the number
| of things reasoning at a best-human level is pronounced
| "ctrl-c, ctrl-v".
| 10100110 wrote:
| Don't confuse interpolation with extrapolation. Curing cancer
| will require new ideas. IMO requires skill proficiency in tasks
| where the methods of solving are known.
| trueismywork wrote:
| They are the same things
| golol wrote:
| Mathematicians spend most of their time interpolating between
| known ideas and it would be extremely helpful to have
| computer assistance with that.
| visarga wrote:
| Search is extrapolation. Learning is interpolation.
| Search+Learn is the formula used by AZ. Don't forget AZ
| taught us humans a thing or two about a game we had 2000
| years head start in, and starting from scratch not from human
| supervision.
| xdavidliu wrote:
| no, search is not extrapolation. Extrapolation means taking
| some data and projecting out beyond the limits of that
| data. For example, if my bank account had $10 today and $20
| tomorrow, then I can extrapolate and say it might have $30
| the day after tomorrow. Interpolation means taking some
| data and inferring the gaps of that data. For example, if I
| had $10 today and $30 the day after tomorrow, I can
| interpolate and say I probably had $20 tomorrow.
|
| Search is different from either of those things, it's when
| you have a target and a collection of other things, and are
| trying to find the target in that collection.
| visarga wrote:
| Search can go from a random init model to beating humans
| at Go. That is not interpolation.
|
| - Search allows exploration of the game tree, potentially
| finding novel strategies.
|
| - Learning compresses the insights gained from search
| into a more efficient policy.
|
| - This compressed policy then guides future searches more
| effectively.
|
| Evolution is also a form of search, and it is open-ended.
| AlphaProof solved IMO problems, those are chosen to be
| out of distribution, simple imitation can't solve them.
| Scientists do (re)search, they find novel insights nobody
| else discovered before. What I want to say is that search
| is on a whole different level than what neural nets do,
| they can only interpolate their training data, search
| pushes outside of the known data distribution.
|
| It's actually a combo of search+learning that is
| necessary, learning is just the little brother of search,
| it compresses novel insights into the model. You can
| think of training a neural net also as search - the best
| parameters that would fit the training set.
| trotro wrote:
| The methods are know, but the solutions to the IMO problems
| weren't. So the AI did extrapolate a solution.
|
| Also, there's no reason to affirm that an eventual cure for
| cancer requires fundamentally new methods. Maybe the current
| methods are sufficient, it's just that nobody has been
| "smart" enough to put the pieces together. (disclaimer: not
| an expert at all)
| dsign wrote:
| I think you are correct though. We don't need new physics
| to cure cancer. But we may need information-handling,
| reasoning and simulation systems which are orders of
| magnitude bigger and more complex than anything we have
| this year. We also need to stop pussy-footing and diddling
| with ideologies and start working on the root cause of
| cancer and almost every other disease, which is aging.
| markusde wrote:
| Unlike curing cancer, the IMO problems were specifically
| designed to be solvable
| xdavidliu wrote:
| new doesn't necessarily mean "an extremal point that's not
| the average of two existing points". The set of existing
| knowledge is not necessarily continuous; the midpoint between
| two known points may be unknown, and thus would be a "new"
| point that could be obtained by interpolation.
| im3w1l wrote:
| I think this is kinda false actually on the cancer side. We
| have reached a point where we have known approaches that
| work. It's "just" a matter of putting them into practice
| which will of course require solving many little details,
| which is very important and time-consuming work, but it
| doesn't require super-human genius level of lateral thinking,
| just a few millions man years of grinding away at it.
| iamronaldo wrote:
| No benchmarks of any kind?
| arnabgho wrote:
| https://x.com/GoogleDeepMind/status/1816498082860667086
| piombisallow wrote:
| IMO problems aren't fundamentally different from chess or other
| games, in that the answer is already known.
| Smaug123 wrote:
| I really don't understand what you mean by this. 1) it's not
| known whether chess is a win for White or not. 2) IMO problems,
| such as 2024 problem 1 which the system solved, are often
| phrased as "Determine all X such that...".
| diffeomorphism wrote:
| You are attacking a straw man and the point made is pretty
| good.
|
| Competition problems are designed to be actually solvable by
| contestants. In particular, the problems should be solvable
| using a reasonable collection of techniques and many "prep
| courses" will teach you many techniques, tools and algorithms
| and a good starting point is to throw that stuff at any given
| problem. So just like chess _openings_ putting in lots of leg
| work will give you some good results for that part. You might
| very well lose in mid and late game, just like this AI might
| struggle with "actual problems"
|
| It is of course still very impressive, but that is an
| important point.
| Smaug123 wrote:
| I'm attacking nobody! I literally couldn't understand the
| point, so I said so: as stated, its premises are simply
| clearly false!
|
| _Your_ point, however, is certainly a good one: IMO
| problems are an extremely narrow subset of the space of
| mathematical problems, which is itself not necessarily even
| 50% of the space of the _work of a mathematician_.
| wufufufu wrote:
| Kinda? Chess isn't solved. Complex problems can have better
| solutions discovered in the future.
| jeremyjh wrote:
| It isn't solved but the evaluation (which side is better, by
| how much, and which moves are best) of a strong engine is -
| for all practical purposes - an answer to every chess
| position you can pose to it. This makes it easy to gauge
| improvement and benchmark against other systems compared to
| some other problems.
| Davidzheng wrote:
| lol
| osti wrote:
| But the answer is probably not known by you, in particular.
| piombisallow wrote:
| Yes, sure, but this doesn't mean that this generalizes to
| open math research problems, which would be the useful real-
| world application of this. Otherwise this is just playing
| known games with known rules, granted better/faster than
| humans.
| energy123 wrote:
| IMO and Chess are the same in the most important respect, you
| can use Lean or a simulated chess game to create unlimited
| quality training labels. Any problem of this category should be
| solved with enough compute and clever
| architecture/metacognition design. The more intractable
| problems are where data is hard to find or synthesize.
| almostgotcaught wrote:
| > in that the answer is already known.
|
| you realize this holds true for all of math right? outside of
| godel incompleteness potholes every proof/theorem is a
| permutation of ZFC. And you can fix the potholes by just
| filling them in with more Cs.
| diffeomorphism wrote:
| That seems shallow and not really useful. Like sorting is
| easy, just take all permutations and look at each one to see
| whether is sorted.... it just might you take longer than the
| heat death of the universe to actually do that.
| lanstin wrote:
| There are elementary Diophantine equations that are
| independent of ZFC. What is your approach for those?
| petters wrote:
| The problems were first converted into a formal language. So they
| were partly solved by the AI
| jeremyjh wrote:
| Yes and it is difficult for me to believe that there is not
| useful human analysis and understanding involved in this
| translation that the AI is helpless without. But that I suppose
| is a problem that could be tackled with a different model...
| adrianN wrote:
| Even so, having a human formalize the problems and an AI to
| find machine checkable proofs could be very useful for
| mathematicians.
| sebzim4500 wrote:
| It is vastly easier to do the formalization than to actually
| solve the problem. Any undergraduate with some lean
| familiarity could do it in minutes.
| Davidzheng wrote:
| Disagree! Some problems are much harder than others. If you
| don't believe me please go formalize P5 in this year imo.
| rpois wrote:
| Does this formalization process include giving it the answer it
| should try to prove?
| Smaug123 wrote:
| Nope, per Oliver Nash who worked on the thing:
| https://news.ycombinator.com/item?id=41070372
| clbrmbr wrote:
| IIUC, a Gemini-based system could translate the natural
| language questions into Lean, but in the blog post they don't
| really commit to whether this was done just to generate
| training data or was used in the competition.
| golol wrote:
| Formalization is in principle just a translation process and
| should be a much simpler problem than the actual IMO problem.
| Besides, they also trained a Gemini model which formalizes
| natural language problems, and this is how they generated
| training data for AlphaProof. I would therefore expect that
| they could have also formalized the IMO problems with that
| model and just did it manually because the point is not to
| demonstrate formalizing but instead proof capabilities.
| riku_iki wrote:
| > Formalization is in principle just a translation process
| and should be a much simpler problem than the actual IMO
| problem
|
| maybe not, because you need to connect many complicated
| topics/terms/definitions together, and you don't have a way
| to reliably verify if formalized statement is correct.
|
| They built automatic formalization network in this case, but
| didn't trust it and formalized it manually.
| pishpash wrote:
| Yet the facts at hand are the opposite of what you say.
| Reliable formalizer was the more difficult problem than
| solving formalized IMO problems, because they have not
| produced one.
| fngjdflmdflg wrote:
| That does not necessarily follow from the facts at hand.
| For example they may have prioritized work on the proof
| solver itself as they may feel that that is the more
| important result. Alternatively if their goal is to build a
| proof solver then building the formalizer would be useless
| if they could not build the actual proof solver.
| pishpash wrote:
| A proof solver existed. They were improving the proof
| solver explicitly by making the formalizer a part of the
| training. Formalizer reliability is the key novelty. It
| turns out it was only reliable enough for training. So
| unless they made the problem statement at the outset that
| "we'll only make the formalizer strong enough to train
| but not use", I disagree with that assessment.
| fngjdflmdflg wrote:
| That's a good point. The formalizer was used to created
| the training data for the proof solver, so they likely
| worked on it more than if they just used it as a
| preprocessing step during inference. It is still possible
| that they worked on the formalizer until they got good
| results from it enough to create good training data, and
| then began training as soon as possible and did not spend
| too much time trying to improve the formalizer. Depending
| on how long the training was expected to take, perhaps
| that is a reasonable assumption. Although I think I agree
| more with your view now.
| trotro wrote:
| But formalization is the easy part for humans. I'm sure every
| mathematician would be be happy if the only thing required to
| prove a result was to formalize it in Lean and feed it to the
| AI to find the proof.
| Davidzheng wrote:
| Not sure every mathematician would be happy to do this... it
| sounds much less pleasant than thinking. It's like saying
| mathematicians would rather be programmers lol. It's a
| significant difficult problem which i believe should be left
| completely to AI. Human formalization should become dead
| c0l0 wrote:
| That's great, but does that particular model also know
| if/when/that it _does not_ know?
| foota wrote:
| Never?
|
| Edit: To defend my response, the model definitely knows when it
| hasn't yet found a correct response, but this is categorically
| different from knowing that it does not know (and of course
| monkeys and typewriters etc., can always find a proof
| eventually if one exists).
| ibash wrote:
| Yes
|
| > AlphaProof is a system that trains itself to prove
| mathematical statements in the formal language Lean. ... Formal
| languages offer the critical advantage that proofs involving
| mathematical reasoning can be formally verified for
| correctness.
| diffeomorphism wrote:
| While that was probably meant to be rhetorical, the answer
| surprisingly seems to be an extremely strong "Yes, it does".
| Exciting times.
| PaulHoule wrote:
| See https://en.wikipedia.org/wiki/Automated_Mathematician for an
| early system that seems similar in some way.
| golol wrote:
| This Wikipedia page makes AM kind of comes across as a nonsense
| project whose outputs no one (besides the author) bothered to
| decipher.
| Davidzheng wrote:
| HOLY SHIT. It's amazing
| osti wrote:
| So they weren't able to solve the combinatorics problem. I'm not
| super well versed in competition math, but combinatorics always
| seem to be the most interesting problems to me.
| mupuff1234 wrote:
| Can it / did it solve problems that weren't solved yet?
| raincole wrote:
| Techinically yes. And it's easy. You can probably do it with
| your PC's computational power.
|
| The thing is that most math "problems" are not solved not
| becasue they're hard, but because they're not interesting
| enough to even be discovered by humans.
| mupuff1234 wrote:
| Yeah, I mean "interesting" problems (perhaps not fields medal
| interesting, but interesting enough)
| Ericson2314 wrote:
| The lede is a bit buried: they're using Lean!
|
| This is important for more than Math problems. Making ML models
| wrestle with proof systems is a good way to avoid bullshit in
| general.
|
| Hopefully more humans write types in Lean and similar systems as
| a much way of writing prompts.
| Ericson2314 wrote:
| They're def gonna go after the Riemann hypothesis with this,
| hehe.
| nwoli wrote:
| Guessing the context here is that the RH was recently
| translated into Lean. Would be very cool if they threw their
| compute on that
| Smaug123 wrote:
| I think you might be thinking of the recent project to
| start Fermat's Last Theorem? The Riemann hypothesis has
| been easy to state (given what's in Mathlib) for years.
| Davidzheng wrote:
| Yeah lol i don't think either is hard to formalize in
| lean
| raincole wrote:
| They're not just formalizing Fermant's Last Theorem's
| statement itself. They're formalizing the proof.
| Smaug123 wrote:
| And while AlphaProof is clearly _extremely_ impressive, it does
| give the computer an advantage that a human doesn 't have in
| the IMO: nobody's going to be constructing Grobner bases in
| their head, but `polyrith` is just eight characters away. I saw
| AlphaProof used `nlinarith`.
| Ericson2314 wrote:
| Hehe, well, we'll need to have a tool-assited international
| math Olympiad then.
| ComplexSystems wrote:
| This is the greatest idea ever. Why doesn't this exist?
| panagathon wrote:
| This does sound like a lot of fun. Since this AI only
| reaches silver level, presumably such a contest could be
| quite competitive, with it not yet being clear cut
| whether a human or a tool-assisted human would come out
| on top, for any particular problem.
| Davidzheng wrote:
| Good. I want my AI to use all the advantages it has to
| reinvent the landscape of mathematics
| xrisk wrote:
| Can you give some context on how using Lean benefits?
|
| In my understanding, proofs are usually harder to transcribe
| into Lean which is nobody _writes_ proofs using Lean.
|
| What is a nlinarith?
| llwu wrote:
| nlinarith is a proof automation that attempts to finish a
| proof using the simplex method to find a linear combination
| of hypotheses and things that have already been proven, as
| well as some quadratic terms of them.
|
| Docs: https://leanprover-
| community.github.io/mathlib4_docs/Mathlib...
| empath75 wrote:
| Don't think of lean as a tool that the ai is using (ie,
| cheating), think of lean plus AlphaProof as a single system.
| There's no reason to draw artificial boundaries around where
| the AI is and where the tools that the AI is using are. Lean
| itself is a traditional symbolic artificial intelligence
| system.
|
| People want always knock generative AIs for not being able to
| reason, and we've had automated systems that reason perfectly
| well for decades, but for some reason that doesn't count as
| AI to people.
| sebzim4500 wrote:
| The uses of `nlinarith` are very straight forward
| manipulations of inequalities, they would be one or two steps
| for a human too.
| queuebert wrote:
| That's amazing. I was just about to comment that hooking this
| up to Lean [1] would be killer. This must be the way forward
| for higher math, as proofs are getting so complicated that
| almost no one understands all pieces of major proofs.
|
| 1. https://lean-lang.org/
| riku_iki wrote:
| Example of proof from AlphaProof system:
| https://storage.googleapis.com/deepmind-media/DeepMind.com/B...
| golol wrote:
| This is the real deal. AlphaGeometry solved a very limited set of
| problems with a lot of brute force search. This is a much broader
| method that I believe will have a great impact on the way we do
| mathematics. They are really implementing a self-feeding pipeling
| from natural language mathematics to formalized mathematics where
| they can train both formalization and proving. In principle this
| pipeline can also learn basic theory building like creating
| auxilliary definitions and Lemmas. I really think this is the
| holy grail of proof-assistance and will allow us to formalize
| most mathematics that we create very naturally. Humans will work
| podt-rigorously and let the machine asisst with filling in the
| details.
| visarga wrote:
| > a lot of brute force search
|
| Don't dismiss search, it might be brute force but it goes
| beyond human level in Go and silver at IMO. Search is also what
| powers evolution which created us, also by a lot of brute
| forcing, and is at the core of scientific method (re)search.
| Eridrus wrote:
| Search is great, search works, but there was not a tonne to
| learn from the AlphaGeometry paper unless you were
| specifically interested in solving geometry problems.
| kypro wrote:
| My old AI professor used to say that every problem is a
| search problem.
|
| The issue is that to find solutions for useful problems
| you're often searching through highly complex and often
| infinite solution spaces.
| visarga wrote:
| For some problems validation is expensive. Like the
| particle collider or space telescope, or testing the COVID
| vaccine. It's actually validation that is the bottleneck in
| search not ideation.
| pishpash wrote:
| There's no problem with search. The goal is to search most
| efficiently.
| deely3 wrote:
| You mean that by improving search we can solve any
| problem? What if solution field is infinite, even if we
| make search algo 10x100 more performant, solution field
| will still be infinite, no?
| pishpash wrote:
| Gradient descent is a search. Where does it say the
| search space has to be small?
| shkkmo wrote:
| I would argue that no actually searchable solution space is
| really infinite (if only because infinite turing machines
| can't exist). Finite solution spaces can get more than
| large enough to be intractable.
| smokel wrote:
| What about N? Seems pretty infinite to me, unless with
| "actually" you mean finite in time and space, which would
| make your argument a tautology. Or am I missing
| something?
| Davidzheng wrote:
| Yes and there's a lot of search here too. That's a key to the
| approach
| thomasahle wrote:
| Also AlphaProof had to search for 60 hours for one of the IMO
| problems it solved.
| renonce wrote:
| It's going to be significantly faster very soon, we have
| seen how AlphaGo evolved into KataGo which is many
| magnitudes more compute efficient
| randcraw wrote:
| And I understand the upper time limit for each question was
| 4.5 hours. So it solved one question almost immediately,
| two well over the allotted time (60 hrs), and two not at
| all. No medal for you, Grasshopper.
| gowld wrote:
| What makes solving IMO problems hard is usually the limits of
| human memory, pattern-matching, and search, not creativity.
| After all, these are problems that are _already_ solved, and
| it is expected that many people can solve the problems in
| about 1 hour 's time.
|
| That makes it, in principle, similar or even easier than a
| champsionship-level chess move, which often take more than 1
| hour for a professional human (with more training than an IMO
| high school student) to solve.
|
| Another interesting concern is that when posing a problem to
| humans, it's fine to pose an "easy" brute-forceable problem,
| but humans, being slow brute-searchers, need to find more
| clever solutions. But if you give such a problem to a
| computer, it can trivialize it. So to test a computer, you
| need to pose non- easily-brute-forceable problems, which are
| harder for the computer than the others, but equally
| difficult for the humans as the other problems are.
| sebzim4500 wrote:
| Ok but if you read the actual solutions they aren't a
| bizarre mess of brute force.
|
| They look like what a human would write if they were trying
| to come up with a formal proof (albeit it does some steps
| in a weird order).
| fmap wrote:
| Agreed, this is a big step forward. Geometry problems are in a
| different class, since you can translate them into systems of
| polynomial equations and use well known computer algebra
| algorithms to solve them.
|
| By contrast, this kind of open ended formalization is something
| where progress used to be extremely slow and incremental. I
| worked in an adjacent field 5 years ago and I cannot stress
| enough that these results are simply out of reach for
| traditional automated reasoning techniques.
|
| Real automatic theorem proving is also useful for a lot more
| than pure mathematics. For example, it's simple to write out an
| axiomatic semantics for a small programming language in lean
| and pose a question of the form "show that there exists a
| program which satisfies this specification".
|
| If this approach scales it'll be far more important than any
| other ML application that has come out in the last few years.
| cubefox wrote:
| > Agreed, this is a big step forward. Geometry problems are
| in a different class, since you can translate them into
| systems of polynomial equations and use well known computer
| algebra algorithms to solve them.
|
| The blog post indicates the opposite. The geometry problem in
| the IMO problem set was solved by AlphaGeometry 2, which is
| an LLM based on Google's Gemini. LLMs are considered
| relatively general systems. But the other three solved
| problems were proved by AlphaProof, which is a narrow RL
| system that is literally based on AlphaZero, the Go and Chess
| AI. Only its initial (bootstrapping) human training data
| (proofs) were formalized and augmented by an LLM (Gemini).
| EugeneOZ wrote:
| No. It's like you are allowed to use search engines to find a
| solution, nothing more than that.
| geysersam wrote:
| I imagine a system like this to be vastly more useful outside
| the realm of mathematics research.
|
| You don't need to be able to prove very hard problems to do
| useful work. Proving just simple things is often enough. If I
| ask a language model to complete a task, organize some entries
| in a certain way, or schedule this or that, write a code that
| accomplishes X, the result is typically not trustworthy
| directly. But if the system is able to translate parts of the
| problem to logic and find a solution, that might make the
| system much more reliable.
| creata wrote:
| But for it to be 100% trustworthy, _you 'd_ have to express
| correctness criteria for those simple tasks as formal
| statements.
| Smaug123 wrote:
| So I am extremely hyped about this, but it's not clear to me how
| much heavy lifting this sentence is doing:
|
| > First, the problems were manually translated into formal
| mathematical language for our systems to understand.
|
| The non-geometry problems which were solved were all of the form
| "Determine all X such that...", and the resulting theorem
| statements are all of the form "We show that the set of all X is
| {foo}". The downloadable solutions from
| https://storage.googleapis.com/deepmind-media/DeepMind.com/B...
| don't make it clear whether the set {foo} was decided by a human
| during this translation step, or whether the computer found it. I
| _want_ to believe that the computer found it, but I can 't find
| anything to confirm. Anyone know?
| ocfnash wrote:
| The computer did find the answers itself. I.e., it found "even
| integers" for P1, "{1,1}" for P2, and "2" for P6. It then also
| provided provided a Lean proof in each case.
| nnarek wrote:
| formal definition of first theorem already contain answer of
| the problem "{a : R | [?] k : Z, Even k [?] a = k}" (which
| mean set of even real numbers).if they say that they have
| translated first problem into formal definition then it is
| very interesting how they initially formalized problem
| without including answer in it
| golol wrote:
| I would expect that in their data which they train
| AlphaProof on they have some concept of a "vague problem"
| whoch could just look like
|
| {Formal description of the set in question} = ?
|
| And then Alphaproof has to find candidate descriptions of
| this set and prove a theorem that they are equal to the
| above.
|
| I doubt they would claim to solve the problem if they
| provided half of the answer.
| puttycat wrote:
| > I doubt they would claim to solve the problem if they
| provided half of the answer.
|
| Stranger things have happened
| riku_iki wrote:
| its not clear if theorem is actual input formal definition,
| or formal definition was in different form.
| cygaril wrote:
| Come up with many possible answers, formalize them all, and
| then try to prove or disprove each of them.
| Davidzheng wrote:
| This is probably partially what they did idk why it's
| downvoted lol
| Smaug123 wrote:
| (You're talking to one of the people who was part of the
| project, which is why I took @ocfnash's answer as
| authoritative: they did not cheat.)
| Xelynega wrote:
| If they're talking to the people who are part of the
| project I'd hope the answer would contain detail and not
| expect to be taken as authoritative.
| refulgentis wrote:
| If they're talking to the public I'd hope that they don't
| face an infinite loop of ever-more stringent requirements
| for proof they didn't give it the answer in the question.
| pishpash wrote:
| That's a weird perception of talking to the public.
| refulgentis wrote:
| ?
| pishpash wrote:
| Exactly, a problem and its answer are just different ways
| of describing the same object. Every step of a proof is a
| transformation/translation of the same object. It would be
| disingenuous to say that _some_ heavy lifting isn 't done
| in formalizing a problem but it seems that step is also
| performed by a machine:
|
| "We established a bridge between these two complementary
| spheres by fine-tuning a Gemini model to automatically
| translate natural language problem statements into formal
| statements, creating a large library of formal problems of
| varying difficulty."
|
| I'm confused, is the formalization by Gemini or "manually"?
| Which is it?
| Davidzheng wrote:
| Can you elaborate on how it makes guesses like this? Does it
| do experiments before? Is it raw LLM? Is it feedback loop
| based on partial progress?
| Sharlin wrote:
| "AlphaProof is a system that trains itself to prove
| mathematical statements in the formal language Lean. It
| couples a pre-trained language model with the AlphaZero
| reinforcement learning algorithm, which previously taught
| itself how to master the games of chess, shogi and Go."
| JKCalhoun wrote:
| Yeah I am not clear the degree to which this system and
| LLMs are related. Are they related? Or is AlphaProof a
| complete tangent to CHatGPT and its ilk?
| gowld wrote:
| It's not an English LLM (Large Language Model).
|
| It's a math Language Model. Not even sure it's a Large
| Language Model. (Maybe shares a foundational model with
| an English LLM; I don't know)
|
| It learns mathematical statements, and generates new
| mathematical statements, then uses search techniques to
| continue. Similar to Alpha Go's neural network, what
| makes it new and interesting is how the NN/LLM part makes
| smart guesses that drastically prune the search tree,
| before the brute-force search part.
|
| (This is also what humans do to solve math probrems. But
| humans are really, really slow at brute-force search, so
| we really almost entirely on the NN pattern-matching
| analogy-making part.)
| sebzim4500 wrote:
| My reading of it is that it uses the same architecture as
| one of the Gemini models but does not share any weights
| with it. (i.e it's not just a finetune)
| nextos wrote:
| These kind of LLMs are also very interesting for software
| engineering. It's just a matter of replacing Lean with
| something that is more oriented towards proving software
| properties.
|
| For example, write a formal specification of a function
| in Dafny on Liquid Haskell and get the LLM to produce
| code that is formally guaranteed to be correct. Logic-
| based + probability-based ML.
|
| All GOFAI ideas are still very useful.
| dooglius wrote:
| The linked page says
|
| > While the problem statements were formalized into Lean by
| hand, the answers within the problem statements were generated
| and formalized by the agent.
|
| However, it's unclear what initial format was given to the
| agents that allowed this step
| Smaug123 wrote:
| FWIW, GPT-4o transcribed a screenshot of problem 1 perfectly
| into LaTeX, so I don't think "munge the problem into machine-
| readable form" is per se a difficult part of it these days
| even if they did somehow take shortcuts (which it sounds like
| they didn't).
| pclmulqdq wrote:
| Comparing "turn photo into LaTeX" to "translate theorems
| into Lean" is like comparing a child's watercolor drawing
| to the Mona Lisa.
| Smaug123 wrote:
| ... no? After the LaTeX output, I told stock GPT4o that
| the answer was "all even integers", and asked for the
| statement in Lean. I had to make _two changes_ to its
| output (both of which were compile-time errors, not
| misformalisations), and it gave me the formalisation of
| the difficult direction of the problem.
|
| Both changes were trivial: it had one incorrect (but
| unnecessary) import, and it used the syntax from Lean 3
| instead of Lean 4 in one lambda definition. A system that
| was trained harder on Lean would not make those mistakes.
|
| The one _actual_ error it made was in not proposing that
| the other direction of the "if and only if" is required.
| Again, I am quite confident that this formalisation
| failure mode is not hard to solve in a system that is,
| like, actually trained to do this.
|
| Obviously formalising _problems that a working
| mathematicican solves_ is dramatically harder than
| formalising IMO problems, and is presumably _way_ ahead
| of the state of the art.
| lmm wrote:
| > I am quite confident that this formalisation failure
| mode is not hard to solve in a system that is, like,
| actually trained to do this.
|
| Why?
| sebzim4500 wrote:
| I think that's exagerating a bit. If you are familiar
| with both Lean and LaTeX then I think transcribing these
| problems to Lean only takes about twice as long as
| transcribing them to LaTeX.
| pclmulqdq wrote:
| So if Lean was used to find the answers, where exactly is the
| AI? A thin wrapper around Lean?
| dooglius wrote:
| Lean checks that the proof is valid, it didn't find the
| proof.
| xrisk wrote:
| Lean is just the language, Presumably to drive the AI
| towards the program ("the proof")
| pishpash wrote:
| The AI is the "solver network", which is the (directed)
| search over solutions generated by Lean. The AI is in doing
| an efficient search, I suppose.
|
| I'm also waiting for my answer on the role of the Gemini
| formalizer, but reading between the lines, it looks like it
| was only used during training the "solver network", but not
| used in solving the IMO problems. If so then the hyping is
| greatly premature, as the hybrid formalizer/solver is the
| whole novelty of this, but it's not good enough to use end-
| to-end?
|
| You cannot say AlphaProof learned enough to solve problems
| if formalization made them easier to solve in the first
| place! You can say that the "solver network" part learned
| enough to solve formalized problems better than prior
| training methods.
| sdenton4 wrote:
| Think of problems in NP - you can check the answer
| efficiently, but finding the answer to check is the hard
| part... This is basically what we're looking at here: The
| proof checker can quickly evaluate correctness, but we need
| something to produce the proof, and that's the hard part.
| golol wrote:
| > When presented with a problem, AlphaProof generates solution
| candidates and then proves or disproves them by searching over
| possible proof steps in Lean.
|
| To me, this sounds like Alphaproof receives a "problem",
| whatever that is (how do you formalize "determine all X such
| that..."? One is asked to show that an abstract set is actually
| some easily understandable set...). Then it generates candidate
| Theorems, persumably in Lean. I.e. the set is {n: P(n)} for
| some formula or something. Then it searches for proofs.
|
| I think if Alphaproof did not find {foo} but it was given then
| it would be very outrageous to claim that it solved the
| problem.
|
| I am also very hyped.
| zerocrates wrote:
| Interesting that they have a formalizer (used to create the
| training data) but didn't use it here. Not reliable enough?
| summerlight wrote:
| To speak generally, that translation part is _much_ easier than
| the proof part. The problem with automated translation is that
| the translation result might be incorrect. This happens a lot
| when even people try formal methods by their hands, so I guess
| the researchers concluded that they 'll have to audit every
| single translation regardless of using LLM or whatever tools.
| thomasahle wrote:
| You'd think that, but Timothy Gowers (the famous
| mathematician they worked with) wrote
| (https://x.com/wtgowers/status/1816509817382735986)
|
| > However, LLMs are not able to autoformalize reliably, so
| they got them to autoformalize each problem many times. Some
| of the formalizations were correct, but even the incorrect
| ones were useful as training data, as often they were easier
| problems.
|
| So didn't actually solve autoformalization, which is why they
| still needed humans to translate the input IMO 2024 problems.
|
| The reason why formalization is harder than you think is that
| there is no way to know if you got it right. You can use
| Reinforcement Learning with proofs and have a clear signal
| from the proof checker. We don't have a way to verify
| formalizations the same way.
| llwu wrote:
| > We don't have a way to verify formalizations the same
| way.
|
| While there is no perfect method, it is possible to use the
| agent to determine if the statement is false, has
| contradictory hypotheses, or a suspiciously short proof.
| thrdbndndn wrote:
| > However, LLMs are not able to autoformalize reliably, so
| they got them to autoformalize each problem many times.
| Some of the formalizations were correct, but even the
| incorrect ones were useful as training data, as often they
| were easier problems.
|
| A small detail wasn't clear to me: for these incorrectly
| formalized problems, how do they get the correct _answer_
| as ground truth for training? Have a human to manually
| solve them?
|
| (In contrast to problems actually from "a huge database of
| IMO-type problems", they do have answers for these
| already).
| adrianN wrote:
| You write proofs in a formal language that can be machine
| checked. If the checker is happy, the proof is correct
| (unless there is a bug in the checker, but that is
| unlikely).
| raincole wrote:
| They said the incorrectly formalized ones are usually
| easier, so I assume they just hire humans to solve them
| in the old way until the AI is smart enough to solve
| these easier problems.
| summerlight wrote:
| > A small detail wasn't clear to me: for these
| incorrectly formalized problems, how do they get the
| correct answer as ground truth for training? Have a human
| to manually solve them?
|
| Formal proofs can be mechanically checked if it's correct
| or not. We just don't know what's the answer. Think it as
| an extremely rigorous type system that typically requires
| really long type annotations, like annotation itself is a
| complex program. So if AlphaProof happens to generate a
| proof that passes this checker, we know that it's
| correct.
| ajross wrote:
| > To speak generally, that translation part is much easier
| than the proof part.
|
| To you or me, sure. But I think the proof that it isn't for
| this AI system is that they didn't do it. Asking a modern LLM
| to "translate" something is a pretty solved problem, after
| all. That argues strongly that what was happening here is not
| a "translation" but something else, like a semantic
| distillation.
|
| If you ask a AI (or person) to prove the halting problem,
| they can't. If you "translate" the question into a specific
| example that does halt, they can run it and find out.
|
| I'm suspicious, basically.
| kurthr wrote:
| As is often the case, creating a well formed problem statement
| often takes as much knowledge (if not work) as finding the
| solution.
|
| But seriously, if you can't ask the LLM to solve the right
| question, you can't really expect it to give you the right
| answer unless you're really lucky. "I'm sorry, but I think you
| meant to ask a different question. You might want to check the
| homework set again to be sure, but here's what I think you
| really want."
| gowld wrote:
| The article says
|
| > AlphaProof solved two algebra problems and one number theory
| problem by _determining the answer_ and proving it was correct.
| sebzim4500 wrote:
| I as someone with a maths degree but who hasn't done this kind
| of thing for half a decade, was able to immediately guess the
| solution to (1) but actually proving it is much harder.
| balls187 wrote:
| What was the total energy consumption required to acheive this
| result (both training and running)
|
| And, how much CO2 was released into earths atmosphere?
| ozten wrote:
| Compared to all of the humans who compete at this level and
| their inputs and outputs for the trailing 5 years.
| balls187 wrote:
| And?
|
| The result is (likely) net energy consumption, resulting in
| (likely) net CO2 emissions.
|
| So, what was did it cost us for this achievement in AI?
|
| EDIT TO ADD: It's fair to think that such a presser should
| not include answers to my questions. But, it's also fair to
| want that level of transparency given we are dealing with
| climate change.
| regularfry wrote:
| You're not wrong and _in general_ the conclusion is that AI
| emits less CO2 than a human performing the same task.
| Whether that 's true for this specific task is worth
| asking, as is the question of how efficient such a process
| can be made.
| amelius wrote:
| There's no energy limit in the IMO rules.
| balls187 wrote:
| The point isn't IMO rules.
|
| It's that we are living in a period of time where there are
| very real consequences of nearly a century of unchecked CO2
| due to human industry.
|
| And AI (like crypto before it) requires considerable energy
| consumption. Because of which, I believe we (people who
| believe in AI) need to hold companies accountable by very
| transparently disclosing those energy costs.
| amelius wrote:
| What if at some point AI figures out a solution to climate
| change?
| Sivart13 wrote:
| Why would we be any more likely to implement it, relative
| to the solutions that humans have already figured out for
| climate change?
| sensanaty wrote:
| Well we can be confident in the knowledge that techbros
| might finally take the issue seriously if an AI tells
| them to!
| ks2048 wrote:
| I know this is not an uncommon opinion in tech circles,
| but I believe an insane thing to hang humanities hopes
| on. There's no reason to think AI will be omnipotent.
| balls187 wrote:
| There is not, but there is plenty of historical evidence
| that scientific and technological progress has routinely
| addressed humanities crisis du jour.
| staunton wrote:
| You mean new and incredibly effective ways to shape
| public opinion? I guess it might. But then someone would
| still have to use it for that purpose...
| password54321 wrote:
| > need to hold companies accountable by very transparently
| disclosing those energy costs.
|
| And if they do, then what? If it is "too high" do we delay
| research because we need to keep the world how it is for
| you? What about all the other problems others face that
| could be solved by doubling down on compute for AI
| research?
| balls187 wrote:
| > And if they do, then what? If it is "too high" do we
| delay research because we need to keep the world how it
| is for you?
|
| First, it's keeping the world how it is for _all of us_ ,
| not just me.
|
| Second, to answer you question, I think that is a
| decision for _all of us_ to weigh in on, but before we
| can do that, we must be informed as best as we can.
|
| Do sacrifices have to be made for the greater good?
| Absolutely. Do for-profit mega corporations get to make
| those decisions without consent from the public? No.
| xyzzy123 wrote:
| They are 100% accountable for paying the bill from their
| electricity provider.
| majikaja wrote:
| It would be nice if on the page they included detailed
| descriptions of the proofs it came up with, more information
| about the capabilities of the system and insights into the
| training process...
|
| If the data is synthetic and covers a limited class of problems I
| would imagine what it's doing mostly reduces to some basic search
| pattern heuristics which would be of more value to understand
| than just being told it can solve a few problems in three days.
| cygaril wrote:
| Proofs are here: https://storage.googleapis.com/deepmind-
| media/DeepMind.com/B...
| majikaja wrote:
| I found those, I just would have appreciated if the content
| of the mathematics wasn't sidelined to a separate download as
| if it's not important. I felt the explanation on the page was
| shallow, as if they just want people to accept it's a black
| box.
|
| All I've learnt from this is that they used an unstated
| amount of computational resources just to basically brute
| force what a human already is capable of doing in far less
| time.
| Davidzheng wrote:
| Very few humans can after years of training. Please don't
| trivialize.
| necovek wrote:
| Very few humans go after this type of the training. In my
| "math talent" school (many of the Serbian/Yugoslavian
| medal winners came from it), at most a dozen students
| "trained" for this over 4 high school generations (500
| students).
|
| Problems are certainly not trivial, but humans are not
| really putting all their effort into it either, and the
| few that do train for it, on average medal 50% of the
| time and get a silver or better 25% of the time (by
| design) with much less time available to do the problems.
| fancyfredbot wrote:
| I'm seriously jealous of the people getting paid to work on this.
| Sounds great fun and must be incredibly satisfying to move the
| state of the art forward like that.
| Mithriil wrote:
| Best we can do then is keep ourselves up to date and give our
| support!
| bearjaws wrote:
| C'mon you're meant to be re-configuring 3,292,329 line of YML
| for K8s.
|
| (/s)
| psbp wrote:
| It's funny that if I could describe my entire career, it
| would probably be something similar to software
| janitor/maintenance worker.
|
| I guess I should have pursued a PhD when I was younger.
| onemoresoop wrote:
| You probably mean envious not jealous.
| yalok wrote:
| I'm learning something new today. In some other languages
| these 2 are usually the same 1 word.
| Vinnl wrote:
| Huh. So I tried to look it up just now and I'm not sure if
| I understand the difference. (To the extent that there is
| one - apparently one _can_ mean the other, but I imagine
| they 're usually used as follows.)
|
| It looks like "jealous" is more being afraid of losing
| something you have (most commonly e.g. a spouse's
| affection) to someone else, whereas "envious" is wanting
| what someone else has?
| onemoresoop wrote:
| Correct.
| onemoresoop wrote:
| No worries. I learned this too a while ago (was also using
| jealous instead of envious and vice versa myself). From my
| understanding the use of jealous is when you have something
| but that is threatened by some external factor, eg a
| partner, a friend having more fun with somebody else.
| Envious is when you covet something that you do not have
| currently but wish to, which is in this case playing with
| exciting tech.
| GuB-42 wrote:
| I don't know about that. A lot of the work that should have
| been very satisfying turned out to be boring as hell, if not
| toxic, while at the same time, some apparently mundane stuff
| turned out to be really exciting.
|
| I found the work environment to be more important than the
| subject when it comes to work satisfaction. If you are working
| on a world changing subject with a team of assholes, you are
| going to have a bad time, some people really have a skill for
| sucking the fun out of everything, and office politics are
| everywhere, especially on world changing subjects.
|
| On the other hand, you can have a most boring subject, say
| pushing customer data to a database, and have the time of your
| life: friendly team, well designed architecture, time for
| experimentation and sharing of knowledge, etc... I have come to
| appreciate the beauty of a simple thing that just works. It is
| so rare, maybe even more rare than scientific breakthroughs.
|
| Now, you can also have an awesome work environment _and_ an
| awesome subject, it is like hitting the jackpot... and a good
| reason to be envious.
| lolinder wrote:
| This is a fun result for AI, but a _very_ disingenuous way to
| market it.
|
| IMO contestants aren't allowed to bring in paper tables, much
| less a whole theorem prover. They're given two 4.5 hour sessions
| (9 hours total) to solve all the problems with nothing but
| pencils, rulers, and compasses [0].
|
| This model, meanwhile, was wired up to a theorem proover and took
| three solid days to solve the problems. The article is extremely
| light on details, but I'm assuming that most of that time was
| guess-and-check: feed the theorem prover a possible answer, get
| feedback, adjust accordingly.
|
| If the IMO contestants were given a theorem prover and three days
| (even counting breaks for sleeping and eating!), how would
| AlphaProof have ranked?
|
| Don't get me wrong, this is a fun project and an exciting result,
| but their comparison to silver medalists at the IMO is just
| feeding into the excessive hype around AI, not accurately
| representing its current state relative to humanity.
|
| [0] 5.1 and 5.4 in the regulations: https://www.imo-
| official.org/documents/RegulationsIMO.pdf
| gjm11 wrote:
| Working mathematicians mostly don't use theorem provers in
| their work, and find that when they do they go significantly
| more slowly (with of course the compensating advantage of
| guaranteeing no mistakes in the final result).
|
| A theorem prover is probably more useful for the typical IMO
| problem than for the typical real research problem, but even so
| I'd guess that even with a reasonable amount of training most
| IMO contestants would not do much better for having access to a
| theorem prover.
|
| Having three days would be a bigger deal, for sure. (But from
| "computers can't do this" to "computers can do this, but it
| takes days" is generally a much bigger step than from
| "computers can do this, but it takes days" to "computers can do
| this in seconds".)
| golol wrote:
| The point is not to compare AI and humans, it is to compare AI
| and IMO-level math problems. It's not for sport.
| lolinder wrote:
| They're literally comparing AI to human IMO contestants.
| "DeepProof solves 4/6 IMO problems correctly" would be the
| non-comparison version of this press release and would give a
| better sense for how it's actually doing.
| golol wrote:
| "Solving IMO problems at Silver-Medal level" is pretty much
| equivalent to solving something like 4/6 problems. It is
| only a disingenious comparison if you want to read it as a
| comparison. I mean yea, many people will, but I don't care
| anout them. People who are technically interested in this
| know that the point is not to have a competition of AI with
| humans.
| lolinder wrote:
| > but I don't care anout them
|
| It's great that you feel safe being so aloof, but I
| believe we have a responsibility in tech to turn down the
| AI hype valve.
|
| The NYT is currently running a piece with the headline
| "Move Over, Mathematicians, Here Comes AlphaProof".
| People see that, and people react, and we in tech are not
| helping matters by carelessly making false comparisons.
| golol wrote:
| Why? Why is hype bad? What actual harm does it cause?
|
| Also the headline is fair, as I do believe that
| AlphaProof demonstrates an approach to mathematics that
| will indeed invade mathematicians workspaces. And I say
| that as a mathemstician.
| Davidzheng wrote:
| For sure. I feel like mathematicians are not paying
| attention (maybe deliberately) we have a real shot of
| solving some incredibly hard open problem in the next few
| years.
| visarga wrote:
| I think search-based AI is on a different level than
| imitation models like GPT. This is not a hallucinating
| model, and it could potentially discover things that are
| not written in any books.
|
| Search is amazing. Protein folding searches for the
| lowest energy configuration. Evolution searches for
| ecological fit. Culture searches for progress, and
| science for understanding. Even placing a foot on the
| ground searches for dynamic equilibrium. Training a ML
| model searches for best parateters to fit a dataset.
| Search is universal. Supervised learning is just
| imitative, search is extrapolative.
| kenjackson wrote:
| Exactly. The point is what can we eventually get AI to solve
| problems which we as humans can't. Not if we can win the IMO
| with a computer.
| blackbear_ wrote:
| And why aren't you complaining that human participants could
| train and study for thousands of hours before attempting the
| problems? And that the training materials they used was itself
| created and perfected by hundreds of other people, after having
| themselves spend countless hours studying?
| lolinder wrote:
| Because so did the AI?
| blackbear_ wrote:
| That's exactly my point? Both had to learn?
| Davidzheng wrote:
| I can tell you that as someone who could have gotten bronze (i
| was too weak for the team) and is now a math phd--I would not
| have scored as well as alphaproof in three days most likely. In
| most problems either you find an idea soon or it can be much
| much longer. It's just not a matter of working and constant
| progress.
| knotthebest wrote:
| Skill issue tbh
| Davidzheng wrote:
| Agreed :'(
| StefanBatory wrote:
| Wow, that's absolutely impressive to hear!
|
| Also it's making me think that in 5-10 years almost all tasks
| involving computer scientists or mathematicians will be done in
| AI. Perhaps people going into trades had a point.
| visarga wrote:
| Everything that allows for cheap validation is going that way.
| Math, code, or things we can simulate precisely. LLM ideation +
| Validation is a powerful combination.
| quirino wrote:
| I honestly expected the IOI (International Olympiad of
| Informatics) to be "beaten" much earlier than the IMO. There's
| AlphaCode, of course, but on the latest update I don't think it
| was quite on "silver medal" level. And available LLM's are
| probably not even on "honourable mention" level.
|
| I wonder if some class of problems will emerge that human
| competitors are able to solve but are particularly tricky for
| machines. And which characteristics these problems will have
| (e.g. they'll require some sort of intuition or visualization
| that is not easily formalized).
|
| Given how much of a dent LLM's are already making on beginner
| competitions (AtCoder recently banned using them on ABC rounds
| [1]), I can't help but think that soon these competitions will be
| very different.
|
| [1] https://info.atcoder.jp/entry/llm-abc-rules-en
| oXman038 wrote:
| IOI problems are more close to IMO combinatoric problems than
| other IMO problem types. That might be the reason for that
| delay. I personally like only combinatoric problems in IMO.
| Thats why I drop math track and went IOI instead.
|
| I feel why combinatoric is harder for AI models is the same
| reason why LLM's are not great at reasoning anything out of
| distribution. LLM's are good pattern recognizers and
| fascinating at this point. But simple tasks like counting
| intersections at the Venn diagrams requires more strategy and
| less pattern recognition. Pure NN based models seem won't be
| enough to solve these problems. AI agents and RL are promising.
|
| I don't know anything about lean but I am curious that proof of
| combinatorial problems can be as well represented as number
| theory or algebra. If combinatorial problem solutions are
| always closer to natural language, the failure of LLMs are
| expected. Or, at least we can assume it might take more time to
| make it better. I am making assumption in here that solutions
| of combinatorial problems in IMO are more human language
| oriented and relies on more common sense/informal logic when it
| compared to geometry or number theory problems.
| Davidzheng wrote:
| Are you convinced there's a "reason " AI today is worse at
| combo? Like i don't see enough evidence that it's not an
| accident.
| robinhouston wrote:
| Some more context is provided by Tim Gowers on Twitter [1].
|
| Since I think you need an account to read threads now, here's a
| transcript:
|
| Google DeepMind have produced a program that in a certain sense
| has achieved a silver-medal peformance at this year's
| International Mathematical Olympiad.
|
| It did this by solving four of the six problems completely, which
| got it 28 points out of a possible total of 42. I'm not quite
| sure, but I think that put it ahead of all but around 60
| competitors.
|
| However, that statement needs a bit of qualifying.
|
| The main qualification is that the program needed a lot longer
| than the human competitors -- for some of the problems over 60
| hours -- and of course much faster processing speed than the poor
| old human brain.
|
| If the human competitors had been allowed that sort of time per
| problem they would undoubtedly have scored higher.
|
| Nevertheless, (i) this is well beyond what automatic theorem
| provers could do before, and (ii) these times are likely to come
| down as efficiency gains are made.
|
| Another qualification is that the problems were manually
| translated into the proof assistant Lean, and only then did the
| program get to work. But the essential mathematics was done by
| the program: just the autoformalization part was done by humans.
|
| As with AlphaGo, the program learnt to do what it did by teaching
| itself. But for that it needed a big collection of problems to
| work on. They achieved that in an interesting way: they took a
| huge database of IMO-type problems and got a large language model
| to formalize them.
|
| However, LLMs are not able to autoformalize reliably, so they got
| them to autoformalize each problem many times. Some of the
| formalizations were correct, but even the incorrect ones were
| useful as training data, as often they were easier problems.
|
| It's not clear what the implications of this are for mathematical
| research. Since the method used was very general, there would
| seem to be no obvious obstacle to adapting it to other
| mathematical domains, apart perhaps from insufficient data.
|
| So we might be close to having a program that would enable
| mathematicians to get answers to a wide range of questions,
| provided those questions weren't _too_ difficult -- the kind of
| thing one can do in a couple of hours.
|
| That would be massively useful as a research tool, even if it
| wasn't itself capable of solving open problems.
|
| Are we close to the point where mathematicians are redundant?
| It's hard to say. I would guess that we're still a breakthrough
| or two short of that.
|
| It will be interesting to see how the time the program takes
| scales as the difficulty of the problems it solves increases. If
| it scales with a similar ratio to that of a human mathematician,
| then we might have to get worried.
|
| But if the function human time taken --> computer time taken
| grows a lot faster than linearly, then more AI work will be
| needed.
|
| The fact that the program takes as long as it does suggests that
| it hasn't "solved mathematics".
|
| However, what it does is way beyond what a pure brute-force
| search would be capable of, so there is clearly something
| interesting going on when it operates. We'll all have to watch
| this space.
|
| 1. https://x.com/wtgowers/status/1816509803407040909?s=46
| visarga wrote:
| > If the human competitors had been allowed that sort of time
| per problem they would undoubtedly have scored higher.
|
| Or if AlphaProof used more compute they could have slashed that
| time to 1/10 or less. It's arbitrary as long as we don't define
| what is the compute the AI should be entitled to use here.
| skywhopper wrote:
| Except it didn't. The problem statements were hand-encoded into a
| formal language by human experts, and even then only one problem
| was actually solved within the time limit. So, claiming the work
| was "silver medal" quality is outright fraudulent.
| noud wrote:
| I had exactly the same feeling when reading this blog. Sure,
| the techniques used to find the solutions are really
| interesting. But the claim more than they achieve. The problem
| statements are not available in Lean, and the time limit is 2 x
| 4.5 hours. Not 3 days.
|
| The article claims they have another model that can work
| without formal languages, and that it looks very promising. But
| they don't mention how well that model performed. Would that
| model also perform at silver medal level?
|
| Also note, that if the problems are provided in a formal
| language, you can always find the solution in finite amount of
| time (provided the solution exists). You can brute-force over
| all possible solutions until you find the solution that proofs
| the statement. This may take a very long time, but it will find
| the solutions eventually. You will always solve all the
| problems and win the IMO at gold medal level. Alphaproof seems
| to do something similar, but takes smarter decisions which
| possible solutions to try and which once to skip. What would be
| the reason they don't achieve gold?
| refulgentis wrote:
| Goalposts at the moon, FUD at "but what if its obviously fake?".
|
| Real, exact, quotes from the top comments at 1 PM EST.
|
| "I want to believe that the computer found it, but I can't find
| anything to confirm."
|
| "Curing cancer will require new ideas"
|
| "Maybe they used 10% of all of GCP [Google compute]"
| brap wrote:
| Are all of these specialized models available for use? Like, does
| it have an API?
|
| I wonder because on one hand they seem very impressive and
| groundbreaking, on the other it's hard to imagine why more than a
| handful of researchers would use them
| creata wrote:
| > it's hard to imagine why more than a handful of researchers
| would use them
|
| If you could automatically prove that your concurrency protocol
| is safe, or that your C program has no memory management
| mistakes, or that your algorithm always produces the same
| results as a simpler, more obviously correct but less optimized
| algorithm, I think that would be a huge benefit for many
| programmers.
| gallerdude wrote:
| Sometimes I wonder if in 100 years, it's going to be surprising
| to people that computers had a use before AI...
| onemoresoop wrote:
| If AI stays in the computer form though..
| necovek wrote:
| AI is simply another form of what we've been doing since the
| dawn of computers: expressing real world problems in the form
| of computations.
|
| While there are certainly some huge jumps in compute power,
| theory of data transformation and availability of data to
| transform, it would surprise me if computers in a 100 years do
| not still rely on a combination of well-defined and well-
| understood algorithms and AI-inspired tools that do the same
| thing but on a much bigger scale.
|
| If not for any other reason, then because there are so many
| things where you can easily produce a great, always correct
| result simply by doing very precise, obvious and simple
| computation.
|
| We've had computers and digital devices for a long while now,
| yet we still rely heavily on mechanical contraptions. Sure, we
| improve them with computers (eg. think brushless motors), but I
| don't think anyone would be surprised today about how did
| anyone design these same devices (hair dryers, lawn mowers,
| internal combustion engines...) before computers?
| Jun8 wrote:
| Tangentially: I found it fascinating to follow along the solution
| to Problem 6: https://youtu.be/7h3gJfWnDoc (aquaesulian is a node
| to ancient name of Bath). There's no advanced math and each step
| is quite simple, I'd guess on a medium 8th grader level.
|
| Note that the 6th question is generally the hardest ("final
| boss") and many top performers couldn't solve it.
|
| I don't know what Lean is or how see AI's proofs but an AI system
| that can explain such a question on par with the YouTuber above
| would be fantastic!
| pmcf wrote:
| I read this as "In My Opinion" and really thought this about AI
| dealing with opinionated people. Nope. HN is still safe. For
| now...
| lo_fye wrote:
| Remember when people thought computers would never be able to
| beat a human Grand Master at chess? Ohhh, pre-2000 life, how I
| miss thee.
| AyyEye wrote:
| Parlor tricks. Wake me up AI can reliably identify which number
| is circled at the level of my two year old.
| michael_nielsen wrote:
| A good brief overview here from Tim Gowers (a Fields Medallist,
| who participated in the effort), explaining and contextualizing
| some of the main caveats:
| https://x.com/wtgowers/status/1816509803407040909
| HL33tibCe7 wrote:
| This is kind of an ideal use-case for AI, because we can say with
| absolute certainty whether their solution is correct, completely
| eliminating the problem of hallucination.
| cs702 wrote:
| In 1997, machines defeated a World Chess Champion for the first
| time, using brute-force "dumb search." Critics noted that while
| "dumb search" worked for chess, it might not necessarily be a
| general strategy applicable to other cognitive tasks.[a]
|
| In 2016, machines defeated a World Go Champion for the first
| time, using a clever form of "dumb search" that leverages
| compute, DNNs, reinforcement learning (RL), and self-play.
| Critics noted that while this fancy form of "dumb search" worked
| for Go, it might not necessarily be a general strategy applicable
| to other cognitive tasks.[a]
|
| In 2024, machines solved insanely hard math problems at the
| Silver Medal level in an International Math Olympiad for the
| first time, using a clever form of "dumb search" that leverages
| compute, DNNs, RL, and a formal language. Perhaps "dumb search"
| over cleverly pruned spaces isn't as dumb as the critics would
| like it to be?
|
| ---
|
| [a] http://www.incompleteideas.net/IncIdeas/BitterLesson.html
| klyrs wrote:
| A brute force search has perfect knowledge. Calling it "dumb"
| encourages bad analogies -- it's "dumb" because it doesn't
| require advanced reasoning. It's also "genius" because it
| always gets the right answer eventually. It's hugely expensive
| to run.
|
| And you keep shifting the goalpost on what's called "dumb"
| here.
| adrianN wrote:
| You might have missed the point.
| bdjsiqoocwk wrote:
| Tell us what the point was, I think a lot of us are missing
| it.
| BiteCode_dev wrote:
| Moving the goal post of dumb search.
| jmalicki wrote:
| None of the above are brute force search.
| TheDudeMan wrote:
| So all forms of search are dumb in your view?
| outlore wrote:
| I think OP is arguing the opposite. In other words, critics
| of AI innovations call them "dumb search". The goal post
| keeps moving.
| cs702 wrote:
| _Exactly_. They are not dumb search!
| sobellian wrote:
| From whence are you quoting the phrase "dumb search?"
| fspeech wrote:
| By "dumb" I assume you meant brute-force. Search, as opposed to
| extrapolation, is what actually produces suprise or creative
| results, whether it happens in a person's head or on a
| computer. The issue is to produce the heuristics that can let
| one push back the combinatorial explosion.
| tux3 wrote:
| The success in Go was very much not dumb search. Where dumb
| search had failed to achieve the level of even a very weak
| player, the neural net's "intuition" about good moves without
| any search was already very strong. Only the combination was
| superhuman, and that was anything but the dumb search that had
| been tried before.
|
| Today's announcement is also not about proving Lean theorems by
| "dumb search". The success is about search + neural networks.
|
| You're attacking critics for criticizing the solution that has
| failed, while confusing it for the solution that works to this
| day.
| cs702 wrote:
| That's in fact precisely my point:
|
| Clever forms of "dumb search" that leverage compute, DNNs,
| RL, self-play, and/or formal languages are _not_ dumb at all.
|
| I put the words "dumb search" in quotes precisely because I
| think critics who dismiss AI progress as such are missing the
| point.
|
| We're not in disagreement :-)
| tux3 wrote:
| >Clever forms of "dumb search" that leverage compute, DNNs,
| RL, self-play, and formal languages are not dumb at all
|
| Right. My point is that you're attacking a position that no
| real critic holds. Of course we're in agreement then!
|
| "Clever forms of dumb search are not dumb" feels a little
| like kicking down open doors. We were always going to
| agree.
| cs702 wrote:
| _> My point is that you 're attacking a position that no
| real critic holds_
|
| I'm not so sure. I wrote my comment after seeing quite a
| few comments on other threads here that read like fancy
| versions of "this is just brute-force search over
| cleverly pruned trees." Search the other threads here,
| and you'll see what I mean.
| Davidzheng wrote:
| By its nature hard math problems do not have fast algorithmic
| solutions--you can only solve them by search or clever search.
| Mathematicians have heuristics on helping us decide intuitively
| what's going to work what can make progress. But in the end--
| there is only search
| netcan wrote:
| Well written.
|
| It kinda had to be this way, I think. There's a something from
| nothing problem. Douglas Adams brilliantly starts at this
| point.
|
| We don't understand something from nothing. We don't even have
| the language to describe it. Concepts like "complexity" are
| frustratingly resistant to formalization.
|
| "There is no free will." Has recently resurged as a
| philosophical position... like it did in response to Newton's
| mechanics.
|
| Matter from void. Life from minerals. Consciousness from
| electrons. Free will from deterministic components. Smart
| reasoning & intelligent rationalisation from dumb search,
| computation, DNNs and such.
|
| I don't think this debate was supposed to be ended by anything
| short of empirical demonstration.
|
| Endnote: deep blue's victory over Gary was a bunch of
| preprogrammed bulls--t. Rematch!
| creer wrote:
| If one of the lessons of LLMs was that much of "common sense"
| is covered in the written language corpus - that is, perhaps,
| basic human intelligence is covered by language.
|
| Then, should we expect less with mathematics where written
| language is the normal way knowledge is propagated, and where
| formal proofs are wanted? An important distinction here is the
| coupling of search (not LLM for this one), a formal math
| language, and theorem proving. Math intelligence may not be
| merely the math written corpus, but adding the formal language
| and theorem proving sounds pretty powerful.
|
| All this still lacks self-directed goals. An intention. For now
| that's taken care of by the human asking questions.
| imtringued wrote:
| >that is, perhaps, basic human intelligence is covered by
| language.
|
| It isn't. That's why LLMs are still terrible at basic
| reasoning tasks. The dataset doesn't contain the human
| intelligence, just the end results. Nobody is training their
| LLMs on brain scans.
| mikeknoop wrote:
| High efficiency "search" is necessary to reach AGI. For
| example, humans don't search millions of potentially answers to
| beat ARC Prize puzzles. Instead, humans use our core experience
| to shrink the search space "intuitively" and deterministically
| check only a handful of ideas. I think deep-learning guided
| search is an incredibly promising research direction.
| dmitrygr wrote:
| > First, the problems were manually translated into formal
| mathematical language
|
| That is more than half the work of solving them. Headline should
| read "AI solves the simple part of each IMA problem at silver
| medal level"
| jerb wrote:
| Is the score of 28 comparable to the score of 29 here?
| https://www.kaggle.com/competitions/ai-mathematical-olympiad...
| Davidzheng wrote:
| No. I would say it is more impressive than 50/50 there.
| (Source: I used to do math comps back in the day sorry it's not
| a great source)
| gus_massa wrote:
| IIUC the American Math Olympiad has 3 rounds. Wining the last
| one is almost a guaranty gold medal.
|
| The link you posted has problems with a dificulty between the
| first and second round that are much easier.
|
| I took a quik look at the recent list of problems in the first
| and second round. I expect this new AI to get a solid 50/50
| points in this test.
| machiaweliczny wrote:
| Good, now use DiffusER on algebra somehow please
| gowld wrote:
| Why is it so hard to make an AI that can translate an informally
| specified math problem (and Geometry isn't even so informal) into
| a formal representation?
| nybsjytm wrote:
| To what extent is the training and structure of AlphaProof
| tailored specifically to IMO-type problems, which typically have
| short solutions using combinations of a small handful of specific
| techniques?
|
| (It's not my main point, but it's always worth remembering - even
| aside from any AI context - that many top mathematicians can't do
| IMO-type problems, and many top IMO medalists turn out to be
| unable to solve actual problems in research mathematics. IMO
| problems are generally regarded as somewhat niche.)
| Davidzheng wrote:
| The last statement is largely correct (though idk what the imo
| medalists that are unable to solve actual problems most
| mathematicians can't solve most open problems). But i kind of
| disagree with the assessment of imo problems--the search space
| is huge if it were as you say it would be easy to search.
| nybsjytm wrote:
| No, I don't mean that the search space is small. I just mean
| that there are special techniques which are highly relevant
| for IMO-type problems. It'd be interesting to know how
| important that knowledge was for the design and training of
| AlphaProof.
|
| In other words, how does AlphaProof fare on mathematical
| problems which aren't in the IMO style? (As such exceptions
| comprise most mathematical problems)
| Davidzheng wrote:
| Probably less well? They rely heavily on the dataset of
| existing problems
| gowld wrote:
| This shows a major gap in AI.
|
| The proofs of these problems aren't interesting. They were
| already known before the AI started work.
|
| What's interesting is _how_ the AI found the proof. The only
| answer we have is "slurped data into a neural network, matched
| patterns, and did some brute search".
|
| What were the ideas it brainstormed? What were the dead-end
| paths? What were the "activations" where the problem seemed
| similar to a certain piece of input, which led to a guess of a
| step in the solution?
| atum47 wrote:
| Oh, the title was changed to international math Olympiad. I was
| reading IMO as in my opinion, haha
| thrance wrote:
| Theorem proving is a single-player game with an insanely big
| search space, I always thouht it would be solved long before AGI.
|
| IMHO, the largest contributors to AlphaProof were the people
| behind Lean and Mathlib, who took the daunting task of
| formalizing the _entirety_ of mathematics to themselves.
|
| This lack of formalizing in math papers was what killed any
| attempt at automation, because AI researcher had to wrestle with
| the human element of figuring out the author's own notations,
| implicit knowledge, skipped proof steps...
| camjw wrote:
| > Theorem proving is a single-player game with an insanely big
| search space, I always thouht it would be solved long before
| AGI.
|
| This seems so weird to me - AGI is undefined as a term imo but
| why would you expect "producing something generally
| intelligent" (i.e. median human level intelligence) to be
| significantly harder than "this thing is better than Terrence
| Tao at maths"?
| thrance wrote:
| My intuition tells me we humans are generally very bad at
| math. Proving a theorem, in an ideal way, mostly involves
| going from point A to point B in the space of all proofs,
| using previous results as stepping stones. This isn't
| particularly a "hard" problem for computers which are able to
| navigate search spaces for various games much more
| efficiently than us (chess, go...).
|
| On the other hand, navigating the real world mostly consists
| in employing a ton of heuristics we are still kind of
| clueless about.
|
| At the end of the day, we won't know before we get there, but
| I think my reasons are compelling enough to think what I
| think.
| tim-kt wrote:
| I don't think that computers have an advantage because they
| can navigate search spaces efficiently. The search space
| for difficult theorems is gigantic. Proving them often
| relies on a combination of experience with rigorous
| mathematics and very good intuition [1] as well as many,
| many steps. One example is the classification of all finite
| simple groups [2], which took about 200 years and a lot of
| small steps. I guess maybe brute forcing for 200 years with
| the technology available today might work. But I'm
| sceptical and sort of hope that I won't be out of a job in
| 10 years. I'm certainly curious about the current
| development.
|
| [1] https://terrytao.wordpress.com/career-advice/theres-
| more-to-...
|
| [2] https://en.m.wikipedia.org/wiki/Classification_of_finit
| e_sim...
| thrance wrote:
| Oh for sure, when I say "soon" it's only relative to AGI.
|
| What I meant to convey, is that theorem proving at least
| is a well-defined problem, and computers have had some
| successes in similar-ish search problems before.
|
| Also I don't think pure brute-force was ever used to
| solve any kind of interesting problem.
|
| Chess engines make use of alpha-beta pruning plus some
| empirical heuristics people came up with over the years.
| Go engines use Monte-Carlo Tree Search with straight deep
| learning models node evaluation. Theorem proving, when it
| is solved, will certainly use some kind of neural
| network.
| Davidzheng wrote:
| I also think humans are bad at math. And that we are
| probably better at IRL but maybe IRL has more data anyway
| raincole wrote:
| Because "Generally Intelligent" is a very broad and vague
| term.
|
| "Better than Terrence Tao at solving certain formalized
| problems" (not necessarily equal to "Better that Terrence Tao
| at maths) isn't.
| camjw wrote:
| Seems fair - I strongly think that "certain formalized
| problems" is very very far away from doing actual maths!
| And that actual maths is sometimes much less about solving
| known conjectures and much more about developing new
| theories.
| Davidzheng wrote:
| They didn't formalize the entirety of math. Good thing imo
| doesn't need the entirety. But they didn't even formalize
| enough for imo--this is probably why combo wasn't solved
| cynicalpeace wrote:
| Machines have been better than humans at chess for decades.
|
| Yet no one cares. Everyone's busy watching Magnus Carlsen.
|
| We are human. This means we care about what other humans do. We
| only care about machines insofar as it serves us.
|
| This principle is broadly extensible to work and art. Humans will
| always have a place in these realms as long as humans are around.
| camjw wrote:
| Sure but if an AI can prove e.g the Goldbach conjecture then
| that is a bfd.
| hyperbovine wrote:
| What if the proof were incomprehensible to humans?
| camjw wrote:
| I think that is unlikely to be the case - the classic
| example of a proof that human's "can't understand" is the
| Four Colour Theorem, but thats because the proof is a
| reduction to like 100000 special cases which are checked by
| computer.
|
| To what extent is the proof of Fermat's Last Theorem
| "incomprehensible to humans" because only like a dozen
| people on the planet could truly understand it - I don't
| know.
|
| The point of new proofs is really to learn new things about
| mathematics, and I'm sure we would learn something from a
| proof of Goldbach's conjecture.
|
| Finally if it's not peer reviewed then its not a real proof
| eh.
| steego wrote:
| It shouldn't count. We need to require it be able to ELI5
| the proof to Goldbach's conjecture to an actual class of
| _graduating_ kindergartners.
| fanatic2pope wrote:
| If it cannot explain how it was proven, was it actually
| proven?
| cynicalpeace wrote:
| No. Funny how these discussions too often devolve into
| semantics lol.
| cynicalpeace wrote:
| How is that a counterpoint?
| awahab92 wrote:
| magnus carlsen basically quit because computers ruined chess.
| As did kasparov.
|
| Fischer was probably the last great player who was unassisted
| by tools.
| hyperbovine wrote:
| ?? Carlsen is very much active -- look up the YouTube channel
| EpicChess for an extremely entertaining recap of what he's up
| to recently.
| karmakurtisaani wrote:
| Carlsen plays still at the highest level. He just didn't want
| to do the world championship anymore, wasn't worth the effort
| after winning it so many times.
| christianqchung wrote:
| > This principle is broadly extensible to work and art
|
| Nah, as a consumer it makes no difference to me if a meat
| packing factory or Amazon warehouse employs 5000 or 5 people.
| To art, this principle is totally real, but for work, it only
| applies to some/most of it.
| cynicalpeace wrote:
| Read the next sentence: "We only care about machines insofar
| as it serves us."
|
| Imagine a machine doing "work" that only serves itself and
| other machines that does no service to humanity. It would
| have no economic value. In fact the whole concept of "work"
| only makes sense if it is assigned economic value by humans.
| christianqchung wrote:
| Then we agree, but your chess example made it sound like if
| a machine could automatically pack meat with little human
| intervention, people wouldn't want it. That's also not the
| next sentence. How is it broadly applicable to work?
| smokel wrote:
| _> Everyone 's busy watching Magnus Carlsen._
|
| Actually, I was looking up Elo ratings of the top _computer_
| chess players, and learned that it is not that trivial to
| compare these, due to differences in hardware requirements and
| whatnot.
| cynicalpeace wrote:
| Are you arguing computer chess players are as popular as
| human chess players?
| zone411 wrote:
| I don't think this principle extends to math proofs. It's much,
| much easier to verify a proof than to create it, and a second
| proof will just be a footnote. Not many mathematicians will
| want to work on that. That said, there is a lot of distance
| between IMO and the frontiers of research math.
| cynicalpeace wrote:
| Why does it not extend to math proofs?
| ertgbnm wrote:
| I'm sure humans will always enjoy chess and art regardless of
| how much better AI is at it. In the same way, there will
| probably always be mathematic hobbyist who study math for fun.
| But I seriously doubt that in the near future there will be
| mathematicians who will be publishing new advancements that
| aren't mostly or entirely discovered by AI. A human might get
| credit for a proof for asking the initial question, but there
| is pretty much no world where a computer can easily solve a
| meaningful mathematical problem but we insist on a human solve
| it more slowly and expensively instead.
| cynicalpeace wrote:
| My point was where does the concept of "meaningful" come
| from?
|
| The proof will only have value if it's meaningful to us.
| dan_mctree wrote:
| I'm curious if we'll see a world where computers could solve math
| problems so easily, that we'll be overwhelmed by all the results
| and stop caring. The role of humans might change to asking the
| computer interesting questions that we care about.
| Davidzheng wrote:
| I think mathematicians will still care
| klysm wrote:
| I'm not sure what stop caring really means - like stop caring
| about the result, or the implications?
| xyst wrote:
| Billions of dollars spent building this, gW of energy used to
| train it. And the best it could do is "silver"?
|
| Got to be kidding me. We are fucked
| NoblePublius wrote:
| A lot of words to say second place
| stonethrowaway wrote:
| It's like bringing a rocket launcher to a fist fight but I'd like
| to use these math language models to find gaps in logic when
| people are making online arguments. It would be an excellent way
| to verify who has done their homework.
| zhiQ wrote:
| Coincidentally, I just posted about how well LLMs handle adding
| long strings of numbers:
| https://userfriendly.substack.com/p/discover-how-mistral-lar...
| hulitu wrote:
| > AI solves International Math Olympiad problems at silver medal
| level
|
| > In the official competition, students submit answers in two
| sessions of 4.5 hours each. Our systems solved one problem within
| minutes and took up to three days to solve the others.
|
| Why not compare with students who are given 3 days to submit an
| answer ? /s
| lumb63 wrote:
| Can someone explain why proving and math problem solving is not a
| far easier problem for computers? Why does it require any
| "artificial intelligence" at all?
|
| For example, suppose a computer is asked to prove the sum of two
| even numbers is an even number. It could pull up its list of
| "things it knows about even numbers", namely that an even number
| modulo 2 is 0. Assuming the first number is "a" and the second is
| "b", then it knows a=2x and b=2y for some x and y. It then knows
| via the distributive property that the sum is 2(x+y), which
| satisfies the definition of an even number.
|
| What am I missing that makes this problem so much harder than
| applying a finite and known set of axioms and manipulations?
| zone411 wrote:
| The problems in question require much, much more complex
| proofs. Try example IMO problems yourself and see if they don't
| require much intelligence: https://artofproblemsolving.com/wiki
| /index.php/IMO_Problems_.... And then keep in mind that
| research math is orders of magnitude more complex still.
| runeblaze wrote:
| Another answer is that 3SAT and co can be seen as distilled
| variants of proving statements. Well, 3SAT is famously hard.
| psb217 wrote:
| In a sense, the model _is_ simply applying a finite and known
| set of axioms and manipulations. What makes this hard in
| practice is that the number of possible ways in which to
| perform multiple steps of this sort of axiomatic reasoning
| grows exponentially with the length of the shortest possible
| solution for a given problem. This is similar to the way in
| which the tree of possible futures in games like go/chess grows
| exponentially as one tries to plan further into the future.
|
| This makes it natural address these problems using similar
| techniques, which is what this research team did. The "magic"
| in their solution is the use of neural nets to make good
| guesses about which branches of these massive search trees to
| explore, and make good guesses about how good any particular
| branch is even before they reach the end of the branch. These
| tricks let them (massively) reduce the effective branching
| factor and depth of the search trees required to produce
| solutions to math problems or win board games.
| ComplexSystems wrote:
| What you're missing is that this kind of thing has arbitrarily
| been declared "artificial intelligence" territory. Once the
| ability of computers to do has been established, it will no
| longer be artificial intelligence territory; at that point
| it'll just be another algorithm.
| booleandilemma wrote:
| Proofs require a certain ingenuity that computers just don't
| have, imo. A computer would never be able to come up with
| something like Cantor's diagonalization proof on its own.
| Davidzheng wrote:
| Are you sure alphaproof can't
| zone411 wrote:
| The best discussion is here:
| https://leanprover.zulipchat.com/#narrow/stream/219941-Machi...
| data_maan wrote:
| It's bullshit. AlphaGeometry can't even solve Pythagoras theorem.
|
| Not opensourcing anything.
|
| This is a dead end on which no further research can be built.
|
| It violates pretty much every principle of incremental
| improvement on which science is based. It's here just for hype,
| and the 300+ comments prove it.
| necovek wrote:
| This is certainly impressive, but whenever IMO is brought up, a
| caveat should be put out: medals are awarded to 50% of the
| participants (high school students), with 1:2:3 ratio between
| gold, silver and bronze. That puts all gold and silver medalists
| among the top 25% of the participants.
|
| That means that "AI solves IMO problems better than 75% of the
| students", which is probably even more impressive.
|
| But, "minutes for one problem and up to 3 days for each remaining
| problem" means that this is unfortunately not a true
| representation either. If these students were given up to 15 days
| (5 problems at "up to 3 days each") instead of 9h, there would
| probably be more of them that match or beat this score too.
|
| It really sounds like AI solved only a single problem in the 9h
| students get, so it certainly would not be even close to the
| medals. What's the need to taint the impressive result with
| apples-to-oranges comparison?
|
| Why not be more objective and report that it took longer but was
| able to solve X% of problems (or scored X out of N points)?
| muglug wrote:
| > What's the need to taint the impressive result with apples-
| to-oranges comparison?
|
| Most of DeepMind's research is a cost-centre for the company.
| These press releases help justify the continued investment both
| to investors and to the wider public.
| Davidzheng wrote:
| In my opinion (not Google s) the only reason they didn't get
| gold this year (apart from being unlucky on problem selection)
| is that they didn't want to try for any partial credit in P3
| and P5. They are so close to the cut off and usually
| contestants with a little bit of progress can get 1 point. But
| i guess they didn't want to get a gold on a technicality--it
| would be bad press. So they settled in a indisputable silver
| tardygrade wrote:
| One key difference between giving humans more time and giving
| computer programs more time is that historically we have had
| more success making the latter run faster than the former.
| acchow wrote:
| But computers get faster each year, so even with zero progress
| in actual AI, this will reach human-student speeds in a few
| years (need a 40x speed up)
| dinobones wrote:
| I see DeepMind is still playing around with RL + search
| algorithms, except now it looks like they're using an LLM to
| generate state candidates.
|
| I don't really find that this impressive. With enough compute you
| could just do n-of-10,000 LLM generations to "brute force" a
| difficult problem and you'll get there eventually.
| richard___ wrote:
| Sigh. Just wrong
___________________________________________________________________
(page generated 2024-07-25 23:00 UTC)