[HN Gopher] AlphaGeometry: An Olympiad-level AI system for geometry
___________________________________________________________________
AlphaGeometry: An Olympiad-level AI system for geometry
Author : FlawedReformer
Score : 303 points
Date : 2024-01-17 16:22 UTC (6 hours ago)
(HTM) web link (deepmind.google)
(TXT) w3m dump (deepmind.google)
| marojejian wrote:
| AlphaGeometry paper:
| https://www.nature.com/articles/s41586-023-06747-5
| marojejian wrote:
| Though this particular model doesn't sound generalizable, the
| neuro-symbolic approach seems very promising to me:
|
| - linking the (increasingly powerful) "system 1" tools that are
| most of current ML with more structured "system 2" tools, like
| logical proof generation, which can plan and and check the
| veracity / value of output. - System 2 chugs along 'till it gets
| stuck, then system 1 jumps in to provide an intuitive guess on
| what part of state space to check next. - Here they leveraged the
| ability to generate proofs by computer to create a data set of
| 100m proofs, enabling scalable self-supervised learning. Seem to
| me the symbolic domains are well formed to allow such generation
| of data, which while low value in each instance, might allow
| valuable pre-training in aggregate.
|
| Putting these elements together is an approach that could get us
| quite far.
|
| The key milestone will be moving away from the need to use
| specific formal / symbolic domains, and to generate a pretrained
| system that can generalize the skills learned from those domains.
| gizmo686 wrote:
| > The key milestone will be moving away from the need to use
| specific formal / symbolic domains, and to generate a
| pretrained system that can generalize the skills learned from
| those domains.
|
| You do not need to solve everything at once. This approach has
| the potential to revolution both math and programming by moving
| formal verification from being a niche tool into a regular part
| of every practitioners toolbox.
|
| It also completely solves (within the domain it applies) one of
| the most fundamental problems of AI that the current round is
| calling "hallucinations"; however that solution only works
| because we have a non AI system to prove correctness.
|
| At a high level, this approach is not really that new. Biochem
| has been using AI to help find candidate molecules, which are
| then verified by physical experimentation.
|
| Combinatorical game AI has been using the AI as an input to old
| fasion monte carlo searches
| jvanderbot wrote:
| Moreover, it's not so recently that people began training
| networks to help make guesses for branch and bound type
| solvers.
| initplus wrote:
| >however that solution only works because we have a non AI
| system to prove correctness
|
| But this is actually a really common scenario. Checking a
| solution for correctness is often much easier than actually
| finding a correct solution in the first place.
| empath-nirvana wrote:
| > however that solution only works because we have a non AI
| system to prove correctness.
|
| I'm not sure why you are calling it non-AI. There's no reason
| why some AI system has to be some single neural network like
| GPT and not a hacked together conglomeration of a bunch of
| neural networks and symbolic logic systems. Like is it
| cheating to use a SAT solver? Is a SAT solver itself not
| artificial intelligence of a kind?
| sega_sai wrote:
| Just in case the DOI link from the press-release doesn't work for
| you -- https://www.nature.com/articles/s41586-023-06747-5 here's
| the paper.
| apsec112 wrote:
| Links to the paper appear broken, but here is a link:
|
| https://www.nature.com/articles/s41586-023-06747-5
| apsec112 wrote:
| Interesting that the transformer used is tiny. From the paper:
|
| "We use the Meliad library for transformer training with its base
| settings. The transformer has 12 layers, embedding dimension of
| 1,024, eight heads of attention and an inter-attention dense
| layer of dimension 4,096 with ReLU activation. Overall, the
| transformer has 151 million parameters, excluding embedding
| layers at its input and output heads. Our customized tokenizer is
| trained with 'word' mode using SentencePiece and has a vocabulary
| size of 757. We limit the maximum context length to 1,024 tokens
| and use T5-style relative position embedding. Sequence packing is
| also used because more than 90% of our sequences are under 200 in
| length."
| ipnon wrote:
| This suggests there may be some more low hanging fruit in the
| hard sciences for transformers to bite at, so long as they can
| be properly formulated. This was not a problem of scaling it
| seems.
| albertzeyer wrote:
| It's not tiny, this is a quite normal size outside the field of
| LLMs, e.g. normal-sized language models, or also translation
| models, or acoustic models. Some people even would call this
| large.
| apsec112 wrote:
| It's tiny by the standards of transformers, pretty sure most
| transformers trained (across all domains) are larger than
| this
| albertzeyer wrote:
| No. Where do you have this from?
|
| Looking at NeurIPS 2023:
|
| https://openreview.net/group?id=NeurIPS.cc/2023/Conference#
| t...
|
| Some random spotlight papers:
|
| - https://openreview.net/pdf?id=YkBDJWerKg: Transformer
| (VPT) with 248M parameters
|
| - https://openreview.net/pdf?id=CAF4CnUblx: Vit-B/16 with
| 86M parameters
|
| - https://openreview.net/pdf?id=3PjCt4kmRx: Transformer
| with 282M parameters
|
| Also, in my field (speech recognition, machine translation,
| language modeling), all using Transformer variants, this is
| a pretty normal model size.
| kristjansson wrote:
| It's tiny by the standards of LLMs; _L_LM might give you
| some indication as to where they fit in the overall
| landscape.
| hcarlens wrote:
| Also relevant: https://aimoprize.com/
|
| ($10m prize for models that can perform well at IMO)
| ChatGTP wrote:
| "Our AI system"
|
| It's our maths. No one else owns it?
|
| Funny game the AI field is playing now. One-upmanship seems to be
| the aim of the game.
| drdeca wrote:
| The AI system that they produced.
|
| "Our" doesn't always imply ownership/title. I do not own my
| dad.
| empath-nirvana wrote:
| "What the I.M.O. is testing is very different from what creative
| mathematics looks like for the vast majority of mathematicians,"
| he said. ---
|
| Not to pick on this guy, but this is ridiculous goal post
| shifting. It's just astounding what people will hand-wave away as
| not requiring intelligence.
| mb7733 wrote:
| No, that sort of thing has been said about math competitions
| for a long time. It's not a new argument put forward as
| something against AI.
|
| An analogy with software is that math competitions are like
| (very hard) leetcode.
|
| There was an article posted on HN recently that is related:
| https://benexdict.io/p/math-team
| esafak wrote:
| What is the argument; that math competitions are easy for
| computers but hard for humans?
| nitwit005 wrote:
| It's a quote from the article. The argument is naturally
| there.
| esafak wrote:
| I was referring to your leetcode analogy; those too are
| hard for humans.
| silveraxe93 wrote:
| The usual argument is:
|
| We test developer skill by giving them leetcode problems,
| but leetcode while requiring programming skill is nothing
| like a real programmer's job.
| mb7733 wrote:
| Like I said, it's got nothing to do with computers or AI.
| This point of view predates any kind of AI that would be
| capable of doing either.
|
| The similarly with leetcode & job interviews is as follows:
| to excel at math competitions one must grind problems and
| learn tricks, in order to quickly solve problems in a high
| pressure environment.
|
| And, just like how solving leetcode problems is pretty
| different than what a typical computer scientist or
| software engineer does, doing math competitions is pretty
| different than what a typical mathematician does.
| drawnwren wrote:
| Yeah, this would be akin to saying "What leetcode is testing is
| very different from what professional programming looks like".
|
| It isn't untrue, but both are considered metrics by the
| community. (Whether rightfully or not seems irrelevant to ML).
| adverbly wrote:
| I guess it kind of depends on what a goal post should
| represent. If it represents an incremental goal then obviously
| you are correct.
|
| I think the ultimate goal though is clearly that AI systems
| will be able to generate new high-value knowledge and proofs. I
| think the realistically the final goal post has always been at
| that point.
| GPerson wrote:
| I'm not sure if it's goal post shifting or not, but it is a
| true statement.
| nybsjytm wrote:
| As said by others, that is an absolutely commonplace saying,
| albeit usually having nothing whatsoever to do with AI. See for
| instance https://terrytao.wordpress.com/career-advice/advice-
| on-mathe...
|
| Also if you read about the structure of AlphaGeometry, I think
| it's very hard to maintain that it "requires intelligence." As
| AI stuff goes, it's pretty comprehensible and plain.
| 1024core wrote:
| From the Nature article:
|
| > Note that the performance of GPT-4 performance on IMO problems
| can also be contaminated by public solutions in its training
| data.
|
| Doesn't anybody proofread in Nature?
| chefandy wrote:
| Not a subject matter expert, so forgive me if the question is
| unintentionally obtuse, but It seems like a reasonable
| statement. They seem to be inferring that problems with
| existing public solutions wouldn't be a good indicator of
| performance in solving novel problems-- likely a more important
| evaluative measure than how fast it can replicate an answer
| it's already seen. Since you couldn't know if that solution was
| in its training data, you couldn't know if you were doing that
| or doing a more organic series of problem solving steps,
| therefore contaminating it. What's the problem with saying
| that?
| mlyle wrote:
| They're referring to "performance of performance".
|
| Not that it's a big deal. I notice problems like this slip
| into my writing more and more without detection as I get
| older :/
|
| I don't see the sentence in the Nature paper, though.
| chefandy wrote:
| Ah-- I've got super bad ADHD so I usually don't even catch
| things like that. I'm a terrible editor. I wouldn't be
| surprised if the paper authors were paying attention to
| this thread, and gave a call to Nature after seeing the
| parent comment.
| 1024core wrote:
| The text is still there.
|
| Paper: https://www.nature.com/articles/s41586-023-06747-5
|
| Section title: Geometry theorem prover baselines
|
| Second paragraph under that. 3rd sentence from the end of
| the para.
| 1024core wrote:
| Paper: https://www.nature.com/articles/s41586-023-06747-5
|
| Section title: Geometry theorem prover baselines
|
| Second paragraph under that. 3rd sentence from the end of
| the para.
| dr_kiszonka wrote:
| Weird. I can't find the sentence you are quoting in the paper.
| 1024core wrote:
| Paper: https://www.nature.com/articles/s41586-023-06747-5
|
| Section title: Geometry theorem prover baselines
|
| Second paragraph under that. 3rd sentence from the end of the
| para.
| gumballindie wrote:
| I suppose we can now replace high schoolers.
| dandanua wrote:
| and their teachers
| m3kw9 wrote:
| So create ai modules for every niche subject and have a OpenAI
| like function calling system to inference them?
| kfarr wrote:
| Yes but I would argue geometry is not niche it is the
| foundation of reasoning about the physical world not just math
| problems
| Imnimo wrote:
| I'm very curious how often the LM produces a helpful
| construction. Surely it must be doing better than random chance,
| but is it throwing out thousands of constructions before it finds
| a good one, or is it able to generate useful proposals at a rate
| similar to human experts?
|
| They say in the paper, "Because the language model decoding
| process returns k different sequences describing k alternative
| auxiliary constructions, we perform a beam search over these k
| options, using the score of each beam as its value function. This
| set-up is highly parallelizable across beams, allowing
| substantial speed-up when there are parallel computational
| resources. In our experiments, we use a beam size of k = 512, the
| maximum number of iterations is 16 and the branching factor for
| each node, that is, the decoding batch size, is 32."
|
| But I don't totally understand how 512 and 16 translate into
| total number of constructions proposed. They also note that
| ablating beam size and max iterations seems to only somewhat
| degrade performance. Does this imply that the model is actually
| pretty good at putting helpful constructions near the top, and
| only for the hardest problems does it need to produce thousands?
| kingkongjaffa wrote:
| thanks TIL what Beam search is
| https://en.wikipedia.org/wiki/Beam_search
| refulgentis wrote:
| IMHO: this bumps, hard, against limitations of language /
| human-machine analogies.
|
| But let's try -- TL;DR 262,144, but don't take it literally:
|
| - The output of a decoding function is a token. ~3/4 of a word.
| Let's just say 1 word.
|
| - _Tokens considered per token output = 262,144_ Total number
| of token considerations for 1 output token = beam_size *
| branching_factor * max_iterations = 512 * 32 * 16 = 262,144.
|
| - Let's take their sample solution and get a word count.
| https://storage.googleapis.com/deepmind-media/DeepMind.com/B...
|
| - _Total tokens for solution = 2289_
|
| - _Total # of tokens considered = 600,047,616_ = 262,144 * 2289
|
| - Hack: ""number of solutions considered"" = total tokens
| considered / total tokens in solution
|
| - _262,144_ (same # as number of tokens we viewed at each
| iteration step, which makes sense)
| riku_iki wrote:
| My understanding is that they encoded domain in several dozens of
| mechanical rules described in extended data table 1, and then did
| transformer-guided brute force search for solutions.
| dang wrote:
| See also https://www.nytimes.com/2024/01/17/science/ai-computers-
| math...
|
| (via https://news.ycombinator.com/item?id=39030186, but we'll
| merge that thread hither)
| empath-nirvana wrote:
| https://archive.is/https://www.nytimes.com/2024/01/17/scienc...
| neonate wrote:
| https://web.archive.org/web/20240117162133/https://www.nytim.
| ..
| TFortunato wrote:
| I'm a big fan of approaches like this, that combine deep learning
| / newer techniques w/ "GOFAI" approaches.
| summerlight wrote:
| It looks like there's some interesting works to connect ML with
| symbolic reasoning (or searching). I'm closer to layman in this
| area but IIUC the latter is known to be rife with yet-to-be-
| understood heuristics to prune out the solution space and ML
| models are pretty good at this area. I'm not in a position to
| suggest what needs to happen to further push the boundary, but in
| my impression it looks like the current significant blocker is
| that we don't really have a way to construct a self-feedback loop
| structure that consistently iterates and improves the model from
| its own output. If this can be done properly, we may see
| something incredible in a few years.
| rafaelero wrote:
| I didn't look further into their method, but it seems to me
| that symbolic reasoning is only important insofar as it makes
| the solution verifiable. That still is a glorious capability,
| but a very narrow one.
| zodiac wrote:
| The real TIL (to me) is that the previous state-of-the-art could
| solve 10 of these! I'd heard there was a decision algorithm for
| plane geometry problems but I didn't know it was a practical one.
| Some searching turned up
| http://www.mmrc.iss.ac.cn/~xgao/paper/book-area.pdf as a
| reference.
| WhitneyLand wrote:
| It was interesting. One of the reviewers noted that one of the
| generated proofs didn't seem that elegant in his opinion.
|
| Given enough compute, I wonder how much this would be improved by
| having it find as many solutions as possible within the allotted
| time and proposing the simplest one.
| spookie wrote:
| Elegance would require more than this. Reasoning, for one.
| jacobolus wrote:
| The issue is that the computer-generated proofs operate at a
| very low level, step by step, like writing a program in
| assembly language without the use of coherent structure.
|
| The human proof style instead chunks the parts of a solution
| into human-meaningful "lemmas" (helper theorems) and builds
| bodies of theory into well-defined and widely used abstractions
| like complex numbers or derivatives, with a corpus of accepted
| results.
|
| Some human proofs of theorems also have a bit of this flavor of
| inscrutable lists of highly technical steps, especially the
| first time something is proven. Over time, the most important
| theorems are often recast in terms of a more suitable grounding
| theory, in which they can be proven with a few obvious
| statements or sometimes a clever trick or two.
| ackfoobar wrote:
| Euclidean geometry is decidable. Does that make it easier for
| computers, compared to other IMO problems?
| treprinum wrote:
| What happens to science when the most talented kids won't be able
| to compete in ~2 years tops? Would our civilization reach plateau
| or start downward trajectory as there will be no incentive to
| torture oneself to become the best in the world? Will it all fade
| away like chess once computers started beating grandmasters?
| Agingcoder wrote:
| I never studied math to become the best in the world !
| treprinum wrote:
| Are you an Olympiad Gold Medalist? Did you move the field of
| math significantly (beyond a basic PhD)?
| ironlake wrote:
| Chess is immensely more popular since Deep Blue beat Kasparov.
| jstummbillig wrote:
| True, but the fun is currently derived from humans going up
| against humans as a sport. Machines are tools (for learning
| or cheating) but we are not interested in competition with
| them.
|
| How will it work for math, where humans do useful work right
| now? I can not see battle math becoming a big thing, but
| maybe when some of the tedious stuff goes away math will be
| philosophy and poking the smarter system is where
| entertainment can be had?
| EvgeniyZh wrote:
| I don't think chess faded away
| jeanloolz wrote:
| Chess never faded away. It actually has never been as big as it
| is today.
| treprinum wrote:
| That might be true in the number of active chess players, but
| it's no longer viewed as a peak intellectual game and the
| magic is long gone, just another intense hobby some people
| do, but basic machines can do better.
| westcoast49 wrote:
| It'a still viewed as a peak intellectual game. And the
| magic is still there, just watch or listen to the fans of
| Magnus Carlsen when they're viewing, discussing or
| analyzing his games.
| lacker wrote:
| The incentive to compete in the IMO is that it's fun to do math
| contests, it's fun to win math contests, and (if you think that
| far ahead as a high schooler) it looks good on your resume.
| None of that incentive will go away if the computers get better
| at math contests.
| treprinum wrote:
| I'd be super demoralized if anything I could do a future
| pocket machine could do much better and faster. Like my very
| best is not enough to even tread water.
| jhbadger wrote:
| People still compete in foot races even though cars can go
| faster. People still play chess and Go even though computers
| can beat them.
| sdenton4 wrote:
| We just need a Netflix series about a geometry prodigy with
| excellent fashion sense.
| hiAndrewQuinn wrote:
| I've accepted that the human form, with its ~3 pounds of not
| especially optimized brainpower, is basically going to be
| relegated to the same status as demoscene hardware for anything
| that matters after this century.
|
| That's cool by me, though. This bit of demoscene hardware
| experiences qualia, and that combination is weird and cool
| enough to make me want to push myself in new and weird
| directions. That's what play is in a way.
| nybsjytm wrote:
| If I read their paper right, this is legit work (much more legit
| than DeepMind's AI math paper last month falsely advertised as
| solving an open math research problem) but it's still pretty
| striking how far away the structure of it is from the usual idea
| of automated reasoning/intelligence.
|
| A transformer is trained on millions of elementary geometry
| theorems and used as brute search for a proof, which because of
| the elementary geometry context has both a necessarily elementary
| structure and can be easily symbolically judged as true or false.
| When the brute search fails, an extra geometric construction is
| randomly added (like adding a midpoint of a certain line) to see
| if brute search using that extra raw material might work. [edit:
| as corrected by Imnimo, I got this backwards - the brute search
| is just pure brute search, the transformer is used to predict
| which extra geometric construction to add]
|
| Also (not mentioned in the blog post) the actual problem
| statements had to be modified/adapted, e.g. the actual problem
| statement "Let AH1, BH2 and CH3 be the altitudes of a triangle
| ABC. The incircle W of triangle ABC touches the sides BC, CA and
| AB at T1, T2 and T3, respectively. Consider the symmetric images
| of the lines H1H2, H2H3, and H3H1 with respect to the lines T1T2,
| T2T3, and T3T1. Prove that these images form a triangle whose
| vertices lie on W." had to be changed to "Let ABC be a triangle.
| Define point I such that AI is the bisector of angle BAC and CI
| is the bisector of angle ACB. Define point T1 as the foot of I on
| line BC. Define T2 as the foot of I on line AC. Define point T3
| as the foot of I on line AB. Define point H1 as the foot of A on
| line BC. Define point H2 as the foot of B on line AC. Define
| point H3 as the foot of C on line AB. Define point X1 as the
| intersection of circles (T1,H1) and (T2,H1). Define point X2 as
| the intersection of circles (T1,H2) and (T2,H2). Define point Y2
| as the intersection of circles (T2,H2) and (T3,H2). Define point
| Y3 as the intersection of circles (T2,H3) and (T3,H3). Define
| point Z as the intersection of lines X1X2 and Y2Y3. Prove that
| T1I=IZ."
| Imnimo wrote:
| >A transformer is trained on millions of elementary geometry
| theorems and used as brute search for a proof, which because of
| the elementary geometry context has both a necessarily
| elementary structure and can be easily symbolically judged as
| true or false. When the brute search fails, an extra geometric
| construction is randomly added (like adding a midpoint of a
| certain line) to see if brute search using that extra raw
| material might work.
|
| I don't think this is quite right. The brute force search is
| performed by a symbolic solver, not the transformer. When it
| runs out of new deductions, the transformer is asked to suggest
| possible extra constructions (not randomly added).
| nybsjytm wrote:
| Thanks for the correction, it looks like you're right.
| killerstorm wrote:
| > it's still pretty striking how far away the structure of it
| is from the usual idea of automated reasoning/intelligence.
|
| How so? Reasoning is fundamentally a search problem.
|
| The process you described is exactly the process humans use:
| i.e. make a guess about what's useful, try to work out details
| mechanically. If get stuck, make another guess, etc. So the
| process is like searching through a tree.
|
| People figured out this process back in 1955 (and made a
| working prototype which can prove theorems):
| https://en.wikipedia.org/wiki/Logic_Theorist but it all hinges
| on using good 'heuristics'. Neural networks are relevant here
| as they can extract heuristics from data.
|
| What do you think is "the usual idea of automated reasoning"?
| Some magic device which can solve any problem using a single
| linear pass?
| somewhereoutth wrote:
| > Some magic device which can solve any problem using a
| single linear pass?
|
| You mean like a calculator?
| gowld wrote:
| A calculator that can solve any problem human could, and
| more? Sure.
| jhrmnn wrote:
| I think the intuition here is that an "intelligent" system
| would lean much more heavily on the heuristics than on the
| search, like humans. It's hard to quantify in this case, but
| certainly in the case of chess, when engines were about as
| good as best human players, they were doing orders of
| magnitude larger search than humans. Which made them feel
| more like chess _engines_ than chess AI. AlphaZero certainly
| made a step in the right direction, but it's still far from
| how humans play.
| killerstorm wrote:
| The comment above mentions "brute force" incorrectly. It's
| fundamentally impossible to brute force googol
| combinations...
| amelius wrote:
| Well, what's different is that humans invent new abstractions
| along the way such as complex numbers and Fourier transforms.
| gowld wrote:
| A neural network is nothing but a heap of new abstractions
| from data.
| MrPatan wrote:
| Not a lot of humans do
| amelius wrote:
| When a human has done the same thing many times they tend
| to try to generalize and take shortcuts. And make tools.
| Perhaps I missed something but I haven't seen a neural
| net do that.
| littlestymaar wrote:
| True, but we're talking about Olympiad level math skills
| here.
| bnprks wrote:
| I appreciate that the authors released code and weights with
| their paper! This is the first high-profile DeepMind paper I can
| recall that has runnable inference code + checkpoints released.
| (Though I'm happy to be corrected by earlier examples I've
| missed)
|
| I don't yet see a public copy of the training set / example
| training code, but still this is a good step towards providing
| something other researchers can build on -- which is after all
| the whole point of academic papers!
| pk-protect-ai wrote:
| Yeap. I'm missing the datasets as well. They have generated
| 100M synthetic examples ... Were these examples generated with
| AlphaGeometry? Where is the filtering code and initial input to
| generate these synthetics?
|
| Im I'm wrong that they are using t5 model? At least they are
| using the sentencepiece t5 vocabulary.
|
| How many GPU hours have they spend training this model? Which
| training parameters were used?
|
| Don't get me wrong, I find this system fascinating it is what
| applied engineering should look like. But I'd like to know more
| about the training details and the initial data they have used
| as well as the methods of synthetic data generation.
| qwertox wrote:
| Google DeepMind keeps publishing these things while having missed
| the AI-for-the-people train, that I'm getting more and more the
| feeling that they have such a huge arsenal of AI-tech piling up
| in their secret rooms, as if they're already simulating entire
| AI-based societies.
|
| As if they're stuck in a loop of "let's improve this a bit more
| until it is even better", while AGI is already a solved problem
| for them.
|
| Or they're just an uncoordinated, chaotic bunch of AI teams
| without a leader who unifies them. With leader I don't mean Demis
| Hassabis, but rather someone like Sundar Pichai.
| nybsjytm wrote:
| This is at almost the polar opposite end of the spectrum from
| "AGI," it's centered on brute search.
| danielmarkbruce wrote:
| They are not polar opposites.
| achierius wrote:
| I think Occam's razor would have something to say about the
| relative likelihood of those two options.
| senseiV wrote:
| > simulating entire AI-based societies.
|
| Didnt they already have scaled down simulations of this?
| sidcool wrote:
| And it's been open sourced. The code and model both!
| zbyforgotp wrote:
| This sounds like the famous system 1 and system 2 thinking with
| the prover doing the systematic system 2 thinking and the llm
| doing the intuitive system 1 thinking.
| artninja1988 wrote:
| >To train AlphaGeometry's language model, the researchers had to
| create their own training data to compensate for the scarcity of
| existing geometric data. They generated nearly half a billion
| random geometric diagrams and fed them to the symbolic engine.
| This engine analyzed each diagram and produced statements about
| their properties. These statements were organized into 100
| million synthetic proofs to train the language model.
|
| With all the bickering about copyright, could something similar
| be used for coding llms? Would kill the ip issues, at least for
| coding
| nodogoto wrote:
| What statements about properties of randomly generated code
| snippets would be useful for coding LLMs? You would need to
| generate text explaining what each snippet does, but that would
| require an existing coding LLM, so any IP concerns would
| persist.
| artninja1988 wrote:
| Yeah, but couldn't some system be built that understands what
| the differnt code snippets do by compiling them or whatever?
| sangnoir wrote:
| For language, the symbolic engine itself would likely be
| trained on copyrighted input, unlike the geometry engine, since
| math & math facts are not covered by copyright.
|
| If you couple random text genrsrion and such an engine for
| language, you'd be laundering your training using the extra
| step (and the quality will likely be worse due to
| multiplicative errors)
| ipsum2 wrote:
| Do normal proofs for Olympiad level geometry questions require
| 150+ steps like the one that was generated here? It seems
| inelegant.
| lacker wrote:
| When a human solves these problems, you might state a "step" in
| natural language. Like, "apply a polar transformation to this
| diagram". But that same single step, if you translate into a
| low-level DSL, could be dozens of different steps.
|
| For an example, check out this formalization of a solution to a
| non-geometry IMO problem:
|
| https://gist.github.com/mlyean/b4cc46cf6b0705c1226511a2b404d...
|
| Roughly the same solution is written here in human language:
|
| https://artofproblemsolving.com/wiki/index.php/1972_IMO_Prob...
|
| It's not precisely clear what counts as a "step" but to me this
| seems like roughly 50 steps in the low-level formalized proof,
| and 5 steps in the human-language proof.
|
| So, my conclusion is just that I wouldn't necessarily say that
| a 150-step automated proof was inelegant. It can also be that
| the language used to express these proofs fills in more of the
| details.
|
| It's like the difference between counting lines of code in
| compiled code and a high-level language, counting the steps is
| really dependent on the level at which you express it.
| FranchuFranchu wrote:
| As soon as ChatGPT got released, I tried to make it solve IMO-
| style problems. It failed.
|
| If this is legit, this is huge. Finding geometric proofs means
| proving things, and proving propositions is what an intelligence
| does. In my opinion, this is the closest we have gotten so far to
| AGI. Really excited to see what the future holds.
| 1024core wrote:
| From the paper:
|
| > When producing full natural-language proofs on IMO-AG-30,
| however, GPT-4 has a success rate of 0% ...
| gowld wrote:
| Proving propositions has been computer work for several
| decades.
|
| This is not a pure LLM. Search algorithms are obviously
| incredibly powerful at solving search problems. The LLM
| contribution is an "intuitive" way to optimize the search.
| compthink wrote:
| Very interesting. I was suspicious they could get an LLM to do
| this alone, and reading the description they don't. It's still AI
| I would say, maybe even more so since it alternates between a
| generative machine learning model to come up with hypothesis
| points. But then it uses a optimizer to solve for those pointa
| and simplify the problem. So it really is a mathematical
| reasoning system. Also good to note that is tuned specificially
| to these types of geometry problems.
| wigster wrote:
| the hounds of tindalos will be sniffing around this
___________________________________________________________________
(page generated 2024-01-17 23:00 UTC)