[HN Gopher] AlphaGeometry: An Olympiad-level AI system for geometry
       ___________________________________________________________________
        
       AlphaGeometry: An Olympiad-level AI system for geometry
        
       Author : FlawedReformer
       Score  : 303 points
       Date   : 2024-01-17 16:22 UTC (6 hours ago)
        
 (HTM) web link (deepmind.google)
 (TXT) w3m dump (deepmind.google)
        
       | marojejian wrote:
       | AlphaGeometry paper:
       | https://www.nature.com/articles/s41586-023-06747-5
        
       | marojejian wrote:
       | Though this particular model doesn't sound generalizable, the
       | neuro-symbolic approach seems very promising to me:
       | 
       | - linking the (increasingly powerful) "system 1" tools that are
       | most of current ML with more structured "system 2" tools, like
       | logical proof generation, which can plan and and check the
       | veracity / value of output. - System 2 chugs along 'till it gets
       | stuck, then system 1 jumps in to provide an intuitive guess on
       | what part of state space to check next. - Here they leveraged the
       | ability to generate proofs by computer to create a data set of
       | 100m proofs, enabling scalable self-supervised learning. Seem to
       | me the symbolic domains are well formed to allow such generation
       | of data, which while low value in each instance, might allow
       | valuable pre-training in aggregate.
       | 
       | Putting these elements together is an approach that could get us
       | quite far.
       | 
       | The key milestone will be moving away from the need to use
       | specific formal / symbolic domains, and to generate a pretrained
       | system that can generalize the skills learned from those domains.
        
         | gizmo686 wrote:
         | > The key milestone will be moving away from the need to use
         | specific formal / symbolic domains, and to generate a
         | pretrained system that can generalize the skills learned from
         | those domains.
         | 
         | You do not need to solve everything at once. This approach has
         | the potential to revolution both math and programming by moving
         | formal verification from being a niche tool into a regular part
         | of every practitioners toolbox.
         | 
         | It also completely solves (within the domain it applies) one of
         | the most fundamental problems of AI that the current round is
         | calling "hallucinations"; however that solution only works
         | because we have a non AI system to prove correctness.
         | 
         | At a high level, this approach is not really that new. Biochem
         | has been using AI to help find candidate molecules, which are
         | then verified by physical experimentation.
         | 
         | Combinatorical game AI has been using the AI as an input to old
         | fasion monte carlo searches
        
           | jvanderbot wrote:
           | Moreover, it's not so recently that people began training
           | networks to help make guesses for branch and bound type
           | solvers.
        
           | initplus wrote:
           | >however that solution only works because we have a non AI
           | system to prove correctness
           | 
           | But this is actually a really common scenario. Checking a
           | solution for correctness is often much easier than actually
           | finding a correct solution in the first place.
        
           | empath-nirvana wrote:
           | > however that solution only works because we have a non AI
           | system to prove correctness.
           | 
           | I'm not sure why you are calling it non-AI. There's no reason
           | why some AI system has to be some single neural network like
           | GPT and not a hacked together conglomeration of a bunch of
           | neural networks and symbolic logic systems. Like is it
           | cheating to use a SAT solver? Is a SAT solver itself not
           | artificial intelligence of a kind?
        
       | sega_sai wrote:
       | Just in case the DOI link from the press-release doesn't work for
       | you -- https://www.nature.com/articles/s41586-023-06747-5 here's
       | the paper.
        
       | apsec112 wrote:
       | Links to the paper appear broken, but here is a link:
       | 
       | https://www.nature.com/articles/s41586-023-06747-5
        
       | apsec112 wrote:
       | Interesting that the transformer used is tiny. From the paper:
       | 
       | "We use the Meliad library for transformer training with its base
       | settings. The transformer has 12 layers, embedding dimension of
       | 1,024, eight heads of attention and an inter-attention dense
       | layer of dimension 4,096 with ReLU activation. Overall, the
       | transformer has 151 million parameters, excluding embedding
       | layers at its input and output heads. Our customized tokenizer is
       | trained with 'word' mode using SentencePiece and has a vocabulary
       | size of 757. We limit the maximum context length to 1,024 tokens
       | and use T5-style relative position embedding. Sequence packing is
       | also used because more than 90% of our sequences are under 200 in
       | length."
        
         | ipnon wrote:
         | This suggests there may be some more low hanging fruit in the
         | hard sciences for transformers to bite at, so long as they can
         | be properly formulated. This was not a problem of scaling it
         | seems.
        
         | albertzeyer wrote:
         | It's not tiny, this is a quite normal size outside the field of
         | LLMs, e.g. normal-sized language models, or also translation
         | models, or acoustic models. Some people even would call this
         | large.
        
           | apsec112 wrote:
           | It's tiny by the standards of transformers, pretty sure most
           | transformers trained (across all domains) are larger than
           | this
        
             | albertzeyer wrote:
             | No. Where do you have this from?
             | 
             | Looking at NeurIPS 2023:
             | 
             | https://openreview.net/group?id=NeurIPS.cc/2023/Conference#
             | t...
             | 
             | Some random spotlight papers:
             | 
             | - https://openreview.net/pdf?id=YkBDJWerKg: Transformer
             | (VPT) with 248M parameters
             | 
             | - https://openreview.net/pdf?id=CAF4CnUblx: Vit-B/16 with
             | 86M parameters
             | 
             | - https://openreview.net/pdf?id=3PjCt4kmRx: Transformer
             | with 282M parameters
             | 
             | Also, in my field (speech recognition, machine translation,
             | language modeling), all using Transformer variants, this is
             | a pretty normal model size.
        
             | kristjansson wrote:
             | It's tiny by the standards of LLMs; _L_LM might give you
             | some indication as to where they fit in the overall
             | landscape.
        
       | hcarlens wrote:
       | Also relevant: https://aimoprize.com/
       | 
       | ($10m prize for models that can perform well at IMO)
        
       | ChatGTP wrote:
       | "Our AI system"
       | 
       | It's our maths. No one else owns it?
       | 
       | Funny game the AI field is playing now. One-upmanship seems to be
       | the aim of the game.
        
         | drdeca wrote:
         | The AI system that they produced.
         | 
         | "Our" doesn't always imply ownership/title. I do not own my
         | dad.
        
       | empath-nirvana wrote:
       | "What the I.M.O. is testing is very different from what creative
       | mathematics looks like for the vast majority of mathematicians,"
       | he said. ---
       | 
       | Not to pick on this guy, but this is ridiculous goal post
       | shifting. It's just astounding what people will hand-wave away as
       | not requiring intelligence.
        
         | mb7733 wrote:
         | No, that sort of thing has been said about math competitions
         | for a long time. It's not a new argument put forward as
         | something against AI.
         | 
         | An analogy with software is that math competitions are like
         | (very hard) leetcode.
         | 
         | There was an article posted on HN recently that is related:
         | https://benexdict.io/p/math-team
        
           | esafak wrote:
           | What is the argument; that math competitions are easy for
           | computers but hard for humans?
        
             | nitwit005 wrote:
             | It's a quote from the article. The argument is naturally
             | there.
        
               | esafak wrote:
               | I was referring to your leetcode analogy; those too are
               | hard for humans.
        
             | silveraxe93 wrote:
             | The usual argument is:
             | 
             | We test developer skill by giving them leetcode problems,
             | but leetcode while requiring programming skill is nothing
             | like a real programmer's job.
        
             | mb7733 wrote:
             | Like I said, it's got nothing to do with computers or AI.
             | This point of view predates any kind of AI that would be
             | capable of doing either.
             | 
             | The similarly with leetcode & job interviews is as follows:
             | to excel at math competitions one must grind problems and
             | learn tricks, in order to quickly solve problems in a high
             | pressure environment.
             | 
             | And, just like how solving leetcode problems is pretty
             | different than what a typical computer scientist or
             | software engineer does, doing math competitions is pretty
             | different than what a typical mathematician does.
        
         | drawnwren wrote:
         | Yeah, this would be akin to saying "What leetcode is testing is
         | very different from what professional programming looks like".
         | 
         | It isn't untrue, but both are considered metrics by the
         | community. (Whether rightfully or not seems irrelevant to ML).
        
         | adverbly wrote:
         | I guess it kind of depends on what a goal post should
         | represent. If it represents an incremental goal then obviously
         | you are correct.
         | 
         | I think the ultimate goal though is clearly that AI systems
         | will be able to generate new high-value knowledge and proofs. I
         | think the realistically the final goal post has always been at
         | that point.
        
         | GPerson wrote:
         | I'm not sure if it's goal post shifting or not, but it is a
         | true statement.
        
         | nybsjytm wrote:
         | As said by others, that is an absolutely commonplace saying,
         | albeit usually having nothing whatsoever to do with AI. See for
         | instance https://terrytao.wordpress.com/career-advice/advice-
         | on-mathe...
         | 
         | Also if you read about the structure of AlphaGeometry, I think
         | it's very hard to maintain that it "requires intelligence." As
         | AI stuff goes, it's pretty comprehensible and plain.
        
       | 1024core wrote:
       | From the Nature article:
       | 
       | > Note that the performance of GPT-4 performance on IMO problems
       | can also be contaminated by public solutions in its training
       | data.
       | 
       | Doesn't anybody proofread in Nature?
        
         | chefandy wrote:
         | Not a subject matter expert, so forgive me if the question is
         | unintentionally obtuse, but It seems like a reasonable
         | statement. They seem to be inferring that problems with
         | existing public solutions wouldn't be a good indicator of
         | performance in solving novel problems-- likely a more important
         | evaluative measure than how fast it can replicate an answer
         | it's already seen. Since you couldn't know if that solution was
         | in its training data, you couldn't know if you were doing that
         | or doing a more organic series of problem solving steps,
         | therefore contaminating it. What's the problem with saying
         | that?
        
           | mlyle wrote:
           | They're referring to "performance of performance".
           | 
           | Not that it's a big deal. I notice problems like this slip
           | into my writing more and more without detection as I get
           | older :/
           | 
           | I don't see the sentence in the Nature paper, though.
        
             | chefandy wrote:
             | Ah-- I've got super bad ADHD so I usually don't even catch
             | things like that. I'm a terrible editor. I wouldn't be
             | surprised if the paper authors were paying attention to
             | this thread, and gave a call to Nature after seeing the
             | parent comment.
        
               | 1024core wrote:
               | The text is still there.
               | 
               | Paper: https://www.nature.com/articles/s41586-023-06747-5
               | 
               | Section title: Geometry theorem prover baselines
               | 
               | Second paragraph under that. 3rd sentence from the end of
               | the para.
        
             | 1024core wrote:
             | Paper: https://www.nature.com/articles/s41586-023-06747-5
             | 
             | Section title: Geometry theorem prover baselines
             | 
             | Second paragraph under that. 3rd sentence from the end of
             | the para.
        
         | dr_kiszonka wrote:
         | Weird. I can't find the sentence you are quoting in the paper.
        
           | 1024core wrote:
           | Paper: https://www.nature.com/articles/s41586-023-06747-5
           | 
           | Section title: Geometry theorem prover baselines
           | 
           | Second paragraph under that. 3rd sentence from the end of the
           | para.
        
       | gumballindie wrote:
       | I suppose we can now replace high schoolers.
        
         | dandanua wrote:
         | and their teachers
        
       | m3kw9 wrote:
       | So create ai modules for every niche subject and have a OpenAI
       | like function calling system to inference them?
        
         | kfarr wrote:
         | Yes but I would argue geometry is not niche it is the
         | foundation of reasoning about the physical world not just math
         | problems
        
       | Imnimo wrote:
       | I'm very curious how often the LM produces a helpful
       | construction. Surely it must be doing better than random chance,
       | but is it throwing out thousands of constructions before it finds
       | a good one, or is it able to generate useful proposals at a rate
       | similar to human experts?
       | 
       | They say in the paper, "Because the language model decoding
       | process returns k different sequences describing k alternative
       | auxiliary constructions, we perform a beam search over these k
       | options, using the score of each beam as its value function. This
       | set-up is highly parallelizable across beams, allowing
       | substantial speed-up when there are parallel computational
       | resources. In our experiments, we use a beam size of k = 512, the
       | maximum number of iterations is 16 and the branching factor for
       | each node, that is, the decoding batch size, is 32."
       | 
       | But I don't totally understand how 512 and 16 translate into
       | total number of constructions proposed. They also note that
       | ablating beam size and max iterations seems to only somewhat
       | degrade performance. Does this imply that the model is actually
       | pretty good at putting helpful constructions near the top, and
       | only for the hardest problems does it need to produce thousands?
        
         | kingkongjaffa wrote:
         | thanks TIL what Beam search is
         | https://en.wikipedia.org/wiki/Beam_search
        
         | refulgentis wrote:
         | IMHO: this bumps, hard, against limitations of language /
         | human-machine analogies.
         | 
         | But let's try -- TL;DR 262,144, but don't take it literally:
         | 
         | - The output of a decoding function is a token. ~3/4 of a word.
         | Let's just say 1 word.
         | 
         | - _Tokens considered per token output = 262,144_ Total number
         | of token considerations for 1 output token = beam_size *
         | branching_factor * max_iterations = 512 * 32 * 16 = 262,144.
         | 
         | - Let's take their sample solution and get a word count.
         | https://storage.googleapis.com/deepmind-media/DeepMind.com/B...
         | 
         | - _Total tokens for solution = 2289_
         | 
         | - _Total # of tokens considered = 600,047,616_ = 262,144 * 2289
         | 
         | - Hack: ""number of solutions considered"" = total tokens
         | considered / total tokens in solution
         | 
         | - _262,144_ (same # as number of tokens we viewed at each
         | iteration step, which makes sense)
        
       | riku_iki wrote:
       | My understanding is that they encoded domain in several dozens of
       | mechanical rules described in extended data table 1, and then did
       | transformer-guided brute force search for solutions.
        
       | dang wrote:
       | See also https://www.nytimes.com/2024/01/17/science/ai-computers-
       | math...
       | 
       | (via https://news.ycombinator.com/item?id=39030186, but we'll
       | merge that thread hither)
        
         | empath-nirvana wrote:
         | https://archive.is/https://www.nytimes.com/2024/01/17/scienc...
        
           | neonate wrote:
           | https://web.archive.org/web/20240117162133/https://www.nytim.
           | ..
        
       | TFortunato wrote:
       | I'm a big fan of approaches like this, that combine deep learning
       | / newer techniques w/ "GOFAI" approaches.
        
       | summerlight wrote:
       | It looks like there's some interesting works to connect ML with
       | symbolic reasoning (or searching). I'm closer to layman in this
       | area but IIUC the latter is known to be rife with yet-to-be-
       | understood heuristics to prune out the solution space and ML
       | models are pretty good at this area. I'm not in a position to
       | suggest what needs to happen to further push the boundary, but in
       | my impression it looks like the current significant blocker is
       | that we don't really have a way to construct a self-feedback loop
       | structure that consistently iterates and improves the model from
       | its own output. If this can be done properly, we may see
       | something incredible in a few years.
        
         | rafaelero wrote:
         | I didn't look further into their method, but it seems to me
         | that symbolic reasoning is only important insofar as it makes
         | the solution verifiable. That still is a glorious capability,
         | but a very narrow one.
        
       | zodiac wrote:
       | The real TIL (to me) is that the previous state-of-the-art could
       | solve 10 of these! I'd heard there was a decision algorithm for
       | plane geometry problems but I didn't know it was a practical one.
       | Some searching turned up
       | http://www.mmrc.iss.ac.cn/~xgao/paper/book-area.pdf as a
       | reference.
        
       | WhitneyLand wrote:
       | It was interesting. One of the reviewers noted that one of the
       | generated proofs didn't seem that elegant in his opinion.
       | 
       | Given enough compute, I wonder how much this would be improved by
       | having it find as many solutions as possible within the allotted
       | time and proposing the simplest one.
        
         | spookie wrote:
         | Elegance would require more than this. Reasoning, for one.
        
         | jacobolus wrote:
         | The issue is that the computer-generated proofs operate at a
         | very low level, step by step, like writing a program in
         | assembly language without the use of coherent structure.
         | 
         | The human proof style instead chunks the parts of a solution
         | into human-meaningful "lemmas" (helper theorems) and builds
         | bodies of theory into well-defined and widely used abstractions
         | like complex numbers or derivatives, with a corpus of accepted
         | results.
         | 
         | Some human proofs of theorems also have a bit of this flavor of
         | inscrutable lists of highly technical steps, especially the
         | first time something is proven. Over time, the most important
         | theorems are often recast in terms of a more suitable grounding
         | theory, in which they can be proven with a few obvious
         | statements or sometimes a clever trick or two.
        
       | ackfoobar wrote:
       | Euclidean geometry is decidable. Does that make it easier for
       | computers, compared to other IMO problems?
        
       | treprinum wrote:
       | What happens to science when the most talented kids won't be able
       | to compete in ~2 years tops? Would our civilization reach plateau
       | or start downward trajectory as there will be no incentive to
       | torture oneself to become the best in the world? Will it all fade
       | away like chess once computers started beating grandmasters?
        
         | Agingcoder wrote:
         | I never studied math to become the best in the world !
        
           | treprinum wrote:
           | Are you an Olympiad Gold Medalist? Did you move the field of
           | math significantly (beyond a basic PhD)?
        
         | ironlake wrote:
         | Chess is immensely more popular since Deep Blue beat Kasparov.
        
           | jstummbillig wrote:
           | True, but the fun is currently derived from humans going up
           | against humans as a sport. Machines are tools (for learning
           | or cheating) but we are not interested in competition with
           | them.
           | 
           | How will it work for math, where humans do useful work right
           | now? I can not see battle math becoming a big thing, but
           | maybe when some of the tedious stuff goes away math will be
           | philosophy and poking the smarter system is where
           | entertainment can be had?
        
         | EvgeniyZh wrote:
         | I don't think chess faded away
        
         | jeanloolz wrote:
         | Chess never faded away. It actually has never been as big as it
         | is today.
        
           | treprinum wrote:
           | That might be true in the number of active chess players, but
           | it's no longer viewed as a peak intellectual game and the
           | magic is long gone, just another intense hobby some people
           | do, but basic machines can do better.
        
             | westcoast49 wrote:
             | It'a still viewed as a peak intellectual game. And the
             | magic is still there, just watch or listen to the fans of
             | Magnus Carlsen when they're viewing, discussing or
             | analyzing his games.
        
         | lacker wrote:
         | The incentive to compete in the IMO is that it's fun to do math
         | contests, it's fun to win math contests, and (if you think that
         | far ahead as a high schooler) it looks good on your resume.
         | None of that incentive will go away if the computers get better
         | at math contests.
        
           | treprinum wrote:
           | I'd be super demoralized if anything I could do a future
           | pocket machine could do much better and faster. Like my very
           | best is not enough to even tread water.
        
           | jhbadger wrote:
           | People still compete in foot races even though cars can go
           | faster. People still play chess and Go even though computers
           | can beat them.
        
         | sdenton4 wrote:
         | We just need a Netflix series about a geometry prodigy with
         | excellent fashion sense.
        
         | hiAndrewQuinn wrote:
         | I've accepted that the human form, with its ~3 pounds of not
         | especially optimized brainpower, is basically going to be
         | relegated to the same status as demoscene hardware for anything
         | that matters after this century.
         | 
         | That's cool by me, though. This bit of demoscene hardware
         | experiences qualia, and that combination is weird and cool
         | enough to make me want to push myself in new and weird
         | directions. That's what play is in a way.
        
       | nybsjytm wrote:
       | If I read their paper right, this is legit work (much more legit
       | than DeepMind's AI math paper last month falsely advertised as
       | solving an open math research problem) but it's still pretty
       | striking how far away the structure of it is from the usual idea
       | of automated reasoning/intelligence.
       | 
       | A transformer is trained on millions of elementary geometry
       | theorems and used as brute search for a proof, which because of
       | the elementary geometry context has both a necessarily elementary
       | structure and can be easily symbolically judged as true or false.
       | When the brute search fails, an extra geometric construction is
       | randomly added (like adding a midpoint of a certain line) to see
       | if brute search using that extra raw material might work. [edit:
       | as corrected by Imnimo, I got this backwards - the brute search
       | is just pure brute search, the transformer is used to predict
       | which extra geometric construction to add]
       | 
       | Also (not mentioned in the blog post) the actual problem
       | statements had to be modified/adapted, e.g. the actual problem
       | statement "Let AH1, BH2 and CH3 be the altitudes of a triangle
       | ABC. The incircle W of triangle ABC touches the sides BC, CA and
       | AB at T1, T2 and T3, respectively. Consider the symmetric images
       | of the lines H1H2, H2H3, and H3H1 with respect to the lines T1T2,
       | T2T3, and T3T1. Prove that these images form a triangle whose
       | vertices lie on W." had to be changed to "Let ABC be a triangle.
       | Define point I such that AI is the bisector of angle BAC and CI
       | is the bisector of angle ACB. Define point T1 as the foot of I on
       | line BC. Define T2 as the foot of I on line AC. Define point T3
       | as the foot of I on line AB. Define point H1 as the foot of A on
       | line BC. Define point H2 as the foot of B on line AC. Define
       | point H3 as the foot of C on line AB. Define point X1 as the
       | intersection of circles (T1,H1) and (T2,H1). Define point X2 as
       | the intersection of circles (T1,H2) and (T2,H2). Define point Y2
       | as the intersection of circles (T2,H2) and (T3,H2). Define point
       | Y3 as the intersection of circles (T2,H3) and (T3,H3). Define
       | point Z as the intersection of lines X1X2 and Y2Y3. Prove that
       | T1I=IZ."
        
         | Imnimo wrote:
         | >A transformer is trained on millions of elementary geometry
         | theorems and used as brute search for a proof, which because of
         | the elementary geometry context has both a necessarily
         | elementary structure and can be easily symbolically judged as
         | true or false. When the brute search fails, an extra geometric
         | construction is randomly added (like adding a midpoint of a
         | certain line) to see if brute search using that extra raw
         | material might work.
         | 
         | I don't think this is quite right. The brute force search is
         | performed by a symbolic solver, not the transformer. When it
         | runs out of new deductions, the transformer is asked to suggest
         | possible extra constructions (not randomly added).
        
           | nybsjytm wrote:
           | Thanks for the correction, it looks like you're right.
        
         | killerstorm wrote:
         | > it's still pretty striking how far away the structure of it
         | is from the usual idea of automated reasoning/intelligence.
         | 
         | How so? Reasoning is fundamentally a search problem.
         | 
         | The process you described is exactly the process humans use:
         | i.e. make a guess about what's useful, try to work out details
         | mechanically. If get stuck, make another guess, etc. So the
         | process is like searching through a tree.
         | 
         | People figured out this process back in 1955 (and made a
         | working prototype which can prove theorems):
         | https://en.wikipedia.org/wiki/Logic_Theorist but it all hinges
         | on using good 'heuristics'. Neural networks are relevant here
         | as they can extract heuristics from data.
         | 
         | What do you think is "the usual idea of automated reasoning"?
         | Some magic device which can solve any problem using a single
         | linear pass?
        
           | somewhereoutth wrote:
           | > Some magic device which can solve any problem using a
           | single linear pass?
           | 
           | You mean like a calculator?
        
             | gowld wrote:
             | A calculator that can solve any problem human could, and
             | more? Sure.
        
           | jhrmnn wrote:
           | I think the intuition here is that an "intelligent" system
           | would lean much more heavily on the heuristics than on the
           | search, like humans. It's hard to quantify in this case, but
           | certainly in the case of chess, when engines were about as
           | good as best human players, they were doing orders of
           | magnitude larger search than humans. Which made them feel
           | more like chess _engines_ than chess AI. AlphaZero certainly
           | made a step in the right direction, but it's still far from
           | how humans play.
        
             | killerstorm wrote:
             | The comment above mentions "brute force" incorrectly. It's
             | fundamentally impossible to brute force googol
             | combinations...
        
           | amelius wrote:
           | Well, what's different is that humans invent new abstractions
           | along the way such as complex numbers and Fourier transforms.
        
             | gowld wrote:
             | A neural network is nothing but a heap of new abstractions
             | from data.
        
             | MrPatan wrote:
             | Not a lot of humans do
        
               | amelius wrote:
               | When a human has done the same thing many times they tend
               | to try to generalize and take shortcuts. And make tools.
               | Perhaps I missed something but I haven't seen a neural
               | net do that.
        
               | littlestymaar wrote:
               | True, but we're talking about Olympiad level math skills
               | here.
        
       | bnprks wrote:
       | I appreciate that the authors released code and weights with
       | their paper! This is the first high-profile DeepMind paper I can
       | recall that has runnable inference code + checkpoints released.
       | (Though I'm happy to be corrected by earlier examples I've
       | missed)
       | 
       | I don't yet see a public copy of the training set / example
       | training code, but still this is a good step towards providing
       | something other researchers can build on -- which is after all
       | the whole point of academic papers!
        
         | pk-protect-ai wrote:
         | Yeap. I'm missing the datasets as well. They have generated
         | 100M synthetic examples ... Were these examples generated with
         | AlphaGeometry? Where is the filtering code and initial input to
         | generate these synthetics?
         | 
         | Im I'm wrong that they are using t5 model? At least they are
         | using the sentencepiece t5 vocabulary.
         | 
         | How many GPU hours have they spend training this model? Which
         | training parameters were used?
         | 
         | Don't get me wrong, I find this system fascinating it is what
         | applied engineering should look like. But I'd like to know more
         | about the training details and the initial data they have used
         | as well as the methods of synthetic data generation.
        
       | qwertox wrote:
       | Google DeepMind keeps publishing these things while having missed
       | the AI-for-the-people train, that I'm getting more and more the
       | feeling that they have such a huge arsenal of AI-tech piling up
       | in their secret rooms, as if they're already simulating entire
       | AI-based societies.
       | 
       | As if they're stuck in a loop of "let's improve this a bit more
       | until it is even better", while AGI is already a solved problem
       | for them.
       | 
       | Or they're just an uncoordinated, chaotic bunch of AI teams
       | without a leader who unifies them. With leader I don't mean Demis
       | Hassabis, but rather someone like Sundar Pichai.
        
         | nybsjytm wrote:
         | This is at almost the polar opposite end of the spectrum from
         | "AGI," it's centered on brute search.
        
           | danielmarkbruce wrote:
           | They are not polar opposites.
        
         | achierius wrote:
         | I think Occam's razor would have something to say about the
         | relative likelihood of those two options.
        
         | senseiV wrote:
         | > simulating entire AI-based societies.
         | 
         | Didnt they already have scaled down simulations of this?
        
       | sidcool wrote:
       | And it's been open sourced. The code and model both!
        
       | zbyforgotp wrote:
       | This sounds like the famous system 1 and system 2 thinking with
       | the prover doing the systematic system 2 thinking and the llm
       | doing the intuitive system 1 thinking.
        
       | artninja1988 wrote:
       | >To train AlphaGeometry's language model, the researchers had to
       | create their own training data to compensate for the scarcity of
       | existing geometric data. They generated nearly half a billion
       | random geometric diagrams and fed them to the symbolic engine.
       | This engine analyzed each diagram and produced statements about
       | their properties. These statements were organized into 100
       | million synthetic proofs to train the language model.
       | 
       | With all the bickering about copyright, could something similar
       | be used for coding llms? Would kill the ip issues, at least for
       | coding
        
         | nodogoto wrote:
         | What statements about properties of randomly generated code
         | snippets would be useful for coding LLMs? You would need to
         | generate text explaining what each snippet does, but that would
         | require an existing coding LLM, so any IP concerns would
         | persist.
        
           | artninja1988 wrote:
           | Yeah, but couldn't some system be built that understands what
           | the differnt code snippets do by compiling them or whatever?
        
         | sangnoir wrote:
         | For language, the symbolic engine itself would likely be
         | trained on copyrighted input, unlike the geometry engine, since
         | math & math facts are not covered by copyright.
         | 
         | If you couple random text genrsrion and such an engine for
         | language, you'd be laundering your training using the extra
         | step (and the quality will likely be worse due to
         | multiplicative errors)
        
       | ipsum2 wrote:
       | Do normal proofs for Olympiad level geometry questions require
       | 150+ steps like the one that was generated here? It seems
       | inelegant.
        
         | lacker wrote:
         | When a human solves these problems, you might state a "step" in
         | natural language. Like, "apply a polar transformation to this
         | diagram". But that same single step, if you translate into a
         | low-level DSL, could be dozens of different steps.
         | 
         | For an example, check out this formalization of a solution to a
         | non-geometry IMO problem:
         | 
         | https://gist.github.com/mlyean/b4cc46cf6b0705c1226511a2b404d...
         | 
         | Roughly the same solution is written here in human language:
         | 
         | https://artofproblemsolving.com/wiki/index.php/1972_IMO_Prob...
         | 
         | It's not precisely clear what counts as a "step" but to me this
         | seems like roughly 50 steps in the low-level formalized proof,
         | and 5 steps in the human-language proof.
         | 
         | So, my conclusion is just that I wouldn't necessarily say that
         | a 150-step automated proof was inelegant. It can also be that
         | the language used to express these proofs fills in more of the
         | details.
         | 
         | It's like the difference between counting lines of code in
         | compiled code and a high-level language, counting the steps is
         | really dependent on the level at which you express it.
        
       | FranchuFranchu wrote:
       | As soon as ChatGPT got released, I tried to make it solve IMO-
       | style problems. It failed.
       | 
       | If this is legit, this is huge. Finding geometric proofs means
       | proving things, and proving propositions is what an intelligence
       | does. In my opinion, this is the closest we have gotten so far to
       | AGI. Really excited to see what the future holds.
        
         | 1024core wrote:
         | From the paper:
         | 
         | > When producing full natural-language proofs on IMO-AG-30,
         | however, GPT-4 has a success rate of 0% ...
        
         | gowld wrote:
         | Proving propositions has been computer work for several
         | decades.
         | 
         | This is not a pure LLM. Search algorithms are obviously
         | incredibly powerful at solving search problems. The LLM
         | contribution is an "intuitive" way to optimize the search.
        
       | compthink wrote:
       | Very interesting. I was suspicious they could get an LLM to do
       | this alone, and reading the description they don't. It's still AI
       | I would say, maybe even more so since it alternates between a
       | generative machine learning model to come up with hypothesis
       | points. But then it uses a optimizer to solve for those pointa
       | and simplify the problem. So it really is a mathematical
       | reasoning system. Also good to note that is tuned specificially
       | to these types of geometry problems.
        
       | wigster wrote:
       | the hounds of tindalos will be sniffing around this
        
       ___________________________________________________________________
       (page generated 2024-01-17 23:00 UTC)