[HN Gopher] AlphaProof's Greatest Hits
       ___________________________________________________________________
        
       AlphaProof's Greatest Hits
        
       Author : rishicomplex
       Score  : 241 points
       Date   : 2024-11-17 17:20 UTC (1 days ago)
        
 (HTM) web link (rishimehta.xyz)
 (TXT) w3m dump (rishimehta.xyz)
        
       | wslh wrote:
       | If you were to bet on solving problems like "P versus NP" using
       | these technologies combined with human augmentation (or vice
       | versa), what would be the provable time horizon for achieving
       | such a solution? I think we should assume that the solution is
       | also expressible in the current language of math/logic.
        
         | hiddencost wrote:
         | No one is focused on those. They're much more focused on more
         | rote problems.
         | 
         | You might find them used to accelerate research math by helping
         | them with lemmas and checking for errors, and formalizing
         | proofs. That seems realistic in the next couple of years.
        
           | nybsjytm wrote:
           | There are some AI guys like Christian Szegedy who predict
           | that AI will be a "superhuman mathematician," solving
           | problems like the Riemann hypothesis, by the end of 2026. I
           | don't take it very seriously, but that kind of
           | prognostication is definitely out there.
        
             | titanomachy wrote:
             | I'm sure AI can "solve" the Riemann hypothesis already,
             | since a human proved it and the proof is probably in its
             | training data.
        
               | nybsjytm wrote:
               | No, nobody has proved it.
               | 
               | Side point, there is no existing AI which can prove - for
               | example - the Poincare conjecture, even though that has
               | already been proved. The details of the proof are far too
               | dense for any present chatbot like ChatGPT to handle, and
               | nothing like AlphaProof is able either since the scope of
               | the proof is well out of the reach of Lean or any other
               | formal theorem proving environment.
        
               | Davidzheng wrote:
               | what does this even mean? Surely an existing AI could
               | reguritate all of Perelman's arxiv papers if we trained
               | them to do that. Are you trying to make a case that the
               | AI doesn't understand the proof it's giving? Because then
               | I think there's no clear goal-line.
        
               | nybsjytm wrote:
               | You don't even need AI to regurgitate Perelman's papers,
               | you can do that in three lines of python.
               | 
               | What I meant is that there's no AI you can ask to explain
               | the details of Perelman's proof. For example, if there's
               | a lemma or a delicate point in a proof that you don't
               | understand, you can't ask an AI to clarify it.
        
             | Davidzheng wrote:
             | link to this prediction? The famous old prediction of
             | Szegedy was IMO gold by 2026 and that one is basically
             | confirmed right? I think 2027/2028 personally is a
             | breakeven bet for superhuman mathematician.
        
               | nybsjytm wrote:
               | I consider it unconfirmed until it happens! No idea where
               | I saw it but it was probably on twitter.
        
         | zarzavat wrote:
         | Probably a bad example, P vs NP is the most likely of the
         | millennium problems to be unsolvable, so the answer may be
         | "never".
         | 
         | I'll bet the most technical open problems will be the ones to
         | fall first. What AIs lack in creativity they make up for in
         | ability to absorb a large quantity of technical concepts.
        
           | wslh wrote:
           | Thank you for the response. I have a follow-up question:
           | Could these AIs contribute to advancements in resolving the P
           | vs NP problem? I recall that the solution to Fermat's Last
           | Theorem relied on significant progress in elliptic curves.
           | Could we now say that these AI systems might play a similar
           | role in advancing our understanding of P vs NP?
        
             | ccppurcell wrote:
             | Just my guess as a mathematician. But if LLMs are good for
             | anything it will be for finding surprising connections and
             | applying our existing tools in ways beyond human search.
             | There's a huge space of tools and problems, and human
             | intuition and brute force searching can only go so far. I
             | can imagine that LLMs might start to find combinatorial
             | proofs of topological theorems, maybe even novel theorems.
             | Or vice versa. But I find it difficult to imagine them
             | inventing new tools and objects that are really useful.
        
               | nemonemo wrote:
               | Do you have a past example where this already-proven
               | theorem/new tools/objects would have been only possible
               | by human but not AI? Any such example would make your
               | arguments much more approachable by non-mathematicians.
        
           | meindnoch wrote:
           | Ok, then the AI should formally prove that it's "unsolvable"
           | (however you meant it).
        
             | less_less wrote:
             | Unsolvable would mean that no proof exists, and no disproof
             | exists, in whatever axiom system (eg ZF). While in some
             | cases you can hack around this by proving eg "no proof and
             | no disproof exist in ZF, if ZF is consistent", IIUC that's
             | not always possible. It's like asking "when will AI be
             | strong enough that it can always decide correctly whether a
             | given computer program halts?"
             | 
             | Then again, it's fairly likely that P=?NP is decidable but
             | we just don't have any idea how to prove it. In that case
             | the question is more or less "what's the time horizon until
             | AI is vastly better than humans at formal math?", to which
             | the answer is certainly "we don't know, there may be
             | obstacles, just wait and see".
        
         | uptownfunk wrote:
         | The hard part is in the creation of new math to solve these
         | problems not in the use of existing mathematics. So new objects
         | (groups rings fields) etc have to be theorized, their
         | properties understood, and then that new machinery used to
         | crack the existing problems. I think we will get to a place
         | (around 5 years) where AI will be able to solve these problems
         | and create these new objects. I don't think it's one of
         | technology I think it's more financial. Meaning, there isn't
         | much money to be made doing this (try and justify it for
         | yourself) and so the lack of focus here. I think this is a red
         | herring and there is a gold mine in there some where but it
         | will likely take someone with a lot of cash to fund it out of
         | passion (Vlad Tenev / Harmonic, or Zuck and Meta AI, or the
         | Google / AlphaProof guys) but in the big tech world, they are
         | just a minnow project in a sea of competing initiatives. And so
         | that leaves us at the mercy of open research, which if it is a
         | compute bound problem, is one that may take 10-20 years to
         | crack. I hope I see a solution to RH in my lifetime (and in
         | language that I can understand)
        
           | wslh wrote:
           | I understand that a group of motivated individuals, even
           | without significant financial resources, could attempt to
           | tackle these challenges, much like the way free and open-
           | source software (FOSS) is developed. The key ingredients
           | would be motivation and intelligence, as well as a shared
           | passion for advancing mathematics and solving foundational
           | problems.
        
             | uptownfunk wrote:
             | Ok but how do you get around needing a 10k or 100k h100
             | cluster
        
               | wslh wrote:
               | It is well known that cloud services like Google Cloud
               | subsidizes some projects and we don't even know if in a
               | few years improvements will arise.
        
               | uptownfunk wrote:
               | Possible but unlikely given how much demand there is and
               | the pressure to deliver returns to shareholders, however
               | sure it is possible. Right now search is very
               | inefficient, the search space is massive. That is the
               | main problem. You can have many sequences of text that
               | sound plausible, but of them a much smaller number will
               | be logically valid. This is the main challenge. Once we
               | can search efficiently not just in semantically valid
               | space but I suppose what you can call syntactically valid
               | space then we will be able to crack this.
        
           | Davidzheng wrote:
           | I think there's significant financial incentives for big tech
           | given the scarcity of benchmarks for intelligence which are
           | not saturated
        
           | empath75 wrote:
           | > I think we will get to a place (around 5 years) where AI
           | will be able to solve these problems and create these new
           | objects.
           | 
           | For all we know, buried deep in AlphaProof's attempts to
           | solve these toy problems, it already tried and discarded
           | several new ideas.
        
       | sbierwagen wrote:
       | More information about the language used in the proofs:
       | https://en.wikipedia.org/wiki/Lean_(proof_assistant)
        
       | sincerely wrote:
       | in the first question, why do they even specify [?]n[?] (and
       | [?]2n[?] and so on) when n is an integer?
        
         | rishicomplex wrote:
         | Alpha need not be an integer, we have to prove that it is
        
           | sincerely wrote:
           | Should have read more carefully, thank you!
        
       | Robotenomics wrote:
       | "Only 5/509 participants solved P6"
        
         | nybsjytm wrote:
         | This has to come with an asterisk, which is that participants
         | had approximately 90 minutes to work on each problem while
         | AlphaProof computed for three days for each of the ones it
         | solved. Looking at this problem specifically, I think that many
         | participants could have solved P6 without the time limit.
         | 
         | (I think you should be very skeptical of anyone who hypes
         | AlphaProof without mentioning this - which is not to suggest
         | that there's nothing there to hype)
        
           | auggierose wrote:
           | Certainly an interesting information that AlphaProof needed
           | three days. But does it matter for evaluating the importance
           | of this result? No.
        
             | nybsjytm wrote:
             | I agree that the result is important regardless. But the
             | tradeoff of computing time/cost with problem complexity is
             | hugely important to think about. Finding a proof in a
             | formal language is trivially solvable in theory since you
             | just have to search through possible proofs until you find
             | one ending with the desired statement. The whole practical
             | question is how much time it takes.
             | 
             | Three days per problem is, by many standards, a
             | 'reasonable' amount of time. However there are still
             | unanswered questions, notably that 'three days' is not
             | really meaningful in and of itself. How parallelized was
             | the computation; what was the hardware capacity? And how
             | optimized is AlphaProof for IMO-type problems (problems
             | which, among other things, all have short solutions using
             | elementary tools)? These are standard kinds of critical
             | questions to ask.
        
               | dash2 wrote:
               | Though, if you start solving problems that humans can't
               | or haven't solved, then questions of capacity won't
               | matter much. A speedup in the movement of the maths
               | frontier would be worth many power stations.
        
               | nybsjytm wrote:
               | For some time a 'superhuman math AI' could be useful for
               | company advertising and getting the attention of VCs. But
               | eventually it would be pretty clear that innovative math
               | research, with vanishingly few exceptions, isn't very
               | useful for making revenue. (I am a mathematician and this
               | is meant with nothing but respect for math research.)
        
               | airstrike wrote:
               | "Making revenue" is far from being the only metric by
               | which we deem something worthy.
        
               | nybsjytm wrote:
               | As a mathematician, of course I agree. But in a sentence
               | like:
               | 
               | > A speedup in the movement of the maths frontier would
               | be worth many power stations
               | 
               | who is it 'worth' it to? And to what end? I can say with
               | some confidence that many (likely most, albeit certainly
               | not all) mathematicians do not want data centers and
               | power stations to guzzle energy and do their math for
               | them. It's largely a vision imposed from without by
               | Silicon Valley and Google research teams. What do they
               | want it for and why is it (at least for now) "worth" it
               | to them?
               | 
               | Personally, I don't believe for a second that they want
               | it for the good of the mathematical community. Of course,
               | a few of their individual researchers might have their
               | own personal and altruistic motivations; however I don't
               | think this is so relevant.
        
               | auggierose wrote:
               | There is something called applied math, and there is a
               | big gulf between applied math and pure math. This new
               | technology has the potential of making use of much more
               | "pure math" for applied math, unifying much of pure math,
               | applied math, and programming. I don't really care about
               | the RH, I care about that.
        
               | empath75 wrote:
               | The big exception being predicting market movements, and
               | I can't imagine how much money the hedge funds are
               | spending on this right now.
        
               | auggierose wrote:
               | > Finding a proof in a formal language is trivially
               | solvable in theory since you just have to search through
               | possible proofs until you find one ending with the
               | desired statement. The whole practical question is how
               | much time it takes.
               | 
               | No, in my experience the whole practical question is, can
               | it be found automatically, or can it not be found
               | automatically? Because there is an exponential search
               | space that conventional automated methods will not be
               | able to chew through, it either works, or it doesn't.
               | AlphaProof shows that for some difficult IMO problems, it
               | works.
        
             | andrepd wrote:
             | More or less. Modern theorem provers, even fully automatic
             | ones, can prove incredibly difficult problems if given
             | enough time. With 3 days and terabytes of memory, perhaps
             | they could? Would be interesting to compare Alphaproof with
             | a standard theorem prover that is given similarly
             | astronomical computing resources.
        
               | auggierose wrote:
               | That is an interesting thought. But I doubt that standard
               | automated theorem proving techniques would have solved
               | this in a 100 years, even with immense resources, as all
               | they do is (basically) brute force search, and it remains
               | an exponential search space. Therefore 3 days does not
               | really matter, indeed, 3 months would still be a result
               | that 10 years ago I would have thought impossible.
        
           | letitgo12345 wrote:
           | Think more is made of this asterix than necessary. Quite
           | possible adding 10x more GPUs would have allowed it to solve
           | it in the time limit.
        
             | nybsjytm wrote:
             | Very plausible, but that would also be noteworthy. As I've
             | mentioned in some other comments here, (as far as I know)
             | we outside of DeepMind don't know anything about the
             | computing power required to run alphaproof, and the
             | tradeoff between computing power required and the
             | complexity of problems it can address is really key to
             | understanding how useful it might be.
        
       | nybsjytm wrote:
       | Why have they still not released a paper aside from a press
       | release? I have to admit I still don't know how auspicious it is
       | that running google hardware for three days apiece was able to
       | find half-page long solutions, given that the promise has always
       | been to solve the Riemann hypothesis with the click of a button.
       | But of course I do recognize that it's a big achievement relative
       | to previous work in automatic theorem proving.
        
         | whatshisface wrote:
         | I don't know why so few people realize this, but by solving any
         | of the problems their performance is superhuman for most
         | reasonable definitions of human.
         | 
         | Talking about things like solving the Reimman hypothesis in so
         | many years assumes a little too much about the difficulty of
         | problems that we can't even begin to conceive of a solution
         | for. A better question is what can happen when everybody has
         | access to above average reasoning. Our society is structured
         | around avoiding confronting people with difficult questions,
         | except when they are intended to get the answer wrong.
        
           | GregarianChild wrote:
           | We know that any theorem that is provable at all (in the
           | chosen foundation of mathematics) can be found by patiently
           | enumerating all possible proofs. So, in order to evaluate
           | AlphaProof's achievements, we'd need to know how much of a
           | shortcut AlphaProof achieved. A good proxy for that would be
           | the total energy usage for training and running AlphaProof. A
           | moderate proxy for that would be the number of GPUs / TPUs
           | that were run for 3 days. If it's somebody's laptop, it would
           | be super impressive. If it's 1000s of TPUs, then less so.
        
             | Onavo wrote:
             | > _We know that any theorem that is provable at all (in the
             | chosen foundation of mathematics) can be found by patiently
             | enumerating all possible proofs._
             | 
             | Which computer science theorem is this from?
        
               | zeroonetwothree wrote:
               | It's just an obvious statement. If a proof exists, you
               | will eventually get to it.
        
               | j16sdiz wrote:
               | Only if we take AC, I guess?
        
               | Tainnor wrote:
               | No, this has nothing to do with choice.
        
               | kadoban wrote:
               | It's a direct consequence of the format of a proof.
               | They're finite-length sequences of a finite alphabet of
               | symbols, so of course they're enumerable (the same
               | algorithm you use to count works to enumerate them).
               | 
               | If you want a computer to be able to tell that it found a
               | correct proof once it enumerates it, you need a bit more
               | than that (really just the existence of automated proof
               | checkers is enough for that).
        
               | E_Bfx wrote:
               | I guess it is tautological from the definition of
               | "provable". A theorem is provable by definition if there
               | is a finite well-formulated formula that has the theorem
               | as consequence (https://en.wikipedia.org/wiki/Theorem
               | paragraph theorem in logic)
        
               | Xcelerate wrote:
               | Not sure it's a tautology. It's not obvious that a
               | recursively enumerable procedure exists for arbitrary
               | formal systems that will eventually reach all theorems
               | derivable via the axioms and transformation rules. For
               | example, if you perform depth-first traversal, you will
               | not reach all theorems.
               | 
               | Hilbert's program was a (failed) attempt to determine,
               | loosely speaking, whether there was a process or
               | procedure that could discover all mathematical truths.
               | Any theorem depends on the formal system you start with,
               | but the deeper implicit question is: where do the axioms
               | come from and can we discover all of them (answer:
               | "unknown" and "no")?
        
               | Tainnor wrote:
               | It's "obvious" in the sense that it's a trivial corollary
               | of the completeness theorem (so it wouldn't be true for
               | second order logic, for example).
               | 
               | Hilbert's program failed in no contradiction to what GP
               | wrote because the language of FOL theorems is only
               | _recursively enumerable_ and not _decidable_. It 's
               | obvious that something is true if you've found a proof,
               | but if you haven't found a proof yet, is the theorem
               | wrong or do you simply have to wait a little longer?
        
             | Davidzheng wrote:
             | The shortcut vs enumeration is definitely enormous right?
             | just take average shannon entropy to the exponent of the
             | length for example will be probably > heat death (or
             | whatever death) of universe on all of human compute (I'm
             | assuming I didn't bother to check but it seems intuitively
             | true by a margin)
        
           | zeroonetwothree wrote:
           | Well, multiply two large numbers instantly is a superhuman
           | feat a calculator can do. I would hope we are going for a
           | higher bar, like "useful". Let's see if this can provide
           | proofs of novel results.
        
             | nybsjytm wrote:
             | It's worth emphasizing that it's been possible for years to
             | use an automatic theorem prover to prove novel results. The
             | whole problem is to get novel interesting results.
        
             | whatshisface wrote:
             | The ability to multiply two numbers superhumanly has
             | already transformed society.
        
             | rowanG077 wrote:
             | That superhuman capability of "multiplying two large
             | numbers instantly" has transformed human society like not
             | even plumbing has. I really can't see how this you could
             | not consider this useful.
        
           | nybsjytm wrote:
           | > A better question is what can happen when everybody has
           | access to above average reasoning. Our society is structured
           | around avoiding confronting people with difficult questions,
           | except when they are intended to get the answer wrong.
           | 
           | What does this have to do with a hypothetical automatic
           | theorem prover?
        
             | whatshisface wrote:
             | Logic is pretty much absent from our culture and daily
             | life, but that could be due to its limited supply.
        
               | nybsjytm wrote:
               | Being logical in social life is pretty much completely
               | different from being logical in a mathematical argument,
               | especially in a formal theorem proving environment. (Just
               | try to write any kind of cultural proposition in a formal
               | language!)
        
               | whatshisface wrote:
               | That's the way things are now, but this regime came about
               | when proving things took intense concentration and
               | specialized skills that very few people had. Contrast
               | going to look something up in a library with googling
               | something during a conversation.
        
         | GregarianChild wrote:
         | Google stopped publishing interesting AI work since they had
         | their AI lead taken away by OpenAI, and mostly with tech that
         | was pioneered, but not monetised by Google like transformers.
         | 
         | I imagine they are under pressure not to make this mistake
         | again.
        
       | throwaway713 wrote:
       | Anyone else feel like mathematics is sort of the endgame? I.e.,
       | once ML can do it better than humans, that's basically it?
        
         | margorczynski wrote:
         | I doubt it. Math has the property that you have a way to 100%
         | verify that what you're doing is correct with little cost (as
         | it is done with Lean). Most problems don't have anything close
         | to that.
        
           | exe34 wrote:
           | to be fair, humans also have to run experiments to discover
           | whether their models fit nature - AI will do it too.
        
             | margorczynski wrote:
             | These kind of experiments are many times orders of
             | magnitude more costly (time, energy, money, safety, etc.)
             | than verifying a mathematical proof with something like
             | Lean. That's why many think math will be one of the first
             | to crack with AI as there is a relatively cheap and fast
             | feedback loop available.
        
           | AlotOfReading wrote:
           | Math doesn't have a property that you can verify everything
           | you're doing is correct with little cost. Humans simply tend
           | to prefer theorems and proofs that are simpler.
        
             | thrance wrote:
             | You _can_ , in principle, formalize any correct
             | mathematical proof and verify its validity procedurally
             | with a "simple" algorithm, that actually exists (See Coq,
             | Lean...). Coming up with the proof is much harder, and
             | deciding what to attempt to prove even harder, though.
        
               | AlotOfReading wrote:
               | You can verify it with a simple algorithm, but that
               | verification won't always be cheap. If it was, curry-
               | howard would be incredibly uninteresting. It only _seems_
               | cheap because we usually have little interest in
               | exploring the expensive theorems. Sometimes we do though
               | and get things like the 4 color theorem whose proof
               | verification amounts to combinatorial search.
        
         | raincole wrote:
         | If "better than humans" means when you give it a real world
         | problem, it gives you a mathematical model to describe it (and
         | does it better than human experts), then yes, it's the end
         | game.
         | 
         | If it just solves a few formalized problems with formalized
         | theorems, not so much. You can write a program that solves ALL
         | the problems under formalized theorems already. It just runs
         | very slowly.
        
           | titanomachy wrote:
           | I don't think you can gloss over the importance of
           | computational tractability here. A human could also start
           | enumerating every possible statement in ZFC, but clearly that
           | doesn't make them a mathematician.
        
         | technotony wrote:
         | Yes, because if AI can do maths then it can use that to improve
         | the efficiency/quality of it's algorithms to self improve...
        
           | nybsjytm wrote:
           | The quality of AI algorithms is not based on formal
           | mathematics at all. (For example, I'm unaware of even one
           | theorem relevant to going from GPT-1 to GPT-4.) Possibly in
           | the future it'll be otherwise though.
        
           | SkiFire13 wrote:
           | ... or it might prove that it's impossible to self-improve
           | given the current constraits
        
         | GuB-42 wrote:
         | Not the endgame by far. Maybe the endgame for LLMs, and I am
         | not even convinced.
         | 
         | Maths is detached from reality. An AI capable of doing math
         | better than humans may not be able do drive a car, as driving a
         | car requires a good understanding of the world, it has to
         | recognize object and understand their behavior, for example,
         | understanding that a tree won't move but a person might, but it
         | will move slower than another car. It has to understand the
         | physics of the car: inertia, grip, control,... It may even have
         | to understand human psychology and make ethical choices.
         | 
         | Fully autonomous robots would be the endgame.
        
           | ykonstant wrote:
           | >An AI capable of doing math better than humans may not be
           | able do drive a car,
           | 
           | Noo, but my excuse for being unable to drive a car is
           | precisely that I am a quirky mathematician focused on
           | research!
        
         | jdietrich wrote:
         | Humans are terrible at anything you learn at university and
         | incredibly good at most things you learn at trade school. In
         | absolute terms, mathematics is much easier than laying bricks
         | or cutting hair.
         | 
         | https://en.wikipedia.org/wiki/Moravec%27s_paradox
        
           | youoy wrote:
           | I would say that "narrow" mathematics (finding a proof of a
           | given statement that we suspect has a proof using a formal
           | language) is much easier that "generally" laying brick or
           | cutting hair.
           | 
           | But I cannot see how consistently doing general mathematics
           | (as in finding interesting and useful statements to proof,
           | and then finding the proofs) is easier than consistently
           | cutting hair/driving a car.
           | 
           | We might get LLM level mathematics, but not Human level
           | mathematics, in the same way that we can get LLM level films
           | (something like Avengers, or the final season of GoT), but we
           | are not going to get Human level films.
           | 
           | I suspect that there are no general level mathematics without
           | the geometric experience of humans, so for general level
           | mathematics one has to go through perceptions and
           | interactions with reality first. In that case, general
           | mathematics is one level over "laying bricks or cutting
           | hair", so more complex. And the paradox is only a paradox for
           | superficial reasoning.
        
             | staunton wrote:
             | > But I cannot see how consistently doing general
             | mathematics (as in finding interesting and useful
             | statements to proof, and then finding the proofs) is easier
             | than consistently cutting hair/driving a car.
             | 
             | The main "absolute" difficulty there is in understanding
             | and shaping what the mathematical audience thinks is
             | "interesting". So it's really a marketing problem. Given
             | how these tools are being used for marketing, I would have
             | high hopes, at least for this particular aspect...
        
               | youoy wrote:
               | Is it really marketing in general? I can agree with some
               | of it, but for me the existence of the term "low hanging
               | fruit" to describe some statements says otherwise...
        
           | looofooo0 wrote:
           | Sure but checking everything is correctly wired, plug-in, cut
           | etc. Everything needes is thought of? There is plenty of
           | things an AI could do to help a trades man.
        
         | abrookewood wrote:
         | I mean ... calculators can do better at mathematics than most
         | of us. I don't think they are going to threaten us anytime
         | soon.
        
         | AlexCoventry wrote:
         | There are important forms of discernment and judgement which
         | aren't captured by mathematics.
        
         | awanderingmind wrote:
         | The end of an era perhaps, but not 'the end' - another
         | commenter has already mentioned Moravec's paradox:
         | https://en.wikipedia.org/wiki/Moravec%27s_paradox
         | 
         | It will be interesting if/when these models start proving major
         | open problems, e.g. the Riemann Hypothesis. The sociological
         | impact on the mathematical community would certainly be acute,
         | and likely lead to a seismic shift in the understanding of what
         | research-level mathematics is 'for'. This discussion already
         | appears to be in progress. As an outsider I have no idea what
         | the timeline is for such things (2 years? 10? 100?).
         | 
         | On the plus side, AlphaProof has the benefit over ordinary LLMs
         | in their current form in that it does not pollute our common
         | epistemological well, and its output is eminently interrogable
         | (if you know Lean at last).
        
         | empath75 wrote:
         | Computers have been better than us at calculation since about a
         | week after computers were invented.
         | 
         | If a computer proves the Reimann Hypothesis, someone will say
         | "Oh of course, writing a proof doesn't require intelligence,
         | it's merely the dumb application of logical rules, but only a
         | human could have thought of the conjecture to begin with."
        
       | sega_sai wrote:
       | I think the interface of LLM with formalized languages is really
       | the future. Because here you can formally verify every statement
       | and deal with hallucinations.
        
         | raincole wrote:
         | It's obviously not the future (outside of mathematics
         | research). The whole LLM boom we've seen in the past two years
         | comes from one single fact: peopel don't need to learn a new
         | language to use it.
        
           | seizethecheese wrote:
           | Both comments can be right. People don't need to know HTML to
           | use the internet.
        
           | nickpsecurity wrote:
           | Natural language -> Formal Language with LLM-assisted
           | tactics/functions -> traditional tools (eg provers/planners)
           | -> expert-readable outputs -> layperson-readable results.
           | 
           | I can imagine many uses for flows where LLM's can implement
           | the outer layers above.
        
         | Groxx wrote:
         | The difficulty then will be figuring out if the proof is
         | relevant to what you want, or simply a proof of 1=1 in
         | disguise.
        
         | est wrote:
         | > formalized languages is really the future
         | 
         | Hmm, maybe it's time for symbolism to shine?
        
         | samweb3 wrote:
         | I am building Memelang (memelang.net) to help with this as
         | well. I'd love your thoughts if you have a moment!
        
           | thesz wrote:
           | You are building an SQL in disguise.
           | 
           | First, you need to encode "memes" and relations between them
           | at scale. This is not a language problem, it is data handling
           | problem.
           | 
           | Second, at some point of time you will need to query memes
           | and relations between them, again, at scale. While expression
           | of queries _is_ a language problem, an implementation will
           | heavily use what SQL engines does use.
           | 
           | And you need to look at Cyc:
           | https://en.wikipedia.org/wiki/Cyc
           | 
           | It does what you are toing do for 40 (forty) years now.
        
             | jamilton wrote:
             | The PHP implementation works by converting it to SQL, so
             | I'm sure they're aware.
        
           | meindnoch wrote:
           | Looks like an incomplete Prolog.
        
           | chvid wrote:
           | Maybe include a section titled 'how is this different from
           | and how is it related to' ie. relational algebra?
        
         | nybsjytm wrote:
         | If this were the case, I don't see why we'd need to wait for an
         | AI company to make a breakthrough in math research. The key
         | issue instead is how to encode 'real-life' statements in a
         | formal language - which to me seems like a ludicrous problem,
         | just complete magical thinking.
         | 
         | For example, how might an arbitrary statement like "Scholars
         | believe that professional competence of a teacher is a
         | prerequisite for improving the quality of the educational
         | process in preschools" be put in a lean-like language? What
         | about "The theoretical basis of the October Revolution lay in a
         | development of Marxism, but this development occurred through
         | three successive rounds of theoretical debate"?
         | 
         | Or have I totally misunderstood what people mean when they say
         | that developments in automatic theorem proving will solve LLM's
         | hallucination problem?
        
           | nine_k wrote:
           | You can't talk about formally verifiable truthiness until you
           | solve epistemology. This can be achieved formally in
           | mathematics, with known principal limitations. Here strict
           | theorem-proving, Lean-style, is viable.
           | 
           | It can also be achieved informally and in a fragments way in
           | barely-mathematical disciplines, like biology, linguistics,
           | and even history. We have chains of logical conclusions that
           | do not follow strictly, but with various probabilistic
           | limitations, and under modal logic of sorts. Several
           | contradictory chains follow under the different (modal)
           | assumptions / hypotheses, and often both should be
           | considered. This is where probabilistic models like LLMs
           | could work together with formal logic tools and huge
           | databases of facts and observations, being the proverbial
           | astute reader.
           | 
           | In some more relaxed semi-disciplines, like sociology,
           | psychology, or philosophy, we have a hodgepodge of
           | contradictory, poorly defined notions and hand-wavy reasoning
           | (I don't speak about Wittgenstein here, but more about Freud,
           | Foucault, Derrida, etc.) Here, I think, the current crop of
           | LLMs is applicable most directly, with few augmentations.
           | Still a much, much wider window of context might be required
           | to make it actually productive, by the standards of the
           | field.
        
             | nybsjytm wrote:
             | Hmmm I think even in something very nominally nearby like
             | theoretical physics, there's very little that's similar to
             | theorem proving. I don't see how AlphaProof could be a
             | stepping stone to anything like what you're describing.
             | 
             | Generally, I think many people who haven't studied
             | mathematics don't realize how huge the gulf is between
             | "being logical/reasonable" and applying mathematical logic
             | as in a complicated proof. Neither is really of any help
             | for the other. I think this is actually the orthodox
             | position among mathematicians; it's mostly people who might
             | have taken an undergraduate math class or two who might
             | think of one as a gateway to the other. (However there are
             | certainly some basic commonalities between the two. For
             | example, the converse error is important to understand in
             | both.)
        
               | auggierose wrote:
               | I don't think that mathematicians are actually in the
               | best position to judge how math/logic might help in
               | practical applications, because they are usually not
               | interested in practical applications at all (at least for
               | the last 70 years or so). Especially, they are also not
               | interested in logic at all.
               | 
               | But logic _is_ very relevant to  "being
               | logical/reasonable", and seeing how mathematicians apply
               | logic in their proofs is very relevant, and a starting
               | point for more complex applications. I see mathematics as
               | the simplest kind of application of logic you can have if
               | you use only your brain for thinking, and not also a
               | computer.
               | 
               | "Being logical/reasonable" also contains a big chunk of
               | intuition/experience, and that is where machine learning
               | will make a big difference.
        
           | benlivengood wrote:
           | Probabilistic reasoning is possible in a formal setting; It
           | produces a probability distribution over answers. To ground
           | probabilistic logic itself I'm not aware of much progress
           | beyond the initial idea of logical induction[0].
           | 
           | [0] https://arxiv.org/abs/1609.03543
        
             | nybsjytm wrote:
             | This takes for granted a formal setting, which is what I'm
             | questioning in any of these 'real world' contexts.
        
               | benlivengood wrote:
               | > For example, how might an arbitrary statement like
               | "Scholars believe that professional competence of a
               | teacher is a prerequisite for improving the quality of
               | the educational process in preschools" be put in a lean-
               | like language? What about "The theoretical basis of the
               | October Revolution lay in a development of Marxism, but
               | this development occurred through three successive rounds
               | of theoretical debate"?
               | 
               | > This takes for granted a formal setting, which is what
               | I'm questioning in any of these 'real world' contexts.
               | 
               | A formal model of semantics would likely be a low-level
               | physical representation of possible states augmented with
               | sound definitions of higher-level concepts and objects. I
               | don't think humans are capable of developing a formal
               | semantics that would work for your sentences (it's taken
               | us hundreds of years to approach formalization of
               | particle physics), but I think that an automated prover
               | with access to physical experiments and an LLM could
               | probably start building a more comprehensive semantics.
        
           | lukeschlather wrote:
           | People talk about how LLMs need "more data" but data is
           | really sort of a blunt instrument to try and impart the LLM a
           | lot of experience. Unfortunately data isn't actually
           | experience, it's a record of an experience.
           | 
           | Now what we've seen with e.g. Chess and Go, is that when you
           | can give a tensor model "real experience" at the speed of
           | however many GPUs you have, the model is quickly capable of
           | superhuman performance. Automatic theorem proving means you
           | can give the model "real experience" without armies of humans
           | writing down proofs, you can generate and validate proofs as
           | fast as a computer will let you and the LLM has "real
           | experience" with every new proof it generates along with an
           | objective cost function, was the proof correct?
           | 
           | Now, this might not let us give the LLMs real experience with
           | being a teacher or dealing with Marxist revolutionaries, but
           | it would be a sea change in the ability of LLMs to do math,
           | and it probably would let us give LLMs real experience in
           | writing software.
        
           | ianhorn wrote:
           | This is a thing I'm working on, so I have some potentially
           | useful thoughts. tl;dr, it doesn't have to be about encoding
           | _arbitrary_ real life statements to be super duper useful
           | today.
           | 
           | > how might an arbitrary statement like "Scholars believe
           | that professional competence of a teacher is a prerequisite
           | for improving the quality of the educational process in
           | preschools" be put in a lean-like language?
           | 
           | Totally out of scope in the any near future for me at least.
           | But that doesn't prevent it from being super useful for a
           | narrower scope. For example:
           | 
           | - How might we take a statement like "(x + 1) (x - 5) = 0"
           | and encode it formally?
           | 
           | - Or "(X X^T)^-1 X Y = B"?
           | 
           | - Or "6 Fe_2 + 3 H_2 O -> ?"?
           | 
           | We can't really do this for a huge swath of pretty narrow
           | applied problems. In the first, what kind of thing is X? Is
           | it an element of an algebraically closed field? In the
           | second, are those matrices of real numbers? In the third, is
           | that 6 times F times e_2 or 6 2-element iron molecules?
           | 
           | We can't _formally_ interpret those as written, but you and I
           | can easily tell what 's meant. Meanwhile, current ways of
           | formally writing those things up is a massive pain in the
           | ass. Anything with a decent amount of common sense can tell
           | you what is _probably_ meant formally. We know that we can 't
           | have an algorithm that's right 100% of the time for a lot of
           | relevant things, but 99.99% is pretty useful. If clippy says
           | 'these look like matrices, right?' and is almost always
           | right, then it's almost always saving you lots of time and
           | letting lots more people benefit from formal methods with a
           | much lower barrier to entry.
           | 
           | From there, it's easy to imagine coverage and accuracy of
           | formalizable statements going up and up and up until so much
           | is automated that it looks kinda like 'chatting about real-
           | life statements' again. I doubt that's the path, but from a
           | 'make existing useful formal methods super accessible' lens
           | it doesn't have to be.
        
         | thesz wrote:
         | So, you looking for Cyc [1], practically.
         | 
         | [1] https://en.wikipedia.org/wiki/Cyc
        
           | auggierose wrote:
           | Does Cyc have proofs?
        
             | cess11 wrote:
             | In a sense, yes, since it has a foundation in Prolog style
             | facts and rules, and supposedly can output its reasoning.
        
               | auggierose wrote:
               | Ok, sounds like in principle you could have proofs, but
               | in practice, you don't?
               | 
               | Are there any checks for the consistency of all facts?
        
               | cess11 wrote:
               | How could there be? Godel gets in the way.
        
               | auggierose wrote:
               | In most interactive theorem provers the check is simple:
               | you single out the axioms you believe, and all other
               | stuff you do is guaranteed to preserve consistency. That
               | works in an ITP system because there are only a few
               | axioms.
               | 
               | If Cyc has 100000 axioms/facts, then that's a problem.
        
               | thesz wrote:
               | If axioms, facts and rules have types, these types can
               | guide choice.
        
               | auggierose wrote:
               | Not a believer in types here, sorry. I'd rather have
               | simple theorems instead without wading through a swamp of
               | types first. And I am sure AI will be able to handle
               | theorems just fine.
        
               | cess11 wrote:
               | It's not a theorem prover, it's more like a support
               | system for decision making.
               | 
               | They can probably isolate 100k facts from some domain for
               | you if you pay for it, but their idea is to run inference
               | on millions of "common sense" and everyday facts and have
               | been for decades.
        
               | auggierose wrote:
               | That sounds to me like something that will be subsumed by
               | the combination of ITP and AI.
        
           | notTooFarGone wrote:
           | You reminded me of my pains with OpenCyc in my student job.
           | Thanks :D
        
           | trenchgun wrote:
           | >The Cyc knowledge base involving ontological terms was
           | largely created by hand axiom-writing
           | 
           | >An inference engine is a computer program that tries to
           | derive answers from a knowledge base. The Cyc inference
           | engine performs general logical deduction.[8] It also
           | performs inductive reasoning, statistical machine learning
           | and symbolic machine learning, and abductive reasoning. The
           | Cyc inference engine separates the epistemological problem
           | from the heuristicproblem. For the latter, Cyc used a
           | community-of-agents architecture in which specialized
           | modules, each with its own algorithm, became prioritized if
           | they could make progress on the sub-problem.
           | 
           | Not really.
        
         | tucnak wrote:
         | Grammar sampling has been around for months, and remained
         | largely unexplored. Don't fall into the common trap of thinking
         | in a fixed language, rather think about a superset of possible
         | languages (grammars) and how they could evolve from one
         | another. I bet, if there's a breakthrough, it's probably in
         | "differential grammars," or whatever it would be called: plug
         | that into the backtracking sampler, & you have your System 2.
        
         | gosub100 wrote:
         | Ok so Claude says that the Riemann hypothesis is proven and
         | gives you 1200 pages of math symbols backing it up. Now what?
         | 
         | You probably say "now a human can verify it" but what if the
         | humans are wrong? What is the source of truth?
        
       | chompychop wrote:
       | Is it currently possible to reliably limit the cut-off knowledge
       | of an LLM (either during training or inference)? An interesting
       | experiment would be to feed an LLM mathematical knowledge only up
       | to the year of proving a theorem, and then see if it can actually
       | come up with the novel techniques used in the proof. For example,
       | having only access to papers prior to 1993, can an LLM come up
       | with Wiles' proof of FLT?
        
         | ogrisel wrote:
         | That should be doable, e.g. by semi-automated curation of the
         | pre-training dataset. However, since curating such large
         | datasets and running pre-training runs is so expensive, I doubt
         | that anybody will run such an experiment. Especially since
         | would have to trust that the curation process was correct
         | enough for the end-result to be meaningful. Checking that the
         | curation process is not flawed is probably as expensive as
         | running it in the first place.
        
         | n4r9 wrote:
         | There's the Frontier Math benchmarks [0] demonstrating that AI
         | is currently quite far from human performance at research-level
         | mathematics.
         | 
         | [0] https://arxiv.org/abs/2411.04872
        
           | data_maan wrote:
           | They didn't demonstrate anything. They haven't even released
           | their dataset, nor mentioned how big it is.
           | 
           | It's just hot air, just like the AlphaProof announcement,
           | where very little is know about their system.
        
       | chvid wrote:
       | Mathematicians have been using computers, programming languages,
       | and proof engines for over half a century; however breakthroughs
       | in mathematics are still made by humans in any meaningful sense,
       | even though the tools they use and make are increasingly complex.
       | 
       | But as things look now, I will be willing to bet that the next
       | major breakthrough in maths will be touted as being AI/LLMs and
       | coming out of one of the big US tech companies rather than some
       | German university.
       | 
       | Why? Simply, the money is much bigger. Such an event would pop
       | the market value of the company involved by a hundred billion -
       | plenty of incentive right there to paint whatever as AI and hire
       | whoever.
        
         | kzrdude wrote:
         | But, these AI solutions are trying to solve math problems to
         | prove their AI capabilities, not because they care about
         | mathematics.
        
           | staunton wrote:
           | Sure. Why do you say "but"? Solving such a math problem
           | (while perhaps massively overstating the role AI actually
           | played in the solution) would be great PR for everyone
           | involved.
        
       ___________________________________________________________________
       (page generated 2024-11-18 23:01 UTC)