[HN Gopher] Terence Tao on proof checkers and AI programs
       ___________________________________________________________________
        
       Terence Tao on proof checkers and AI programs
        
       Author : antineutrino
       Score  : 175 points
       Date   : 2024-06-11 14:56 UTC (1 days ago)
        
 (HTM) web link (www.scientificamerican.com)
 (TXT) w3m dump (www.scientificamerican.com)
        
       | sylware wrote:
       | I took us only Terence Tao to speak out loud about the ego of
       | many mathematicians refusing to consider "AI" as a potential
       | assistant.
       | 
       | We may realise we need much bigger neural net and training
       | facilities, different models.
       | 
       | And in end, it can be a failure.
       | 
       | You know, R&D.
        
       | pollimr wrote:
       | These tools, and subsequently their unethical parent companies,
       | will be air gapped by length from my research.
        
         | rcpt wrote:
         | https://lean-lang.org/about/ is an open source project
        
       | aeternum wrote:
       | How about: AI will become everyone's co-pilot
       | 
       | Now we don't need to write hundreds of additional articles.
        
         | ProllyInfamous wrote:
         | ~"You don't ever want to be a company programming a bot for a
         | company which is designed to program bots, to program bots,
         | programming bots programming bots."~ [1]
         | 
         | --Mark Zuckerberg, probably [actually not].
         | 
         | I am not a robot, but the above limerick's author.
         | 
         | RIP to my IRL humanbro, who recently programmed himself out of
         | his kushy SalesForce job.
         | 
         | Welcome... to The Future(tm).
         | 
         | --
         | 
         | [1] Paraphrasing Zuckerberg saying he doesn't think a structure
         | of "just managers managing managers" is ideal.
        
       | rthnbgrredf wrote:
       | Will be interesting when they become able to autocomplete current
       | research topics correctly.
        
       | drlemonpepper wrote:
       | https://archive.is/Idouw
        
       | Tarean wrote:
       | Computer-Checked proofs are one area where ai could be really
       | useful fairly soon. Though it may be closer to neural networks in
       | chess engines than full llm's.
       | 
       | You already use tons of solvers because proving everything by
       | hand is mind numbingly tedious and time consuming, but
       | tactics/solvers really struggle if you throw too many theorems
       | and lemmas at them. Neural nets as search machines to pattern
       | match relevant lemmas fit perfectly. Similarly induction and
       | higher order unification are full-on code synthesis where brute-
       | force iterating over all possible syntax trees is really
       | inefficient.
       | 
       | And solvers do a backtracking solve anyway, so if the ai returns
       | 95% irrelevant lemmas it doesn't matter. Still would be a
       | dramatic improvement over manually searching for them.
       | 
       | Though I'm not convinced that computer checked proofs are
       | necessarily good for communication. There are reasons beside
       | page-count that proofs for human consumption are high-level and
       | gloss over details.
        
         | qsort wrote:
         | Not a mathematician, but knowing how proof-checkers work I
         | don't expect them to become practical without some significant
         | future advancement, they certainly aren't right now.
         | 
         | The main problem is that in order to work they have to be
         | connected to the formal definition of the mathematical objects
         | you are using in your proof. But nobody thinks that way when
         | writing (or reading!) a proof, you usually have a high-level,
         | informal idea of what you are "morally" doing, and then you
         | fill-in formal details as needed.
         | 
         | Computer code (kinda) works because the formal semantics of the
         | language is much closer to your mental model, so much that in
         | many cases it's possible to give an operational semantics, but
         | generally speaking math is different in its objectives.
        
           | jvanderbot wrote:
           | You yourself identified some problems that a modern AI guru
           | would salivate over: Formal specifications from natural
           | language, "filling in" formal details. etc
           | 
           | I'll add another one: NN-based speedups to constraint
           | solvers, which are usually something like branch-and-bound.
           | This is (i've heard) a very active area of research.
        
           | g15jv2dp wrote:
           | You're commenting on a thread about an interview where an
           | actual Fields medalist explains how proof assistants are
           | actually used to do meaningful mathematical work, today. The
           | issues you raise are all discussed in the interview.
        
             | qsort wrote:
             | practical = practical for routine mathematical work, which
             | they aren't right now, which Tao says in the article
             | itself.
             | 
             | Never have I ever said or implied, here or anywhere else,
             | that proof checkers are useless and bad and stupid.
        
               | g15jv2dp wrote:
               | > practical = practical for routine mathematical work,
               | which they aren't right now, which Tao says in the
               | article itself.
               | 
               | I see, I guess it would have been nice to be clearer from
               | the start.
               | 
               | > Never have I ever said or implied, here or anywhere
               | else, that proof checkers are useless and bad and stupid.
               | 
               | Did I say you did?
        
             | PartiallyTyped wrote:
             | They = LLMs.
             | 
             | As I've interpreted it, they mean that LLMs and similar
             | tech isn't particularly helpful to theorem provers at the
             | moment.
        
         | bramhaag wrote:
         | Automated theorem proving is a concept that has been around for
         | a long time already.
         | 
         | Isabelle's sledgehammer strategy combines automatic theorem
         | provers (E, Z3, SPASS, Vampire, ...) and attempts to
         | prove/refute a goal. In theory this seems fine, but in practice
         | you end up with a reconstructed proof that applies 12 seemingly
         | arbitrary lemmas. This sort of proof is unreadable and very
         | fragile. I don't see how AI will magically solve this issue.
        
           | gowld wrote:
           | What is a "fragile" proof?
        
             | staunton wrote:
             | It no longer works when some definition or other theorem
             | changes slightly. Most of these proof assistants provide
             | ways to write proofs that can slightly adjust and "plug
             | together" the other theorems in the "obvious" way.
        
         | trott wrote:
         | > Computer-Checked proofs are one area where ai could be really
         | useful fairly soon. Though it may be closer to neural networks
         | in chess engines than full llm's.
         | 
         | Yes, if significant progress is made in sample-efficiency. The
         | current neural networks are extremely sample-inefficient. And
         | formal math datasets are much smaller than those of, say,
         | Python code.
        
       | boyka wrote:
       | Isn't proof checking at least as hard as k-sat and therefore NP?
       | Given that neural networks process in constant time, how would
       | that be possible without approximation?
        
         | hollerith wrote:
         | No.
        
           | boyka wrote:
           | Care to explain?
           | 
           | E.g. https://mathoverflow.net/questions/226966/algorithmic-
           | comple...
        
             | markisus wrote:
             | A proof is basically a directed graph where nodes are
             | propositions, and edges are deductive rules. One only needs
             | to check that the starting nodes are axioms and that each
             | edge is one of a finite set of deduction rules in the
             | system. So the time to check a proof is linear in the
             | number of edges.
        
         | empath75 wrote:
         | This article isn't about using neural networks for proof
         | checking, but just using them to assist with writing Lean
         | programs in the same way that developers use co-pilot to write
         | java code or whatever. It might just spit out boilerplate code
         | for you, of which there still is a great deal in formal proofs.
         | 
         | The proof checking is still actually done with normal proof
         | checking programs.
        
         | sebzim4500 wrote:
         | Checking a proof is much easier than finding it.
        
         | AnimalMuppet wrote:
         | Neural networks process in constant time _per input /output_
         | (token or whatever). But some prompts could produce more
         | tokens.
         | 
         | As you increase the size of the window, doesn't the size of the
         | NN have to increase? So if you want to just use an NN for this,
         | the bigger the problem it can handle, the bigger the window, so
         | the longer per token produced; but also the more tokens
         | produced, so it's longer on that front as well. I don't know if
         | that adds up to polynomial time or not.
         | 
         | Note well: I'm not disagreeing with your overall assertion. I
         | don't know if NNs can do this; I'm inclined to think they
         | can't. All I'm saying is that even NNs are not constant time in
         | problem size.
        
         | YetAnotherNick wrote:
         | Humans run in constant time. So yeah it depends on the constant
         | factor.
        
         | pclmulqdq wrote:
         | Most k-sat problems are easy, or can be reduced to a smaller
         | and easier version of k-sat. Neural networks might be able to
         | write Lean or Coq to feed into a theorem prover.
        
         | memkit wrote:
         | This is kind of amusing because NP problems, by definition,
         | must have a polynomial-length _certification_ to a given
         | solution that can be run in polynomial time.
         | 
         | The word "certification" can be read as "proof" without loss of
         | meaning.
         | 
         | What you're asking is slightly different: whether the proof-
         | checking problem in logic is NP. I don't know enough math to
         | know for sure, but assuming that any proof can be found in
         | polynomial time by a _non-deterministic_ Turing machine and
         | that such a proof is polynomial in length, then yes, the proof-
         | checking problem is in NP. Alas, I think that all of that is
         | moot because Turing and Church 's answer in the negative to the
         | _Entscheidungsproblem_ [1].
         | 
         | It would help your understanding and everyone else's if you
         | separated the _search problem_ for a proof from the
         | _verification problem_ of a proof. Something is NP when you can
         | provide verification that runs in polynomial time. The lesser
         | spoken about second condition is that the search problem must
         | be solved in polynomial time by a non-deterministic Turing
         | machine. Problems which don 't meet the second condition are
         | generally only of theoretical value.
         | 
         | 1. https://en.m.wikipedia.org/wiki/Entscheidungsproblem
        
       | qsort wrote:
       | "their time frame is maybe a little bit aggressive" has to be the
       | nicest way I've ever heard someone say "they're full of shit".
        
         | hyperfuturism wrote:
         | I think he's being quite literal in that he believes that
         | they're too aggressive.
         | 
         | In the article he does afterall second the notion that AI will
         | eventually replace a lot of human labour related to
         | mathematics.
        
           | qsort wrote:
           | I'm not reading it that way, I don't know where you are
           | getting "AI will eventually replace..." from, if anything he
           | seems to have the opposite idea: "even if AI can do the type
           | of mathematics we do now, it means that we will just move to
           | a to a higher type of mathematics".
           | 
           | The line I quoted is in response to "in two to three years
           | mathematics will be "solved" in the same sense that chess is
           | solved".
           | 
           | Unless my priors are dramatically wrong, anyone who believes
           | _that_ is indeed full of shit.
        
       | raphlinus wrote:
       | The description of "project manager mathematicians" reminded me
       | of "A letter to my old friend Jonathan" [1] and the followup [2]
       | by Edsger Dijkstra from 1975. It was written as satire, basically
       | criticizing the way that software was produced by showing how
       | ridiculous it would be to produce mathematics the same way. But
       | in some ways it was prescient. Of course, the _main_ point of
       | these documents is to critique intellectual property
       | (particularly the absurdity of applying it to mathematical truth)
       | but fortunately that aspect is not a significant concern of this
       | push toward mechanization.
       | 
       | [1]:
       | https://www.cs.utexas.edu/users/EWD/transcriptions/EWD04xx/E...
       | 
       | [2]:
       | https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/E...
        
       | ur-whale wrote:
       | Relying on GPT style systems to translate high level math to lean
       | stuff strikes me as dangerous unless you're going to double-check
       | by hand that the LLM didn't suddenly decide to hallucinate parts
       | of the answer.
       | 
       | Granted, the hallucinated part might not pass the final Lean
       | verification, but ... what if it does because the LLM did not
       | properly translate the mathematician's intent to Lean language?
       | 
       | I'm thinking the human will still need to read the Lean program,
       | understand it and double-check that his original human intent was
       | properly codified.
       | 
       | Better than having to produce it by hand, but still sub-optimal.
        
         | kccqzy wrote:
         | I disagree. I think if the hallucinated part pass the final
         | Lean verification, it would actually be trivial for a
         | mathematician to verify the intent is reflected in the theorem.
         | 
         | And also, in this case, doesn't the mathematician first come up
         | with a theorem and then engage AI to find a proof? I believe
         | that's what Terence did recently by following his Mastodon
         | threads a while ago. Therefore that intent verification can be
         | skipped too. Lean can act as the final arbiter of whether the
         | AI produces something correct or not.
        
         | constantcrying wrote:
         | There is no danger. If the output of the Neural net is wrong,
         | then Lean will catch it.
         | 
         | >what if it does because the LLM did not properly translate the
         | mathematician's intent to Lean language?
         | 
         | How so? The one thing which definitely needs to be checked by
         | hand is the theorem to be proven, which obviously shouldn't be
         | provided by a Neural network. The NN will try to find the
         | proof, which is then verified by Lean, but fundamentally the NN
         | can't prove the wrong theorem and have it verified by Lean.
        
         | jerf wrote:
         | GPT != AI. GPT is a subset of AI. GPT is probably absolutely
         | useless for high-level math, if for no other reason that to be
         | useful you'd need training data that basically by definition
         | doesn't exist. Though I am also strongly inclined to believe
         | that LLMs are constitutionally incapable of functioning at that
         | level of math regardless, because that's simply not what they
         | are designed to do.
         | 
         | But AI doesn't have to mean LLMs, recent narrowing of the
         | term's meaning notwithstanding. There's all kinds of ways AIs
         | in the more general sense could assist with proof writing in a
         | proof language, and do it in a way that each step is still
         | sound and reliable.
         | 
         | In fact once you accept you're working in a proof language like
         | lean, the LLM part of the problem goes away. The AIs can work
         | directly on such a language, the problems LLMs solve simply
         | aren't present. There's no need to understand a high-context,
         | fuzzy-grammar, background-laden set of symbols anymore. Also no
         | problem with the AI "explaining" its thinking. We might still
         | not be able to follow too large an explanation, but it would
         | certainly be present even so, in a way that neural nets don't
         | necessarily have one.
        
           | gowld wrote:
           | > GPT is probably absolutely useless for high-level math
           | 
           | Terence Tao disagrees with you.
           | 
           | https://mathstodon.xyz/@tao/110601051375142142
        
             | renewiltord wrote:
             | This tool is like all other tools in that expert
             | practitioners extract value from it and novices consider it
             | useless. Fascinating. The advent of search engines was
             | similar with many remarking on how if you searched for
             | something sometimes you'd find websites that were wrong.
             | Things change and yet stay the same. At the time I thought
             | age was the discriminator. Today I know it's not that. It's
             | something else.
        
             | jerf wrote:
             | If it could be solved with a Math Overflow-post level of
             | effort, even from Terence Tao, it isn't what I was talking
             | about as "high level math".
             | 
             | I also am not surprised by "Consider a generation function"
             | coming out of an LLM. I am talking about a system that
             | could _solve_ that problem, entirely, as doing high level
             | math. A system that can emit  "have you considered using
             | wood?" is not a system that can build a house autonomously.
             | 
             | It especially won't seem all that useful next to the
             | generation of AIs I anticipate to be coming which use LLMs
             | as a component to understand the world but are not just big
             | LLMs.
        
         | m4lvin wrote:
         | The trick is that the human only needs to read and understand
         | the Lean _statement_ of a theorem and agree that it (with all
         | involved definitions) indeed represents the original
         | mathematical statement, but not the _proof_. Because that the
         | proof is indeed proving the statement is what Lean checks. We
         | do not need to trust the LLM in any way.
         | 
         | So would I accept a proof made by GPT or whatever? Yes. But not
         | a (re)definition.
         | 
         | The analogy for programming is that if someone manages to write
         | a function with a certain input and output types, and the
         | compiler accepts it, then we do know that indeed someone
         | managed to write a function of that type. Of course we have no
         | idea about the behaviour, but statements/theorems are types,
         | not values :-)
        
           | cubefox wrote:
           | The thing is, when AI systems are able to translate intuitive
           | natural language proofs into formal Lean code, they will also
           | be used to translate intuitive concepts into formal Lean
           | definitions. And then we can't be sure whether those
           | definitions actually define the concepts they are named
           | after.
        
       | egl2021 wrote:
       | For now this might be a project for Fields medalists, like Tao
       | and Scholze, who have the leisure to spend 10x the time on a
       | proof. I recently talked to a post-doc in a top-ranked math
       | department, and he doesn't know anyone who actually uses lean4 in
       | their work. For early-career people, it's probably better to
       | trust your intuition and get the papers out.
        
         | g15jv2dp wrote:
         | I'm a math professor and I know many people who use or
         | contribute to lean. Not all of them are Fields medalists. Maybe
         | it takes more than a couple of people's opinion on the subject
         | to have a good picture.
        
           | shakow wrote:
           | > people who use or contribute to lean
           | 
           | And do they get tenure- or funding-able output of it, or is
           | it more of a side-hobby?
        
             | g15jv2dp wrote:
             | Yes, they get grants to work on it. Other questions? I
             | can't say I enjoy this "gotcha" tone.
        
               | Jensson wrote:
               | So they get grants to work on these tools rather than to
               | work on math, seems like a very special situation then
               | and a very special math department and therefore not a
               | good picture for how helpful it is to math departments in
               | general.
        
               | nimih wrote:
               | "No _true_ working mathematician would spend their time
               | using Lean! "
        
               | g15jv2dp wrote:
               | I guess we got in the top 10 worldwide for math in the
               | Shanghai ranking by accident. If you don't know what
               | you're speaking about, sometimes it's wise to not say
               | anything.
        
               | seanhunter wrote:
               | The mathematicians I know of who work on Lean4 as part of
               | grant-funded research do so because their research
               | funding is for formalising mathematics and Lean4 is the
               | tool they choose to use for that.
               | 
               | I really have no idea why people are taking this approach
               | here.
               | 
               | Proof assistants are useful. People use them to do actual
               | maths. There is a lot of work to formalise all of maths
               | (this isn't a proof assistant thing this is a huge
               | undertaking given maths is 3000+ years old and we've only
               | just decided what formality really fundamentally consists
               | of) so some places you need to do more foundational work
               | than others to make it useful.
        
               | szvsw wrote:
               | Isn't working on these tools a form of working on math?
               | Math is a language unto itself.
        
               | nickpsecurity wrote:
               | I'm with them. They're using a tool they're paid to work
               | on. That's always a special case. The general case would
               | be what percentage of mathematicians use (a) proof
               | assistants at all, (b) mechanized proof assistants, and
               | (c) Lean specifically.
               | 
               | Do most working mathematicians use proof assistant or
               | checkers? Does anyone have a percentage with feedback on
               | how they use them?
        
               | TeMPOraL wrote:
               | > _They're using a tool they're paid to work on. That's
               | always a special case._
               | 
               | FWIW, the industry term for this is "dogfooding", and
               | it's usually seen as a predictor of high quality tool.
        
               | GPerson wrote:
               | The person you are replying to asked a very useful
               | question, and you supplied a useful answer. I don't
               | detect any snarky tone in their comment.
        
               | shakow wrote:
               | That's not a "gotcha", but an interrogation on whether
               | what I experienced in bioinformatics (i.e. working on
               | tools, especially already existing ones, is a carreer
               | trap) also applied to maths.
        
               | a_wild_dandan wrote:
               | Snark and contrarianism are specialties around here.
               | Thank you for sharing your experience; I genuinely
               | appreciate it.
        
         | occamrazor wrote:
         | For Tao, spending 10x time on aproof still means spending 5x
         | less time than an average postdoc. He is incredibly fast and
         | productive.
        
         | flostk wrote:
         | I'm sure this also depends a lot on the field within
         | mathematics. Areas like mathematical logic or algebra have a
         | pretty formal style anyway, so it's comparatively less effort
         | to translate proofs to lean. I would expect more people from
         | these areas to use proof checkers than say from low-dimensional
         | topology, which has a more "intuitive" style. Of course by
         | "intuitive" I don't mean it's any less rigorous. But a picture
         | proof of some homotopy equivalence is just much harder to
         | translate to something lean can understand than some sequence
         | of inequalities. To add a data point, I'm in geometry/topology
         | and I've never seen or heard of anyone who uses this stuff (so
         | far).
        
           | enugu wrote:
           | Pictures can be directly used in formal proofs. See [1] for
           | instance So, the problem is not in principle, but rather,
           | pictures and more direct syntax has not yet been integrated
           | into current proof systems. When that is done, proofs will be
           | much more natural and the formal proof will be closer to the
           | 'proof in the mind'.
           | 
           | [1] https://www.unco.edu/nhs/mathematical-
           | sciences/faculty/mille...
        
         | yboris wrote:
         | _lean4_ - programming language and theorem prover
         | 
         | https://lean-lang.org/
         | 
         | https://github.com/leanprover/lean4
        
       | iamwil wrote:
       | A tangential topic, but anyone have recommendations for proof-
       | checking languages? Lean, Isabelle, Coq, Idris are the few I know
       | about. I've only ever used Isabelle, but can't really tell the
       | difference between them. Anyone here used more than one, and can
       | tell why people pick one over the other?
        
         | mathgradthrow wrote:
         | Lean has the largest existing math library, but you're kind of
         | stuck in a microsoft ecosystem.
        
           | mkehrt wrote:
           | I'm not sure what you mean. I've been playing around with
           | Lean and it works fine on a Mac. I am using VSCode, though.
        
           | ykonstant wrote:
           | If you mean dependence on VS Code, there are emacs and neovim
           | packages for Lean; some of the core devs use those
           | exclusively. Also note that de Moura is no longer at
           | Microsoft and Lean is not an MS product.
        
         | lacker wrote:
         | They are pretty similar as languages. In general I would
         | recommend Lean because the open source community is very active
         | and helpful and you can easily get help with whatever you're
         | doing on Zulip. (Zulip is a Slack/Discord alternative.) The
         | other ones are technically open source but the developers are
         | far less engaged with the public.
         | 
         | Proof-checking languages are still just quite hard to use,
         | compared to a regular programming language. So you're going to
         | need assistance in order to get things done, and the Lean
         | community is currently the best at that.
        
         | einszwei wrote:
         | I have personally tried Coq and Lean and stuck to Lean as it
         | has much better tooling, documentation and usability while
         | being functionally similar to Coq.
         | 
         | Caveat that I have only used Coq and Lean side by side for
         | about a month before deciding to stick with Lean.
        
         | tombert wrote:
         | Isabelle is my favorite, and it's the one I've used the most,
         | if only because sledgehammer is super cool, but Coq is neat as
         | well. It has coqhammer (a word I will not say at work), and
         | that works ok but it didn't seem to work as well as
         | sledgehammer did for me.
         | 
         | Coq is much more type-theory focused, which I do think lends it
         | a bit better to a regular programming language. Isabelle feels
         | a lot more predicate-logic and set-theory-ish to me, which is
         | useful in its own right but I don't feel maps quite as cleanly
         | to code (though the Isabelle code exporter has generally worked
         | fine).
        
         | awfulneutral wrote:
         | Another one is Oak: https://oakproof.org/
        
         | vzaliva wrote:
         | One consideration in choosing a proof assistant is the type of
         | proofs you are doing. For example, mathematicians often lean
         | towards Lean (pun intended), while those in programming
         | languages favour Coq or Isabelle, among others.
        
         | cubefox wrote:
         | As Tao says in the interview: Lean is the dominant one.
        
         | trott wrote:
         | There's also https://en.wikipedia.org/wiki/Metamath
         | 
         | Its smallest proof-checker is only 500 lines.
        
       | munchler wrote:
       | Terence Tao also gave a great talk on this topic a few months
       | ago, where he goes into more detail about uses of Lean:
       | https://www.youtube.com/watch?v=AayZuuDDKP0
        
       | thecleaner wrote:
       | Does Lean work with linear slgebra proofs ?
        
         | gowld wrote:
         | You need to ask a more specific question.
         | 
         | https://leanprover-community.github.io/theories/linear_algeb...
        
       | gowld wrote:
       | > German mathematician and Fields Medalist Peter Scholze
       | collaborated in a Lean project--even though he told me he doesn't
       | know much about computers.
       | 
       | > With these formalization projects, not everyone needs to be a
       | programmer. Some people can just focus on the mathematical
       | direction; you're just splitting up a big mathematical task into
       | lots of smaller pieces. And then there are people who specialize
       | in turning those smaller pieces into formal proofs. We don't need
       | everybody to be a programmer; we just need some people to be
       | programmers. It's a division of labor.
       | 
       | Who can get paid to be a "mathematical programmer" ?
       | 
       | PhD mathematicians? Grad students?
        
       | zero_k wrote:
       | Proofs and proof checking is also big in SAT solving (think:
       | MiniSat) and SMT (think: z3, cvc5). It's also coming to model
       | counting (counting the number of solutions). In fact, we did one
       | in Isabelle for an approximate model counter [1], that provides
       | probabilistically approximate count (a so-called PAC guarantee),
       | which is a lot harder to do, due to the probabilistic nature.
       | 
       | [1] https://github.com/meelgroup/approxmc
        
       | tmsh wrote:
       | Really insightful. What's missing though I think is that LLMs
       | abstract things in ways that are increasingly super-human. Tao
       | says: "It's very easy to transform a problem into one that's
       | harder than into one that's simpler. And AI has not demonstrated
       | any ability to be any better than humans in this regard." But I
       | think due to the way LLMs work there will be much higher-level
       | insights that are possible. And that is the really exciting part.
       | 
       | Right now it's a helper. A fact checker. And doer of tedious
       | things. Soon it will be a suggester of insights. Already LLMs
       | compress embeddings and knowledge and have insights that we don't
       | see.
       | 
       | Hinton gives the example of how a nuclear bomb and a compost heap
       | are related and that most humans would miss that:
       | https://www.youtube.com/watch?v=Gg-w_n9NJIE&t=4613s
        
         | Salgat wrote:
         | The biggest problem with LLMs are that they are simply
         | regressions of writings of millions of humans on the internet.
         | It's very much a machine that is trained to mimic how humans
         | write, nothing more.
         | 
         | The bigger concern is that we don't have the training data
         | required for it to mimic something smarter than a human
         | (assuming that a human is given sufficient time to work on the
         | same problem).
         | 
         | I have no doubt ML will exceed human intelligence, but the
         | current bottleneck is figuring out how to get a model to train
         | itself against its own output without human intervention,
         | instead of doing a regression against the entire world's
         | collective writings. Remember, Ramanujan was, with no formal
         | training and only access to a few math books, able to achieve
         | brilliant discoveries in mathematics. In terms of training data
         | used by ML models, that is basically nothing.
        
           | zozbot234 wrote:
           | > the current bottleneck is figuring out how to get a model
           | to train itself against its own output without human
           | intervention
           | 
           | Isn't that what AlphaZero was known for? After all, the
           | 'zero' part of the name was a play on 'zero-shot learning'
           | which is the concept you're talking about. So we have a proof
           | of existence already, and there's been recent papers showing
           | progress in some mathematical domains (though simplified ones
           | so far, such as Euclidean geometry - which does however
           | feature complex proofs).
        
           | mofeien wrote:
           | LLMs are increasingly trained on multimodal data besides
           | human written text, such as videos showing complex real-world
           | phenomena that humans may also observe, but not understand,
           | or time series data that humans may observe but be incapable
           | of predicting. I don't see a fundamental reason why the LLM
           | that is trained on such data may not already become a
           | superhuman predictor of e. g. time-series data, with the
           | right model architecture, amount of data, and compute.
           | 
           | In a way, next-token prediction is also a time-series
           | prediction problem, and one that is arguably more difficult
           | than producing the time-series/text in the first place. That
           | is since to predict what a human will say about a topic, you
           | don't just need to model the topic, but also the mental
           | processes of the human talking about it, adding another layer
           | of abstraction.
        
           | tmsh wrote:
           | Right, I think the insight that LLMs "create" though is via
           | compression of knowledge in new ways.
           | 
           | It's like teaching one undergraduate (reading some blog on
           | the internet and training on next word prediction), then
           | another undergraduate (training on another blog), then
           | another one, etc. And finding a way to store what's in common
           | among all these undergraduates. Over time, it starts to see
           | commonalities and to become more powerful than a human. Or a
           | human who had taught a million undergraduates. But it's not
           | just at the undergraduate level. It's the most
           | abstract/efficient way to represent the essence of their
           | ideas.
           | 
           | And in math that's the whole ball game.
        
           | mistrial9 wrote:
           | > but the current bottleneck is figuring out how to get a
           | model to train itself against its own output without human
           | intervention
           | 
           | this is sort of humorous, because the entire current
           | generation of Transformer-based encoders was built
           | specifically to get "a model to train itself without human
           | intervention"
           | 
           | ref. arXiv:2304.12210v2 [cs.LG] 28 Jun 2023 * A Cookbook of
           | Self-Supervised Learning
        
           | rdevsrex wrote:
           | Ramanujan attributed his success in no small part to divine
           | assistance. I wonder if sentient AI would do the same :)
        
         | calf wrote:
         | Hinton's example doesn't resonate for me -- I do gardening as a
         | hobby, so my immediate answer to his "What is common to nukes
         | and compost heap?" -- was the concept of critical mass. It took
         | me 10 seconds to articulate the word, so I'm not as fast, but I
         | still got the "correct" analogy right away, because with the
         | right information the answer is obvious.
         | 
         | Hinton says this demonstrates analogical thinking. I read about
         | gardening online and I've probably read about compost heap
         | physics written down somewhere online.
         | 
         | So what is more likely happened was that ChatGPT was previously
         | trained some article about compost heap chain reactions,
         | because there's a lot of soil science and agriculture that
         | discusses this, even for laypeople like myself that I've
         | already read, so that it isn't a long reach if you know enough
         | about both topics.
         | 
         | Thus Hinton's example strikes me as more having failed to
         | control for LLMs having already basically seen the answer
         | before in its training texts.
         | 
         | EDIT: A few minutes later in the video, Ilya comments
         | (https://youtu.be/Gg-w_n9NJIE?t=4966): "We have a proof of
         | existence, the human brain is a neural network" (and the
         | panelist agrees, assuming/barring any computational "magic"
         | happening in the human brain. I'm interested in this comment,
         | because I generally hold that position as well, but I see many
         | naysayers and dissenters who will respond "Well but real
         | neurons don't run on 8 bits, neurons have different types and
         | cells and contain DNA in a chemical soup of hormones, etc.
         | etc., you are just confusing the model for the real thing in a
         | backwards way", and so on. So I think there's an interesting
         | divide in philosophy there as well, and I don't personally have
         | an articulate answer to explain why I might hold the position
         | that natural brains are also neural networks other than an
         | appeal to computationalism.
        
         | 29athrowaway wrote:
         | If one AI based system becomes an expert in writing math
         | proofs, the cost of getting another one is simply copying files
         | and provisioning them into different cloud instances. This can
         | be done within minutes to hours.
         | 
         | If one human becomes an expert in writing math proofs, the cost
         | of getting another one is waiting for a human within the human
         | population that likes math as a career, the cost of waiting for
         | that human to finish school (decades) and getting advanced
         | specialization, with no guarantee that the human will finish
         | school or become proficient enough to advance the frontier of
         | knowledge. By the time you finished waiting for this, you could
         | have trillions of AI experts working in parallel.
         | 
         | The bandwidth of the brain when it comes to assimilating new
         | information is low (up to hundreds of words per minute).
         | Machines will be able to clone a lifetime of knowledge in
         | seconds, have thousands of conversations in parallel, even
         | serializing parts of their "brain" and sending them over to
         | other AIs, like the Borg. These are things that are not
         | possible within the human condition.
         | 
         | And once programmable matter is attained, AI will produce
         | exponential amounts of computronium and do the equivalent of
         | millennias of research in seconds.
        
       | Tossrock wrote:
       | A working mathematician recently got GPT-4o to prove an (afaik)
       | novel lemma while pursuing their research:
       | 
       | "My partner, who is mathematician, used ChatGPT last week for the
       | first time to prove some lemmas for his research. He already
       | suspected those lemmas were true and had some vague idea how to
       | approach them, but he wasn't expert in this type of statement.
       | This is the first time that he got correct and useful proofs out
       | of the models.
       | 
       | [...]
       | 
       | These are the two lemmas. The first one is something that a
       | collaborator of my partner noticed for small values of e in their
       | computations. ChatGPT did not find the proof before being told to
       | try Mobius functions.
       | 
       | https://chatgpt.com/share/9ee33e31-7cec-4847-92e4-eebb48d4ff...
       | 
       | The second looks a bit more standard to me, probably something
       | that Mathematica would have been able to do as well. But perhaps
       | that's just because I am more familiar with such formulas. Still,
       | Mathematica doesn't give a nice derivation, so this is useful.
       | 
       | https://chatgpt.com/share/7335f11d-f7c0-4093-a761-1090a21579...
       | 
       | "
        
         | a_wild_dandan wrote:
         | In a recent interview, Ben Goertzel mentioned having a PhD-
         | level discussion with GPT-4 about novel math ideas cooked up by
         | himself and a colleague. These things are becoming scary good
         | at reasoning and heavyweight topics. As the ML field continues
         | focusing on adding more System Two capabilities to complement
         | LLMs' largely System One thinking, things will get wild.
        
           | UniverseHacker wrote:
           | This is my experience as well- even the popular hype
           | underestimates how creative and intelligent LLMs can be. Most
           | discussions are reasoning about what it should or shouldn't
           | be capable of based on how we think they work, without really
           | looking directly at what is can actually do, and not focusing
           | on what it can't do. They are very alien- much worse than
           | humans at many things, but also much better at others. And
           | not in the areas we would expect in both cases. Ironically
           | they are especially good at creative and abstract thinking
           | and very bad at keeping track of details and basic
           | computation, nearly the opposite of what we'd expect from
           | computers.
        
             | mu53 wrote:
             | I am pessimistic about AI. The technology is definitely
             | useful, but I watched a video that made good points.
             | * On benchmark style tests, LLMs are not improving. GPT-4o
             | is equivalent to GPT-4 and both barely improve on GPT-3.5T.
             | * AI companies have been caught outright lying in demos and
             | manipulating outcomes by pre-training for certain
             | scenarios.       * The main features added to GPT-4o have
             | been features that manipulate humans with dark patterns
             | * The dark patterns include emotional affects in tone and
             | cute/quirky responses       * These dark patterns encourage
             | people to think of LLMs as humans that have a similar
             | processing of the universe.
             | 
             | I seriously wonder about the transcript that these guys had
             | with the LLM. Were they suggesting things? Did ChatGPT just
             | reconfigure words that helped them think through the
             | problem? I think the truth is that ChatGPT is a very
             | effective rubber duck debugger.
             | 
             | > https://www.youtube.com/watch?v=VctsqOo8wsc
        
               | UniverseHacker wrote:
               | Try playing around with getting GPT-4 to discuss creative
               | solutions to real unsolved problems you are personally an
               | expert on, or created yourself. That video looks like
               | just standard internet ragebait to me.
               | 
               | I find it pretty annoying when people say you are just
               | being manipulated by hype if you are impressed by LLMs,
               | when I was a serious skeptic, thought GPT-3 was useless,
               | and only changed my opinion by directly experimenting
               | with GPT-4 on my own- by getting it to discuss and solve
               | problems in my area of expertise as an academic
               | researcher.
        
         | krackers wrote:
         | Woah that second proof is really slick. Is there a
         | combinatorial proof instead of algebraic proof as to why that
         | holds?
        
           | gjm11 wrote:
           | Yes.
           | 
           | (HN comments aren't the best medium for writing mathematics.
           | I hope this ends up reasonably readable.)
           | 
           | (d choose k) is the number of ways to pick k things from d.
           | 
           | (d choose k) k^2 is the number of ways to pick k things from
           | d, and then colour one of them red and one of them (possibly
           | the same one) blue.
           | 
           | (-1)^k (d choose k) k^2 is that (when k is even), or minus
           | that (when k is odd).
           | 
           | So the identity says: Suppose we consider a set of d things,
           | and we consider picking _some_ of them (meaning 0 or more,
           | though actually 0 is impossible; k is the number we pick) and
           | colouring one of those red and one blue; then there are the
           | same number of ways to do this picking an _odd_ number of
           | things as picking an _even_ number.
           | 
           | Well, let's describe the process a bit differently. We take
           | our set of d things. We colour one of them red. We colour one
           | of them blue. Then we pick some set of the things (k in
           | number), including the red thing and the blue thing (which
           | may or may not also be the red thing). And the claim is that
           | there are the same number of ways to do this with an even
           | value of k as with an odd number.
           | 
           | But now this is almost trivial!
           | 
           | Once we've picked our red and blue things, we just have to
           | decide which of the _other_ things to pick. And as long as
           | there 's at least one of those, there are as many ways to
           | pick an odd number of things as to pick an even number. (Fix
           | one of them; we can either include it or not, and switching
           | that changes the parity.)
           | 
           | So if d>2 then we're done -- for each choice of red and blue,
           | there are the same number of "odd" and "even" ways to finish
           | the job.
           | 
           | What if d<=2? Well, then the "theorem" isn't actually true.
           | 
           | d=1: (1 choose 0) 0^2 - (1 choose 1) 1^2 = -1.
           | 
           | d=2: (2 choose 0) 0^2 - (2 choose 1) 1^2 + (2 choose 2) 2^2 =
           | 2.
        
             | krackers wrote:
             | >We take our set of d things. We colour one of them red. We
             | colour one of them blue. Then we pick some set of the
             | things (k in number), including the red thing and the blue
             | thing
             | 
             | In this alternate view of the process, where does the k^2
             | term come from? Isn't the number of ways to do it in this
             | perspective just
             | 
             | (d-1 choose k-1) # We already chose the red one (which is
             | also the blue) and need to choose k-1 more
             | 
             | +
             | 
             | (d-2 choose k-2) # We chose both red and blue ones and need
             | to choose the other k-2
             | 
             | Edit: I think it actually works out, seems to be equal to
             | 
             | d * 1 * Binomial[d - 1, k - 1] + d * (d - 1) * Binomial[d -
             | 2, k - 2] = Binomial[d, k]*k^2
             | 
             | And then you apply your argument about fixing one of the
             | elements in each of the two cases. Actually the proof seems
             | to be a variant (same core argument) as sum of even
             | binomial terms = sum of odd binomial terms [1]
             | 
             | [1] https://math.stackexchange.com/questions/313832/a-combi
             | nator...
        
               | gjm11 wrote:
               | In the alternate view, the k^2 isn't there any more. Not
               | because we are computing a different number, but because
               | we're organizing the computation differently.
               | 
               | (It's turned into a d^2 factor, except that we then need
               | to treat the red=blue case differently from the red!=blue
               | case, so it's really a d factor plus a d^2-d factor.)
               | 
               | And yes, the last bit of the proof is exactly the same
               | thing as one way to prove that the sum of "even" and
               | "odd" binomial coefficients are equal -- provided n>0.
               | (For the (0 choose whatever) case, the "even" sum is 1
               | and the "odd" sum is 0.)
        
         | Sniffnoy wrote:
         | Sorry, what is this quoted from?
         | 
         | It's interesting because that first one is something I was
         | discussing recently with my friend Beren (does he have an
         | account here? idk). We were thinking of it as summing over
         | ordered partitions, rather than over partitions with a factor
         | of |tau|!, but obviously those are the same thing.
         | 
         | (If you do it over _cyclically_ ordered partitions -- so, a
         | factor of (|tau|-1)! rather than |tau|! -- you get 0 instead of
         | (-1)^e.)
         | 
         | Here's Beren's combinatorial proof:
         | 
         | First let's do the second one I said, about cyclically ordered
         | ones, because it's easier. Choose one element of the original
         | set to be special. Now we can set up a bijection between even-
         | length and odd-length cyclically ordered partitions as follows:
         | If the special element is on its own, merge it with the part
         | after it. If it's not on its own, split it out into its own
         | part before the part it's in. So if we sum with a sign factor,
         | we get 0.
         | 
         | OK, so what about the linearly ordered case? We'll use a
         | similar bijection, except it won't quite be a bijection this
         | time. Pick an order on your set. Apply the above bijection with
         | the last element as the special element. Except some things
         | don't get matched, namely, partitions that have the last
         | element of your set on its own as the last part.
         | 
         | So in that case, take the second-to-last element, and apply
         | recursively, etc, etc. Ultimately everything gets matched
         | except for the paritition where everything is separate and all
         | the parts are in order. This partition has length equal to the
         | size of the original set. So if the original set was even you
         | get one more even one, and if the original set was odd you get
         | one more odd one. Which rephrased as a sum with a sign factor
         | yields the original statement.
         | 
         | Would be interested to send this to the mathematician you're
         | referring to, but I have no idea who that might be since you
         | didn't say. :)
        
       | dorfsmay wrote:
       | I just spent hours looking at formal methods and had decided to
       | learn Isabelle/HOL, then this article !
        
         | garbthetill wrote:
         | may i ask why isabelle over agda or lean?
        
           | dorfsmay wrote:
           | I know nothing, I am a complete beginner when it comes to
           | formal methods. From reading about it, it seems that
           | Isabelle/HOL is the best when it comes to automation which
           | apparently is something you really want. It might be easier
           | to learn (controversial, some say Lean is easier). It's been
           | used to prove some software (including sel4 and a version of
           | java), Apple and AWS are using it (but then I know AWS uses,
           | or used, TLA+).
           | 
           | At the end of the day, I didn't want to spend more time
           | reading about it then learning two of them (trying one and
           | potentially switch later). The more you read about it, the
           | more options open up (SPARK, TLA+, COQ, etc...).
           | 
           | I do find it ironic to read this article today given that I
           | made that decision yesterday!
        
             | hwayne wrote:
             | [shameless plug]I maintain a collection of proofs of
             | leftpad in different prover languages, so people can
             | compare them. It's here: https://github.com/hwayne/lets-
             | prove-leftpad
             | 
             | [/invalid closing tag]
        
           | chaoskanzlerin wrote:
           | The main upside Isabelle has over other proof assistants is
           | the existence of Sledgehammer: invoke it, and your current
           | proof state gets handed off to a SMT solver like cvc4, veriT,
           | z3, vampire, etc. If the SMT solver finds a solution,
           | Isabelle then reconstructs the proof.
           | 
           | It's essentially a brute-force button that no other proof
           | assistant (aside from Coq) has.
        
       | grondilu wrote:
       | "I think in the future, instead of typing up our proofs, we would
       | explain them to some GPT. And the GPT will try to formalize it in
       | Lean as you go along. If everything checks out, the GPT will
       | [essentially] say, "Here's your paper in LaTeX; here's your Lean
       | proof. If you like, I can press this button and submit it to a
       | journal for you." It could be a wonderful assistant in the
       | future."
       | 
       | That'd be nice, but eventually what will happen is that the human
       | will submit a mathematical conjecture, the computer will
       | internally translate into something like Lean, and try to find
       | either a proof or a disproof. It will then translate it in
       | natural language and tell it to the user.
       | 
       | Unless mathematics is fundamentally more complicated than chess
       | or go, I don't see why that could not happen.
        
         | qfiard wrote:
         | It is fundamentally more complicated. Possible moves can be
         | evaluated against one another in games. We have no idea how to
         | make progress on many math conjectures, e.g. Goldbach or
         | Riemann's. An AI would need to find connections with different
         | fields in mathematics that no human has found before, and this
         | is far beyond what chess or Go AIs are doing.
        
           | salamo wrote:
           | Not a mathematician, but I can imagine a few different things
           | which make proofs much more difficult if we simply tried to
           | map chess algorithms to theorem proving. In chess, each board
           | position is a node in a game tree and the legal moves
           | represent edges to other nodes in the game tree. We could
           | represent a proof as a path through a tree of legal
           | transformations to some initial state as well.
           | 
           | But the first problem is, the number of legal transformations
           | is actually infinite. (Maybe I am wrong about this.) So it
           | immediately becomes impossible to search the full tree of
           | possibilities.
           | 
           | Ok, so maybe a breadth-first approach won't work. Maybe we
           | can use something like Monte Carlo tree search with move
           | (i.e. math operation) ordering. But unlike chess/go, we can't
           | just use rollouts because the "game" never ends. You can
           | always keep tacking on more operations.
           | 
           | Maybe with a constrained set of transformations and a really
           | good move ordering function it would be possible. Maybe Lean
           | is already doing this.
        
             | grondilu wrote:
             | > But the first problem is, the number of legal
             | transformations is actually infinite.
             | 
             | I am fairly certain the number of legal transformations in
             | mathematics is not infinite. There is a finite number of
             | axioms, and all proven statements are derived from axioms
             | through a finite number of steps.
        
               | hwayne wrote:
               | Technically speaking one of the foundational axioms of
               | ZFC set theory is actually an axiom _schema_ , or an
               | infinite collection of axioms all grouped together. I
               | have no idea how lean or isabelle treat them.
        
         | szvsw wrote:
         | Chess and Go are some of the simplest board games there are.
         | Board gamers would put them in the very low "weight" rankings
         | from a rules complexity perspective (compared to "modern" board
         | games)!
         | 
         | Mathematics are infinitely (ha) more complex. Work your way up
         | to understanding (at least partially) a proof of Godel's
         | incompleteness theorem and then you will agree! Apologies if
         | you have done that already.
         | 
         | To some extent, mathematics is a bit like drunkards looking for
         | a coin at night under a streetlight because that's where the
         | light is... there's a whole lot more out there though (quote
         | from a prof in undergrad)
        
         | hwayne wrote:
         | > Unless mathematics is fundamentally more complicated than
         | chess or go, I don't see why that could not happen.
         | 
         | My usual comparison is Sokoban: there are still lots of levels
         | that humans can beat that all Sokoban AIs cannot, including the
         | AIs that came out of Deepmind and Google. The problem is that
         | the training set is so much smaller, and the success heuristics
         | so much more exacting, that we can't get the same benefits from
         | scale as we do with chess. Math is even harder than that.
         | 
         | (I wonder if there's something innate about "planning problems"
         | that makes them hard for AI to do.)
        
       | emporas wrote:
       | > Let's say an AI supplies an incomprehensible, ugly proof. Then
       | you can work with it, and you can analyze it. Suppose this proof
       | uses 10 hypotheses to get one conclusion--if I delete one
       | hypothesis, does the proof still work? That's a science that
       | doesn't really exist yet because we don't have so many AI-
       | generated proofs, but I think there will be a new type of
       | mathematician that will take AI-generated mathematics and make it
       | more comprehensible.
       | 
       | My thoughts exactly about designing public APIs of code. A role
       | which traditionally performed only by experienced developers, now
       | it can be greatly simplified and accessible to everyone.
        
       | mik1998 wrote:
       | All I can read from this article is "the end of mathematics is on
       | the horizon". Hopefully nothing he says actually materializes.
        
       ___________________________________________________________________
       (page generated 2024-06-12 23:00 UTC)