[HN Gopher] Terence Tao on proof checkers and AI programs
___________________________________________________________________
Terence Tao on proof checkers and AI programs
Author : antineutrino
Score : 175 points
Date : 2024-06-11 14:56 UTC (1 days ago)
(HTM) web link (www.scientificamerican.com)
(TXT) w3m dump (www.scientificamerican.com)
| sylware wrote:
| I took us only Terence Tao to speak out loud about the ego of
| many mathematicians refusing to consider "AI" as a potential
| assistant.
|
| We may realise we need much bigger neural net and training
| facilities, different models.
|
| And in end, it can be a failure.
|
| You know, R&D.
| pollimr wrote:
| These tools, and subsequently their unethical parent companies,
| will be air gapped by length from my research.
| rcpt wrote:
| https://lean-lang.org/about/ is an open source project
| aeternum wrote:
| How about: AI will become everyone's co-pilot
|
| Now we don't need to write hundreds of additional articles.
| ProllyInfamous wrote:
| ~"You don't ever want to be a company programming a bot for a
| company which is designed to program bots, to program bots,
| programming bots programming bots."~ [1]
|
| --Mark Zuckerberg, probably [actually not].
|
| I am not a robot, but the above limerick's author.
|
| RIP to my IRL humanbro, who recently programmed himself out of
| his kushy SalesForce job.
|
| Welcome... to The Future(tm).
|
| --
|
| [1] Paraphrasing Zuckerberg saying he doesn't think a structure
| of "just managers managing managers" is ideal.
| rthnbgrredf wrote:
| Will be interesting when they become able to autocomplete current
| research topics correctly.
| drlemonpepper wrote:
| https://archive.is/Idouw
| Tarean wrote:
| Computer-Checked proofs are one area where ai could be really
| useful fairly soon. Though it may be closer to neural networks in
| chess engines than full llm's.
|
| You already use tons of solvers because proving everything by
| hand is mind numbingly tedious and time consuming, but
| tactics/solvers really struggle if you throw too many theorems
| and lemmas at them. Neural nets as search machines to pattern
| match relevant lemmas fit perfectly. Similarly induction and
| higher order unification are full-on code synthesis where brute-
| force iterating over all possible syntax trees is really
| inefficient.
|
| And solvers do a backtracking solve anyway, so if the ai returns
| 95% irrelevant lemmas it doesn't matter. Still would be a
| dramatic improvement over manually searching for them.
|
| Though I'm not convinced that computer checked proofs are
| necessarily good for communication. There are reasons beside
| page-count that proofs for human consumption are high-level and
| gloss over details.
| qsort wrote:
| Not a mathematician, but knowing how proof-checkers work I
| don't expect them to become practical without some significant
| future advancement, they certainly aren't right now.
|
| The main problem is that in order to work they have to be
| connected to the formal definition of the mathematical objects
| you are using in your proof. But nobody thinks that way when
| writing (or reading!) a proof, you usually have a high-level,
| informal idea of what you are "morally" doing, and then you
| fill-in formal details as needed.
|
| Computer code (kinda) works because the formal semantics of the
| language is much closer to your mental model, so much that in
| many cases it's possible to give an operational semantics, but
| generally speaking math is different in its objectives.
| jvanderbot wrote:
| You yourself identified some problems that a modern AI guru
| would salivate over: Formal specifications from natural
| language, "filling in" formal details. etc
|
| I'll add another one: NN-based speedups to constraint
| solvers, which are usually something like branch-and-bound.
| This is (i've heard) a very active area of research.
| g15jv2dp wrote:
| You're commenting on a thread about an interview where an
| actual Fields medalist explains how proof assistants are
| actually used to do meaningful mathematical work, today. The
| issues you raise are all discussed in the interview.
| qsort wrote:
| practical = practical for routine mathematical work, which
| they aren't right now, which Tao says in the article
| itself.
|
| Never have I ever said or implied, here or anywhere else,
| that proof checkers are useless and bad and stupid.
| g15jv2dp wrote:
| > practical = practical for routine mathematical work,
| which they aren't right now, which Tao says in the
| article itself.
|
| I see, I guess it would have been nice to be clearer from
| the start.
|
| > Never have I ever said or implied, here or anywhere
| else, that proof checkers are useless and bad and stupid.
|
| Did I say you did?
| PartiallyTyped wrote:
| They = LLMs.
|
| As I've interpreted it, they mean that LLMs and similar
| tech isn't particularly helpful to theorem provers at the
| moment.
| bramhaag wrote:
| Automated theorem proving is a concept that has been around for
| a long time already.
|
| Isabelle's sledgehammer strategy combines automatic theorem
| provers (E, Z3, SPASS, Vampire, ...) and attempts to
| prove/refute a goal. In theory this seems fine, but in practice
| you end up with a reconstructed proof that applies 12 seemingly
| arbitrary lemmas. This sort of proof is unreadable and very
| fragile. I don't see how AI will magically solve this issue.
| gowld wrote:
| What is a "fragile" proof?
| staunton wrote:
| It no longer works when some definition or other theorem
| changes slightly. Most of these proof assistants provide
| ways to write proofs that can slightly adjust and "plug
| together" the other theorems in the "obvious" way.
| trott wrote:
| > Computer-Checked proofs are one area where ai could be really
| useful fairly soon. Though it may be closer to neural networks
| in chess engines than full llm's.
|
| Yes, if significant progress is made in sample-efficiency. The
| current neural networks are extremely sample-inefficient. And
| formal math datasets are much smaller than those of, say,
| Python code.
| boyka wrote:
| Isn't proof checking at least as hard as k-sat and therefore NP?
| Given that neural networks process in constant time, how would
| that be possible without approximation?
| hollerith wrote:
| No.
| boyka wrote:
| Care to explain?
|
| E.g. https://mathoverflow.net/questions/226966/algorithmic-
| comple...
| markisus wrote:
| A proof is basically a directed graph where nodes are
| propositions, and edges are deductive rules. One only needs
| to check that the starting nodes are axioms and that each
| edge is one of a finite set of deduction rules in the
| system. So the time to check a proof is linear in the
| number of edges.
| empath75 wrote:
| This article isn't about using neural networks for proof
| checking, but just using them to assist with writing Lean
| programs in the same way that developers use co-pilot to write
| java code or whatever. It might just spit out boilerplate code
| for you, of which there still is a great deal in formal proofs.
|
| The proof checking is still actually done with normal proof
| checking programs.
| sebzim4500 wrote:
| Checking a proof is much easier than finding it.
| AnimalMuppet wrote:
| Neural networks process in constant time _per input /output_
| (token or whatever). But some prompts could produce more
| tokens.
|
| As you increase the size of the window, doesn't the size of the
| NN have to increase? So if you want to just use an NN for this,
| the bigger the problem it can handle, the bigger the window, so
| the longer per token produced; but also the more tokens
| produced, so it's longer on that front as well. I don't know if
| that adds up to polynomial time or not.
|
| Note well: I'm not disagreeing with your overall assertion. I
| don't know if NNs can do this; I'm inclined to think they
| can't. All I'm saying is that even NNs are not constant time in
| problem size.
| YetAnotherNick wrote:
| Humans run in constant time. So yeah it depends on the constant
| factor.
| pclmulqdq wrote:
| Most k-sat problems are easy, or can be reduced to a smaller
| and easier version of k-sat. Neural networks might be able to
| write Lean or Coq to feed into a theorem prover.
| memkit wrote:
| This is kind of amusing because NP problems, by definition,
| must have a polynomial-length _certification_ to a given
| solution that can be run in polynomial time.
|
| The word "certification" can be read as "proof" without loss of
| meaning.
|
| What you're asking is slightly different: whether the proof-
| checking problem in logic is NP. I don't know enough math to
| know for sure, but assuming that any proof can be found in
| polynomial time by a _non-deterministic_ Turing machine and
| that such a proof is polynomial in length, then yes, the proof-
| checking problem is in NP. Alas, I think that all of that is
| moot because Turing and Church 's answer in the negative to the
| _Entscheidungsproblem_ [1].
|
| It would help your understanding and everyone else's if you
| separated the _search problem_ for a proof from the
| _verification problem_ of a proof. Something is NP when you can
| provide verification that runs in polynomial time. The lesser
| spoken about second condition is that the search problem must
| be solved in polynomial time by a non-deterministic Turing
| machine. Problems which don 't meet the second condition are
| generally only of theoretical value.
|
| 1. https://en.m.wikipedia.org/wiki/Entscheidungsproblem
| qsort wrote:
| "their time frame is maybe a little bit aggressive" has to be the
| nicest way I've ever heard someone say "they're full of shit".
| hyperfuturism wrote:
| I think he's being quite literal in that he believes that
| they're too aggressive.
|
| In the article he does afterall second the notion that AI will
| eventually replace a lot of human labour related to
| mathematics.
| qsort wrote:
| I'm not reading it that way, I don't know where you are
| getting "AI will eventually replace..." from, if anything he
| seems to have the opposite idea: "even if AI can do the type
| of mathematics we do now, it means that we will just move to
| a to a higher type of mathematics".
|
| The line I quoted is in response to "in two to three years
| mathematics will be "solved" in the same sense that chess is
| solved".
|
| Unless my priors are dramatically wrong, anyone who believes
| _that_ is indeed full of shit.
| raphlinus wrote:
| The description of "project manager mathematicians" reminded me
| of "A letter to my old friend Jonathan" [1] and the followup [2]
| by Edsger Dijkstra from 1975. It was written as satire, basically
| criticizing the way that software was produced by showing how
| ridiculous it would be to produce mathematics the same way. But
| in some ways it was prescient. Of course, the _main_ point of
| these documents is to critique intellectual property
| (particularly the absurdity of applying it to mathematical truth)
| but fortunately that aspect is not a significant concern of this
| push toward mechanization.
|
| [1]:
| https://www.cs.utexas.edu/users/EWD/transcriptions/EWD04xx/E...
|
| [2]:
| https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/E...
| ur-whale wrote:
| Relying on GPT style systems to translate high level math to lean
| stuff strikes me as dangerous unless you're going to double-check
| by hand that the LLM didn't suddenly decide to hallucinate parts
| of the answer.
|
| Granted, the hallucinated part might not pass the final Lean
| verification, but ... what if it does because the LLM did not
| properly translate the mathematician's intent to Lean language?
|
| I'm thinking the human will still need to read the Lean program,
| understand it and double-check that his original human intent was
| properly codified.
|
| Better than having to produce it by hand, but still sub-optimal.
| kccqzy wrote:
| I disagree. I think if the hallucinated part pass the final
| Lean verification, it would actually be trivial for a
| mathematician to verify the intent is reflected in the theorem.
|
| And also, in this case, doesn't the mathematician first come up
| with a theorem and then engage AI to find a proof? I believe
| that's what Terence did recently by following his Mastodon
| threads a while ago. Therefore that intent verification can be
| skipped too. Lean can act as the final arbiter of whether the
| AI produces something correct or not.
| constantcrying wrote:
| There is no danger. If the output of the Neural net is wrong,
| then Lean will catch it.
|
| >what if it does because the LLM did not properly translate the
| mathematician's intent to Lean language?
|
| How so? The one thing which definitely needs to be checked by
| hand is the theorem to be proven, which obviously shouldn't be
| provided by a Neural network. The NN will try to find the
| proof, which is then verified by Lean, but fundamentally the NN
| can't prove the wrong theorem and have it verified by Lean.
| jerf wrote:
| GPT != AI. GPT is a subset of AI. GPT is probably absolutely
| useless for high-level math, if for no other reason that to be
| useful you'd need training data that basically by definition
| doesn't exist. Though I am also strongly inclined to believe
| that LLMs are constitutionally incapable of functioning at that
| level of math regardless, because that's simply not what they
| are designed to do.
|
| But AI doesn't have to mean LLMs, recent narrowing of the
| term's meaning notwithstanding. There's all kinds of ways AIs
| in the more general sense could assist with proof writing in a
| proof language, and do it in a way that each step is still
| sound and reliable.
|
| In fact once you accept you're working in a proof language like
| lean, the LLM part of the problem goes away. The AIs can work
| directly on such a language, the problems LLMs solve simply
| aren't present. There's no need to understand a high-context,
| fuzzy-grammar, background-laden set of symbols anymore. Also no
| problem with the AI "explaining" its thinking. We might still
| not be able to follow too large an explanation, but it would
| certainly be present even so, in a way that neural nets don't
| necessarily have one.
| gowld wrote:
| > GPT is probably absolutely useless for high-level math
|
| Terence Tao disagrees with you.
|
| https://mathstodon.xyz/@tao/110601051375142142
| renewiltord wrote:
| This tool is like all other tools in that expert
| practitioners extract value from it and novices consider it
| useless. Fascinating. The advent of search engines was
| similar with many remarking on how if you searched for
| something sometimes you'd find websites that were wrong.
| Things change and yet stay the same. At the time I thought
| age was the discriminator. Today I know it's not that. It's
| something else.
| jerf wrote:
| If it could be solved with a Math Overflow-post level of
| effort, even from Terence Tao, it isn't what I was talking
| about as "high level math".
|
| I also am not surprised by "Consider a generation function"
| coming out of an LLM. I am talking about a system that
| could _solve_ that problem, entirely, as doing high level
| math. A system that can emit "have you considered using
| wood?" is not a system that can build a house autonomously.
|
| It especially won't seem all that useful next to the
| generation of AIs I anticipate to be coming which use LLMs
| as a component to understand the world but are not just big
| LLMs.
| m4lvin wrote:
| The trick is that the human only needs to read and understand
| the Lean _statement_ of a theorem and agree that it (with all
| involved definitions) indeed represents the original
| mathematical statement, but not the _proof_. Because that the
| proof is indeed proving the statement is what Lean checks. We
| do not need to trust the LLM in any way.
|
| So would I accept a proof made by GPT or whatever? Yes. But not
| a (re)definition.
|
| The analogy for programming is that if someone manages to write
| a function with a certain input and output types, and the
| compiler accepts it, then we do know that indeed someone
| managed to write a function of that type. Of course we have no
| idea about the behaviour, but statements/theorems are types,
| not values :-)
| cubefox wrote:
| The thing is, when AI systems are able to translate intuitive
| natural language proofs into formal Lean code, they will also
| be used to translate intuitive concepts into formal Lean
| definitions. And then we can't be sure whether those
| definitions actually define the concepts they are named
| after.
| egl2021 wrote:
| For now this might be a project for Fields medalists, like Tao
| and Scholze, who have the leisure to spend 10x the time on a
| proof. I recently talked to a post-doc in a top-ranked math
| department, and he doesn't know anyone who actually uses lean4 in
| their work. For early-career people, it's probably better to
| trust your intuition and get the papers out.
| g15jv2dp wrote:
| I'm a math professor and I know many people who use or
| contribute to lean. Not all of them are Fields medalists. Maybe
| it takes more than a couple of people's opinion on the subject
| to have a good picture.
| shakow wrote:
| > people who use or contribute to lean
|
| And do they get tenure- or funding-able output of it, or is
| it more of a side-hobby?
| g15jv2dp wrote:
| Yes, they get grants to work on it. Other questions? I
| can't say I enjoy this "gotcha" tone.
| Jensson wrote:
| So they get grants to work on these tools rather than to
| work on math, seems like a very special situation then
| and a very special math department and therefore not a
| good picture for how helpful it is to math departments in
| general.
| nimih wrote:
| "No _true_ working mathematician would spend their time
| using Lean! "
| g15jv2dp wrote:
| I guess we got in the top 10 worldwide for math in the
| Shanghai ranking by accident. If you don't know what
| you're speaking about, sometimes it's wise to not say
| anything.
| seanhunter wrote:
| The mathematicians I know of who work on Lean4 as part of
| grant-funded research do so because their research
| funding is for formalising mathematics and Lean4 is the
| tool they choose to use for that.
|
| I really have no idea why people are taking this approach
| here.
|
| Proof assistants are useful. People use them to do actual
| maths. There is a lot of work to formalise all of maths
| (this isn't a proof assistant thing this is a huge
| undertaking given maths is 3000+ years old and we've only
| just decided what formality really fundamentally consists
| of) so some places you need to do more foundational work
| than others to make it useful.
| szvsw wrote:
| Isn't working on these tools a form of working on math?
| Math is a language unto itself.
| nickpsecurity wrote:
| I'm with them. They're using a tool they're paid to work
| on. That's always a special case. The general case would
| be what percentage of mathematicians use (a) proof
| assistants at all, (b) mechanized proof assistants, and
| (c) Lean specifically.
|
| Do most working mathematicians use proof assistant or
| checkers? Does anyone have a percentage with feedback on
| how they use them?
| TeMPOraL wrote:
| > _They're using a tool they're paid to work on. That's
| always a special case._
|
| FWIW, the industry term for this is "dogfooding", and
| it's usually seen as a predictor of high quality tool.
| GPerson wrote:
| The person you are replying to asked a very useful
| question, and you supplied a useful answer. I don't
| detect any snarky tone in their comment.
| shakow wrote:
| That's not a "gotcha", but an interrogation on whether
| what I experienced in bioinformatics (i.e. working on
| tools, especially already existing ones, is a carreer
| trap) also applied to maths.
| a_wild_dandan wrote:
| Snark and contrarianism are specialties around here.
| Thank you for sharing your experience; I genuinely
| appreciate it.
| occamrazor wrote:
| For Tao, spending 10x time on aproof still means spending 5x
| less time than an average postdoc. He is incredibly fast and
| productive.
| flostk wrote:
| I'm sure this also depends a lot on the field within
| mathematics. Areas like mathematical logic or algebra have a
| pretty formal style anyway, so it's comparatively less effort
| to translate proofs to lean. I would expect more people from
| these areas to use proof checkers than say from low-dimensional
| topology, which has a more "intuitive" style. Of course by
| "intuitive" I don't mean it's any less rigorous. But a picture
| proof of some homotopy equivalence is just much harder to
| translate to something lean can understand than some sequence
| of inequalities. To add a data point, I'm in geometry/topology
| and I've never seen or heard of anyone who uses this stuff (so
| far).
| enugu wrote:
| Pictures can be directly used in formal proofs. See [1] for
| instance So, the problem is not in principle, but rather,
| pictures and more direct syntax has not yet been integrated
| into current proof systems. When that is done, proofs will be
| much more natural and the formal proof will be closer to the
| 'proof in the mind'.
|
| [1] https://www.unco.edu/nhs/mathematical-
| sciences/faculty/mille...
| yboris wrote:
| _lean4_ - programming language and theorem prover
|
| https://lean-lang.org/
|
| https://github.com/leanprover/lean4
| iamwil wrote:
| A tangential topic, but anyone have recommendations for proof-
| checking languages? Lean, Isabelle, Coq, Idris are the few I know
| about. I've only ever used Isabelle, but can't really tell the
| difference between them. Anyone here used more than one, and can
| tell why people pick one over the other?
| mathgradthrow wrote:
| Lean has the largest existing math library, but you're kind of
| stuck in a microsoft ecosystem.
| mkehrt wrote:
| I'm not sure what you mean. I've been playing around with
| Lean and it works fine on a Mac. I am using VSCode, though.
| ykonstant wrote:
| If you mean dependence on VS Code, there are emacs and neovim
| packages for Lean; some of the core devs use those
| exclusively. Also note that de Moura is no longer at
| Microsoft and Lean is not an MS product.
| lacker wrote:
| They are pretty similar as languages. In general I would
| recommend Lean because the open source community is very active
| and helpful and you can easily get help with whatever you're
| doing on Zulip. (Zulip is a Slack/Discord alternative.) The
| other ones are technically open source but the developers are
| far less engaged with the public.
|
| Proof-checking languages are still just quite hard to use,
| compared to a regular programming language. So you're going to
| need assistance in order to get things done, and the Lean
| community is currently the best at that.
| einszwei wrote:
| I have personally tried Coq and Lean and stuck to Lean as it
| has much better tooling, documentation and usability while
| being functionally similar to Coq.
|
| Caveat that I have only used Coq and Lean side by side for
| about a month before deciding to stick with Lean.
| tombert wrote:
| Isabelle is my favorite, and it's the one I've used the most,
| if only because sledgehammer is super cool, but Coq is neat as
| well. It has coqhammer (a word I will not say at work), and
| that works ok but it didn't seem to work as well as
| sledgehammer did for me.
|
| Coq is much more type-theory focused, which I do think lends it
| a bit better to a regular programming language. Isabelle feels
| a lot more predicate-logic and set-theory-ish to me, which is
| useful in its own right but I don't feel maps quite as cleanly
| to code (though the Isabelle code exporter has generally worked
| fine).
| awfulneutral wrote:
| Another one is Oak: https://oakproof.org/
| vzaliva wrote:
| One consideration in choosing a proof assistant is the type of
| proofs you are doing. For example, mathematicians often lean
| towards Lean (pun intended), while those in programming
| languages favour Coq or Isabelle, among others.
| cubefox wrote:
| As Tao says in the interview: Lean is the dominant one.
| trott wrote:
| There's also https://en.wikipedia.org/wiki/Metamath
|
| Its smallest proof-checker is only 500 lines.
| munchler wrote:
| Terence Tao also gave a great talk on this topic a few months
| ago, where he goes into more detail about uses of Lean:
| https://www.youtube.com/watch?v=AayZuuDDKP0
| thecleaner wrote:
| Does Lean work with linear slgebra proofs ?
| gowld wrote:
| You need to ask a more specific question.
|
| https://leanprover-community.github.io/theories/linear_algeb...
| gowld wrote:
| > German mathematician and Fields Medalist Peter Scholze
| collaborated in a Lean project--even though he told me he doesn't
| know much about computers.
|
| > With these formalization projects, not everyone needs to be a
| programmer. Some people can just focus on the mathematical
| direction; you're just splitting up a big mathematical task into
| lots of smaller pieces. And then there are people who specialize
| in turning those smaller pieces into formal proofs. We don't need
| everybody to be a programmer; we just need some people to be
| programmers. It's a division of labor.
|
| Who can get paid to be a "mathematical programmer" ?
|
| PhD mathematicians? Grad students?
| zero_k wrote:
| Proofs and proof checking is also big in SAT solving (think:
| MiniSat) and SMT (think: z3, cvc5). It's also coming to model
| counting (counting the number of solutions). In fact, we did one
| in Isabelle for an approximate model counter [1], that provides
| probabilistically approximate count (a so-called PAC guarantee),
| which is a lot harder to do, due to the probabilistic nature.
|
| [1] https://github.com/meelgroup/approxmc
| tmsh wrote:
| Really insightful. What's missing though I think is that LLMs
| abstract things in ways that are increasingly super-human. Tao
| says: "It's very easy to transform a problem into one that's
| harder than into one that's simpler. And AI has not demonstrated
| any ability to be any better than humans in this regard." But I
| think due to the way LLMs work there will be much higher-level
| insights that are possible. And that is the really exciting part.
|
| Right now it's a helper. A fact checker. And doer of tedious
| things. Soon it will be a suggester of insights. Already LLMs
| compress embeddings and knowledge and have insights that we don't
| see.
|
| Hinton gives the example of how a nuclear bomb and a compost heap
| are related and that most humans would miss that:
| https://www.youtube.com/watch?v=Gg-w_n9NJIE&t=4613s
| Salgat wrote:
| The biggest problem with LLMs are that they are simply
| regressions of writings of millions of humans on the internet.
| It's very much a machine that is trained to mimic how humans
| write, nothing more.
|
| The bigger concern is that we don't have the training data
| required for it to mimic something smarter than a human
| (assuming that a human is given sufficient time to work on the
| same problem).
|
| I have no doubt ML will exceed human intelligence, but the
| current bottleneck is figuring out how to get a model to train
| itself against its own output without human intervention,
| instead of doing a regression against the entire world's
| collective writings. Remember, Ramanujan was, with no formal
| training and only access to a few math books, able to achieve
| brilliant discoveries in mathematics. In terms of training data
| used by ML models, that is basically nothing.
| zozbot234 wrote:
| > the current bottleneck is figuring out how to get a model
| to train itself against its own output without human
| intervention
|
| Isn't that what AlphaZero was known for? After all, the
| 'zero' part of the name was a play on 'zero-shot learning'
| which is the concept you're talking about. So we have a proof
| of existence already, and there's been recent papers showing
| progress in some mathematical domains (though simplified ones
| so far, such as Euclidean geometry - which does however
| feature complex proofs).
| mofeien wrote:
| LLMs are increasingly trained on multimodal data besides
| human written text, such as videos showing complex real-world
| phenomena that humans may also observe, but not understand,
| or time series data that humans may observe but be incapable
| of predicting. I don't see a fundamental reason why the LLM
| that is trained on such data may not already become a
| superhuman predictor of e. g. time-series data, with the
| right model architecture, amount of data, and compute.
|
| In a way, next-token prediction is also a time-series
| prediction problem, and one that is arguably more difficult
| than producing the time-series/text in the first place. That
| is since to predict what a human will say about a topic, you
| don't just need to model the topic, but also the mental
| processes of the human talking about it, adding another layer
| of abstraction.
| tmsh wrote:
| Right, I think the insight that LLMs "create" though is via
| compression of knowledge in new ways.
|
| It's like teaching one undergraduate (reading some blog on
| the internet and training on next word prediction), then
| another undergraduate (training on another blog), then
| another one, etc. And finding a way to store what's in common
| among all these undergraduates. Over time, it starts to see
| commonalities and to become more powerful than a human. Or a
| human who had taught a million undergraduates. But it's not
| just at the undergraduate level. It's the most
| abstract/efficient way to represent the essence of their
| ideas.
|
| And in math that's the whole ball game.
| mistrial9 wrote:
| > but the current bottleneck is figuring out how to get a
| model to train itself against its own output without human
| intervention
|
| this is sort of humorous, because the entire current
| generation of Transformer-based encoders was built
| specifically to get "a model to train itself without human
| intervention"
|
| ref. arXiv:2304.12210v2 [cs.LG] 28 Jun 2023 * A Cookbook of
| Self-Supervised Learning
| rdevsrex wrote:
| Ramanujan attributed his success in no small part to divine
| assistance. I wonder if sentient AI would do the same :)
| calf wrote:
| Hinton's example doesn't resonate for me -- I do gardening as a
| hobby, so my immediate answer to his "What is common to nukes
| and compost heap?" -- was the concept of critical mass. It took
| me 10 seconds to articulate the word, so I'm not as fast, but I
| still got the "correct" analogy right away, because with the
| right information the answer is obvious.
|
| Hinton says this demonstrates analogical thinking. I read about
| gardening online and I've probably read about compost heap
| physics written down somewhere online.
|
| So what is more likely happened was that ChatGPT was previously
| trained some article about compost heap chain reactions,
| because there's a lot of soil science and agriculture that
| discusses this, even for laypeople like myself that I've
| already read, so that it isn't a long reach if you know enough
| about both topics.
|
| Thus Hinton's example strikes me as more having failed to
| control for LLMs having already basically seen the answer
| before in its training texts.
|
| EDIT: A few minutes later in the video, Ilya comments
| (https://youtu.be/Gg-w_n9NJIE?t=4966): "We have a proof of
| existence, the human brain is a neural network" (and the
| panelist agrees, assuming/barring any computational "magic"
| happening in the human brain. I'm interested in this comment,
| because I generally hold that position as well, but I see many
| naysayers and dissenters who will respond "Well but real
| neurons don't run on 8 bits, neurons have different types and
| cells and contain DNA in a chemical soup of hormones, etc.
| etc., you are just confusing the model for the real thing in a
| backwards way", and so on. So I think there's an interesting
| divide in philosophy there as well, and I don't personally have
| an articulate answer to explain why I might hold the position
| that natural brains are also neural networks other than an
| appeal to computationalism.
| 29athrowaway wrote:
| If one AI based system becomes an expert in writing math
| proofs, the cost of getting another one is simply copying files
| and provisioning them into different cloud instances. This can
| be done within minutes to hours.
|
| If one human becomes an expert in writing math proofs, the cost
| of getting another one is waiting for a human within the human
| population that likes math as a career, the cost of waiting for
| that human to finish school (decades) and getting advanced
| specialization, with no guarantee that the human will finish
| school or become proficient enough to advance the frontier of
| knowledge. By the time you finished waiting for this, you could
| have trillions of AI experts working in parallel.
|
| The bandwidth of the brain when it comes to assimilating new
| information is low (up to hundreds of words per minute).
| Machines will be able to clone a lifetime of knowledge in
| seconds, have thousands of conversations in parallel, even
| serializing parts of their "brain" and sending them over to
| other AIs, like the Borg. These are things that are not
| possible within the human condition.
|
| And once programmable matter is attained, AI will produce
| exponential amounts of computronium and do the equivalent of
| millennias of research in seconds.
| Tossrock wrote:
| A working mathematician recently got GPT-4o to prove an (afaik)
| novel lemma while pursuing their research:
|
| "My partner, who is mathematician, used ChatGPT last week for the
| first time to prove some lemmas for his research. He already
| suspected those lemmas were true and had some vague idea how to
| approach them, but he wasn't expert in this type of statement.
| This is the first time that he got correct and useful proofs out
| of the models.
|
| [...]
|
| These are the two lemmas. The first one is something that a
| collaborator of my partner noticed for small values of e in their
| computations. ChatGPT did not find the proof before being told to
| try Mobius functions.
|
| https://chatgpt.com/share/9ee33e31-7cec-4847-92e4-eebb48d4ff...
|
| The second looks a bit more standard to me, probably something
| that Mathematica would have been able to do as well. But perhaps
| that's just because I am more familiar with such formulas. Still,
| Mathematica doesn't give a nice derivation, so this is useful.
|
| https://chatgpt.com/share/7335f11d-f7c0-4093-a761-1090a21579...
|
| "
| a_wild_dandan wrote:
| In a recent interview, Ben Goertzel mentioned having a PhD-
| level discussion with GPT-4 about novel math ideas cooked up by
| himself and a colleague. These things are becoming scary good
| at reasoning and heavyweight topics. As the ML field continues
| focusing on adding more System Two capabilities to complement
| LLMs' largely System One thinking, things will get wild.
| UniverseHacker wrote:
| This is my experience as well- even the popular hype
| underestimates how creative and intelligent LLMs can be. Most
| discussions are reasoning about what it should or shouldn't
| be capable of based on how we think they work, without really
| looking directly at what is can actually do, and not focusing
| on what it can't do. They are very alien- much worse than
| humans at many things, but also much better at others. And
| not in the areas we would expect in both cases. Ironically
| they are especially good at creative and abstract thinking
| and very bad at keeping track of details and basic
| computation, nearly the opposite of what we'd expect from
| computers.
| mu53 wrote:
| I am pessimistic about AI. The technology is definitely
| useful, but I watched a video that made good points.
| * On benchmark style tests, LLMs are not improving. GPT-4o
| is equivalent to GPT-4 and both barely improve on GPT-3.5T.
| * AI companies have been caught outright lying in demos and
| manipulating outcomes by pre-training for certain
| scenarios. * The main features added to GPT-4o have
| been features that manipulate humans with dark patterns
| * The dark patterns include emotional affects in tone and
| cute/quirky responses * These dark patterns encourage
| people to think of LLMs as humans that have a similar
| processing of the universe.
|
| I seriously wonder about the transcript that these guys had
| with the LLM. Were they suggesting things? Did ChatGPT just
| reconfigure words that helped them think through the
| problem? I think the truth is that ChatGPT is a very
| effective rubber duck debugger.
|
| > https://www.youtube.com/watch?v=VctsqOo8wsc
| UniverseHacker wrote:
| Try playing around with getting GPT-4 to discuss creative
| solutions to real unsolved problems you are personally an
| expert on, or created yourself. That video looks like
| just standard internet ragebait to me.
|
| I find it pretty annoying when people say you are just
| being manipulated by hype if you are impressed by LLMs,
| when I was a serious skeptic, thought GPT-3 was useless,
| and only changed my opinion by directly experimenting
| with GPT-4 on my own- by getting it to discuss and solve
| problems in my area of expertise as an academic
| researcher.
| krackers wrote:
| Woah that second proof is really slick. Is there a
| combinatorial proof instead of algebraic proof as to why that
| holds?
| gjm11 wrote:
| Yes.
|
| (HN comments aren't the best medium for writing mathematics.
| I hope this ends up reasonably readable.)
|
| (d choose k) is the number of ways to pick k things from d.
|
| (d choose k) k^2 is the number of ways to pick k things from
| d, and then colour one of them red and one of them (possibly
| the same one) blue.
|
| (-1)^k (d choose k) k^2 is that (when k is even), or minus
| that (when k is odd).
|
| So the identity says: Suppose we consider a set of d things,
| and we consider picking _some_ of them (meaning 0 or more,
| though actually 0 is impossible; k is the number we pick) and
| colouring one of those red and one blue; then there are the
| same number of ways to do this picking an _odd_ number of
| things as picking an _even_ number.
|
| Well, let's describe the process a bit differently. We take
| our set of d things. We colour one of them red. We colour one
| of them blue. Then we pick some set of the things (k in
| number), including the red thing and the blue thing (which
| may or may not also be the red thing). And the claim is that
| there are the same number of ways to do this with an even
| value of k as with an odd number.
|
| But now this is almost trivial!
|
| Once we've picked our red and blue things, we just have to
| decide which of the _other_ things to pick. And as long as
| there 's at least one of those, there are as many ways to
| pick an odd number of things as to pick an even number. (Fix
| one of them; we can either include it or not, and switching
| that changes the parity.)
|
| So if d>2 then we're done -- for each choice of red and blue,
| there are the same number of "odd" and "even" ways to finish
| the job.
|
| What if d<=2? Well, then the "theorem" isn't actually true.
|
| d=1: (1 choose 0) 0^2 - (1 choose 1) 1^2 = -1.
|
| d=2: (2 choose 0) 0^2 - (2 choose 1) 1^2 + (2 choose 2) 2^2 =
| 2.
| krackers wrote:
| >We take our set of d things. We colour one of them red. We
| colour one of them blue. Then we pick some set of the
| things (k in number), including the red thing and the blue
| thing
|
| In this alternate view of the process, where does the k^2
| term come from? Isn't the number of ways to do it in this
| perspective just
|
| (d-1 choose k-1) # We already chose the red one (which is
| also the blue) and need to choose k-1 more
|
| +
|
| (d-2 choose k-2) # We chose both red and blue ones and need
| to choose the other k-2
|
| Edit: I think it actually works out, seems to be equal to
|
| d * 1 * Binomial[d - 1, k - 1] + d * (d - 1) * Binomial[d -
| 2, k - 2] = Binomial[d, k]*k^2
|
| And then you apply your argument about fixing one of the
| elements in each of the two cases. Actually the proof seems
| to be a variant (same core argument) as sum of even
| binomial terms = sum of odd binomial terms [1]
|
| [1] https://math.stackexchange.com/questions/313832/a-combi
| nator...
| gjm11 wrote:
| In the alternate view, the k^2 isn't there any more. Not
| because we are computing a different number, but because
| we're organizing the computation differently.
|
| (It's turned into a d^2 factor, except that we then need
| to treat the red=blue case differently from the red!=blue
| case, so it's really a d factor plus a d^2-d factor.)
|
| And yes, the last bit of the proof is exactly the same
| thing as one way to prove that the sum of "even" and
| "odd" binomial coefficients are equal -- provided n>0.
| (For the (0 choose whatever) case, the "even" sum is 1
| and the "odd" sum is 0.)
| Sniffnoy wrote:
| Sorry, what is this quoted from?
|
| It's interesting because that first one is something I was
| discussing recently with my friend Beren (does he have an
| account here? idk). We were thinking of it as summing over
| ordered partitions, rather than over partitions with a factor
| of |tau|!, but obviously those are the same thing.
|
| (If you do it over _cyclically_ ordered partitions -- so, a
| factor of (|tau|-1)! rather than |tau|! -- you get 0 instead of
| (-1)^e.)
|
| Here's Beren's combinatorial proof:
|
| First let's do the second one I said, about cyclically ordered
| ones, because it's easier. Choose one element of the original
| set to be special. Now we can set up a bijection between even-
| length and odd-length cyclically ordered partitions as follows:
| If the special element is on its own, merge it with the part
| after it. If it's not on its own, split it out into its own
| part before the part it's in. So if we sum with a sign factor,
| we get 0.
|
| OK, so what about the linearly ordered case? We'll use a
| similar bijection, except it won't quite be a bijection this
| time. Pick an order on your set. Apply the above bijection with
| the last element as the special element. Except some things
| don't get matched, namely, partitions that have the last
| element of your set on its own as the last part.
|
| So in that case, take the second-to-last element, and apply
| recursively, etc, etc. Ultimately everything gets matched
| except for the paritition where everything is separate and all
| the parts are in order. This partition has length equal to the
| size of the original set. So if the original set was even you
| get one more even one, and if the original set was odd you get
| one more odd one. Which rephrased as a sum with a sign factor
| yields the original statement.
|
| Would be interested to send this to the mathematician you're
| referring to, but I have no idea who that might be since you
| didn't say. :)
| dorfsmay wrote:
| I just spent hours looking at formal methods and had decided to
| learn Isabelle/HOL, then this article !
| garbthetill wrote:
| may i ask why isabelle over agda or lean?
| dorfsmay wrote:
| I know nothing, I am a complete beginner when it comes to
| formal methods. From reading about it, it seems that
| Isabelle/HOL is the best when it comes to automation which
| apparently is something you really want. It might be easier
| to learn (controversial, some say Lean is easier). It's been
| used to prove some software (including sel4 and a version of
| java), Apple and AWS are using it (but then I know AWS uses,
| or used, TLA+).
|
| At the end of the day, I didn't want to spend more time
| reading about it then learning two of them (trying one and
| potentially switch later). The more you read about it, the
| more options open up (SPARK, TLA+, COQ, etc...).
|
| I do find it ironic to read this article today given that I
| made that decision yesterday!
| hwayne wrote:
| [shameless plug]I maintain a collection of proofs of
| leftpad in different prover languages, so people can
| compare them. It's here: https://github.com/hwayne/lets-
| prove-leftpad
|
| [/invalid closing tag]
| chaoskanzlerin wrote:
| The main upside Isabelle has over other proof assistants is
| the existence of Sledgehammer: invoke it, and your current
| proof state gets handed off to a SMT solver like cvc4, veriT,
| z3, vampire, etc. If the SMT solver finds a solution,
| Isabelle then reconstructs the proof.
|
| It's essentially a brute-force button that no other proof
| assistant (aside from Coq) has.
| grondilu wrote:
| "I think in the future, instead of typing up our proofs, we would
| explain them to some GPT. And the GPT will try to formalize it in
| Lean as you go along. If everything checks out, the GPT will
| [essentially] say, "Here's your paper in LaTeX; here's your Lean
| proof. If you like, I can press this button and submit it to a
| journal for you." It could be a wonderful assistant in the
| future."
|
| That'd be nice, but eventually what will happen is that the human
| will submit a mathematical conjecture, the computer will
| internally translate into something like Lean, and try to find
| either a proof or a disproof. It will then translate it in
| natural language and tell it to the user.
|
| Unless mathematics is fundamentally more complicated than chess
| or go, I don't see why that could not happen.
| qfiard wrote:
| It is fundamentally more complicated. Possible moves can be
| evaluated against one another in games. We have no idea how to
| make progress on many math conjectures, e.g. Goldbach or
| Riemann's. An AI would need to find connections with different
| fields in mathematics that no human has found before, and this
| is far beyond what chess or Go AIs are doing.
| salamo wrote:
| Not a mathematician, but I can imagine a few different things
| which make proofs much more difficult if we simply tried to
| map chess algorithms to theorem proving. In chess, each board
| position is a node in a game tree and the legal moves
| represent edges to other nodes in the game tree. We could
| represent a proof as a path through a tree of legal
| transformations to some initial state as well.
|
| But the first problem is, the number of legal transformations
| is actually infinite. (Maybe I am wrong about this.) So it
| immediately becomes impossible to search the full tree of
| possibilities.
|
| Ok, so maybe a breadth-first approach won't work. Maybe we
| can use something like Monte Carlo tree search with move
| (i.e. math operation) ordering. But unlike chess/go, we can't
| just use rollouts because the "game" never ends. You can
| always keep tacking on more operations.
|
| Maybe with a constrained set of transformations and a really
| good move ordering function it would be possible. Maybe Lean
| is already doing this.
| grondilu wrote:
| > But the first problem is, the number of legal
| transformations is actually infinite.
|
| I am fairly certain the number of legal transformations in
| mathematics is not infinite. There is a finite number of
| axioms, and all proven statements are derived from axioms
| through a finite number of steps.
| hwayne wrote:
| Technically speaking one of the foundational axioms of
| ZFC set theory is actually an axiom _schema_ , or an
| infinite collection of axioms all grouped together. I
| have no idea how lean or isabelle treat them.
| szvsw wrote:
| Chess and Go are some of the simplest board games there are.
| Board gamers would put them in the very low "weight" rankings
| from a rules complexity perspective (compared to "modern" board
| games)!
|
| Mathematics are infinitely (ha) more complex. Work your way up
| to understanding (at least partially) a proof of Godel's
| incompleteness theorem and then you will agree! Apologies if
| you have done that already.
|
| To some extent, mathematics is a bit like drunkards looking for
| a coin at night under a streetlight because that's where the
| light is... there's a whole lot more out there though (quote
| from a prof in undergrad)
| hwayne wrote:
| > Unless mathematics is fundamentally more complicated than
| chess or go, I don't see why that could not happen.
|
| My usual comparison is Sokoban: there are still lots of levels
| that humans can beat that all Sokoban AIs cannot, including the
| AIs that came out of Deepmind and Google. The problem is that
| the training set is so much smaller, and the success heuristics
| so much more exacting, that we can't get the same benefits from
| scale as we do with chess. Math is even harder than that.
|
| (I wonder if there's something innate about "planning problems"
| that makes them hard for AI to do.)
| emporas wrote:
| > Let's say an AI supplies an incomprehensible, ugly proof. Then
| you can work with it, and you can analyze it. Suppose this proof
| uses 10 hypotheses to get one conclusion--if I delete one
| hypothesis, does the proof still work? That's a science that
| doesn't really exist yet because we don't have so many AI-
| generated proofs, but I think there will be a new type of
| mathematician that will take AI-generated mathematics and make it
| more comprehensible.
|
| My thoughts exactly about designing public APIs of code. A role
| which traditionally performed only by experienced developers, now
| it can be greatly simplified and accessible to everyone.
| mik1998 wrote:
| All I can read from this article is "the end of mathematics is on
| the horizon". Hopefully nothing he says actually materializes.
___________________________________________________________________
(page generated 2024-06-12 23:00 UTC)