[HN Gopher] DeepMind AI outdoes human mathematicians on unsolved...
___________________________________________________________________
DeepMind AI outdoes human mathematicians on unsolved problem
Author : rntn
Score : 64 points
Date : 2023-12-14 19:33 UTC (3 hours ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| supermdguy wrote:
| Code is available! They have a few different discovered
| solutions.
|
| https://github.com/google-deepmind/funsearch
| westurner wrote:
| "Mathematical discoveries from program search with large
| language models" (2023)
| https://www.nature.com/articles/s41586-023-06924-6 :
|
| > Abstract: _Large Language Models (LLMs) have demonstrated
| tremendous capabilities in solving complex tasks, from
| quantitative reasoning to understanding natural language.
| However, LLMs sometimes suffer from confabulations (or
| hallucinations) which can result in them making plausible but
| incorrect statements [1,2]. This hinders the use of current
| large models in scientific discovery. Here we introduce
| FunSearch (short for searching in the function space), an
| evolutionary procedure based on pairing a pre-trained LLM with
| a systematic evaluator. We demonstrate the effectiveness of
| this approach to surpass the best known results in important
| problems, pushing the boundary of existing LLM-based approaches
| [3]. Applying FunSearch to a central problem in extremal
| combinatorics -- the cap set problem -- we discover new
| constructions of large cap sets going beyond the best known
| ones, both in finite dimensional and asymptotic cases._ This
| represents the first discoveries made for established open
| problems using LLMs. _We showcase the generality of FunSearch
| by applying it to an algorithmic problem,_ online bin packing,
| _finding new heuristics that improve upon widely used
| baselines. In contrast to most computer search approaches,_
| FunSearch searches for programs that describe how to solve a
| problem, rather than what the solution is. _Beyond being an
| effective and scalable strategy, discovered programs tend to be
| more interpretable than raw solutions, enabling feedback loops
| between domain experts and FunSearch, and the deployment of
| such programs in real-world applications._
|
| "DeepMind AI outdoes human mathematicians on unsolved problem"
| (2023) https://www.nature.com/articles/d41586-023-04043-w :
|
| > _Large language model improves on efforts to solve
| combinatorics problems inspired by the card game Set._
| bnprks wrote:
| Though there are a couple caveats as to what code is available.
| Quoting from the github:
|
| > implementation contains an implementation of the evolutionary
| algorithm, code manipulation routines, and a single-threaded
| implementation of the FunSearch pipeline. It does not contain
| language models for generating new programs, the sandbox for
| executing untrusted code, nor the infrastructure for running
| FunSearch on our distributed system. This directory is intended
| to be useful for understanding the details of our method, and
| for adapting it for use with any available language models,
| sandboxes, and distributed systems.
| Q6T46nT668w6i3m wrote:
| I don't care about their sandbox or distributed system. They
| are irrelevant to the method. The missing language model for
| program generation is disappointing but I imagine anyone
| interested in replication, myself included, would prefer to
| roll their own.
| nyrikki wrote:
| Bad title, this was a hybrid ML/human effort, not a ML only
| achievement.
|
| From the article:
|
| "What's most exciting to me is modelling new modes of human-
| machine collaboration," Ellenberg adds. "I don't look to use
| these as a replacement for human mathematicians, but as a force
| multiplier."
| SirMaster wrote:
| It's almost like saying human+calculator beats human.
|
| Haven't mathematicians been using complex computer modeling to
| help solve unsolved math problems since computers have existed?
| And havent those computers basically always beat out a human
| alone?
|
| So isn't this news just that the mathematicians now have a
| newer and better computer model to help them solve their
| problems?
|
| Seems like evolution, not revolution.
| fasterik wrote:
| If I understand correctly, they're using an LLM to write a
| series of computer programs exploring a large solution space
| and feeding the output of those programs into a separate
| validation program written by a human. To take the calculator
| analogy, it's more like having the human give a set of
| constraints on a solution and having the calculator decide
| what buttons to push.
|
| Real progress is always incremental. I wouldn't be surprised
| if 5-10 years from now we have similar kinds of systems
| discovering new materials or new candidates for dark matter.
| fasterik wrote:
| Even Nature has clickbait titles now.
|
| I agree with the quote. The ability of AI to augment human
| capabilities has way more potential than the more speculative
| ideas about artificial general intelligence. This is why I'm
| not very sympathetic to the skepticism toward deep learning and
| LLMs as "not intelligence", "not real AI", "stochastic
| parrots", etc. Who cares whether or not these systems are
| generally intelligent agents, if they have the potential to
| increase the scientific output of humanity by even 10 or 20%?
| lainga wrote:
| Nobody's asked me to donate all my money to Yudkowsky or else
| suffer the scientific output of humanity increasing by 10 or
| 20%.
| ummonk wrote:
| No it's not a hybrid effort (except insomuch as LLMs are
| reliant on human generated data for training). They're simply
| saying that the code created by the LLM can be examined and
| potentially understood by humans.
| caddemon wrote:
| Their best reported results are a hybrid effort though, here
| is one of the authors of the paper describing how they used
| programs generated by the LLM to extract their own insights
| that then refined future iterations of their workflow:
| https://x.com/matejbalog/status/1735331210140819938?s=20
|
| It can work by itself too but it is unclear at a glance how
| well since the main focus of the paper is the new
| mathematical benchmarks they achieved, i.e. their best
| results. Will have to read the paper more closely to say
| anything with high confidence, but based on their summary I'd
| guess the human in the loop part was pretty important here.
| metanonsense wrote:
| I love the Set game mentioned in the article so much. Whenever I
| make the mistake to install a digital version of it on my phone,
| I have to uninstall it a few weeks later because it completely
| wrecks my productivity.
| caddemon wrote:
| It seems this is essentially an evolutionary algorithm with an
| LLM generating the pool of new variations at each step.
| Definitely a very cool idea, but hard to evaluate the results
| without knowing more about that field of mathematics. Obviously
| the problem they chose fit well into the FunSearch framework, but
| I'm curious if this is one of the more popular open problems in
| that space or if they picked something that was more niche?
|
| Namely, what sort of computational resources had been dedicated
| to the problem before? Because DeepMind suddenly throwing their
| weight at a problem that was previously the focus of a handful of
| random math grad students would make it hard to benchmark the ML
| advance that was made here -- like would it be possible to find a
| similar solution with a ton of compute and more traditional
| genetic algorithms?
|
| I wouldn't be surprised if the answer were no. Protein folding
| competition was a pretty big space in biology before getting
| absolutely destroyed by Alpha Fold. But I also wouldn't be
| entirely surprised if the answer were yes. The amount of hype
| around LLMs right now is crazy, probably half the news I see
| about them turns out to be very exaggerated upon further
| evaluation.
|
| ETA: the headline here is definitely exaggerated, because there
| was also human in the loop to refine what the LLM was generating.
| At a glance the technical article doesn't benchmark enough
| against alternatives to the LLM component in their workflow IMO.
| But it is entirely possible they tackled well established enough
| open problems, such that prior work already handled those control
| cases decently.
|
| I'd love to know what someone in this space of mathematics thinks
| about the paper! Would it have generated much buzz if they got
| these same results like 3 years ago? Would it be accepted to
| Nature if they found they could accomplish something similar
| using their framework even if it was an RNN in the loop?
| ummonk wrote:
| In other words, "LLM writes a computer program that generates new
| examples which improve the lower bound for the n=8 case of a
| problem."
|
| I'd like to see how novel the program it generated was, rather
| than just being a brute force random search or a standard genetic
| algorithm. I have a suspicion that the bug result here is simply
| that it saved them coding time.
| Q6T46nT668w6i3m wrote:
| I'm personally excited by this paradigm. A few years back I had
| success using a similar architecture for polynomial root finding.
| I think it's entirely possible to be really ambitious and reverse
| engineer new and useful generalized functions.
___________________________________________________________________
(page generated 2023-12-14 23:01 UTC)