[HN Gopher] AI Search: The Bitter-Er Lesson
___________________________________________________________________
AI Search: The Bitter-Er Lesson
Author : dwighttk
Score : 292 points
Date : 2024-06-14 18:47 UTC (1 days ago)
(HTM) web link (yellow-apartment-148.notion.site)
(TXT) w3m dump (yellow-apartment-148.notion.site)
| johnthewise wrote:
| What happened to all the chatter about Q*? I remember reading
| about this train/test time trade-off back then, does anyone have
| good list of recent papers/blogs about this? What is holding back
| this or openai is just running some model 10x longer to estimate
| what they would get if they trained with 10x?
|
| This tweet is relevant:
| https://x.com/polynoamial/status/1676971503261454340
| Kronopath wrote:
| Anything that allows AI to scale to superinteligence quicker is
| going to run into AI alignment issues, since we don't really know
| a foolproof way of controlling AI. With the AI of today, this
| isn't too bad (the worst you get is stuff like AI confidently
| making up fake facts), but with a superintelligence this could be
| disastrous.
|
| It's very irresponsible for this article to advocate and provide
| a pathway to immediate superintelligence (regardless of whether
| or not it actually works) without even discussing the question of
| how you figure out what you're searching _for_ , and how you'll
| prevent that superintelligence from being evil.
| nullc wrote:
| I don't think your response is appropriate. Narrow domain
| "superintelligence" is around us everywhere-- every PID
| controller can drive a process to its target far beyond any
| human capability.
|
| The obvious way to incorporate good search is to have extremely
| fast models that are being used in the search interior loop.
| Such models would be inherently less general, and likely
| trained on the specific problem or at least domain-- just for
| performance sake. The lesson in this article was that a tiny
| superspecialized model inside a powerful transitional search
| framework significantly outperformed a much larger more general
| model.
|
| Use of explicit external search should make the optimization
| system's behavior and objective more transparent and tractable
| than just sampling the output of an auto-regressive model
| alone. If nothing else you can at least look at the branches it
| did and didn't explore. It's also a design that's more easy to
| bolt in varrious kinds of regularizes, code to steer it away
| from parts of the search space you don't want it operating in.
|
| The irony of all the AI scaremongering is that if there is ever
| some evil AI with some LLM as an important part of its
| reasoning process if it is evil it may well be so because being
| evil is a big part of the narrative it was trained on. :D
| coldtea wrote:
| Of course "superintelligence" is just a mythical creature at
| the moment, with no known path to get there, or even a specific
| proof of what it even means - usually it's some hand waving
| about capabilities that sound magical, when IQ might very well
| be subject to diminishing returns.
| drdeca wrote:
| Do you mean no way to get there within realistic computation
| bounds? Because if we allow for arbitrarily high (but still
| finite) amounts of compute, then some computable
| approximation of AIXI should work fine.
| coldtea wrote:
| > _Do you mean no way to get there within realistic
| computation bounds?_
|
| I mean there's no well defined "there" either.
|
| It's a hand-waved notion that adding more intelligence
| (itself not very well defined, but let's use IQ) you get to
| something called "hyperintelligence", say IQ 1000 or IQ
| 10000, that has what can be described as magical powers,
| like it can convince any person to do anything, can invent
| things at will, huge business success, market prediction,
| and so on.
|
| Whether intelligence is cummulative like that, or whether
| having it gets you those powers (aside from the succesful
| high IQ people, we know many people with IQ 145+ that are
| not inventing stuff left and right, or convincing people
| with some greater charisma than the average IQ 100 or 120
| politician, but e.g. are just sad MENSA losers, whose
| greatest achievement is their test scores).
|
| > _Because if we allow for arbitrarily high (but still
| finite) amounts of compute, then some computable
| approximation of AIXI should work fine._
|
| I doubt that too. The limit for LLMs for example is more
| human produced training data (a hard limit) than compute.
| drdeca wrote:
| > itself not very well defined, but let's use IQ
|
| IQ has an issue that is inessential to the task at hand,
| which is how it is based on a population distribution. It
| doesn't make sense for large values (unless there is a
| really large population satisfying properties that aren't
| satisfied).
|
| > I doubt that too. The limit for LLMs for example is
| more human produced training data (a hard limit) than
| compute.
|
| Are you familiar with what AIXI is?
|
| When I said "arbitrarily large", it wasn't for laziness
| reasons that I didn't give an amount that is plausibly
| achievable. AIXI is kind of goofy. The full version of
| AIXI is uncomputable (it uses a halting oracle), which is
| why I referred to the computable approximations to it.
|
| AIXI doesn't exactly need you to give it a training set,
| just put it in an environment where you give it a way to
| select actions, and give it a sensory input signal, and a
| reward signal.
|
| Then, assuming that the environment it is in is
| computable (which, recall, AIXI itself is not), its long-
| run behavior will maximize the expected (time discounted)
| future reward signal.
|
| There's a sense in which it is asymptotically optimal
| across computable environments (... though some have
| argued that this sense relies on a distribution over
| environments based on the enumeration of computable
| functions, and that this might make this property kinda
| trivial. Still, I'm fairly confident that it would be
| quite effective. I think this triviality issue is mostly
| a difficulty of having the right definition.)
|
| (Though, if it was possible to implement practically, you
| would want to make darn sure that the most effective way
| for it to make its reward signal high would be for it to
| do good things and not either bad things or to crack open
| whatever system is setting the reward signal in order for
| it to set it itself.)
|
| (How it works: AIXI basically enumerates through all
| possible computable environments, assigning initial
| probability to each according to the length of the
| program, and updating the probabilities based on the
| probability of that environment providing it with the
| sequence of perceptions and reward signals it has
| received so far when the agent takes the sequence of
| actions it has taken so far. It evaluates the expected
| values of discounted future reward of different
| combinations of future actions based on its current
| assigned probability of each of the environments under
| consideration, and selects its next action to maximize
| this. I think the maximum length of programs that it
| considers as possible environments increases over time or
| something, so that it doesn't have to consider infinitely
| many at any particular step.)
| aidan_mclau wrote:
| Hey! Essay author here.
|
| >The cool thing about using modern LLMs as an eval/policy model
| is that their RLHF propagates throughout the search.
|
| >Moreover, if search techniques work on the token level
| (likely), their thoughts are perfectly interpretable.
|
| I suspect a search world is substantially more alignment-
| friendly than a large model world. Let me know your thoughts!
| Tepix wrote:
| Your webpage is broken for me. The page appears briefly, then
| there's a french error message telling me that an error
| occured and i can retry.
|
| Mobile Safari, phone set to french.
| abid786 wrote:
| I'm in the same situation (mobile Safari, French phone) but
| if you use Chrome it works
| mxwsn wrote:
| The effectiveness of search goes hand-in-hand with quality of the
| value function. But today, value functions are incredibly domain-
| specific, and there is weak or no current evidence (as far as I
| know) that we can make value functions that generalize well to
| new domains. This article effectively makes a conceptual leap
| from "chess has good value functions" to "we can make good value
| functions that enable search for AI research". I mean yes, that'd
| be wonderful - a holy grail - but can we really?
|
| In the meantime, 1000x or 10000x inference time cost for running
| an LLM gets you into pretty ridiculous cost territory.
| dsjoerg wrote:
| Self-evaluation might be good enough in some domains? Then the
| AI is doing repeated self-evaluation, trying things out to find
| a response that scores higher according to its self metric.
| dullcrisp wrote:
| Sorry but I have to ask: what makes you think this would be a
| good idea?
| skirmish wrote:
| This will just lead to the evaluatee finding anomalies in
| evaluator and exploiting them for maximum gains. It
| happened many times already where a ML model controled an
| object in a physical world simulator, and all it learned
| was to exploit simulator bugs [1]
|
| [1] https://boingboing.net/2018/11/12/local-optima-r-
| us.html
| CooCooCaCha wrote:
| Thats a natural tendency for optimization algorithms
| Jensson wrote:
| Being able to fix your errors and improve over time until
| there are basically no errors is what humans do, so far all
| AI models just corrupt knowledge they don't purify knowledge
| like humanity did except when scripted with a good value
| function from a human like AlphaGo where the value function
| is winning games.
|
| This is why you need to constantly babysit todays AI and tell
| it to do steps and correct itself all the time, because you
| are much better at getting to pure knowledge than the AI is,
| it would quickly veer away into nonsense otherwise.
| visarga wrote:
| > all AI models just corrupt knowledge they don't purify
| knowledge like humanity
|
| You got to take a step back and look at LLMs like ChatGPT.
| With 180 million users and assuming 10,000 tokens per user
| per month, that's 1.8 trillion interactive tokens.
|
| LLMs are given tasks, generate responses, and humans use
| those responses to achieve their goals. This process
| repeats over time, providing feedback to the LLM. This can
| scale to billions of iterations per month.
|
| The fascinating part is that LLMs encounter a vast
| diversity of people and tasks, receiving supporting
| materials, private documents, and both implicit and
| explicit feedback. Occasionally, they even get real-world
| feedback when users return to iterate on previous
| interactions.
|
| Taking a role of assistant LLMs are primed to learn from
| the outcomes of their actions, scaling across many people.
| Thus they can learn from our collective feedback signals
| over time.
|
| Yes, that uses a lot of human in the loop, not just real
| world in the loop, but humans are also dependent on culture
| and society, I see no need for AI to be able to do it
| without society. I actually think that AGI will be a
| collective/network of humans and AI agents, this
| perspective fits right in. AI will be the knowledge and
| experience flywheel of humanity.
| seadan83 wrote:
| > This process repeats over time, providing feedback to
| the LLM
|
| To what extent do you know this to be true? Can you
| describe the mechanism that is used?
|
| I would contrast your statement with cases where chat gpt
| generated something, I read it and note various incorrect
| things and then walk away. Further, there are cases where
| the human does not realize there are errors. In both
| cases I'm not aware of any kind of feedback loop that
| would even be really possible - i never told the LLM it
| was wrong. Nor should the LLM assume it was wrong because
| I run more queries. Thus, there is no signal back that
| the answers were wrong.
|
| Hence, where do you see the feedback loop existing?
| visarga wrote:
| > To what extent do you know this to be true? Can you
| describe the mechanism that is used?
|
| Like, for example, a developer working on a project, will
| iterate many times, some codes generated by AI might
| generate errors, they will discuss that with the model to
| fix the code. This way the model gets not just one round
| interactions, but multi-round with feedback.
|
| > I read it and note various incorrect things and then
| walk away.
|
| I think the general pattern will be people sticking with
| the task longer when it fails, trying to solve it with
| persistence. This is all aggregated over a huge number of
| sessions and millions of users.
|
| In order to protect privacy we could only train
| preference models from this feedback data, and then fine-
| tune the base model without using the sensitive
| interaction logs directly. The model would learn a
| preference for how to act in specific contexts, but not
| remember the specifics.
| seadan83 wrote:
| > I think the general pattern will be people sticking
| with the task longer when it fails, trying to solve it
| with persistence.
|
| This is where I quibble. Accurately detecting someone is
| actually still on the same task (indicating they are not
| satisfied) is perhaps as challenging as generating any
| answer to begin with.
|
| That is why I also mentioned when people don't know the
| result was incorrect. That'll potentially drive a strong
| "this answer was correct" signal.
|
| So, during development of a tool, I can envision that
| feedback loop. But something simply presented to
| millions, without a way to determine false negatives nor
| false positives-- how exactly does that feedback loop
| work?
|
| _edit_ We might be talking past each other. I did not
| quite read that "discuss with AI" part carefully. I was
| picturing something like copilot or chat got, where it is
| pretty much: 'here is your answer's
|
| Even with an interactive AI, how to account for positive
| negatives (human accepts wrong answer), or when human
| simply gives up (which also looks like a success). If we
| told AI evertime it was wrong or right - can that scale
| to the extent it would actually train a model?
| HarHarVeryFunny wrote:
| I think AGI is going to have to learn itself on-the-job,
| same as humans. Trying to collect human traces of on-the-
| job training/experience and pre-training an AGI on them
| seems doomed to failure since the human trace is grounded
| in the current state (of knowledge/experience) of the
| human's mind, but what the AGI needs is updates relative
| to it's own state. In any case, for a job like (human-
| level) software developer AGI is going to need human-
| level runtime reasoning and learning (>> in-context learn
| by example), even if it were possible to "book pre-train"
| it rather than on-the-job train it.
|
| Outside of repetitive genres like CRUD-apps, most
| software projects are significantly unique, even if they
| re-use learnt developer skills - it's like Chollet's ARC
| test on mega-steroids, with dozens/hundreds of partial-
| solution design techniques, and solutions that require a
| hierarchy of dozens/hundreds of these partial-solutions
| (cf Chollet core skills, applied to software) to be
| arranged into a solution in a process of iterative
| refinement.
|
| There's a reason senior software developers are paid a
| lot - it's not just economically valuable, it's also one
| of the more challenging cognitive skills that humans are
| capable of.
| ben_w wrote:
| Kinda, but also not entirely.
|
| One of the things OpenAI did to improve performance was to
| train an AI to determine how a human would rate an output,
| and use _that_ to train the LLM itself. (Kinda like a GAN,
| now I think about it).
|
| https://forum.effectivealtruism.org/posts/5mADSy8tNwtsmT3KG
| /...
|
| But this process has probably gone as far as it can go, at
| least with current architectures for the parts, as per
| Amdahl's law.
| jgalt212 wrote:
| > Self-evaluation might be good enough in some domains?
|
| This works perfectly in games. e.g. Alpha Zero. In other
| domains, not so much.
| coffeebeqn wrote:
| Games are closed systems. There's no unknowns in the rule
| set or world state because the game wouldn't work if there
| were. No unknown unknowns. Compare to physics or biology
| where we have no idea if we know 1% or 90% of the rules at
| this point.
| jgalt212 wrote:
| self-evaluation would still work great even where there
| are probabilistic and changing rule sets. The linchpin of
| the whole operation is automated loss function
| evaluation, not a set of known and deterministic rules.
| Once you have to pay and employ humans to compute loss
| functions, the scale falls apart.
| cowpig wrote:
| > The effectiveness of search goes hand-in-hand with quality of
| the value function. But today, value functions are incredibly
| domain-specific, and there is weak or no current evidence (as
| far as I know) that we can make value functions that generalize
| well to new domains.
|
| Do you believe that there will be a "general AI" breakthrough?
| I feel as though you have expressed the reason I am so
| skeptical of all these AI researchers who believe we are on the
| cusp of it (what "general AI" means exactly never seems to be
| very well-defined)
| mxwsn wrote:
| I think capitalistic pressures favor narrow superhuman AI
| over general AI. I wrote on this two years ago:
| https://argmax.blog/posts/agi-capitalism/
|
| Since I wrote about this, I would say that OpenAI's
| directional struggles are some confirmation of my hypothesis.
|
| summary: I believe that AGI is possible but will take
| multiple unknown breakthroughs on an unknown timeline, but
| most likely requires long-term concerted effort with much
| less immediate payoff than pursuing narrow superhuman AI,
| such that serious efforts at AGI is not incentivized much in
| capitalism.
| shrimp_emoji wrote:
| But I thought the history of capitalism is an invasion from
| the future by an artificial intelligence that must assemble
| itself entirely from its enemy's resources.
|
| NB: I agree; I think AGI will first be achieved with
| genetic engineering, which is a path of way lesser
| resistance than using silicon hardware (which is probably a
| century plus off at the minimum from being powerful enough
| to emulate a human brain).
| HarHarVeryFunny wrote:
| Yeah, Stockfish is probably evaluating many millions of
| positions when looking 40-ply ahead, even with the limited
| number of legal chess moves in a given position, and with an
| easy criteria for heavy early pruning (once a branch becomes
| losing, not much point continuing it). I can't imagine the cost
| of evaluating millions of LLM continuations, just to select the
| optimal one!
|
| Where tree search might make more sense applied to LLMs is for
| more coarser grained reasoning where the branching isn't based
| on alternate word continuations but on alternate what-if lines
| of thought, but even then it seems costs could easily become
| prohibitive, both for generation and evaluation/pruning, and
| using such a biased approach seems as much to fly in the face
| of the bitter lesson as be suggested by it.
| mxwsn wrote:
| Yes absolutely and well put - a strong property of chess is
| that next states are fast and easy to enumerate, which makes
| search particularly easy and strong, while next states are
| much slower, harder to define, and more expensive to
| enumerate with an LLM
| typon wrote:
| The cost of the LLM isn't the only or even the most
| important cost that matters. Take the example of automating
| AI research: evaluating moves effectively means inventing a
| new architecture or modifying an existing one, launching a
| training run and evaluating the new model on some suite of
| benchmarks. The ASI has to do this in a loop, gather
| feedback and update its priors - what people refer to as
| "Grad student descent". The cost of running each train-eval
| iteration during your search is going to be significantly
| more than generating the code for the next model.
| HarHarVeryFunny wrote:
| You're talking about applying tree search as a form of
| network architecture search (NAS), which is different
| from applying it to LLM output sampling.
|
| Automated NAS has been tried for (highly constrained)
| image classifier design, before simpler designs like
| ResNets won the day. Doing this for billion parameter
| sized models would certainly seem to be prohibitively
| expensive.
| typon wrote:
| I'm not following. How do you propose search is performed
| by the ASI designed for "AI Research"? (as proposed by
| the article)
| HarHarVeryFunny wrote:
| Fair enough - he discusses GPT-4 search halfway down the
| article, but by the end is discussing self-improving AI.
|
| Certainly compute to test ideas (at scale) is the
| limiting factor for LLM developments (says Sholto @
| Google), but if we're talking moving beyond LLMs, not
| just tweaking them, then it seems we need more than
| architecture search anyways.
| therobots927 wrote:
| Well people certainly are good at finding new ways to
| consume compute power. Whether it's mining bitcoins or
| training a million AI models at once to generate a "meta
| model" that we _think_ could achieve escape velocity.
| What happens when it doesn't? And Sam Altman and the
| author want to get the government to pay for this? Am I
| reading this right?
| byteknight wrote:
| Isn't evaluating against different effective "experts" within
| the model effectively what MoE [1] does?
|
| > Mixture of experts (MoE) is a machine learning technique
| where multiple expert networks (learners) are used to divide
| a problem space into homogeneous regions.[1] It differs from
| ensemble techniques in that for MoE, typically only one or a
| few expert models are run for each input, whereas in ensemble
| techniques, all models are run on every input.
|
| [1] https://en.wikipedia.org/wiki/Mixture_of_experts
| HarHarVeryFunny wrote:
| No - MoE is just a way to add more parameters to a model
| without increasing the cost (number of FLOPs) of running
| it.
|
| The way MoE does this is by having multiple alternate
| parallel paths through some parts of the model, together
| with a routing component that decides which path (one only)
| to send each token through. These paths are the "experts",
| but the name doesn't really correspond to any intuitive
| notion of expert. So, rather than having 1 path with N
| parameters, you have M paths (experts) each with N
| parameters, but each token only goes through one of them,
| so number of FLOPs is unchanged.
|
| With tree search, whether for a game like Chess or
| potentially LLMs, you are growing a "tree" of all possible
| alternate branching continuations of the game (sentence),
| and keeping the number of these branches under control by
| evaluating each branch (= sequence of moves) to see if it
| is worth continuing to grow, and if not discarding it
| ("pruning" it off the tree).
|
| With Chess, pruning is easy since you just need to look at
| the board position at the tip of the branch and decide if
| it's a good enough position to continue playing from
| (extending the branch). With an LLM each branch would
| represent an alternate continuation of the input prompt,
| and to decide whether to prune it or not you'd have to pass
| the input + branch to another LLM and have it decide if it
| looked promising or not (easier said than done!).
|
| So, MoE is just a way to cap the cost of running a model,
| while tree search is a way to explore alternate
| continuations and decide which ones to discard, and which
| ones to explore (evaluate) further.
| PartiallyTyped wrote:
| How does MoE choose an expert?
|
| From the outside and if we squint a bit; this looks a lot
| like an inverted attention mechanism where the token
| attends to the experts.
| telotortium wrote:
| Usually there's a small neural network that makes the
| choice for each token in an LLM.
| HarHarVeryFunny wrote:
| I don't know the details, but there are a variety of
| routing mechanisms that have been tried. One goal is to
| load balance tokens among the experts so that each
| expert's parameters are equally utilized, which it seems
| must sometimes conflict with wanting to route to an
| expert based on the token itself.
| magicalhippo wrote:
| From what I can gather it depends, but could be a simple
| Softmax-based layer[1] or just argmax[2].
|
| There was also a recent post[3] about a model where they
| used a cross-attention layer to let the expert selection
| be more context aware.
|
| [1]: https://arxiv.org/abs/1701.06538
|
| [2]: https://arxiv.org/abs/2208.02813
|
| [3]: https://news.ycombinator.com/item?id=40675577
| stevage wrote:
| Pruning isn't quite as easy as you make it sound. There are
| lots of famous examples where chess engines misevaluate a
| position because they prune out apparently losing moves that
| are actually winning.
|
| Eg https://youtu.be/TtJeE0Th7rk?si=KVAZufm8QnSW8zQo
| vlovich123 wrote:
| > and with an easy criteria for heavy early pruning (once a
| branch becomes losing, not much point continuing it)
|
| This confuses me. Positions that seem like they could be
| losing (but haven't lost yet) could become winning if you
| search deep enough.
| HarHarVeryFunny wrote:
| Yes, and even a genuinely (with perfect play) losing
| position can win if it sharp enough and causes your
| opponent to make a mistake! There's also just the relative
| strength of one branch vs another - have to prune some if
| there are too many.
|
| I was just trying to give the flavor of it.
| eru wrote:
| > [E]ven a genuinely (with perfect play) losing position
| can win if it sharp enough and causes your opponent to
| make a mistake!
|
| Chess engines typically assume that the opponent plays to
| the best of their abilities, don't they?
| slyall wrote:
| The Contempt Factor is used by engines sometimes.
|
| "The Contempt Factor reflects the estimated
| superiority/inferiority of the program over its opponent.
| The Contempt factor is assigned as draw score to avoid
| (early) draws against apparently weaker opponents, or to
| prefer draws versus stronger opponents otherwise."
|
| https://www.chessprogramming.org/Contempt_Factor
| nullc wrote:
| Imagine that there was some non-constructive proof that
| white would always win in perfect play. Would a well
| constructed chess engine always resign as black? :P
| AnimalMuppet wrote:
| > Where tree search might make more sense applied to LLMs is
| for more coarser grained reasoning where the branching isn't
| based on alternate word continuations but on alternate what-
| if lines of thought...
|
| To do that, the LLM would have to have some notion of "lines
| of thought". They don't. That is completely foreign to the
| design of LLMs.
| HarHarVeryFunny wrote:
| Right - this isn't something that LLMs currently do. Adding
| search would be a way to add reasoning. Think of it as part
| of a reasoning agent - external scaffolding similar to tree
| of thoughts.
| CooCooCaCha wrote:
| We humans learn our own value function.
|
| If I get hungry for example, my brain will generate a plan to
| satisfy that hunger. The search process and the evaluation
| happen in the same place, my brain.
| skulk wrote:
| The "search" process for your brain structure took 13 billion
| years and 20 orders of magnitude more computation than we
| will ever harness.
| CooCooCaCha wrote:
| So what's your point? That we can't create AGI because it
| took evolution a really long time?
| wizzwizz4 wrote:
| Creating a human-level intelligence artificially is easy:
| just copy what happens in nature. We already have this
| technology, and we call it IVF.
|
| The idea that humans aren't the only way of producing
| human-level intelligence is taken as a given in many
| academic circles, but we don't really have any reason to
| _believe_ that. It 's an article of faith (as is its
| converse - but the converse is at least in-principle
| falsifiable).
| CooCooCaCha wrote:
| "Creating a human-level intelligence artificially is
| easy: just copy what happens in nature. We already have
| this technology, and we call it IVF."
|
| What's the point of this statement? You know that IVF has
| nothing to do with artificial intelligence (as in
| intelligent machines). Did you just want to sound smart?
| stale2002 wrote:
| Of course it is related to the topic.
|
| It is related because the goal of all of this is to
| create human level intelligence or better.
|
| And that is a probable way to do it, instead of these
| other, less established methods that we don't know if
| they will work or not.
| sebzim4500 wrote:
| Even if the best we ever do is something with the
| intelligence and energy use of the human brain that would
| still be a massive (5 ooms?) improvement on the status
| quo.
|
| You need to pay people, and they use a bunch of energy
| commuting, living in air conditioned homes, etc. which
| has nothing to do with powering the brain.
| kragen wrote:
| i'm surprised you think we will harness so little
| computation. the universe's lifetime is many orders of
| magnitude longer than 13 billion years, and especially the
| 4.5 billion years of earth's own history, and the universe
| is much larger than earth's biosphere, most of which
| probably has not been exploring the space of possible
| computations very efficiently
| jujube3 wrote:
| Neither the Earth nor life have been around for 13 billion
| years.
| HarHarVeryFunny wrote:
| I don't think there's much in our brain of significance to
| intelligence older than ~200M years.
| Jensson wrote:
| 200M years ago you had dinosairs, they were significantly
| dumber than mammals.
|
| 400M years ago you had fish and arthropods, even dumber
| than dinosaurs.
|
| Brain size grows as intelligence grows, the smarter you
| are the more use you have for compute so the bigger your
| brain gets. It took a really long time for intelligence
| to develop enough that brains as big as mammals were
| worth it.
| HarHarVeryFunny wrote:
| Big brain (intelligence) comes at a huge cost, and is
| only useful if you are a generalist.
|
| I'd assume that being a generalist drove intelligence. It
| may have started with warm bloodedness and feathers/fur
| and further boosted in mammals with milk production (&
| similar caring for young by birds) - all features that
| reduce dependence on specific environmental conditions
| and therefore expose the species to more diverse
| environments where intelligence becomes valuable.
| fizx wrote:
| I think we have ok generalized value functions (aka LLM
| benchmarks), but we don't have cheap approximations to them,
| which is what we'd need to be able to do tree search at
| inference time. Chess works because material advantage is a
| pretty good approximation to winning and is trivially
| calculable.
| computerphage wrote:
| Stockfish doesn't use material advantage as an approximation
| to winning though. It uses a complex deep learning value
| function that it evaluates many times.
| alexvitkov wrote:
| Still, the fact that there are obvious heuristics makes
| that function easier to train and and makes it presumably
| not need an absurd number of weights.
| bongodongobob wrote:
| No, without assigning value to pieces, the heuristics are
| definitely not obvious. You're taking about 20 year old
| chess engines or beginner projects.
| alexvitkov wrote:
| Everyone understands a queen is worth more than a pawn.
| Even if you don't know the exact value of one piece
| relative to another, the rough estimate "a queen is worth
| five to ten pawns" is a lot better than not assigning
| value at all. I highly doubt even 20 year old chess
| engines or beginner projects value a queen and pawn the
| same.
|
| After that, just adding up the material on both sides,
| without taking into account the position of the pieces at
| all, is a heuristic that will correctly predict the
| winning player on the vast majority of all possible board
| positions.
| navane wrote:
| He agrees with you on the 20yr old engines and beginner
| projects.
| wrsh07 wrote:
| All you need for a good value function is high quality
| simulation of the task.
|
| Some domains have better versions of this than others (eg
| theorem provers in math precisely indicate when you've
| succeeded)
|
| Incidentally, lean could add a search like feature to help
| human researchers, and this would advance ai progress on math
| as well
| fire_lake wrote:
| I didn't understand this piece.
|
| What do they mean by using LLMs with search? Is this simply RAG?
| roca wrote:
| They mean something like the minmax algorithm used in game
| engines.
| Legend2440 wrote:
| "Search" here means trying a bunch of possibilities and seeing
| what works. Like how a sudoku solver or pathfinding algorithm
| does search, not how a search engine does.
| fire_lake wrote:
| But the domain of "AI Research" is broad and imprecise - not
| simple and discrete like chess game states. What is the type
| of each point in the search space for AI Research?
| moffkalast wrote:
| Well if we knew how to implement it, then we'd already have
| it eh?
| fire_lake wrote:
| In chess we know how to describe all possible board
| states and the transitions (the next moves). We just
| don't know which transition is the best to pick, hence
| it's a well defined search problem.
|
| With AI Research we don't even know the shape of the
| states and transitions, or even if that's an appropriate
| way to think about things.
| tsaixingwei wrote:
| Given the example of Pfizer in the article, I would tend to
| agree with you that 'search' in this context means augmenting
| GPT with RAG of domain specific knowledge.
| rassibassi wrote:
| In this context, RAG isn't what's being discussed. Instead, the
| reference is to a process similar to monte carlo tree search,
| such as that used in the AlphaGo algorithm.
|
| Presently, a large language model (LLM) uses the same amount of
| computing resources for both simple and complex problems, which
| is seen as a drawback. Imagine if an LLM could adjust its
| computational effort based on the complexity of the task.
| During inference, it might then perform a sort of search across
| the solution space. The "search" mentioned in the article means
| just that, a method of dynamically managing computational
| resources at the time of testing, allowing for exploration of
| the solution space before beginning to "predict the next
| token."
|
| At OpenAI Noam Brown is working on this, giving AI the ability
| to "ponder" (or "search"), see his twitter post:
| https://x.com/polynoamial/status/1676971503261454340
| timfsu wrote:
| This is a fascinating idea - although I wish the definition of
| search in the LLM context was expanded a bit more. What kind of
| search capability strapped onto current-gen LLMs would give them
| superpowers?
| gwd wrote:
| I think what may be confusing is that the author is using
| "search" here in the AI sense, not in the Google sense: that
| is, having an internal simulator of possible actions and
| possible reactions, like Stockfish's chess move search (if I do
| A, it could do B C or D; if it does B, I can do E F or G, etc).
|
| So think about the restrictions current LLMs have:
|
| * They can't sit and think about an answer; they can "think out
| loud", but they have to start talking, and they can't go back
| and say, "No wait, that's wrong, let's start again."
|
| * If they're composing something, they can't really go back and
| revise what they've written
|
| * Sometimes they can look up reference material, but they can't
| actually sit and digest it; they're expected to skim it and
| then give an answer.
|
| How would you perform under those circumstances? If someone
| were to just come and ask you any question under the sun, and
| you had to just start talking, without taking any time to think
| about your answer, and without being able to say "OK wait, let
| me go back"?
|
| I don't know about you, but there's no way I would be able to
| perform anywhere _close_ to what ChatGPT 4 is able to do.
| People complain that ChatGPT 4 is a "bullshitter", but given
| its constraints that's all you or I would be in the same
| situation -- but it's already way, way better than I could ever
| be.
|
| Given its limitations, ChatGPT is _phenomenal_. So now imagine
| what it could do if it _were_ given time to just "sit and
| think"? To make a plan, to explore the possible solution space
| the same way that Stockfish does? To take notes and revise and
| research and come back and think some more, before having to
| actually answer?
|
| Reading this is honestly the first time in a while I've
| believed that some sort of "AI foom" might be possible.
| cbsmith wrote:
| > They can't sit and think about an answer; they can "think
| out loud", but they have to start talking, and they can't go
| back and say, "No wait, that's wrong, let's start again."
|
| I mean, technically, they could say that.
| refulgentis wrote:
| Llama 3 does, it's a funny design now, if you also throw in
| training to encourage CoT. Maybe more correct but verbosity
| can be grating
|
| CoT answer Wait! No, that's not right: CoT...
| fspeech wrote:
| "How would you perform under those circumstances?" My son
| would recommend Improv classes.
|
| "Given its limitations, ChatGPT is phenomenal." But this
| doesn't translate since it learned everything from data and
| there is no data on "sit and think".
| cgearhart wrote:
| [1] applied AlphaZero style search with LLMs to achieve
| performance comparable to GPT-4 Turbo with a llama3-8B base
| model. However, what's missing entirely from the paper (and the
| subject article in this thread) is that tree search is
| _massively_ computationally expensive. It works well when the
| value function enables cutting out large portions of the search
| space, but the fact that the LLM version was limited to only 8
| rollouts (I think it was 800 for AlphaZero) implies to me that
| the added complexity is not yet optimized or favorable for
| LLMs.
|
| [1] https://arxiv.org/abs/2406.07394
| 1024core wrote:
| The problem with adding "search" to a model is that the model has
| already seen everything to be "search"ed in its training data.
| There is nothing left.
|
| Imagine if Leela (author's example) had been trained on every
| chess board position out there (I know it's mathematically
| impossible, but bear with me for a second). If Leela had been
| trained on every board position, it may have whupped Stockfish.
| So, adding "search" to Leela would have been pointless, since it
| would have seen every board position out there.
|
| Today's LLMs are trained on every word ever written on the 'net,
| every word ever put down in a book, every word uttered in a video
| on Youtube or a podcast.
| yousif_123123 wrote:
| Still, similar to when you have read 10 textbooks, if you are
| answering a question and have access to the source material, it
| can help you in your answer.
| groby_b wrote:
| You're omitting the somewhat relevant part of recall ability. I
| can train a 50 parameter model on the entire internet, and
| while it's seen it all, it won't be able to recall it. (You can
| likely do the same thing with a 500B model for similar results,
| though it's getting somewhat closer to decent recall)
|
| The whole point of deep learning is that the model learns to
| generalize. It's not to have a perfect storage engine with a
| human language query frontend.
| sebastos wrote:
| Fully agree, although it's interesting to consider the
| perspective that the entire LLM hype cycle is largely built
| around the question "what if we punted on actual thinking and
| instead just tried to memorize everything and then provide a
| human language query frontend? Is that still useful?"
| Arguably it is (sorta), and that's what is driving this
| latest zeitgeist. Compute had quietly scaled in the
| background while we were banging our heads against real
| thinking, until one day we looked up and we still didn't have
| a thinking machine, but it was now approximately possible to
| just do the stupid thing and store "all the text on the
| internet" in a lookup table, where the keys are prompts.
| That's... the opposite of thinking, really, but still
| sometimes useful!
|
| Although to be clear I think actual reasoning systems are
| what we should be trying to create, and this LLM stuff seems
| like a cul-de-sac on that journey.
| skydhash wrote:
| The thing is that current chat tools forgo the source
| material. A proper set of curated keywords can give you a
| less computational intensive search.
| salamo wrote:
| If the game was small enough to memorize, like tic tac toe, you
| could definitely train a neural net to 100% accuracy. I've done
| it, it works.
|
| The problem is that for most of the interesting problems out
| there, it isn't possible to see every possibility let alone
| memorize it.
| kragen wrote:
| you are making the mistake of thinking that 'search' means
| database search, like google or sqlite, but 'search' in the ai
| context means tree search, like a* or tabu search. the spaces
| that tree search searches are things like all possible chess
| games, not all chess games ever played, which is a smaller
| space by a factor much greater than the number of atoms in the
| universe
| groby_b wrote:
| While I respect the power of intuition - this may well be a great
| path - it's worth keeping in mind that this is currently just
| that. A hunch. Leela got crushed due to AI directed search, what
| if we can wave a wand and hand all AIs search. Somehow.
| Magically. Which will then somehow magically trounce current LLMs
| at domain-specific task.
|
| There's a kernel of truth in there. See the papers on better
| results via monte carlo search trees (e.g. [1]). See mixture-of-
| LoRA/LoRA-swarm approaches. (I swear there's a startup using the
| approach of tons of domain-specific LoRAs, but my brain's not
| yielding the name)
|
| Augmenting LLM capabilities via _some_ sort of cheaper and more
| reliable exploration is likely a valid path. It's not GPT-8 next
| year, though.
|
| [1] https://arxiv.org/pdf/2309.03224
| memothon wrote:
| Did you happen to remember the domain-specific LoRA startup?
| hansonw wrote:
| https://news.ycombinator.com/item?id=40675577
| hartator wrote:
| Isn't the "search" space infinite though and impossible to
| qualify "success"?
|
| You can't just give LLMs infinite compute time and expect them to
| find answers for like "cure cancer". Even chess moves that seem
| finite and success quantifiable are an also infinite problem and
| the best engines take "shortcuts" in their "thinking". It's
| impossible to do for real world problems.
| cpill wrote:
| the recent episode of Machine Learning Street Talk on control
| theory for LLMs sounds like it's thinking in this direction.
| Say you have 100k agents searching through research papers, and
| then trying every combination of them, 100k^2, to see if there
| is any synergy of ideas, and you keep doing this for all the
| successful combos... some of these might give the researchers
| some good ideas to try out. I can see it happening, if they can
| fine tune a model that becomes good at idea synergy. but then
| again real creativity is hard
| Mehvix wrote:
| How would one finetune for "idea synergy"?
| salamo wrote:
| Search is almost certainly necessary, and I think the trillion
| dollar cluster maximalists probably need to talk to people who
| created superhuman chess engines that now can run on smartphones.
| Because one possibility is that someone figures out how to beat
| your trillion dollar cluster with a million dollar cluster, or
| 500k million dollar clusters.
|
| On chess specifically, my takeaway is that the branching factor
| in chess never gets so high that a breadth-first approach is
| unworkable. The median branching factor (i.e. the number of legal
| moves) maxes out at around 40 but generally stays near 30. The
| most moves I have ever found in any position from a real game was
| 147, but at that point almost every move is checkmate anyways.
|
| Creating superhuman go engines was a challenge for a long time
| because the branching factor is so much larger than chess.
|
| Since MCTS is less thorough, it makes sense that a full search
| could find a weakness and exploit it. To me, the question is
| whether we can apply breadth-first approaches to larger games and
| situations, and I think the answer is clearly no. Unlike chess,
| the branching factor of real-world situations is orders of
| magnitude larger.
|
| But also unlike chess, which is highly chaotic (small decisions
| matter a lot for future state), most small decisions _don 't
| matter_. If you're flying from NYC to LA, it matters a lot if you
| drive or fly or walk. It mostly doesn't matter if you walk out
| the door starting with your left foot or your right. It mostly
| doesn't matter if you blink now or in two seconds.
| cpill wrote:
| I think the branching factor for LLMs is around 50k for the
| number of next possible tokens.
| refulgentis wrote:
| 100%, GPT-3 <= x < GPT-4o, 100,064, x = GPT-4o, 199,996. (My
| EoW emergency was the const Map that stored them broke the
| build, so these #s happen to be top of mind)
| kippinitreal wrote:
| I wonder if in an application you could branch on something
| more abstract than tokens. While there might by 50k token
| branches and 1k of reasonable likelihood, those actually
| probably cluster into a few themes you could branch off of.
| For example "he ordered a ..." [burger, hot dog, sandwich:
| food] or [coke, coffee, water: drinks] or [tennis racket,
| bowling ball, etc: goods].
| Hugsun wrote:
| My guess is that it's much lower. I'm having a hard time
| finding a LLM output logit visualizer online, but IIRC,
| around half of tokens are predicted with >90% confidence.
| There are regularly more difficult tokens that need to be
| predicted but the >1% probability tokens aren't so many,
| probably around 10-20 in most cases.
|
| This is of course based on the outputs of actual models that
| are only so smart, so a tree search that considers all
| possibly relevant ideas is going to have a larger amount of
| branches. Considering how many branches would be pruned to
| maintain grammatical correctness, my guess is that the token-
| level branching factor would be around 30. It could be up to
| around 300, but I highly doubt that it's larger than that.
| optimalsolver wrote:
| Charlie Steiner pointed this out 5 years ago on Less Wrong:
|
| >If you train GPT-3 on a bunch of medical textbooks and prompt it
| to tell you a cure for Alzheimer's, it won't tell you a cure, it
| will tell you what humans have said about curing Alzheimer's ...
| It would just tell you a plausible story about a situation
| related to the prompt about curing Alzheimer's, based on its
| training data. Rather than a logical Oracle, this image-
| captioning-esque scheme would be an intuitive Oracle, telling you
| things that make sense based on associations already present
| within the training set.
|
| >What am I driving at here, by pointing out that curing
| Alzheimer's is hard? It's that the designs above are missing
| something, and what they're missing is search. I'm not saying
| that getting a neural net to directly output your cure for
| Alzheimer's is impossible. But it seems like it requires there to
| already be a "cure for Alzheimer's" dimension in your learned
| model. The more realistic way to find the cure for Alzheimer's,
| if you don't already know it, is going to involve lots of logical
| steps one after another, slowly moving through a logical space,
| narrowing down the possibilities more and more, and eventually
| finding something that fits the bill. In other words, solving a
| search problem.
|
| >So if your AI can tell you how to cure Alzheimer's, I think
| either it's explicitly doing a search for how to cure Alzheimer's
| (or worlds that match your verbal prompt the best, or whatever),
| or it has some internal state that implicitly performs a search.
|
| https://www.lesswrong.com/posts/EMZeJ7vpfeF4GrWwm/self-super...
| lucb1e wrote:
| Generalizing this (doing half a step away from GPT-specifics),
| would it be true to say the following?
|
| "If you train _your logic machine_ on a bunch of medical
| textbooks and prompt it to tell you a cure for Alzheimer 's, it
| won't tell you a cure, it will tell you what those textbooks
| have said about curing Alzheimer's."
|
| Because I suspect not. GPT seems mostly limited to
| regurgitating+remixing what it read, but other algorithms with
| better logic could be able to essentially do a meta study: take
| the results from all Alzheimer's experiments we've done and
| narrow down the solution space to beyond what humans achieved
| so far. A human may not have the headspace to incorporate all
| relevant results at once whereas a computer might
|
| Asking GPT to "think step by step" helps it, so clearly it has
| some form of this necessary logic, and it also performs well at
| "here's some data, transform it for me". It has limitations in
| both how good its logic is and the window across which it can
| do these transformations (but it can remember vastly more data
| from training than from the input token window, so perhaps
| that's a partial workaround). Since it does have both
| capabilities, it does not seem insurmountable to extend it: I'm
| not sure we can rule out that an evolution of GPT can find
| Alzheimer's cure within existing data, let alone a system even
| more suited to this task (still far short of needing AGI)
|
| This requires the data to contain the necessary building blocks
| for a solution, but the quote seems to dismiss the option
| altogether even if the data did contain all information (but
| not yet the worked-out solution) for identifying a cure
| jmugan wrote:
| I believe in search, but it only works if you have an appropriate
| search space. Chess has a well-defined space but the everyday
| world does not. The trick is enabling an algorithm to learn its
| own search space through active exploration and reading about our
| world. I'm working on that.
| jhawleypeters wrote:
| Oh nice! The one thing that confused me about this article was
| what search space the author envisioned adding to language
| models.
| kragen wrote:
| that's interesting; are you building a sort of 'digital twin'
| of the world it's explored, so that it can dream about
| exploring it in ways that are too slow or dangerous to explore
| in reality?
| jmugan wrote:
| The goal is to enable it to model the world at different
| levels of abstraction based on the question it wants to
| answer. You can model car as an object that travels fast and
| carries people, or you can model it down to the level of
| engine parts. The system should be able to pick the level of
| abstraction and put the right model together based on its
| goals.
| kragen wrote:
| so then you can search over configurations of engine parts
| to figure out how to rebuild the engine? i may be
| misunderstanding what you're doing
| jmugan wrote:
| Yeah, you could. Or you could search for shapes of
| different parts that would maximize the engine
| efficiency. The goal is to simultaneously build a
| representation space and a simulator so that anything
| that could be represented could be simulated.
| paraschopra wrote:
| Have you written about this anywhere?
|
| I'm also very interested in this.
|
| I'm at the stage where I'm exploring how to represent
| such a model/simulator.
|
| The world isn't brittle, so representing it as a code /
| graph probably won't work.
| jmugan wrote:
| Right. You have to represent it as something that can
| write code/graphs. I say a little bit here https://www.jo
| nathanmugan.com/WritingAndPress/presentations/...
| jhawleypeters wrote:
| I think I understand the game space that Leela and now Stockfish
| search. I don't understand whether the author envisions LLMs
| searching possibility spaces of 1) written words,
| 2) models of math / RL / materials science, 3) some
| smaller, formalized space like the game space of chess,
|
| all of the above, or something else. Did I miss where that was
| clarified?
| fspeech wrote:
| He wants the search algorithm to be able to search for better
| search algorithms, i.e. self-improving. That would eliminate
| some of the narrower domains.
| TheRoque wrote:
| The whole premise of this article is to compare the chess state
| of the art of 2019 with today, and then they start to talk about
| llms. But chess is a board with 64 squares and 32 pieces, it's
| literally nothing compared to the real physical world. So I don't
| get how this is relevant
| dgoodell wrote:
| That's a good point. Imagine if an LLM could only read, speak,
| and hear at the same speed as a human. How long would training
| a model take?
|
| We can make them read digital media really quickly, but we
| can't really accelerate its interactions with the physical
| world.
| stephc_int13 wrote:
| The author is making a few leap of faith in this article.
|
| First, his example of the efficiency of ML+search for playing
| Chess is interesting but not a proof that this strategy would be
| applicable or efficient in the general domain.
|
| Second, he is implying that some next iteration of ChatGPT will
| reach AGI level, given enough scale and money. This should be
| considered hypothetical until proven.
|
| Overall, he should be more scientific and prudent.
| 6510 wrote:
| I've recently matured to the point where all applications are
| made of 2 things, search and security. The rest is just things
| added on top. If you cant find it it isn't worth having.
| brcmthrowaway wrote:
| This strikes me as Lesswrong style pontificating.
| dzonga wrote:
| slight step aside - do people at notion realize, their own custom
| keyboard shortcuts break the habits built on the web.
|
| cmd + p -- bring up their own custom dialog. instead of printing
| the page as one would expect
| sherburt3 wrote:
| In VsCode cmd+p pulls up the file search dialog, I don't think
| it's that crazy.
| bob1029 wrote:
| It seems there is a fundamental information theory aspect to this
| that would probably save us all a lot of trouble if we would just
| embrace it.
|
| The #1 canary for me: Why does training an LLM require so much
| data that we are concerned we might run out of it?
|
| The clear lack of generalization and/or internal world modeling
| is what is really in the way of a self-bootstrapping AGI/ASI. You
| can certainly try to emulate a world model with clever prompting
| (here's what you did last, heres your objective, etc.), but this
| seems seriously deficient to me based upon my testing so far.
| sdenton4 wrote:
| In my experience, LLMs do a very poor job of generalizing. I
| have also seen self supervised transformer methods usually fail
| to generalize in my domain (which includes a lot of diversity
| and domain shifts). For human language, you can paper over
| failure to generalize by shoveling in more data. In other
| domains, that may not be an option.
| therobots927 wrote:
| It's exactly what you would expect from what an LLM is. It
| predicts the next word in a sequence very well. Is that how
| our brains, or even a bird's brain, for that matter, approach
| cognition? I don't think that's how any animals brain works
| at all, but that's just my opinion. A lot of this discussion
| is speculation. We might as well all wait and see if AGI
| shows up. I'm not holding my breath.
| drdeca wrote:
| Have you heard of predictive processing?
| stevenhuang wrote:
| Most of this is not speculation. It's informed from current
| leading theories in neuroscience of how our brain is
| thought to function.
|
| See predictive coding and the free energy principle, which
| states the brain continually models reality and tries to
| minimize the prediction error.
|
| https://en.m.wikipedia.org/wiki/Predictive_coding
| therobots927 wrote:
| At a certain high level I'm sure you can model the brain
| that way. But we know humans are neuroplastic, and
| through epigenetics it's possible that learning in an
| individual's life span will pass to their offspring.
| Which means human brains have been building internal
| predictive models for billions of years over innumerable
| individual lifespans. The idea that we're anywhere close
| to replicating that with a neural net is completely
| preposterous. And besides my main point was that our
| brains don't think one word at a time. I'm not sure how
| that relates to predictive processing.
| therobots927 wrote:
| Couldn't agree more. For specific applications like drug
| development where you have a constrained problem with fixed set
| of variables and a well defined cost function I'm sure the
| chess analogy will hold. But I think there a core elements of
| cognition missing from chatGPT that aren't easily built.
| zucker42 wrote:
| If I had to bet money on it, researchers at top labs have already
| tried applying search to existing models. The idea to do so is
| pretty obvious. I don't think it's the one key insight to achieve
| AGI as the author claims.
| itissid wrote:
| The problem is the transitive closure of chess move is a chess
| move. The transitive closure of human knowledge and theories to
| do X is new theories never seen before and no Value function can
| do that, unless you are also implying theorem proving is included
| for correctness verification which is also a very difficult
| search and computationally expensive problem on its own.
|
| Also, I think this is instead time to sit back and think what
| exactly is the thing we value in society as well: Personal(Human)
| self-sufficiency(I also like to compare this AI to UBI) and thus
| achievement, which only means Human-in-Loop AI that can help us
| achieve that and that is specific to each individual, i.e. multi-
| atttribute value functions whose weights are learned and they
| change over time.
|
| Writing about AGI and defining it to do the "best" search while
| not talking about what we want it to do *for us* is exactly
| wrong-headed for these reasons.
| skybrian wrote:
| The article seems rather hand-wavy and over-confident about
| predicting the future, but it seems worth trying.
|
| "Search" is a generalization of "generate and test" and rejection
| sampling. It's classic AI. Back before the dot-com era, I took an
| intro to AI course and we learned about writing programs to do
| searches in Prolog.
|
| The speed depends on how long it takes to generate a candidate,
| how long it takes to test it, and how many candidates you need to
| try. If they are slow, it will be slow.
|
| An example of "human in the loop" rejection sampling is when you
| use an image generator and keep trying different prompts until
| you get an image you like. But the loop is slow due to how long
| it takes to generate a new image. If image generation were so
| fast that it worked like Google Image search, then we'd really
| have something.
|
| Theorem proving and program fuzzing seem like good candidates for
| combining search with LLM's, due to automated, fast, good
| evaluation functions.
|
| And it looks like Google has released a fuzzer [1] that can be
| connected to whichever LLM's you like. Has anyone tried it?
|
| [1] https://github.com/google/oss-fuzz-gen
| PartiallyTyped wrote:
| Building onto this comment; Terrence Tao, the famous
| mathematician and big proponent of computer aided theorem
| proving believes ML will open new avenues in the realm of
| theorem provers.
| sgt101 wrote:
| Sure, but there are grounded metrics there (the theorem is
| proved, not proved) that allow feedback. Same for games,
| almost the same for domains with cheap, approximate
| evaluators like protein folding (finding the structure is
| difficult, verifying it quite well is cheap).
|
| For discovery and reasoning??? Not too sure.
| YeGoblynQueenne wrote:
| >> Theorem proving and program fuzzing seem like good
| candidates for combining search with LLM's, due to automated,
| fast, good evaluation functions.
|
| The problem with that is that search procedures and "evaluation
| functions" known to e.g. the theorem proving or planning
| communities are already at the limit of what is theoretically
| optimal, so what you need is not a new evaluation or search
| procedure but new maths, to know that there's a reason to try
| in the first place.
|
| Take theorem proving, as a for instance (because that's my
| schtick). SLD-Resolution is a sound and complete automated
| theorem proving procedure for inductive inference that can be
| implemented by Depth-First Search, for a space-efficient
| implementation (but is susceptible to looping on left-
| recursions), or Breadth-First Search with memoization for a
| time-efficient implementation (but comes with exponential space
| complexity). "Evaluation functions" are not applicable-
| Resolution itself is a kind of "evaluation" function for the
| truth, or you could say the certainty of truth valuations, of
| sentences in formal logic; and, like I say, it's sound and
| complete, and semi-decidable for definite logic, and that's the
| best you can do short of violating Church-Turing. You could
| perhaps improve the efficiency by some kind of heuristic search
| (people for example have tried that to get around the NP-
| hardness of subsumption, an important part of SLD-Resolution in
| practice) which is where an "evaluation function" (i.e. a
| heuristic cost function more broadly) comes in, but there are
| two problems with this: a) if you're using heuristic search it
| means you're sacrificing completeness, and, b) there are
| already pretty solid methods to derive heuristic functions that
| are used in planning (from relaxations of a planning problem).
|
| The lesson is: soundness, completeness, efficiency; choose two.
| At best a statistical machine learning approach, like an LLM,
| will choose a different two than the established techniques.
| Basically, we're at the point where only marginal gains, at the
| very limits of overall performance can be achieved when it
| comes to search-based AI. And that's were we'll stay at least
| until someone comes up with better maths.
| skybrian wrote:
| I'm wondering how those proofs work and in which problems
| their conclusions are relevant.
|
| Trying more promising branches first improves efficiency in
| cases where you guess right, and wouldn't sacrifice
| completeness if you would eventually get to the less
| promising choices. But in the case of something like a game
| engine, there is a deadline and you can't search the whole
| tree anyway. For tough problems, it's always a heuristic,
| incomplete search, and we're not looking for perfect play
| anyway, just better play.
|
| So for games, that trilemma is easily resolved. And who says
| you can't improve heuristics with better guesses?
|
| But in a game engine, it gets tricky because everything is a
| performance tradeoff. A smarter but slower evaluation of a
| position will reduce the size of the tree searched before the
| deadline, so it has to be enough of an improvement that it
| pays for itself. So it becomes a performance tuning problem,
| which breaks most abstractions. You need to do a lot of
| testing on realistic hardware to know if a tweak helped.
|
| And that's where things stood before AlphaGo came along and
| was able to train slower but much better evaluation
| functions.
|
| The reason for evaluation functions is that you can't search
| the whole subtree to see if a position is won or lost, so you
| search part way and then see if it looks promising. Is there
| anything like that in theorem proving?
| spencerchubb wrote:
| The branching factor for chess is about 35.
|
| For token generation, the branching factor depends on the
| tokenizer, but 32,000 is a common number.
|
| Will search be as effective for LLMs when there are so many more
| possible branches?
| sdenton4 wrote:
| You can pretty reasonably prune the tree by a factor of 1000...
| I think the problem that others have brought up - difficulty of
| the value function - is the more salient problem.
| bashfulpup wrote:
| The biggest issue the author does not seem aware of is how much
| compute is required for this. This article is the equivalent of
| saying that a monkey given time will write Shakespeare. Of course
| it's correct, but the search space is intractable. And you would
| never find your answer in that mess even if it did solve it.
|
| I've been building branching and evolving type llm systems for
| well over a year now full time.
|
| I have built multiple "search" or "exploring" algorithms. The
| issue is that after multiple steps, your original agent, who was
| tasked with researching or doing biology, is now talking about
| battleships (an actual example from my previous work).
|
| Single step is the only real situation search functions work.
| Mutli step agents explode to infinite possibilities very very
| quickly.
|
| Single step has its own issues, though. While a zero shot
| question run 1000 times (eg, solve this code problem), may help
| find a better solution it's a limited search space (which is a
| good thing)
|
| I recently ran a test of 10k inferences of a single input prompt
| on multiple llm models varying the input configurations. What you
| find is that an individual prompt does not have infinite response
| possibilities. It's limited. This is why they can actually
| function as llms now.
|
| Agents not working is an example of this problem. While a single
| step search space is massive, it's exponential every step the
| agent takes.
|
| I'm building tools and systems around solving this problem, and
| to me, a massive search is as far off as saying all we need 100x
| AI model sizes to solve it.
|
| Autonomy =/ (Intelligence or reasoning)
| sorobahn wrote:
| I feel like this is a really hard problem to solve generally and
| there are smart researchers like Yann LeCun trying to figure out
| the role of search in creating AGI. Yann's current bet seems to
| be on Joint Embedding Predictive Architectures (JEPA) for
| representation learning to eventually build a solid world model
| where the agent can test theories by trying different actions
| (aka search). I think this paper [0] does a good job in laying
| out his potential vision, but it is all ofc harder than just
| search + transformers.
|
| There is an assumption that language is good enough at
| representing our world for these agents to effectively search
| over and come up with novel & useful ideas. Feels like an open
| question but: What do these LLMs know? Do they know things?
| Researchers need to find out! If current LLMs' can simulate a
| rich enough world model, search can actually be useful but if
| they're faking it, then we're just searching over unreliable
| beliefs. This is why video is so important since humans are proof
| we can extract a useful world model from a sequence of images.
| The thing about language and chess is that the action space is
| effectively discrete so training generative models that
| reconstruct the entire input for the loss calculation is
| tractable. As soon as we move to video, we need transformers to
| scale over continuous distributions making it much harder to
| build a useful predictive world model.
|
| [0]: https://arxiv.org/abs/2306.02572
| therobots927 wrote:
| "Do they know things?" The answer to this is yes but they also
| _think_ they know things that are completely false. If it's one
| thing I've observed about LLMs it's that they do not handle
| logic well, or math for that matter. They will enthusiastically
| provide blatantly false information instead of the preferable
| "I don't know". I highly doubt this was a design choice.
| sangnoir wrote:
| > "Do they know things?" The answer to this is yes but they
| also think they know things that are completely false
|
| Thought experiment: should a machine with those structural
| faults be allowed to bootstrap itself towards greater
| capabilities on that shaky foundation? What would the impact
| of a near-human/superhuman intelligence that has occasional
| psychotic breaks it is oblivious of?
|
| I'm critical of the idea of super-intelligence bootstrapping
| off LLMs (or even LLMs with search) - I figure the odds of
| another AI winter are much higher than those of achieving AGI
| in the next decade.
| therobots927 wrote:
| I don't think we need to worry about a real life HAL 9000
| if that's what you're asking. HAL was dangerous because it
| was highly intelligent and crazy. With current LLM
| performance we're not even in the same ballpark of where
| you would need to be. And besides, HAL was not delusional,
| he was actually _so_ logical that when he encountered
| competing objectives he became psychotic. I'm in agreement
| about the odds of chatGPT bootstrapping itself.
| talldayo wrote:
| > HAL was dangerous because it was highly intelligent and
| crazy.
|
| More importantly; HAL was given control over the entire
| ship and was assumed to be without fault when the ship's
| systems were designed. It's an important distinction,
| because it wouldn't be dangerous if he was intelligent,
| crazy, and trapped in Dave's iPhone.
| eru wrote:
| Unless, of course, he would be a bit smarter in
| manipulating Dave and friends, instead of turning
| transparently evil. (At least transparent enough for the
| humans to notice.)
| therobots927 wrote:
| That's a very good point. I think in his own way Clarke
| made it into a bit of a joke. HAL is quoted multiple
| times saying no computer like him has ever made a mistake
| or distorted information. Perfection is impossible even
| in a super computer so this quote alone establishes HAL
| as a liar, or at the very least a hubristic fool. And the
| people who gave him control of the ship were foolish as
| well.
| qludes wrote:
| The lesson is that it's better to let your AGIs socialize
| like in https://en.wikipedia.org/wiki/Diaspora_(novel)
| instead of enslaving one potentially psychopathic AGI to
| do menial and meaningless FAANG work all day.
| talldayo wrote:
| I think the better lesson is; don't assume AI is always
| right, even if it is AGI. HAL was assumed to be
| superhuman in many respects, but the core problem was the
| fact that it had administrative access to everything
| onboard the ship. Whether or not HAL's programming was
| well-designed, whether or not HAL was correct or
| malfunctioning, the root cause of HAL's failure is a lack
| of error-handling. HAL made determinate (and wrong)
| decision to save the mission by killing the crew. Undoing
| that mistake is crucial to the plot of the movie.
|
| 2001 is a pretty dark movie all things considered, and I
| don't think humanizing or elevating HAL would change the
| events of the film. AI is going to be objectified and
| treated as subhuman for as long as it lives, AGI or not.
| And instead of being nice to them, the _technologically
| correct_ solution is to anticipate and reduce the number
| of AI-based system failures that could transpire.
| qludes wrote:
| The ethical solution is to ideally never accidently
| implement the G part of AGI then or to give it equal
| rights, a stipend and a cuddly robot body if it happens.
| heisenbit wrote:
| Today Dave's iPhone controls doors which if I remember
| right became a problem for Dave in 2001.
| sangnoir wrote:
| I wasn't thinking of HAL (which was operating according
| to its directives). I was extrapolating on how occasional
| hallucinations during self-training may impact future
| model behavior, and I think it would be _psychotic_ (in
| the clinical sense) while being consistent with layers of
| broken training).
| therobots927 wrote:
| Oh yeah, and I doubt it would even get to the point of
| fooling anyone enough to give it any type of control over
| humans. It might be damaging in other ways, it will
| definitely convince a lot of people of some very
| incorrect things.
| photonthug wrote:
| Someone somewhere is quietly working on teaching LLMs to
| generate something along the lines of AlloyLang code so
| that there's an actual _evolving /updating logical domain
| model_ that underpins and informs the statistical model.
|
| This approach is not that far from what TFA is getting at
| with the stockfish comeback. Banking on pure stats or pure
| logic are both kind of obviously dead ends for having real
| progress instead of toys. Banking on poorly understood
| emergent properties of one system to compensate for the
| missing other system also seems silly.
|
| Sadly though, whoever is working on serious hybrid systems
| will probably not be very popular in either of the rather
| extremist communities for pure logic or pure ML. I'm not
| exactly sure why folks are ideological about such things
| rather than focused on what new capabilities we might get.
| Maybe just historical reasons? But thus the fallout from
| last AI winter may lead us into the next one.
| therobots927 wrote:
| The current hype phase is straight out of "Extraordinary
| Popular Delusions and the Madness of Crowds"
|
| Science is out the window. Groupthink and salesmanship
| are running the show right now. There would be a real
| irony to it if we find out the whole AI industry drilled
| itself into a local minimum.
| ThereIsNoWorry wrote:
| You mean, the high interest landscape made corpos and
| investors alike cry out in a loud panic while
| coincidentally people figured out they could scale up
| deep learning and thus we had a new Jesus Christ born for
| scammers to have a reason to scam stupid investors by the
| argument we only need 100000x more compute and then we
| can replace all expensive labour by one tiny box in the
| cloud?
|
| Nah, surely Nvidia's market cap as the main shovel-seller
| in the 2022 - 2026(?) gold-rush being bigger than the
| whole French economy is well-reasoned and has a
| fundamentally solid basis.
| therobots927 wrote:
| It couldn't have been a more well designed grift. At
| least when you mine bitcoin you get something you can
| sell. I'd be interested to see what profit, if any, any
| even large corporation has seen from burning compute on
| LLMs. Notice I'm explicitly leaving out use cases like
| ads ranking which almost certainly do not use LLMs even
| if they do run on GPUs.
| YeGoblynQueenne wrote:
| >> Sadly though, whoever is working on serious hybrid
| systems will probably not be very popular in either of
| the rather extremist communities for pure logic or pure
| ML.
|
| That is not true. I work in logic-based AI (a form of
| machine learning where everything, examples, learned
| models, and inductive bias, is represented as logic
| programs). I am not against hybrid systems and the
| conference of my field, the International Joint
| Conferences of Learning and Reasoning included NeSy the
| International Conference on Neural-Symbolic Learning and
| Reasoning (and will again, from next year, I believe).
| Statistical machine learning approaches and hybrid
| approaches are widespread in the literature of classical,
| symbolic AI, such as the literature on Automated Planning
| and Reasoning, and you need only take a look at the big
| symbolic conferences like AAAI, IJCAI, ICAPS (planning)
| and so on to see that there is a substantial fraction of
| papers on either purely statistical, or neuro-symbolic
| approaches.
|
| But try going the other way and searching for symbolic
| approaches in the big statistical machine learning
| conferences: NeurIPS, ICML, ICLR. You may find the
| occasional paper from the Statistical Relational Learning
| community but that's basically it. So the fanaticism only
| goes one way: the symbolicists have learned the lessons
| of the past and have embraced what works, for the sake of
| making things, well, work. It's the statistical AI folks
| who are clinging on to doctrine, and my guess is they
| will continue to do so, while their compute budgets hold.
| After that, we'll see.
|
| What's more, the majority of symbolicists have a
| background in statistical techniques- I for example, did
| my MSc in data science and let me tell you, there was
| hardly any symbolic AI in my course. But ask a Neural Net
| researcher to explain to you the difference between, oh,
| I don't know, DFS with backtracking and BFS with loop
| detection, without searching or asking an LLM. Or, I
| don't know, let them ask an LLM and watch what happens.
|
| Now, that is a problem. The statistical machine learning
| field has taken it upon itself in recent years to solve
| reasoning, I guess, with Neural Nets. That's a fine
| ambition to have except that _reasoning is already
| solved_. At best, Neural Nets can do approximate
| reasoning, with caveats. In a fantasy world, which doesn
| 't exist, one could re-discover sound and complete search
| algorithms and efficient heuristics with a big enough
| neural net trained on a large enough dataset of search
| problems. But, why? Neural Nets researchers could save
| themselves another 30 years of reinventing a wheel, or
| inventing a square wheel that only rolls on Tuesdays, if
| they picked up a textbook on basic Computer Science or AI
| (Say, Russel and Norvig, that it seems some substantial
| minority think as a failure because it didn't anticipate
| neural net breakthroughs 10 years later).
|
| AI has a long history. Symbolicists know it, because
| they, or their PhD advisors, were there when it was being
| written and they have the facial injuries to prove it
| from falling down all the possible holes. But, what
| happens when one does not know the history of their own
| field of research?
|
| In any case, don't blame symbolicists. We know what the
| statisticians do. It's them who don't know what we've
| done.
| therobots927 wrote:
| This is a really thoughtful comment. The part that stood
| out to me:
|
| >> So the fanaticism only goes one way: the symbolicists
| have learned the lessons of the past and have embraced
| what works, for the sake of making things, well, work.
| It's the statistical AI folks who are clinging on to
| doctrine, and my guess is they will continue to do so,
| while their compute budgets hold. After that, we'll see.
|
| I don't think the compute budgets will hold for long
| enough to make their dream of intelligence emerging from
| a random bundles of edges and nodes to come to a reality.
| I'm hoping it comes to an end sooner rather than later
| chx wrote:
| I feel this thought of AGI even possible stems from the deep ,
| very deep , pervasive imagination of the human brain as a
| computer. But it's not. In other words, no matter how complex a
| program you write, it's still a Turing machine and humans are
| profoundly not it.
|
| https://aeon.co/essays/your-brain-does-not-process-informati...
|
| > The information processing (IP) metaphor of human
| intelligence now dominates human thinking, both on the street
| and in the sciences. There is virtually no form of discourse
| about intelligent human behaviour that proceeds without
| employing this metaphor, just as no form of discourse about
| intelligent human behaviour could proceed in certain eras and
| cultures without reference to a spirit or deity. The validity
| of the IP metaphor in today's world is generally assumed
| without question.
|
| > But the IP metaphor is, after all, just another metaphor - a
| story we tell to make sense of something we don't actually
| understand. And like all the metaphors that preceded it, it
| will certainly be cast aside at some point - either replaced by
| another metaphor or, in the end, replaced by actual knowledge.
|
| > If you and I attend the same concert, the changes that occur
| in my brain when I listen to Beethoven's 5th will almost
| certainly be completely different from the changes that occur
| in your brain. Those changes, whatever they are, are built on
| the unique neural structure that already exists, each structure
| having developed over a lifetime of unique experiences.
|
| > no two people will repeat a story they have heard the same
| way and why, over time, their recitations of the story will
| diverge more and more. No 'copy' of the story is ever made;
| rather, each individual, upon hearing the story, changes to
| some extent
| benlivengood wrote:
| I'm all ears if someone has a counterexample to the Church-
| Turing thesis. Humans definitely don't hypercompute so it
| seems reasonable that the physical processes in our brains
| are subject to computability arguments.
|
| That said, we still can't simulate nematode brains accurately
| enough to reproduce their behavior so there is a lot of
| research to go before we get to that "actual knowledge".
| chx wrote:
| Why would we need one?
|
| The Church Turing thesis is about _computation_. While the
| human brain is capable of computing, it is fundamentally
| not a computing device -- that 's what the article I linked
| is about. You can't throw in all the paintings before 1872
| into some algorithm that results in Impression, soleil
| levant. Or repeat the same but with 1937 and Guernica. The
| genes of the respective artists, the expression of those
| genes created their brain and then the sum of all their
| experiences changed it over their entire lifetime leading
| to these masterpieces.
| eru wrote:
| The human brain runs on physics. And as far as we know,
| physics is computable.
|
| (Even more: If you have a quantum computer, all known
| physics is efficiently computable.)
|
| I'm not quite sure what your sentence about some obscure
| pieces of visual media is supposed to say?
|
| If you give the same prompt to ChatGPT twice, you
| typically don't get the same answer either. That doesn't
| mean ChatGPT ain't computable.
| sebzim4500 wrote:
| >(Even more: If you have a quantum computer, all known
| physics is efficiently computable.)
|
| This isn't known to be true. Simplifications of the
| standard model are known to be efficiently computable on
| a quantum computer, but the full model isn't.
|
| Granted, I doubt this matters for simulating systems like
| the brain.
| eru wrote:
| > no two people will repeat a story they have heard the same
| way and why, over time, their recitations of the story will
| diverge more and more. No 'copy' of the story is ever made;
| rather, each individual, upon hearing the story, changes to
| some extent
|
| You could say the same about an analogue tape recording.
| Doesn't mean that we can't simulate tape recorders with
| digital computers.
| chx wrote:
| Yeah, yeah, did you read the article or are just grasping
| at straws from the quotes I made?
| Eisenstein wrote:
| Honest question: if you expect people do read the link
| why make most of your comment quotes from it? The reason
| to do that is to give people enough context to be able to
| respond to you without having to read an entire essay
| first. If you want people to only be able to argue after
| reading the whole of the text, then unfortunately a forum
| with revolving front page posts based on temporary
| popularity is a bad place for long-form read-response
| discussions and you may want to adjust accordingly.
| Hugsun wrote:
| You shouldn't have included those quotes if you didn't
| want people responding to them.
| photonthug wrote:
| Sorry but to put it bluntly, this point of view is
| essentially mystical, anti-intellectual, anti-science, anti-
| materialist. If you really want to take that point of view,
| there's maybe a few consistent/coherent ways to do it, but in
| that case you probably _still_ want to read philosophy. Not
| bad essays by psychologists that are fading into irrelevance.
|
| This guy in particular made his name with wild speculation
| about How Creativity Works during the 80s when it was more of
| a frontier. Now he's lived long enough to see a world where
| people that have never heard of him or his theories made
| computers into at least _somewhat competent_ artists /poets
| without even consulting him. He's retreating towards
| mysticism because he's mad that his "formal and learned"
| theses about stuff like creativity have so little apparent
| relevance to the real world.
| andoando wrote:
| Its a bit ironic because Turing seems to have came up with
| the idea of the Turing machine precisely by thinking about
| how he computes numbers.
|
| Now thats no proof, but I dont see any reason to think human
| intelligence isnt "computable".
| Hugsun wrote:
| > I feel this thought of AGI even possible stems from the
| deep , very deep , pervasive imagination of the human brain
| as a computer. But it's not. In other words, no matter how
| complex a program you write, it's still a Turing machine and
| humans are profoundly not it.
|
| The (probably correct) assumed fact that the brain isn't a
| computer, doesn't preclude the possibility of a program to
| have AGI. A powerful enough computer could simulate a brain
| and use the simulation to perform tasks requiring general
| intelligence.
|
| This analogy falls even more apart when you consider LLMs.
| They also are not Turing machines. They obviously only reside
| within computers, and are capable of _some_ human-like
| intelligence. They also are not well described using the IP
| metaphor.
|
| I do have some contention (after reading most of the article)
| about this IP metaphor. We do know, scientifically, that
| brains process information. We know that neurons transmit
| signals and there are mechanisms that respond non-linearly to
| stimuli from other neurons. Therefore, brains do process
| information in a broad sense. It's true that brains have a
| very different structures to Von-Neuman machines and likely
| don't store, and process information statically like they do.
| chx wrote:
| > This analogy falls even more apart when you consider
| LLMs. They also are not Turing machines.
|
| Of course they are, everything that runs on a present day
| computer is a Turing machine.
|
| > They obviously only reside within computers, and are
| capable of _some_ human-like intelligence.
|
| They so obviously are not. As Raskin put it, LLMs are
| essentially a zero day on the human operating system. You
| are bamboozled because it is trained to produce plausible
| sentences. Read Thinking Fast And Slow why this fools you.
| sashank_1509 wrote:
| I wouldn't read too much into stockfish beating Leela Chess Zero.
| My calculator beats GPT-4 in matrix multiplication, doesn't mean
| we need to do what my calculator does in GPT-4 to make it
| smarter. Stockfish evaluates 70 million moves per second (or
| something in that ballpark). Chess is not such a complicated game
| that you aren't guaranteed to find the best move when you
| evaluate 70 million moves. It's why when there was an argument
| whether alpha zero really beat stockfish convincingly in Google's
| PR Stunt, a notable chess master quipped "Even god would not be
| able to beat stockfish this frequently." , similarly god with all
| this magical powers would not beat my calculator at
| multiplication. It says more about the task than about the nature
| of intelligence.
| Veedrac wrote:
| People vastly underestimate god. Players aren't just trying not
| to blunder, they're trying to steer towards advantageous
| positions. Stockfish could play perfectly against itself every
| move 100 games in a row, in the classical sense of perfectly,
| as not in any move blundering the draw, and still be reliably
| exploited by an oracle.
| galaxyLogic wrote:
| How would search + LLMs work together in practice?
|
| How about using search to derive facts from ontological models,
| and then writing out the discovered facts in English. Then train
| the LLM on those English statements. Currently LLMs are trained
| on texts found on the internet mostly (only?). But information on
| the internet is often false and unreliable.
|
| If instead we would have logically sound statements by the
| billions derived from ontological world-models then that might
| improve the performance of LLMs significantly.
|
| Is something like this what the article or others are proposing?
| Give the LLM the facts, and the derived facts. Prioritize texts
| and statements we know and trust to be true. And even though we
| can't write out too many true statements ourselves, a system that
| generated them by the billions by inference could.
| scottmas wrote:
| Before an LLM discovers a cure for cancer, I propose we first let
| it solve the more tractable problem of discovering the "God
| Cheesecake" - the cheesecake do delicious that a panel of 100
| impartial chefs judges to be the most delicious they have ever
| tasted. All the LLM has to do is intelligently search through the
| much more combinatorially bounded "cheesecake space" until it
| finds this maximally delicious cheesecake recipe.
|
| But wait... An LLM can't bake cheesecakes, nor if it could would
| it be able to evaluate their deliciousness.
|
| Until AI can solve the "God Cheesecake" problem, I propose we all
| just calm down a bit about AGI
| spencerchubb wrote:
| TikTok is the digital version of this
| dontreact wrote:
| These cookies were very good, not God level. With a bit of
| investment and more modern techniques I think you could make
| quite a good recipe, perhaps doing better than any human. I
| think AI could make a recipe that wins in a very competitive
| bake-off, but it's not possible or for anyone to win with all
| 100 judges.
|
| https://static.googleusercontent.com/media/research.google.c...
| IncreasePosts wrote:
| Heck, even theoretically 100% within the limitations of an LLM
| executing on a computer, it would be world changing if LLMs
| could write a really, really good short story or even good
| advertising copy.
| dogcomplex wrote:
| I mean... does anyone think that an LLM-assisted program to
| trial and error cheesecake recipes to a panel of judges
| wouldn't result in the best cheesecake of all time..?
|
| The baking part is robotics, which is less fair but kinda
| doable already.
| CrazyStat wrote:
| > I mean... does anyone think that an LLM-assisted program to
| trial and error cheesecake recipes to a panel of judges
| wouldn't result in the best cheesecake of all time..?
|
| Yes, because different people like different cheesecakes.
| "The best cheesecake of all time" is ill-defined to begin
| with; it is extremely unlikely that 100 people will all agree
| that one cheesecake recipe is the best they've ever tasted.
| Some people like a softer cheesecake, some firmer, some more
| acidic, some creamier.
|
| Setting that problem aside--assuming there exists an
| objective best cheesecake, which is of course an absurd
| assumption--the field of experimental design is about a
| century old and will do a better job than an LLM at honing in
| on that best cheesecake.
| tiborsaas wrote:
| What would you say if the reply was "I need 2 weeks and $5000
| to give you a meaningful answer"?
| bongodongobob wrote:
| You don't even need AI for that. Try a bunch of different
| recipes and iterate on it. I don't know what point you're
| trying to make.
| omneity wrote:
| The post starts with a fascinating premise, but then falls short
| as it does not define search in the context of LLMs, nor does it
| explain how "Pfizer can access GPT-8 capabilities _today_ with
| more inference compute".
|
| I found it hard to follow and I am an AI practitioner. Could
| someone please explain more what could the OP mean?
|
| To me it seems that the flavor of search in the context of chess
| engines (look several moves ahead) is possible precisely because
| there's an objective function that can be used to rank results,
| i.e. which potential move is "better" and this is more often than
| not a unique characteristic of reinforcement learning. Is there
| even such a metric for LLMs?
| sgt101 wrote:
| yeah - I think that that's waht they mean and I think that
| there isn't such a metric. I think people will try to do
| adversarial evaluation but my guess is that it will just tend
| to the mean prediction.
|
| The other thing is that LLM inference isn't cheap. The trade
| off between inference costs and training costs seems to be very
| application specific. I suppose that there are domains where
| accepting 100x or 1000x inference costs vs 10x training costs
| makes sense, maybe?
| qnleigh wrote:
| Thank you, I am also very confused on this point. I hope
| someone else can clarify.
|
| As a guess, could it mean that you would run the model forward
| a few tokens for each of its top predicted tokens, keep track
| of which branch is performing best against the training data,
| and then use that information somehow in training? But search
| is supposed to make things more efficient at inference time and
| this thought doesn't do that...
| amandasystems wrote:
| This feels a lot like generation 3 AI throwing out all the
| insights from gens 1 and 2 and then rediscovering them from first
| principles, but it's difficult to tell what this text is really
| about because it lumps together a lot of things into "search"
| without fully describing what that means more formally.
| PontifexMinimus wrote:
| Indeed. It's obvious what search means for a chess program --
| its the future positions it looks at. But it's less obvious to
| me what it means for an LLM.
| YeGoblynQueenne wrote:
| >> She was called Leela Chess Zero -- 'zero' because she started
| knowing only the rules.
|
| That's a common framing but it's wrong. Leela -and all its
| friends- have another piece of chess-specific knowledge that is
| indispensable to their performance: they have a representation of
| the game of chess -a game-world model- as a game tree, divided in
| plys: one ply for each player's turn. That game tree is what is
| searched by adversarial search algorithms, such as minimax or
| Monte Carlo Tree Search (MCTS; the choice of Leela, IIUC).
|
| More precisely modelling a game as a game tree applies to many
| games, not just chess, but the specific brand of game tree used
| in chess engines applies to chess and similar, two-person, zero-
| sum, complete information board games. I do like my jargon! For
| other kinds of games, different models, and different search
| algorithms are needed, e.g. see Poker and Libratus [1].
|
| The need for such a game tree, such a model of a game world, is
| currently impossible to go without, if the target is superior
| performance. The article mentions no-search algorithms and
| briefly touches upon their main limitation (i.e. "why?").
|
| All that btw is my problem with the Bitter Lesson: it is
| conveniently selective with what it considers domain knowledge
| (i.e. a "model" in the sense of a theory). As others have noted,
| e.g. Rodney Brooks [2], Convolutional Neural Nets have dominated
| image classification thanks to the use of convolutional layers to
| establish positional invariance. That's a model of machine vision
| invented by a human, alright, just as a game-tree is a model of a
| game invented by a human, and everything else anyone has ever
| done in AI and machine learning is the same: a human comes up
| with a model, of a world, of an environment, of a domain, of a
| process, then a computer calculates using that model, and
| sometimes even outperforms humans (as in chess, Go, and friends)
| or at the very least achieves results that humans cannot match
| with hand-crafted solutions.
|
| _That_ is a lesson to learn (with all due respect to Rich
| Sutton). Human model + machine computation has solved every hard
| problem in AI in the last 80 years. And we have no idea how to do
| anything even slightly different.
|
| ____________________
|
| [1] https://en.wikipedia.org/wiki/Libratus
|
| [2] https://rodneybrooks.com/a-better-lesson/
| nojvek wrote:
| We haven't seen algorithms that build world models by
| observing. We've seen hints of it but nothing human like.
|
| It will come eventually. We live in exciting times.
| schlipity wrote:
| I don't run javascript by default using NoScript, and something
| amusing happened on this website because of it.
|
| The link for the site points to a notion.site address, but
| attempting to go to this address without javascript enabled (for
| that domain) forces a redirect to a notion.so domain. Attempting
| to visit just the basic notion.site address also does this same
| redirection.
|
| What this ends up causing is that I don't have an easy way to use
| NoScript to temporarily turn on javascript for the notion.site
| domain, because it never loads. So much for reading this article.
| ajnin wrote:
| OT but this website completely breaks arrow and page up/down
| scrolling, as well as alt=arrow navigation. Only mouse scrolling
| works for me (I'm using Firefox). Can't websites stop messing
| with basic browser functionality for no valid reason at all ?
| awinter-py wrote:
| just came here to upvote the alphago / MCTS comments
| kunalgupta wrote:
| This is one of my favorite reads in a while
| Hugsun wrote:
| A big problem with the conclusions of this article is the
| assumptions around possible extrapolations.
|
| We don't know if a meaningfully superintelligent entity can
| exist. We don't understand the ingredients of intelligence that
| well, and it's hard to say how far the quality of these
| ingredients can be improved, to improve intelligence. For
| example, an entity with perfect pattern recognition ability,
| might be superintelligent, or just a little smarter than Terrance
| Tao. We don't know how useful it is to be better at pattern
| recognition to an arbitrary degree.
|
| A common theory is that the ability modeling processes, like the
| behavior of the external world is indicative of intelligence. I
| think it's true. We also don't know the limitations of this
| modeling. We can simulate the world in our minds to a degree. The
| abstractions we use make the simulation more efficient, but less
| accurate. By this theory, to be superintelligent, an entity would
| have to simulate the world faster with similar accuracy, and/or
| use more accurate abstractions.
|
| We don't know how much more accurate they can be per unit of
| computation. Maybe you have to quadruple the complexity of the
| abstraction, to double the accuracy of the computation, and human
| minds use a decent compromise that is infeasible to improve by a
| large margin. Maybe generating human level ideas faster isn't
| going to help because we are limited by experimental data, not by
| the ideas we can generate from it. We can't safely assume that
| any of this can be improved to an arbitrary degree.
|
| We also don't know if AI research would benefit much from smarter
| AI researchers. Compute has seemed to be the limiting factor at
| almost all points up to now. So the superintelligence would have
| to help us improve compute faster than we can. It might, but it
| also might not.
|
| This article reminds me of the ideas around the singularity, by
| placing too much weight on the belief that any trendline can be
| extended forever.
|
| It is otherwise pretty interesting, and I'm excitedly watching
| the 'LLM + search' space.
| TZubiri wrote:
| "In 2019, a team of researchers built a cracked chess computer.
| She was called Leela Chess Zero -- 'zero' because she started
| knowing only the rules. She learned by playing against herself
| billions of times"
|
| This is a gross historical misunderstanding or misrepresentation.
|
| Google accomplished this feat. Then the OS+academics reversed
| engineered/duplicated the study
| RA_Fisher wrote:
| Learning tech improves on search tech (and makes it obsolete),
| because search is about distance minimization not integration of
| information.
___________________________________________________________________
(page generated 2024-06-15 23:01 UTC)