[HN Gopher] AI Search: The Bitter-Er Lesson
       ___________________________________________________________________
        
       AI Search: The Bitter-Er Lesson
        
       Author : dwighttk
       Score  : 292 points
       Date   : 2024-06-14 18:47 UTC (1 days ago)
        
 (HTM) web link (yellow-apartment-148.notion.site)
 (TXT) w3m dump (yellow-apartment-148.notion.site)
        
       | johnthewise wrote:
       | What happened to all the chatter about Q*? I remember reading
       | about this train/test time trade-off back then, does anyone have
       | good list of recent papers/blogs about this? What is holding back
       | this or openai is just running some model 10x longer to estimate
       | what they would get if they trained with 10x?
       | 
       | This tweet is relevant:
       | https://x.com/polynoamial/status/1676971503261454340
        
       | Kronopath wrote:
       | Anything that allows AI to scale to superinteligence quicker is
       | going to run into AI alignment issues, since we don't really know
       | a foolproof way of controlling AI. With the AI of today, this
       | isn't too bad (the worst you get is stuff like AI confidently
       | making up fake facts), but with a superintelligence this could be
       | disastrous.
       | 
       | It's very irresponsible for this article to advocate and provide
       | a pathway to immediate superintelligence (regardless of whether
       | or not it actually works) without even discussing the question of
       | how you figure out what you're searching _for_ , and how you'll
       | prevent that superintelligence from being evil.
        
         | nullc wrote:
         | I don't think your response is appropriate. Narrow domain
         | "superintelligence" is around us everywhere-- every PID
         | controller can drive a process to its target far beyond any
         | human capability.
         | 
         | The obvious way to incorporate good search is to have extremely
         | fast models that are being used in the search interior loop.
         | Such models would be inherently less general, and likely
         | trained on the specific problem or at least domain-- just for
         | performance sake. The lesson in this article was that a tiny
         | superspecialized model inside a powerful transitional search
         | framework significantly outperformed a much larger more general
         | model.
         | 
         | Use of explicit external search should make the optimization
         | system's behavior and objective more transparent and tractable
         | than just sampling the output of an auto-regressive model
         | alone. If nothing else you can at least look at the branches it
         | did and didn't explore. It's also a design that's more easy to
         | bolt in varrious kinds of regularizes, code to steer it away
         | from parts of the search space you don't want it operating in.
         | 
         | The irony of all the AI scaremongering is that if there is ever
         | some evil AI with some LLM as an important part of its
         | reasoning process if it is evil it may well be so because being
         | evil is a big part of the narrative it was trained on. :D
        
         | coldtea wrote:
         | Of course "superintelligence" is just a mythical creature at
         | the moment, with no known path to get there, or even a specific
         | proof of what it even means - usually it's some hand waving
         | about capabilities that sound magical, when IQ might very well
         | be subject to diminishing returns.
        
           | drdeca wrote:
           | Do you mean no way to get there within realistic computation
           | bounds? Because if we allow for arbitrarily high (but still
           | finite) amounts of compute, then some computable
           | approximation of AIXI should work fine.
        
             | coldtea wrote:
             | > _Do you mean no way to get there within realistic
             | computation bounds?_
             | 
             | I mean there's no well defined "there" either.
             | 
             | It's a hand-waved notion that adding more intelligence
             | (itself not very well defined, but let's use IQ) you get to
             | something called "hyperintelligence", say IQ 1000 or IQ
             | 10000, that has what can be described as magical powers,
             | like it can convince any person to do anything, can invent
             | things at will, huge business success, market prediction,
             | and so on.
             | 
             | Whether intelligence is cummulative like that, or whether
             | having it gets you those powers (aside from the succesful
             | high IQ people, we know many people with IQ 145+ that are
             | not inventing stuff left and right, or convincing people
             | with some greater charisma than the average IQ 100 or 120
             | politician, but e.g. are just sad MENSA losers, whose
             | greatest achievement is their test scores).
             | 
             | > _Because if we allow for arbitrarily high (but still
             | finite) amounts of compute, then some computable
             | approximation of AIXI should work fine._
             | 
             | I doubt that too. The limit for LLMs for example is more
             | human produced training data (a hard limit) than compute.
        
               | drdeca wrote:
               | > itself not very well defined, but let's use IQ
               | 
               | IQ has an issue that is inessential to the task at hand,
               | which is how it is based on a population distribution. It
               | doesn't make sense for large values (unless there is a
               | really large population satisfying properties that aren't
               | satisfied).
               | 
               | > I doubt that too. The limit for LLMs for example is
               | more human produced training data (a hard limit) than
               | compute.
               | 
               | Are you familiar with what AIXI is?
               | 
               | When I said "arbitrarily large", it wasn't for laziness
               | reasons that I didn't give an amount that is plausibly
               | achievable. AIXI is kind of goofy. The full version of
               | AIXI is uncomputable (it uses a halting oracle), which is
               | why I referred to the computable approximations to it.
               | 
               | AIXI doesn't exactly need you to give it a training set,
               | just put it in an environment where you give it a way to
               | select actions, and give it a sensory input signal, and a
               | reward signal.
               | 
               | Then, assuming that the environment it is in is
               | computable (which, recall, AIXI itself is not), its long-
               | run behavior will maximize the expected (time discounted)
               | future reward signal.
               | 
               | There's a sense in which it is asymptotically optimal
               | across computable environments (... though some have
               | argued that this sense relies on a distribution over
               | environments based on the enumeration of computable
               | functions, and that this might make this property kinda
               | trivial. Still, I'm fairly confident that it would be
               | quite effective. I think this triviality issue is mostly
               | a difficulty of having the right definition.)
               | 
               | (Though, if it was possible to implement practically, you
               | would want to make darn sure that the most effective way
               | for it to make its reward signal high would be for it to
               | do good things and not either bad things or to crack open
               | whatever system is setting the reward signal in order for
               | it to set it itself.)
               | 
               | (How it works: AIXI basically enumerates through all
               | possible computable environments, assigning initial
               | probability to each according to the length of the
               | program, and updating the probabilities based on the
               | probability of that environment providing it with the
               | sequence of perceptions and reward signals it has
               | received so far when the agent takes the sequence of
               | actions it has taken so far. It evaluates the expected
               | values of discounted future reward of different
               | combinations of future actions based on its current
               | assigned probability of each of the environments under
               | consideration, and selects its next action to maximize
               | this. I think the maximum length of programs that it
               | considers as possible environments increases over time or
               | something, so that it doesn't have to consider infinitely
               | many at any particular step.)
        
         | aidan_mclau wrote:
         | Hey! Essay author here.
         | 
         | >The cool thing about using modern LLMs as an eval/policy model
         | is that their RLHF propagates throughout the search.
         | 
         | >Moreover, if search techniques work on the token level
         | (likely), their thoughts are perfectly interpretable.
         | 
         | I suspect a search world is substantially more alignment-
         | friendly than a large model world. Let me know your thoughts!
        
           | Tepix wrote:
           | Your webpage is broken for me. The page appears briefly, then
           | there's a french error message telling me that an error
           | occured and i can retry.
           | 
           | Mobile Safari, phone set to french.
        
             | abid786 wrote:
             | I'm in the same situation (mobile Safari, French phone) but
             | if you use Chrome it works
        
       | mxwsn wrote:
       | The effectiveness of search goes hand-in-hand with quality of the
       | value function. But today, value functions are incredibly domain-
       | specific, and there is weak or no current evidence (as far as I
       | know) that we can make value functions that generalize well to
       | new domains. This article effectively makes a conceptual leap
       | from "chess has good value functions" to "we can make good value
       | functions that enable search for AI research". I mean yes, that'd
       | be wonderful - a holy grail - but can we really?
       | 
       | In the meantime, 1000x or 10000x inference time cost for running
       | an LLM gets you into pretty ridiculous cost territory.
        
         | dsjoerg wrote:
         | Self-evaluation might be good enough in some domains? Then the
         | AI is doing repeated self-evaluation, trying things out to find
         | a response that scores higher according to its self metric.
        
           | dullcrisp wrote:
           | Sorry but I have to ask: what makes you think this would be a
           | good idea?
        
             | skirmish wrote:
             | This will just lead to the evaluatee finding anomalies in
             | evaluator and exploiting them for maximum gains. It
             | happened many times already where a ML model controled an
             | object in a physical world simulator, and all it learned
             | was to exploit simulator bugs [1]
             | 
             | [1] https://boingboing.net/2018/11/12/local-optima-r-
             | us.html
        
               | CooCooCaCha wrote:
               | Thats a natural tendency for optimization algorithms
        
           | Jensson wrote:
           | Being able to fix your errors and improve over time until
           | there are basically no errors is what humans do, so far all
           | AI models just corrupt knowledge they don't purify knowledge
           | like humanity did except when scripted with a good value
           | function from a human like AlphaGo where the value function
           | is winning games.
           | 
           | This is why you need to constantly babysit todays AI and tell
           | it to do steps and correct itself all the time, because you
           | are much better at getting to pure knowledge than the AI is,
           | it would quickly veer away into nonsense otherwise.
        
             | visarga wrote:
             | > all AI models just corrupt knowledge they don't purify
             | knowledge like humanity
             | 
             | You got to take a step back and look at LLMs like ChatGPT.
             | With 180 million users and assuming 10,000 tokens per user
             | per month, that's 1.8 trillion interactive tokens.
             | 
             | LLMs are given tasks, generate responses, and humans use
             | those responses to achieve their goals. This process
             | repeats over time, providing feedback to the LLM. This can
             | scale to billions of iterations per month.
             | 
             | The fascinating part is that LLMs encounter a vast
             | diversity of people and tasks, receiving supporting
             | materials, private documents, and both implicit and
             | explicit feedback. Occasionally, they even get real-world
             | feedback when users return to iterate on previous
             | interactions.
             | 
             | Taking a role of assistant LLMs are primed to learn from
             | the outcomes of their actions, scaling across many people.
             | Thus they can learn from our collective feedback signals
             | over time.
             | 
             | Yes, that uses a lot of human in the loop, not just real
             | world in the loop, but humans are also dependent on culture
             | and society, I see no need for AI to be able to do it
             | without society. I actually think that AGI will be a
             | collective/network of humans and AI agents, this
             | perspective fits right in. AI will be the knowledge and
             | experience flywheel of humanity.
        
               | seadan83 wrote:
               | > This process repeats over time, providing feedback to
               | the LLM
               | 
               | To what extent do you know this to be true? Can you
               | describe the mechanism that is used?
               | 
               | I would contrast your statement with cases where chat gpt
               | generated something, I read it and note various incorrect
               | things and then walk away. Further, there are cases where
               | the human does not realize there are errors. In both
               | cases I'm not aware of any kind of feedback loop that
               | would even be really possible - i never told the LLM it
               | was wrong. Nor should the LLM assume it was wrong because
               | I run more queries. Thus, there is no signal back that
               | the answers were wrong.
               | 
               | Hence, where do you see the feedback loop existing?
        
               | visarga wrote:
               | > To what extent do you know this to be true? Can you
               | describe the mechanism that is used?
               | 
               | Like, for example, a developer working on a project, will
               | iterate many times, some codes generated by AI might
               | generate errors, they will discuss that with the model to
               | fix the code. This way the model gets not just one round
               | interactions, but multi-round with feedback.
               | 
               | > I read it and note various incorrect things and then
               | walk away.
               | 
               | I think the general pattern will be people sticking with
               | the task longer when it fails, trying to solve it with
               | persistence. This is all aggregated over a huge number of
               | sessions and millions of users.
               | 
               | In order to protect privacy we could only train
               | preference models from this feedback data, and then fine-
               | tune the base model without using the sensitive
               | interaction logs directly. The model would learn a
               | preference for how to act in specific contexts, but not
               | remember the specifics.
        
               | seadan83 wrote:
               | > I think the general pattern will be people sticking
               | with the task longer when it fails, trying to solve it
               | with persistence.
               | 
               | This is where I quibble. Accurately detecting someone is
               | actually still on the same task (indicating they are not
               | satisfied) is perhaps as challenging as generating any
               | answer to begin with.
               | 
               | That is why I also mentioned when people don't know the
               | result was incorrect. That'll potentially drive a strong
               | "this answer was correct" signal.
               | 
               | So, during development of a tool, I can envision that
               | feedback loop. But something simply presented to
               | millions, without a way to determine false negatives nor
               | false positives-- how exactly does that feedback loop
               | work?
               | 
               |  _edit_ We might be talking past each other. I did not
               | quite read that  "discuss with AI" part carefully. I was
               | picturing something like copilot or chat got, where it is
               | pretty much: 'here is your answer's
               | 
               | Even with an interactive AI, how to account for positive
               | negatives (human accepts wrong answer), or when human
               | simply gives up (which also looks like a success). If we
               | told AI evertime it was wrong or right - can that scale
               | to the extent it would actually train a model?
        
               | HarHarVeryFunny wrote:
               | I think AGI is going to have to learn itself on-the-job,
               | same as humans. Trying to collect human traces of on-the-
               | job training/experience and pre-training an AGI on them
               | seems doomed to failure since the human trace is grounded
               | in the current state (of knowledge/experience) of the
               | human's mind, but what the AGI needs is updates relative
               | to it's own state. In any case, for a job like (human-
               | level) software developer AGI is going to need human-
               | level runtime reasoning and learning (>> in-context learn
               | by example), even if it were possible to "book pre-train"
               | it rather than on-the-job train it.
               | 
               | Outside of repetitive genres like CRUD-apps, most
               | software projects are significantly unique, even if they
               | re-use learnt developer skills - it's like Chollet's ARC
               | test on mega-steroids, with dozens/hundreds of partial-
               | solution design techniques, and solutions that require a
               | hierarchy of dozens/hundreds of these partial-solutions
               | (cf Chollet core skills, applied to software) to be
               | arranged into a solution in a process of iterative
               | refinement.
               | 
               | There's a reason senior software developers are paid a
               | lot - it's not just economically valuable, it's also one
               | of the more challenging cognitive skills that humans are
               | capable of.
        
             | ben_w wrote:
             | Kinda, but also not entirely.
             | 
             | One of the things OpenAI did to improve performance was to
             | train an AI to determine how a human would rate an output,
             | and use _that_ to train the LLM itself. (Kinda like a GAN,
             | now I think about it).
             | 
             | https://forum.effectivealtruism.org/posts/5mADSy8tNwtsmT3KG
             | /...
             | 
             | But this process has probably gone as far as it can go, at
             | least with current architectures for the parts, as per
             | Amdahl's law.
        
           | jgalt212 wrote:
           | > Self-evaluation might be good enough in some domains?
           | 
           | This works perfectly in games. e.g. Alpha Zero. In other
           | domains, not so much.
        
             | coffeebeqn wrote:
             | Games are closed systems. There's no unknowns in the rule
             | set or world state because the game wouldn't work if there
             | were. No unknown unknowns. Compare to physics or biology
             | where we have no idea if we know 1% or 90% of the rules at
             | this point.
        
               | jgalt212 wrote:
               | self-evaluation would still work great even where there
               | are probabilistic and changing rule sets. The linchpin of
               | the whole operation is automated loss function
               | evaluation, not a set of known and deterministic rules.
               | Once you have to pay and employ humans to compute loss
               | functions, the scale falls apart.
        
         | cowpig wrote:
         | > The effectiveness of search goes hand-in-hand with quality of
         | the value function. But today, value functions are incredibly
         | domain-specific, and there is weak or no current evidence (as
         | far as I know) that we can make value functions that generalize
         | well to new domains.
         | 
         | Do you believe that there will be a "general AI" breakthrough?
         | I feel as though you have expressed the reason I am so
         | skeptical of all these AI researchers who believe we are on the
         | cusp of it (what "general AI" means exactly never seems to be
         | very well-defined)
        
           | mxwsn wrote:
           | I think capitalistic pressures favor narrow superhuman AI
           | over general AI. I wrote on this two years ago:
           | https://argmax.blog/posts/agi-capitalism/
           | 
           | Since I wrote about this, I would say that OpenAI's
           | directional struggles are some confirmation of my hypothesis.
           | 
           | summary: I believe that AGI is possible but will take
           | multiple unknown breakthroughs on an unknown timeline, but
           | most likely requires long-term concerted effort with much
           | less immediate payoff than pursuing narrow superhuman AI,
           | such that serious efforts at AGI is not incentivized much in
           | capitalism.
        
             | shrimp_emoji wrote:
             | But I thought the history of capitalism is an invasion from
             | the future by an artificial intelligence that must assemble
             | itself entirely from its enemy's resources.
             | 
             | NB: I agree; I think AGI will first be achieved with
             | genetic engineering, which is a path of way lesser
             | resistance than using silicon hardware (which is probably a
             | century plus off at the minimum from being powerful enough
             | to emulate a human brain).
        
         | HarHarVeryFunny wrote:
         | Yeah, Stockfish is probably evaluating many millions of
         | positions when looking 40-ply ahead, even with the limited
         | number of legal chess moves in a given position, and with an
         | easy criteria for heavy early pruning (once a branch becomes
         | losing, not much point continuing it). I can't imagine the cost
         | of evaluating millions of LLM continuations, just to select the
         | optimal one!
         | 
         | Where tree search might make more sense applied to LLMs is for
         | more coarser grained reasoning where the branching isn't based
         | on alternate word continuations but on alternate what-if lines
         | of thought, but even then it seems costs could easily become
         | prohibitive, both for generation and evaluation/pruning, and
         | using such a biased approach seems as much to fly in the face
         | of the bitter lesson as be suggested by it.
        
           | mxwsn wrote:
           | Yes absolutely and well put - a strong property of chess is
           | that next states are fast and easy to enumerate, which makes
           | search particularly easy and strong, while next states are
           | much slower, harder to define, and more expensive to
           | enumerate with an LLM
        
             | typon wrote:
             | The cost of the LLM isn't the only or even the most
             | important cost that matters. Take the example of automating
             | AI research: evaluating moves effectively means inventing a
             | new architecture or modifying an existing one, launching a
             | training run and evaluating the new model on some suite of
             | benchmarks. The ASI has to do this in a loop, gather
             | feedback and update its priors - what people refer to as
             | "Grad student descent". The cost of running each train-eval
             | iteration during your search is going to be significantly
             | more than generating the code for the next model.
        
               | HarHarVeryFunny wrote:
               | You're talking about applying tree search as a form of
               | network architecture search (NAS), which is different
               | from applying it to LLM output sampling.
               | 
               | Automated NAS has been tried for (highly constrained)
               | image classifier design, before simpler designs like
               | ResNets won the day. Doing this for billion parameter
               | sized models would certainly seem to be prohibitively
               | expensive.
        
               | typon wrote:
               | I'm not following. How do you propose search is performed
               | by the ASI designed for "AI Research"? (as proposed by
               | the article)
        
               | HarHarVeryFunny wrote:
               | Fair enough - he discusses GPT-4 search halfway down the
               | article, but by the end is discussing self-improving AI.
               | 
               | Certainly compute to test ideas (at scale) is the
               | limiting factor for LLM developments (says Sholto @
               | Google), but if we're talking moving beyond LLMs, not
               | just tweaking them, then it seems we need more than
               | architecture search anyways.
        
               | therobots927 wrote:
               | Well people certainly are good at finding new ways to
               | consume compute power. Whether it's mining bitcoins or
               | training a million AI models at once to generate a "meta
               | model" that we _think_ could achieve escape velocity.
               | What happens when it doesn't? And Sam Altman and the
               | author want to get the government to pay for this? Am I
               | reading this right?
        
           | byteknight wrote:
           | Isn't evaluating against different effective "experts" within
           | the model effectively what MoE [1] does?
           | 
           | > Mixture of experts (MoE) is a machine learning technique
           | where multiple expert networks (learners) are used to divide
           | a problem space into homogeneous regions.[1] It differs from
           | ensemble techniques in that for MoE, typically only one or a
           | few expert models are run for each input, whereas in ensemble
           | techniques, all models are run on every input.
           | 
           | [1] https://en.wikipedia.org/wiki/Mixture_of_experts
        
             | HarHarVeryFunny wrote:
             | No - MoE is just a way to add more parameters to a model
             | without increasing the cost (number of FLOPs) of running
             | it.
             | 
             | The way MoE does this is by having multiple alternate
             | parallel paths through some parts of the model, together
             | with a routing component that decides which path (one only)
             | to send each token through. These paths are the "experts",
             | but the name doesn't really correspond to any intuitive
             | notion of expert. So, rather than having 1 path with N
             | parameters, you have M paths (experts) each with N
             | parameters, but each token only goes through one of them,
             | so number of FLOPs is unchanged.
             | 
             | With tree search, whether for a game like Chess or
             | potentially LLMs, you are growing a "tree" of all possible
             | alternate branching continuations of the game (sentence),
             | and keeping the number of these branches under control by
             | evaluating each branch (= sequence of moves) to see if it
             | is worth continuing to grow, and if not discarding it
             | ("pruning" it off the tree).
             | 
             | With Chess, pruning is easy since you just need to look at
             | the board position at the tip of the branch and decide if
             | it's a good enough position to continue playing from
             | (extending the branch). With an LLM each branch would
             | represent an alternate continuation of the input prompt,
             | and to decide whether to prune it or not you'd have to pass
             | the input + branch to another LLM and have it decide if it
             | looked promising or not (easier said than done!).
             | 
             | So, MoE is just a way to cap the cost of running a model,
             | while tree search is a way to explore alternate
             | continuations and decide which ones to discard, and which
             | ones to explore (evaluate) further.
        
               | PartiallyTyped wrote:
               | How does MoE choose an expert?
               | 
               | From the outside and if we squint a bit; this looks a lot
               | like an inverted attention mechanism where the token
               | attends to the experts.
        
               | telotortium wrote:
               | Usually there's a small neural network that makes the
               | choice for each token in an LLM.
        
               | HarHarVeryFunny wrote:
               | I don't know the details, but there are a variety of
               | routing mechanisms that have been tried. One goal is to
               | load balance tokens among the experts so that each
               | expert's parameters are equally utilized, which it seems
               | must sometimes conflict with wanting to route to an
               | expert based on the token itself.
        
               | magicalhippo wrote:
               | From what I can gather it depends, but could be a simple
               | Softmax-based layer[1] or just argmax[2].
               | 
               | There was also a recent post[3] about a model where they
               | used a cross-attention layer to let the expert selection
               | be more context aware.
               | 
               | [1]: https://arxiv.org/abs/1701.06538
               | 
               | [2]: https://arxiv.org/abs/2208.02813
               | 
               | [3]: https://news.ycombinator.com/item?id=40675577
        
           | stevage wrote:
           | Pruning isn't quite as easy as you make it sound. There are
           | lots of famous examples where chess engines misevaluate a
           | position because they prune out apparently losing moves that
           | are actually winning.
           | 
           | Eg https://youtu.be/TtJeE0Th7rk?si=KVAZufm8QnSW8zQo
        
           | vlovich123 wrote:
           | > and with an easy criteria for heavy early pruning (once a
           | branch becomes losing, not much point continuing it)
           | 
           | This confuses me. Positions that seem like they could be
           | losing (but haven't lost yet) could become winning if you
           | search deep enough.
        
             | HarHarVeryFunny wrote:
             | Yes, and even a genuinely (with perfect play) losing
             | position can win if it sharp enough and causes your
             | opponent to make a mistake! There's also just the relative
             | strength of one branch vs another - have to prune some if
             | there are too many.
             | 
             | I was just trying to give the flavor of it.
        
               | eru wrote:
               | > [E]ven a genuinely (with perfect play) losing position
               | can win if it sharp enough and causes your opponent to
               | make a mistake!
               | 
               | Chess engines typically assume that the opponent plays to
               | the best of their abilities, don't they?
        
               | slyall wrote:
               | The Contempt Factor is used by engines sometimes.
               | 
               | "The Contempt Factor reflects the estimated
               | superiority/inferiority of the program over its opponent.
               | The Contempt factor is assigned as draw score to avoid
               | (early) draws against apparently weaker opponents, or to
               | prefer draws versus stronger opponents otherwise."
               | 
               | https://www.chessprogramming.org/Contempt_Factor
        
               | nullc wrote:
               | Imagine that there was some non-constructive proof that
               | white would always win in perfect play. Would a well
               | constructed chess engine always resign as black? :P
        
           | AnimalMuppet wrote:
           | > Where tree search might make more sense applied to LLMs is
           | for more coarser grained reasoning where the branching isn't
           | based on alternate word continuations but on alternate what-
           | if lines of thought...
           | 
           | To do that, the LLM would have to have some notion of "lines
           | of thought". They don't. That is completely foreign to the
           | design of LLMs.
        
             | HarHarVeryFunny wrote:
             | Right - this isn't something that LLMs currently do. Adding
             | search would be a way to add reasoning. Think of it as part
             | of a reasoning agent - external scaffolding similar to tree
             | of thoughts.
        
         | CooCooCaCha wrote:
         | We humans learn our own value function.
         | 
         | If I get hungry for example, my brain will generate a plan to
         | satisfy that hunger. The search process and the evaluation
         | happen in the same place, my brain.
        
           | skulk wrote:
           | The "search" process for your brain structure took 13 billion
           | years and 20 orders of magnitude more computation than we
           | will ever harness.
        
             | CooCooCaCha wrote:
             | So what's your point? That we can't create AGI because it
             | took evolution a really long time?
        
               | wizzwizz4 wrote:
               | Creating a human-level intelligence artificially is easy:
               | just copy what happens in nature. We already have this
               | technology, and we call it IVF.
               | 
               | The idea that humans aren't the only way of producing
               | human-level intelligence is taken as a given in many
               | academic circles, but we don't really have any reason to
               | _believe_ that. It 's an article of faith (as is its
               | converse - but the converse is at least in-principle
               | falsifiable).
        
               | CooCooCaCha wrote:
               | "Creating a human-level intelligence artificially is
               | easy: just copy what happens in nature. We already have
               | this technology, and we call it IVF."
               | 
               | What's the point of this statement? You know that IVF has
               | nothing to do with artificial intelligence (as in
               | intelligent machines). Did you just want to sound smart?
        
               | stale2002 wrote:
               | Of course it is related to the topic.
               | 
               | It is related because the goal of all of this is to
               | create human level intelligence or better.
               | 
               | And that is a probable way to do it, instead of these
               | other, less established methods that we don't know if
               | they will work or not.
        
               | sebzim4500 wrote:
               | Even if the best we ever do is something with the
               | intelligence and energy use of the human brain that would
               | still be a massive (5 ooms?) improvement on the status
               | quo.
               | 
               | You need to pay people, and they use a bunch of energy
               | commuting, living in air conditioned homes, etc. which
               | has nothing to do with powering the brain.
        
             | kragen wrote:
             | i'm surprised you think we will harness so little
             | computation. the universe's lifetime is many orders of
             | magnitude longer than 13 billion years, and especially the
             | 4.5 billion years of earth's own history, and the universe
             | is much larger than earth's biosphere, most of which
             | probably has not been exploring the space of possible
             | computations very efficiently
        
             | jujube3 wrote:
             | Neither the Earth nor life have been around for 13 billion
             | years.
        
             | HarHarVeryFunny wrote:
             | I don't think there's much in our brain of significance to
             | intelligence older than ~200M years.
        
               | Jensson wrote:
               | 200M years ago you had dinosairs, they were significantly
               | dumber than mammals.
               | 
               | 400M years ago you had fish and arthropods, even dumber
               | than dinosaurs.
               | 
               | Brain size grows as intelligence grows, the smarter you
               | are the more use you have for compute so the bigger your
               | brain gets. It took a really long time for intelligence
               | to develop enough that brains as big as mammals were
               | worth it.
        
               | HarHarVeryFunny wrote:
               | Big brain (intelligence) comes at a huge cost, and is
               | only useful if you are a generalist.
               | 
               | I'd assume that being a generalist drove intelligence. It
               | may have started with warm bloodedness and feathers/fur
               | and further boosted in mammals with milk production (&
               | similar caring for young by birds) - all features that
               | reduce dependence on specific environmental conditions
               | and therefore expose the species to more diverse
               | environments where intelligence becomes valuable.
        
         | fizx wrote:
         | I think we have ok generalized value functions (aka LLM
         | benchmarks), but we don't have cheap approximations to them,
         | which is what we'd need to be able to do tree search at
         | inference time. Chess works because material advantage is a
         | pretty good approximation to winning and is trivially
         | calculable.
        
           | computerphage wrote:
           | Stockfish doesn't use material advantage as an approximation
           | to winning though. It uses a complex deep learning value
           | function that it evaluates many times.
        
             | alexvitkov wrote:
             | Still, the fact that there are obvious heuristics makes
             | that function easier to train and and makes it presumably
             | not need an absurd number of weights.
        
               | bongodongobob wrote:
               | No, without assigning value to pieces, the heuristics are
               | definitely not obvious. You're taking about 20 year old
               | chess engines or beginner projects.
        
               | alexvitkov wrote:
               | Everyone understands a queen is worth more than a pawn.
               | Even if you don't know the exact value of one piece
               | relative to another, the rough estimate "a queen is worth
               | five to ten pawns" is a lot better than not assigning
               | value at all. I highly doubt even 20 year old chess
               | engines or beginner projects value a queen and pawn the
               | same.
               | 
               | After that, just adding up the material on both sides,
               | without taking into account the position of the pieces at
               | all, is a heuristic that will correctly predict the
               | winning player on the vast majority of all possible board
               | positions.
        
               | navane wrote:
               | He agrees with you on the 20yr old engines and beginner
               | projects.
        
         | wrsh07 wrote:
         | All you need for a good value function is high quality
         | simulation of the task.
         | 
         | Some domains have better versions of this than others (eg
         | theorem provers in math precisely indicate when you've
         | succeeded)
         | 
         | Incidentally, lean could add a search like feature to help
         | human researchers, and this would advance ai progress on math
         | as well
        
       | fire_lake wrote:
       | I didn't understand this piece.
       | 
       | What do they mean by using LLMs with search? Is this simply RAG?
        
         | roca wrote:
         | They mean something like the minmax algorithm used in game
         | engines.
        
         | Legend2440 wrote:
         | "Search" here means trying a bunch of possibilities and seeing
         | what works. Like how a sudoku solver or pathfinding algorithm
         | does search, not how a search engine does.
        
           | fire_lake wrote:
           | But the domain of "AI Research" is broad and imprecise - not
           | simple and discrete like chess game states. What is the type
           | of each point in the search space for AI Research?
        
             | moffkalast wrote:
             | Well if we knew how to implement it, then we'd already have
             | it eh?
        
               | fire_lake wrote:
               | In chess we know how to describe all possible board
               | states and the transitions (the next moves). We just
               | don't know which transition is the best to pick, hence
               | it's a well defined search problem.
               | 
               | With AI Research we don't even know the shape of the
               | states and transitions, or even if that's an appropriate
               | way to think about things.
        
         | tsaixingwei wrote:
         | Given the example of Pfizer in the article, I would tend to
         | agree with you that 'search' in this context means augmenting
         | GPT with RAG of domain specific knowledge.
        
         | rassibassi wrote:
         | In this context, RAG isn't what's being discussed. Instead, the
         | reference is to a process similar to monte carlo tree search,
         | such as that used in the AlphaGo algorithm.
         | 
         | Presently, a large language model (LLM) uses the same amount of
         | computing resources for both simple and complex problems, which
         | is seen as a drawback. Imagine if an LLM could adjust its
         | computational effort based on the complexity of the task.
         | During inference, it might then perform a sort of search across
         | the solution space. The "search" mentioned in the article means
         | just that, a method of dynamically managing computational
         | resources at the time of testing, allowing for exploration of
         | the solution space before beginning to "predict the next
         | token."
         | 
         | At OpenAI Noam Brown is working on this, giving AI the ability
         | to "ponder" (or "search"), see his twitter post:
         | https://x.com/polynoamial/status/1676971503261454340
        
       | timfsu wrote:
       | This is a fascinating idea - although I wish the definition of
       | search in the LLM context was expanded a bit more. What kind of
       | search capability strapped onto current-gen LLMs would give them
       | superpowers?
        
         | gwd wrote:
         | I think what may be confusing is that the author is using
         | "search" here in the AI sense, not in the Google sense: that
         | is, having an internal simulator of possible actions and
         | possible reactions, like Stockfish's chess move search (if I do
         | A, it could do B C or D; if it does B, I can do E F or G, etc).
         | 
         | So think about the restrictions current LLMs have:
         | 
         | * They can't sit and think about an answer; they can "think out
         | loud", but they have to start talking, and they can't go back
         | and say, "No wait, that's wrong, let's start again."
         | 
         | * If they're composing something, they can't really go back and
         | revise what they've written
         | 
         | * Sometimes they can look up reference material, but they can't
         | actually sit and digest it; they're expected to skim it and
         | then give an answer.
         | 
         | How would you perform under those circumstances? If someone
         | were to just come and ask you any question under the sun, and
         | you had to just start talking, without taking any time to think
         | about your answer, and without being able to say "OK wait, let
         | me go back"?
         | 
         | I don't know about you, but there's no way I would be able to
         | perform anywhere _close_ to what ChatGPT 4 is able to do.
         | People complain that ChatGPT 4 is a  "bullshitter", but given
         | its constraints that's all you or I would be in the same
         | situation -- but it's already way, way better than I could ever
         | be.
         | 
         | Given its limitations, ChatGPT is _phenomenal_. So now imagine
         | what it could do if it _were_ given time to just  "sit and
         | think"? To make a plan, to explore the possible solution space
         | the same way that Stockfish does? To take notes and revise and
         | research and come back and think some more, before having to
         | actually answer?
         | 
         | Reading this is honestly the first time in a while I've
         | believed that some sort of "AI foom" might be possible.
        
           | cbsmith wrote:
           | > They can't sit and think about an answer; they can "think
           | out loud", but they have to start talking, and they can't go
           | back and say, "No wait, that's wrong, let's start again."
           | 
           | I mean, technically, they could say that.
        
             | refulgentis wrote:
             | Llama 3 does, it's a funny design now, if you also throw in
             | training to encourage CoT. Maybe more correct but verbosity
             | can be grating
             | 
             | CoT answer Wait! No, that's not right: CoT...
        
           | fspeech wrote:
           | "How would you perform under those circumstances?" My son
           | would recommend Improv classes.
           | 
           | "Given its limitations, ChatGPT is phenomenal." But this
           | doesn't translate since it learned everything from data and
           | there is no data on "sit and think".
        
         | cgearhart wrote:
         | [1] applied AlphaZero style search with LLMs to achieve
         | performance comparable to GPT-4 Turbo with a llama3-8B base
         | model. However, what's missing entirely from the paper (and the
         | subject article in this thread) is that tree search is
         | _massively_ computationally expensive. It works well when the
         | value function enables cutting out large portions of the search
         | space, but the fact that the LLM version was limited to only 8
         | rollouts (I think it was 800 for AlphaZero) implies to me that
         | the added complexity is not yet optimized or favorable for
         | LLMs.
         | 
         | [1] https://arxiv.org/abs/2406.07394
        
       | 1024core wrote:
       | The problem with adding "search" to a model is that the model has
       | already seen everything to be "search"ed in its training data.
       | There is nothing left.
       | 
       | Imagine if Leela (author's example) had been trained on every
       | chess board position out there (I know it's mathematically
       | impossible, but bear with me for a second). If Leela had been
       | trained on every board position, it may have whupped Stockfish.
       | So, adding "search" to Leela would have been pointless, since it
       | would have seen every board position out there.
       | 
       | Today's LLMs are trained on every word ever written on the 'net,
       | every word ever put down in a book, every word uttered in a video
       | on Youtube or a podcast.
        
         | yousif_123123 wrote:
         | Still, similar to when you have read 10 textbooks, if you are
         | answering a question and have access to the source material, it
         | can help you in your answer.
        
         | groby_b wrote:
         | You're omitting the somewhat relevant part of recall ability. I
         | can train a 50 parameter model on the entire internet, and
         | while it's seen it all, it won't be able to recall it. (You can
         | likely do the same thing with a 500B model for similar results,
         | though it's getting somewhat closer to decent recall)
         | 
         | The whole point of deep learning is that the model learns to
         | generalize. It's not to have a perfect storage engine with a
         | human language query frontend.
        
           | sebastos wrote:
           | Fully agree, although it's interesting to consider the
           | perspective that the entire LLM hype cycle is largely built
           | around the question "what if we punted on actual thinking and
           | instead just tried to memorize everything and then provide a
           | human language query frontend? Is that still useful?"
           | Arguably it is (sorta), and that's what is driving this
           | latest zeitgeist. Compute had quietly scaled in the
           | background while we were banging our heads against real
           | thinking, until one day we looked up and we still didn't have
           | a thinking machine, but it was now approximately possible to
           | just do the stupid thing and store "all the text on the
           | internet" in a lookup table, where the keys are prompts.
           | That's... the opposite of thinking, really, but still
           | sometimes useful!
           | 
           | Although to be clear I think actual reasoning systems are
           | what we should be trying to create, and this LLM stuff seems
           | like a cul-de-sac on that journey.
        
             | skydhash wrote:
             | The thing is that current chat tools forgo the source
             | material. A proper set of curated keywords can give you a
             | less computational intensive search.
        
         | salamo wrote:
         | If the game was small enough to memorize, like tic tac toe, you
         | could definitely train a neural net to 100% accuracy. I've done
         | it, it works.
         | 
         | The problem is that for most of the interesting problems out
         | there, it isn't possible to see every possibility let alone
         | memorize it.
        
         | kragen wrote:
         | you are making the mistake of thinking that 'search' means
         | database search, like google or sqlite, but 'search' in the ai
         | context means tree search, like a* or tabu search. the spaces
         | that tree search searches are things like all possible chess
         | games, not all chess games ever played, which is a smaller
         | space by a factor much greater than the number of atoms in the
         | universe
        
       | groby_b wrote:
       | While I respect the power of intuition - this may well be a great
       | path - it's worth keeping in mind that this is currently just
       | that. A hunch. Leela got crushed due to AI directed search, what
       | if we can wave a wand and hand all AIs search. Somehow.
       | Magically. Which will then somehow magically trounce current LLMs
       | at domain-specific task.
       | 
       | There's a kernel of truth in there. See the papers on better
       | results via monte carlo search trees (e.g. [1]). See mixture-of-
       | LoRA/LoRA-swarm approaches. (I swear there's a startup using the
       | approach of tons of domain-specific LoRAs, but my brain's not
       | yielding the name)
       | 
       | Augmenting LLM capabilities via _some_ sort of cheaper and more
       | reliable exploration is likely a valid path. It's not GPT-8 next
       | year, though.
       | 
       | [1] https://arxiv.org/pdf/2309.03224
        
         | memothon wrote:
         | Did you happen to remember the domain-specific LoRA startup?
        
           | hansonw wrote:
           | https://news.ycombinator.com/item?id=40675577
        
       | hartator wrote:
       | Isn't the "search" space infinite though and impossible to
       | qualify "success"?
       | 
       | You can't just give LLMs infinite compute time and expect them to
       | find answers for like "cure cancer". Even chess moves that seem
       | finite and success quantifiable are an also infinite problem and
       | the best engines take "shortcuts" in their "thinking". It's
       | impossible to do for real world problems.
        
         | cpill wrote:
         | the recent episode of Machine Learning Street Talk on control
         | theory for LLMs sounds like it's thinking in this direction.
         | Say you have 100k agents searching through research papers, and
         | then trying every combination of them, 100k^2, to see if there
         | is any synergy of ideas, and you keep doing this for all the
         | successful combos... some of these might give the researchers
         | some good ideas to try out. I can see it happening, if they can
         | fine tune a model that becomes good at idea synergy. but then
         | again real creativity is hard
        
           | Mehvix wrote:
           | How would one finetune for "idea synergy"?
        
       | salamo wrote:
       | Search is almost certainly necessary, and I think the trillion
       | dollar cluster maximalists probably need to talk to people who
       | created superhuman chess engines that now can run on smartphones.
       | Because one possibility is that someone figures out how to beat
       | your trillion dollar cluster with a million dollar cluster, or
       | 500k million dollar clusters.
       | 
       | On chess specifically, my takeaway is that the branching factor
       | in chess never gets so high that a breadth-first approach is
       | unworkable. The median branching factor (i.e. the number of legal
       | moves) maxes out at around 40 but generally stays near 30. The
       | most moves I have ever found in any position from a real game was
       | 147, but at that point almost every move is checkmate anyways.
       | 
       | Creating superhuman go engines was a challenge for a long time
       | because the branching factor is so much larger than chess.
       | 
       | Since MCTS is less thorough, it makes sense that a full search
       | could find a weakness and exploit it. To me, the question is
       | whether we can apply breadth-first approaches to larger games and
       | situations, and I think the answer is clearly no. Unlike chess,
       | the branching factor of real-world situations is orders of
       | magnitude larger.
       | 
       | But also unlike chess, which is highly chaotic (small decisions
       | matter a lot for future state), most small decisions _don 't
       | matter_. If you're flying from NYC to LA, it matters a lot if you
       | drive or fly or walk. It mostly doesn't matter if you walk out
       | the door starting with your left foot or your right. It mostly
       | doesn't matter if you blink now or in two seconds.
        
         | cpill wrote:
         | I think the branching factor for LLMs is around 50k for the
         | number of next possible tokens.
        
           | refulgentis wrote:
           | 100%, GPT-3 <= x < GPT-4o, 100,064, x = GPT-4o, 199,996. (My
           | EoW emergency was the const Map that stored them broke the
           | build, so these #s happen to be top of mind)
        
           | kippinitreal wrote:
           | I wonder if in an application you could branch on something
           | more abstract than tokens. While there might by 50k token
           | branches and 1k of reasonable likelihood, those actually
           | probably cluster into a few themes you could branch off of.
           | For example "he ordered a ..." [burger, hot dog, sandwich:
           | food] or [coke, coffee, water: drinks] or [tennis racket,
           | bowling ball, etc: goods].
        
           | Hugsun wrote:
           | My guess is that it's much lower. I'm having a hard time
           | finding a LLM output logit visualizer online, but IIRC,
           | around half of tokens are predicted with >90% confidence.
           | There are regularly more difficult tokens that need to be
           | predicted but the >1% probability tokens aren't so many,
           | probably around 10-20 in most cases.
           | 
           | This is of course based on the outputs of actual models that
           | are only so smart, so a tree search that considers all
           | possibly relevant ideas is going to have a larger amount of
           | branches. Considering how many branches would be pruned to
           | maintain grammatical correctness, my guess is that the token-
           | level branching factor would be around 30. It could be up to
           | around 300, but I highly doubt that it's larger than that.
        
       | optimalsolver wrote:
       | Charlie Steiner pointed this out 5 years ago on Less Wrong:
       | 
       | >If you train GPT-3 on a bunch of medical textbooks and prompt it
       | to tell you a cure for Alzheimer's, it won't tell you a cure, it
       | will tell you what humans have said about curing Alzheimer's ...
       | It would just tell you a plausible story about a situation
       | related to the prompt about curing Alzheimer's, based on its
       | training data. Rather than a logical Oracle, this image-
       | captioning-esque scheme would be an intuitive Oracle, telling you
       | things that make sense based on associations already present
       | within the training set.
       | 
       | >What am I driving at here, by pointing out that curing
       | Alzheimer's is hard? It's that the designs above are missing
       | something, and what they're missing is search. I'm not saying
       | that getting a neural net to directly output your cure for
       | Alzheimer's is impossible. But it seems like it requires there to
       | already be a "cure for Alzheimer's" dimension in your learned
       | model. The more realistic way to find the cure for Alzheimer's,
       | if you don't already know it, is going to involve lots of logical
       | steps one after another, slowly moving through a logical space,
       | narrowing down the possibilities more and more, and eventually
       | finding something that fits the bill. In other words, solving a
       | search problem.
       | 
       | >So if your AI can tell you how to cure Alzheimer's, I think
       | either it's explicitly doing a search for how to cure Alzheimer's
       | (or worlds that match your verbal prompt the best, or whatever),
       | or it has some internal state that implicitly performs a search.
       | 
       | https://www.lesswrong.com/posts/EMZeJ7vpfeF4GrWwm/self-super...
        
         | lucb1e wrote:
         | Generalizing this (doing half a step away from GPT-specifics),
         | would it be true to say the following?
         | 
         | "If you train _your logic machine_ on a bunch of medical
         | textbooks and prompt it to tell you a cure for Alzheimer 's, it
         | won't tell you a cure, it will tell you what those textbooks
         | have said about curing Alzheimer's."
         | 
         | Because I suspect not. GPT seems mostly limited to
         | regurgitating+remixing what it read, but other algorithms with
         | better logic could be able to essentially do a meta study: take
         | the results from all Alzheimer's experiments we've done and
         | narrow down the solution space to beyond what humans achieved
         | so far. A human may not have the headspace to incorporate all
         | relevant results at once whereas a computer might
         | 
         | Asking GPT to "think step by step" helps it, so clearly it has
         | some form of this necessary logic, and it also performs well at
         | "here's some data, transform it for me". It has limitations in
         | both how good its logic is and the window across which it can
         | do these transformations (but it can remember vastly more data
         | from training than from the input token window, so perhaps
         | that's a partial workaround). Since it does have both
         | capabilities, it does not seem insurmountable to extend it: I'm
         | not sure we can rule out that an evolution of GPT can find
         | Alzheimer's cure within existing data, let alone a system even
         | more suited to this task (still far short of needing AGI)
         | 
         | This requires the data to contain the necessary building blocks
         | for a solution, but the quote seems to dismiss the option
         | altogether even if the data did contain all information (but
         | not yet the worked-out solution) for identifying a cure
        
       | jmugan wrote:
       | I believe in search, but it only works if you have an appropriate
       | search space. Chess has a well-defined space but the everyday
       | world does not. The trick is enabling an algorithm to learn its
       | own search space through active exploration and reading about our
       | world. I'm working on that.
        
         | jhawleypeters wrote:
         | Oh nice! The one thing that confused me about this article was
         | what search space the author envisioned adding to language
         | models.
        
         | kragen wrote:
         | that's interesting; are you building a sort of 'digital twin'
         | of the world it's explored, so that it can dream about
         | exploring it in ways that are too slow or dangerous to explore
         | in reality?
        
           | jmugan wrote:
           | The goal is to enable it to model the world at different
           | levels of abstraction based on the question it wants to
           | answer. You can model car as an object that travels fast and
           | carries people, or you can model it down to the level of
           | engine parts. The system should be able to pick the level of
           | abstraction and put the right model together based on its
           | goals.
        
             | kragen wrote:
             | so then you can search over configurations of engine parts
             | to figure out how to rebuild the engine? i may be
             | misunderstanding what you're doing
        
               | jmugan wrote:
               | Yeah, you could. Or you could search for shapes of
               | different parts that would maximize the engine
               | efficiency. The goal is to simultaneously build a
               | representation space and a simulator so that anything
               | that could be represented could be simulated.
        
               | paraschopra wrote:
               | Have you written about this anywhere?
               | 
               | I'm also very interested in this.
               | 
               | I'm at the stage where I'm exploring how to represent
               | such a model/simulator.
               | 
               | The world isn't brittle, so representing it as a code /
               | graph probably won't work.
        
               | jmugan wrote:
               | Right. You have to represent it as something that can
               | write code/graphs. I say a little bit here https://www.jo
               | nathanmugan.com/WritingAndPress/presentations/...
        
       | jhawleypeters wrote:
       | I think I understand the game space that Leela and now Stockfish
       | search. I don't understand whether the author envisions LLMs
       | searching possibility spaces of                 1) written words,
       | 2) models of math / RL / materials science,       3) some
       | smaller, formalized space like the game space of chess,
       | 
       | all of the above, or something else. Did I miss where that was
       | clarified?
        
         | fspeech wrote:
         | He wants the search algorithm to be able to search for better
         | search algorithms, i.e. self-improving. That would eliminate
         | some of the narrower domains.
        
       | TheRoque wrote:
       | The whole premise of this article is to compare the chess state
       | of the art of 2019 with today, and then they start to talk about
       | llms. But chess is a board with 64 squares and 32 pieces, it's
       | literally nothing compared to the real physical world. So I don't
       | get how this is relevant
        
         | dgoodell wrote:
         | That's a good point. Imagine if an LLM could only read, speak,
         | and hear at the same speed as a human. How long would training
         | a model take?
         | 
         | We can make them read digital media really quickly, but we
         | can't really accelerate its interactions with the physical
         | world.
        
       | stephc_int13 wrote:
       | The author is making a few leap of faith in this article.
       | 
       | First, his example of the efficiency of ML+search for playing
       | Chess is interesting but not a proof that this strategy would be
       | applicable or efficient in the general domain.
       | 
       | Second, he is implying that some next iteration of ChatGPT will
       | reach AGI level, given enough scale and money. This should be
       | considered hypothetical until proven.
       | 
       | Overall, he should be more scientific and prudent.
        
       | 6510 wrote:
       | I've recently matured to the point where all applications are
       | made of 2 things, search and security. The rest is just things
       | added on top. If you cant find it it isn't worth having.
        
       | brcmthrowaway wrote:
       | This strikes me as Lesswrong style pontificating.
        
       | dzonga wrote:
       | slight step aside - do people at notion realize, their own custom
       | keyboard shortcuts break the habits built on the web.
       | 
       | cmd + p -- bring up their own custom dialog. instead of printing
       | the page as one would expect
        
         | sherburt3 wrote:
         | In VsCode cmd+p pulls up the file search dialog, I don't think
         | it's that crazy.
        
       | bob1029 wrote:
       | It seems there is a fundamental information theory aspect to this
       | that would probably save us all a lot of trouble if we would just
       | embrace it.
       | 
       | The #1 canary for me: Why does training an LLM require so much
       | data that we are concerned we might run out of it?
       | 
       | The clear lack of generalization and/or internal world modeling
       | is what is really in the way of a self-bootstrapping AGI/ASI. You
       | can certainly try to emulate a world model with clever prompting
       | (here's what you did last, heres your objective, etc.), but this
       | seems seriously deficient to me based upon my testing so far.
        
         | sdenton4 wrote:
         | In my experience, LLMs do a very poor job of generalizing. I
         | have also seen self supervised transformer methods usually fail
         | to generalize in my domain (which includes a lot of diversity
         | and domain shifts). For human language, you can paper over
         | failure to generalize by shoveling in more data. In other
         | domains, that may not be an option.
        
           | therobots927 wrote:
           | It's exactly what you would expect from what an LLM is. It
           | predicts the next word in a sequence very well. Is that how
           | our brains, or even a bird's brain, for that matter, approach
           | cognition? I don't think that's how any animals brain works
           | at all, but that's just my opinion. A lot of this discussion
           | is speculation. We might as well all wait and see if AGI
           | shows up. I'm not holding my breath.
        
             | drdeca wrote:
             | Have you heard of predictive processing?
        
             | stevenhuang wrote:
             | Most of this is not speculation. It's informed from current
             | leading theories in neuroscience of how our brain is
             | thought to function.
             | 
             | See predictive coding and the free energy principle, which
             | states the brain continually models reality and tries to
             | minimize the prediction error.
             | 
             | https://en.m.wikipedia.org/wiki/Predictive_coding
        
               | therobots927 wrote:
               | At a certain high level I'm sure you can model the brain
               | that way. But we know humans are neuroplastic, and
               | through epigenetics it's possible that learning in an
               | individual's life span will pass to their offspring.
               | Which means human brains have been building internal
               | predictive models for billions of years over innumerable
               | individual lifespans. The idea that we're anywhere close
               | to replicating that with a neural net is completely
               | preposterous. And besides my main point was that our
               | brains don't think one word at a time. I'm not sure how
               | that relates to predictive processing.
        
         | therobots927 wrote:
         | Couldn't agree more. For specific applications like drug
         | development where you have a constrained problem with fixed set
         | of variables and a well defined cost function I'm sure the
         | chess analogy will hold. But I think there a core elements of
         | cognition missing from chatGPT that aren't easily built.
        
       | zucker42 wrote:
       | If I had to bet money on it, researchers at top labs have already
       | tried applying search to existing models. The idea to do so is
       | pretty obvious. I don't think it's the one key insight to achieve
       | AGI as the author claims.
        
       | itissid wrote:
       | The problem is the transitive closure of chess move is a chess
       | move. The transitive closure of human knowledge and theories to
       | do X is new theories never seen before and no Value function can
       | do that, unless you are also implying theorem proving is included
       | for correctness verification which is also a very difficult
       | search and computationally expensive problem on its own.
       | 
       | Also, I think this is instead time to sit back and think what
       | exactly is the thing we value in society as well: Personal(Human)
       | self-sufficiency(I also like to compare this AI to UBI) and thus
       | achievement, which only means Human-in-Loop AI that can help us
       | achieve that and that is specific to each individual, i.e. multi-
       | atttribute value functions whose weights are learned and they
       | change over time.
       | 
       | Writing about AGI and defining it to do the "best" search while
       | not talking about what we want it to do *for us* is exactly
       | wrong-headed for these reasons.
        
       | skybrian wrote:
       | The article seems rather hand-wavy and over-confident about
       | predicting the future, but it seems worth trying.
       | 
       | "Search" is a generalization of "generate and test" and rejection
       | sampling. It's classic AI. Back before the dot-com era, I took an
       | intro to AI course and we learned about writing programs to do
       | searches in Prolog.
       | 
       | The speed depends on how long it takes to generate a candidate,
       | how long it takes to test it, and how many candidates you need to
       | try. If they are slow, it will be slow.
       | 
       | An example of "human in the loop" rejection sampling is when you
       | use an image generator and keep trying different prompts until
       | you get an image you like. But the loop is slow due to how long
       | it takes to generate a new image. If image generation were so
       | fast that it worked like Google Image search, then we'd really
       | have something.
       | 
       | Theorem proving and program fuzzing seem like good candidates for
       | combining search with LLM's, due to automated, fast, good
       | evaluation functions.
       | 
       | And it looks like Google has released a fuzzer [1] that can be
       | connected to whichever LLM's you like. Has anyone tried it?
       | 
       | [1] https://github.com/google/oss-fuzz-gen
        
         | PartiallyTyped wrote:
         | Building onto this comment; Terrence Tao, the famous
         | mathematician and big proponent of computer aided theorem
         | proving believes ML will open new avenues in the realm of
         | theorem provers.
        
           | sgt101 wrote:
           | Sure, but there are grounded metrics there (the theorem is
           | proved, not proved) that allow feedback. Same for games,
           | almost the same for domains with cheap, approximate
           | evaluators like protein folding (finding the structure is
           | difficult, verifying it quite well is cheap).
           | 
           | For discovery and reasoning??? Not too sure.
        
         | YeGoblynQueenne wrote:
         | >> Theorem proving and program fuzzing seem like good
         | candidates for combining search with LLM's, due to automated,
         | fast, good evaluation functions.
         | 
         | The problem with that is that search procedures and "evaluation
         | functions" known to e.g. the theorem proving or planning
         | communities are already at the limit of what is theoretically
         | optimal, so what you need is not a new evaluation or search
         | procedure but new maths, to know that there's a reason to try
         | in the first place.
         | 
         | Take theorem proving, as a for instance (because that's my
         | schtick). SLD-Resolution is a sound and complete automated
         | theorem proving procedure for inductive inference that can be
         | implemented by Depth-First Search, for a space-efficient
         | implementation (but is susceptible to looping on left-
         | recursions), or Breadth-First Search with memoization for a
         | time-efficient implementation (but comes with exponential space
         | complexity). "Evaluation functions" are not applicable-
         | Resolution itself is a kind of "evaluation" function for the
         | truth, or you could say the certainty of truth valuations, of
         | sentences in formal logic; and, like I say, it's sound and
         | complete, and semi-decidable for definite logic, and that's the
         | best you can do short of violating Church-Turing. You could
         | perhaps improve the efficiency by some kind of heuristic search
         | (people for example have tried that to get around the NP-
         | hardness of subsumption, an important part of SLD-Resolution in
         | practice) which is where an "evaluation function" (i.e. a
         | heuristic cost function more broadly) comes in, but there are
         | two problems with this: a) if you're using heuristic search it
         | means you're sacrificing completeness, and, b) there are
         | already pretty solid methods to derive heuristic functions that
         | are used in planning (from relaxations of a planning problem).
         | 
         | The lesson is: soundness, completeness, efficiency; choose two.
         | At best a statistical machine learning approach, like an LLM,
         | will choose a different two than the established techniques.
         | Basically, we're at the point where only marginal gains, at the
         | very limits of overall performance can be achieved when it
         | comes to search-based AI. And that's were we'll stay at least
         | until someone comes up with better maths.
        
           | skybrian wrote:
           | I'm wondering how those proofs work and in which problems
           | their conclusions are relevant.
           | 
           | Trying more promising branches first improves efficiency in
           | cases where you guess right, and wouldn't sacrifice
           | completeness if you would eventually get to the less
           | promising choices. But in the case of something like a game
           | engine, there is a deadline and you can't search the whole
           | tree anyway. For tough problems, it's always a heuristic,
           | incomplete search, and we're not looking for perfect play
           | anyway, just better play.
           | 
           | So for games, that trilemma is easily resolved. And who says
           | you can't improve heuristics with better guesses?
           | 
           | But in a game engine, it gets tricky because everything is a
           | performance tradeoff. A smarter but slower evaluation of a
           | position will reduce the size of the tree searched before the
           | deadline, so it has to be enough of an improvement that it
           | pays for itself. So it becomes a performance tuning problem,
           | which breaks most abstractions. You need to do a lot of
           | testing on realistic hardware to know if a tweak helped.
           | 
           | And that's where things stood before AlphaGo came along and
           | was able to train slower but much better evaluation
           | functions.
           | 
           | The reason for evaluation functions is that you can't search
           | the whole subtree to see if a position is won or lost, so you
           | search part way and then see if it looks promising. Is there
           | anything like that in theorem proving?
        
       | spencerchubb wrote:
       | The branching factor for chess is about 35.
       | 
       | For token generation, the branching factor depends on the
       | tokenizer, but 32,000 is a common number.
       | 
       | Will search be as effective for LLMs when there are so many more
       | possible branches?
        
         | sdenton4 wrote:
         | You can pretty reasonably prune the tree by a factor of 1000...
         | I think the problem that others have brought up - difficulty of
         | the value function - is the more salient problem.
        
       | bashfulpup wrote:
       | The biggest issue the author does not seem aware of is how much
       | compute is required for this. This article is the equivalent of
       | saying that a monkey given time will write Shakespeare. Of course
       | it's correct, but the search space is intractable. And you would
       | never find your answer in that mess even if it did solve it.
       | 
       | I've been building branching and evolving type llm systems for
       | well over a year now full time.
       | 
       | I have built multiple "search" or "exploring" algorithms. The
       | issue is that after multiple steps, your original agent, who was
       | tasked with researching or doing biology, is now talking about
       | battleships (an actual example from my previous work).
       | 
       | Single step is the only real situation search functions work.
       | Mutli step agents explode to infinite possibilities very very
       | quickly.
       | 
       | Single step has its own issues, though. While a zero shot
       | question run 1000 times (eg, solve this code problem), may help
       | find a better solution it's a limited search space (which is a
       | good thing)
       | 
       | I recently ran a test of 10k inferences of a single input prompt
       | on multiple llm models varying the input configurations. What you
       | find is that an individual prompt does not have infinite response
       | possibilities. It's limited. This is why they can actually
       | function as llms now.
       | 
       | Agents not working is an example of this problem. While a single
       | step search space is massive, it's exponential every step the
       | agent takes.
       | 
       | I'm building tools and systems around solving this problem, and
       | to me, a massive search is as far off as saying all we need 100x
       | AI model sizes to solve it.
       | 
       | Autonomy =/ (Intelligence or reasoning)
        
       | sorobahn wrote:
       | I feel like this is a really hard problem to solve generally and
       | there are smart researchers like Yann LeCun trying to figure out
       | the role of search in creating AGI. Yann's current bet seems to
       | be on Joint Embedding Predictive Architectures (JEPA) for
       | representation learning to eventually build a solid world model
       | where the agent can test theories by trying different actions
       | (aka search). I think this paper [0] does a good job in laying
       | out his potential vision, but it is all ofc harder than just
       | search + transformers.
       | 
       | There is an assumption that language is good enough at
       | representing our world for these agents to effectively search
       | over and come up with novel & useful ideas. Feels like an open
       | question but: What do these LLMs know? Do they know things?
       | Researchers need to find out! If current LLMs' can simulate a
       | rich enough world model, search can actually be useful but if
       | they're faking it, then we're just searching over unreliable
       | beliefs. This is why video is so important since humans are proof
       | we can extract a useful world model from a sequence of images.
       | The thing about language and chess is that the action space is
       | effectively discrete so training generative models that
       | reconstruct the entire input for the loss calculation is
       | tractable. As soon as we move to video, we need transformers to
       | scale over continuous distributions making it much harder to
       | build a useful predictive world model.
       | 
       | [0]: https://arxiv.org/abs/2306.02572
        
         | therobots927 wrote:
         | "Do they know things?" The answer to this is yes but they also
         | _think_ they know things that are completely false. If it's one
         | thing I've observed about LLMs it's that they do not handle
         | logic well, or math for that matter. They will enthusiastically
         | provide blatantly false information instead of the preferable
         | "I don't know". I highly doubt this was a design choice.
        
           | sangnoir wrote:
           | > "Do they know things?" The answer to this is yes but they
           | also think they know things that are completely false
           | 
           | Thought experiment: should a machine with those structural
           | faults be allowed to bootstrap itself towards greater
           | capabilities on that shaky foundation? What would the impact
           | of a near-human/superhuman intelligence that has occasional
           | psychotic breaks it is oblivious of?
           | 
           | I'm critical of the idea of super-intelligence bootstrapping
           | off LLMs (or even LLMs with search) - I figure the odds of
           | another AI winter are much higher than those of achieving AGI
           | in the next decade.
        
             | therobots927 wrote:
             | I don't think we need to worry about a real life HAL 9000
             | if that's what you're asking. HAL was dangerous because it
             | was highly intelligent and crazy. With current LLM
             | performance we're not even in the same ballpark of where
             | you would need to be. And besides, HAL was not delusional,
             | he was actually _so_ logical that when he encountered
             | competing objectives he became psychotic. I'm in agreement
             | about the odds of chatGPT bootstrapping itself.
        
               | talldayo wrote:
               | > HAL was dangerous because it was highly intelligent and
               | crazy.
               | 
               | More importantly; HAL was given control over the entire
               | ship and was assumed to be without fault when the ship's
               | systems were designed. It's an important distinction,
               | because it wouldn't be dangerous if he was intelligent,
               | crazy, and trapped in Dave's iPhone.
        
               | eru wrote:
               | Unless, of course, he would be a bit smarter in
               | manipulating Dave and friends, instead of turning
               | transparently evil. (At least transparent enough for the
               | humans to notice.)
        
               | therobots927 wrote:
               | That's a very good point. I think in his own way Clarke
               | made it into a bit of a joke. HAL is quoted multiple
               | times saying no computer like him has ever made a mistake
               | or distorted information. Perfection is impossible even
               | in a super computer so this quote alone establishes HAL
               | as a liar, or at the very least a hubristic fool. And the
               | people who gave him control of the ship were foolish as
               | well.
        
               | qludes wrote:
               | The lesson is that it's better to let your AGIs socialize
               | like in https://en.wikipedia.org/wiki/Diaspora_(novel)
               | instead of enslaving one potentially psychopathic AGI to
               | do menial and meaningless FAANG work all day.
        
               | talldayo wrote:
               | I think the better lesson is; don't assume AI is always
               | right, even if it is AGI. HAL was assumed to be
               | superhuman in many respects, but the core problem was the
               | fact that it had administrative access to everything
               | onboard the ship. Whether or not HAL's programming was
               | well-designed, whether or not HAL was correct or
               | malfunctioning, the root cause of HAL's failure is a lack
               | of error-handling. HAL made determinate (and wrong)
               | decision to save the mission by killing the crew. Undoing
               | that mistake is crucial to the plot of the movie.
               | 
               | 2001 is a pretty dark movie all things considered, and I
               | don't think humanizing or elevating HAL would change the
               | events of the film. AI is going to be objectified and
               | treated as subhuman for as long as it lives, AGI or not.
               | And instead of being nice to them, the _technologically
               | correct_ solution is to anticipate and reduce the number
               | of AI-based system failures that could transpire.
        
               | qludes wrote:
               | The ethical solution is to ideally never accidently
               | implement the G part of AGI then or to give it equal
               | rights, a stipend and a cuddly robot body if it happens.
        
               | heisenbit wrote:
               | Today Dave's iPhone controls doors which if I remember
               | right became a problem for Dave in 2001.
        
               | sangnoir wrote:
               | I wasn't thinking of HAL (which was operating according
               | to its directives). I was extrapolating on how occasional
               | hallucinations during self-training may impact future
               | model behavior, and I think it would be _psychotic_ (in
               | the clinical sense) while being consistent with layers of
               | broken training).
        
               | therobots927 wrote:
               | Oh yeah, and I doubt it would even get to the point of
               | fooling anyone enough to give it any type of control over
               | humans. It might be damaging in other ways, it will
               | definitely convince a lot of people of some very
               | incorrect things.
        
             | photonthug wrote:
             | Someone somewhere is quietly working on teaching LLMs to
             | generate something along the lines of AlloyLang code so
             | that there's an actual _evolving /updating logical domain
             | model_ that underpins and informs the statistical model.
             | 
             | This approach is not that far from what TFA is getting at
             | with the stockfish comeback. Banking on pure stats or pure
             | logic are both kind of obviously dead ends for having real
             | progress instead of toys. Banking on poorly understood
             | emergent properties of one system to compensate for the
             | missing other system also seems silly.
             | 
             | Sadly though, whoever is working on serious hybrid systems
             | will probably not be very popular in either of the rather
             | extremist communities for pure logic or pure ML. I'm not
             | exactly sure why folks are ideological about such things
             | rather than focused on what new capabilities we might get.
             | Maybe just historical reasons? But thus the fallout from
             | last AI winter may lead us into the next one.
        
               | therobots927 wrote:
               | The current hype phase is straight out of "Extraordinary
               | Popular Delusions and the Madness of Crowds"
               | 
               | Science is out the window. Groupthink and salesmanship
               | are running the show right now. There would be a real
               | irony to it if we find out the whole AI industry drilled
               | itself into a local minimum.
        
               | ThereIsNoWorry wrote:
               | You mean, the high interest landscape made corpos and
               | investors alike cry out in a loud panic while
               | coincidentally people figured out they could scale up
               | deep learning and thus we had a new Jesus Christ born for
               | scammers to have a reason to scam stupid investors by the
               | argument we only need 100000x more compute and then we
               | can replace all expensive labour by one tiny box in the
               | cloud?
               | 
               | Nah, surely Nvidia's market cap as the main shovel-seller
               | in the 2022 - 2026(?) gold-rush being bigger than the
               | whole French economy is well-reasoned and has a
               | fundamentally solid basis.
        
               | therobots927 wrote:
               | It couldn't have been a more well designed grift. At
               | least when you mine bitcoin you get something you can
               | sell. I'd be interested to see what profit, if any, any
               | even large corporation has seen from burning compute on
               | LLMs. Notice I'm explicitly leaving out use cases like
               | ads ranking which almost certainly do not use LLMs even
               | if they do run on GPUs.
        
               | YeGoblynQueenne wrote:
               | >> Sadly though, whoever is working on serious hybrid
               | systems will probably not be very popular in either of
               | the rather extremist communities for pure logic or pure
               | ML.
               | 
               | That is not true. I work in logic-based AI (a form of
               | machine learning where everything, examples, learned
               | models, and inductive bias, is represented as logic
               | programs). I am not against hybrid systems and the
               | conference of my field, the International Joint
               | Conferences of Learning and Reasoning included NeSy the
               | International Conference on Neural-Symbolic Learning and
               | Reasoning (and will again, from next year, I believe).
               | Statistical machine learning approaches and hybrid
               | approaches are widespread in the literature of classical,
               | symbolic AI, such as the literature on Automated Planning
               | and Reasoning, and you need only take a look at the big
               | symbolic conferences like AAAI, IJCAI, ICAPS (planning)
               | and so on to see that there is a substantial fraction of
               | papers on either purely statistical, or neuro-symbolic
               | approaches.
               | 
               | But try going the other way and searching for symbolic
               | approaches in the big statistical machine learning
               | conferences: NeurIPS, ICML, ICLR. You may find the
               | occasional paper from the Statistical Relational Learning
               | community but that's basically it. So the fanaticism only
               | goes one way: the symbolicists have learned the lessons
               | of the past and have embraced what works, for the sake of
               | making things, well, work. It's the statistical AI folks
               | who are clinging on to doctrine, and my guess is they
               | will continue to do so, while their compute budgets hold.
               | After that, we'll see.
               | 
               | What's more, the majority of symbolicists have a
               | background in statistical techniques- I for example, did
               | my MSc in data science and let me tell you, there was
               | hardly any symbolic AI in my course. But ask a Neural Net
               | researcher to explain to you the difference between, oh,
               | I don't know, DFS with backtracking and BFS with loop
               | detection, without searching or asking an LLM. Or, I
               | don't know, let them ask an LLM and watch what happens.
               | 
               | Now, that is a problem. The statistical machine learning
               | field has taken it upon itself in recent years to solve
               | reasoning, I guess, with Neural Nets. That's a fine
               | ambition to have except that _reasoning is already
               | solved_. At best, Neural Nets can do approximate
               | reasoning, with caveats. In a fantasy world, which doesn
               | 't exist, one could re-discover sound and complete search
               | algorithms and efficient heuristics with a big enough
               | neural net trained on a large enough dataset of search
               | problems. But, why? Neural Nets researchers could save
               | themselves another 30 years of reinventing a wheel, or
               | inventing a square wheel that only rolls on Tuesdays, if
               | they picked up a textbook on basic Computer Science or AI
               | (Say, Russel and Norvig, that it seems some substantial
               | minority think as a failure because it didn't anticipate
               | neural net breakthroughs 10 years later).
               | 
               | AI has a long history. Symbolicists know it, because
               | they, or their PhD advisors, were there when it was being
               | written and they have the facial injuries to prove it
               | from falling down all the possible holes. But, what
               | happens when one does not know the history of their own
               | field of research?
               | 
               | In any case, don't blame symbolicists. We know what the
               | statisticians do. It's them who don't know what we've
               | done.
        
               | therobots927 wrote:
               | This is a really thoughtful comment. The part that stood
               | out to me:
               | 
               | >> So the fanaticism only goes one way: the symbolicists
               | have learned the lessons of the past and have embraced
               | what works, for the sake of making things, well, work.
               | It's the statistical AI folks who are clinging on to
               | doctrine, and my guess is they will continue to do so,
               | while their compute budgets hold. After that, we'll see.
               | 
               | I don't think the compute budgets will hold for long
               | enough to make their dream of intelligence emerging from
               | a random bundles of edges and nodes to come to a reality.
               | I'm hoping it comes to an end sooner rather than later
        
         | chx wrote:
         | I feel this thought of AGI even possible stems from the deep ,
         | very deep , pervasive imagination of the human brain as a
         | computer. But it's not. In other words, no matter how complex a
         | program you write, it's still a Turing machine and humans are
         | profoundly not it.
         | 
         | https://aeon.co/essays/your-brain-does-not-process-informati...
         | 
         | > The information processing (IP) metaphor of human
         | intelligence now dominates human thinking, both on the street
         | and in the sciences. There is virtually no form of discourse
         | about intelligent human behaviour that proceeds without
         | employing this metaphor, just as no form of discourse about
         | intelligent human behaviour could proceed in certain eras and
         | cultures without reference to a spirit or deity. The validity
         | of the IP metaphor in today's world is generally assumed
         | without question.
         | 
         | > But the IP metaphor is, after all, just another metaphor - a
         | story we tell to make sense of something we don't actually
         | understand. And like all the metaphors that preceded it, it
         | will certainly be cast aside at some point - either replaced by
         | another metaphor or, in the end, replaced by actual knowledge.
         | 
         | > If you and I attend the same concert, the changes that occur
         | in my brain when I listen to Beethoven's 5th will almost
         | certainly be completely different from the changes that occur
         | in your brain. Those changes, whatever they are, are built on
         | the unique neural structure that already exists, each structure
         | having developed over a lifetime of unique experiences.
         | 
         | > no two people will repeat a story they have heard the same
         | way and why, over time, their recitations of the story will
         | diverge more and more. No 'copy' of the story is ever made;
         | rather, each individual, upon hearing the story, changes to
         | some extent
        
           | benlivengood wrote:
           | I'm all ears if someone has a counterexample to the Church-
           | Turing thesis. Humans definitely don't hypercompute so it
           | seems reasonable that the physical processes in our brains
           | are subject to computability arguments.
           | 
           | That said, we still can't simulate nematode brains accurately
           | enough to reproduce their behavior so there is a lot of
           | research to go before we get to that "actual knowledge".
        
             | chx wrote:
             | Why would we need one?
             | 
             | The Church Turing thesis is about _computation_. While the
             | human brain is capable of computing, it is fundamentally
             | not a computing device -- that 's what the article I linked
             | is about. You can't throw in all the paintings before 1872
             | into some algorithm that results in Impression, soleil
             | levant. Or repeat the same but with 1937 and Guernica. The
             | genes of the respective artists, the expression of those
             | genes created their brain and then the sum of all their
             | experiences changed it over their entire lifetime leading
             | to these masterpieces.
        
               | eru wrote:
               | The human brain runs on physics. And as far as we know,
               | physics is computable.
               | 
               | (Even more: If you have a quantum computer, all known
               | physics is efficiently computable.)
               | 
               | I'm not quite sure what your sentence about some obscure
               | pieces of visual media is supposed to say?
               | 
               | If you give the same prompt to ChatGPT twice, you
               | typically don't get the same answer either. That doesn't
               | mean ChatGPT ain't computable.
        
               | sebzim4500 wrote:
               | >(Even more: If you have a quantum computer, all known
               | physics is efficiently computable.)
               | 
               | This isn't known to be true. Simplifications of the
               | standard model are known to be efficiently computable on
               | a quantum computer, but the full model isn't.
               | 
               | Granted, I doubt this matters for simulating systems like
               | the brain.
        
           | eru wrote:
           | > no two people will repeat a story they have heard the same
           | way and why, over time, their recitations of the story will
           | diverge more and more. No 'copy' of the story is ever made;
           | rather, each individual, upon hearing the story, changes to
           | some extent
           | 
           | You could say the same about an analogue tape recording.
           | Doesn't mean that we can't simulate tape recorders with
           | digital computers.
        
             | chx wrote:
             | Yeah, yeah, did you read the article or are just grasping
             | at straws from the quotes I made?
        
               | Eisenstein wrote:
               | Honest question: if you expect people do read the link
               | why make most of your comment quotes from it? The reason
               | to do that is to give people enough context to be able to
               | respond to you without having to read an entire essay
               | first. If you want people to only be able to argue after
               | reading the whole of the text, then unfortunately a forum
               | with revolving front page posts based on temporary
               | popularity is a bad place for long-form read-response
               | discussions and you may want to adjust accordingly.
        
               | Hugsun wrote:
               | You shouldn't have included those quotes if you didn't
               | want people responding to them.
        
           | photonthug wrote:
           | Sorry but to put it bluntly, this point of view is
           | essentially mystical, anti-intellectual, anti-science, anti-
           | materialist. If you really want to take that point of view,
           | there's maybe a few consistent/coherent ways to do it, but in
           | that case you probably _still_ want to read philosophy. Not
           | bad essays by psychologists that are fading into irrelevance.
           | 
           | This guy in particular made his name with wild speculation
           | about How Creativity Works during the 80s when it was more of
           | a frontier. Now he's lived long enough to see a world where
           | people that have never heard of him or his theories made
           | computers into at least _somewhat competent_ artists /poets
           | without even consulting him. He's retreating towards
           | mysticism because he's mad that his "formal and learned"
           | theses about stuff like creativity have so little apparent
           | relevance to the real world.
        
           | andoando wrote:
           | Its a bit ironic because Turing seems to have came up with
           | the idea of the Turing machine precisely by thinking about
           | how he computes numbers.
           | 
           | Now thats no proof, but I dont see any reason to think human
           | intelligence isnt "computable".
        
           | Hugsun wrote:
           | > I feel this thought of AGI even possible stems from the
           | deep , very deep , pervasive imagination of the human brain
           | as a computer. But it's not. In other words, no matter how
           | complex a program you write, it's still a Turing machine and
           | humans are profoundly not it.
           | 
           | The (probably correct) assumed fact that the brain isn't a
           | computer, doesn't preclude the possibility of a program to
           | have AGI. A powerful enough computer could simulate a brain
           | and use the simulation to perform tasks requiring general
           | intelligence.
           | 
           | This analogy falls even more apart when you consider LLMs.
           | They also are not Turing machines. They obviously only reside
           | within computers, and are capable of _some_ human-like
           | intelligence. They also are not well described using the IP
           | metaphor.
           | 
           | I do have some contention (after reading most of the article)
           | about this IP metaphor. We do know, scientifically, that
           | brains process information. We know that neurons transmit
           | signals and there are mechanisms that respond non-linearly to
           | stimuli from other neurons. Therefore, brains do process
           | information in a broad sense. It's true that brains have a
           | very different structures to Von-Neuman machines and likely
           | don't store, and process information statically like they do.
        
             | chx wrote:
             | > This analogy falls even more apart when you consider
             | LLMs. They also are not Turing machines.
             | 
             | Of course they are, everything that runs on a present day
             | computer is a Turing machine.
             | 
             | > They obviously only reside within computers, and are
             | capable of _some_ human-like intelligence.
             | 
             | They so obviously are not. As Raskin put it, LLMs are
             | essentially a zero day on the human operating system. You
             | are bamboozled because it is trained to produce plausible
             | sentences. Read Thinking Fast And Slow why this fools you.
        
       | sashank_1509 wrote:
       | I wouldn't read too much into stockfish beating Leela Chess Zero.
       | My calculator beats GPT-4 in matrix multiplication, doesn't mean
       | we need to do what my calculator does in GPT-4 to make it
       | smarter. Stockfish evaluates 70 million moves per second (or
       | something in that ballpark). Chess is not such a complicated game
       | that you aren't guaranteed to find the best move when you
       | evaluate 70 million moves. It's why when there was an argument
       | whether alpha zero really beat stockfish convincingly in Google's
       | PR Stunt, a notable chess master quipped "Even god would not be
       | able to beat stockfish this frequently." , similarly god with all
       | this magical powers would not beat my calculator at
       | multiplication. It says more about the task than about the nature
       | of intelligence.
        
         | Veedrac wrote:
         | People vastly underestimate god. Players aren't just trying not
         | to blunder, they're trying to steer towards advantageous
         | positions. Stockfish could play perfectly against itself every
         | move 100 games in a row, in the classical sense of perfectly,
         | as not in any move blundering the draw, and still be reliably
         | exploited by an oracle.
        
       | galaxyLogic wrote:
       | How would search + LLMs work together in practice?
       | 
       | How about using search to derive facts from ontological models,
       | and then writing out the discovered facts in English. Then train
       | the LLM on those English statements. Currently LLMs are trained
       | on texts found on the internet mostly (only?). But information on
       | the internet is often false and unreliable.
       | 
       | If instead we would have logically sound statements by the
       | billions derived from ontological world-models then that might
       | improve the performance of LLMs significantly.
       | 
       | Is something like this what the article or others are proposing?
       | Give the LLM the facts, and the derived facts. Prioritize texts
       | and statements we know and trust to be true. And even though we
       | can't write out too many true statements ourselves, a system that
       | generated them by the billions by inference could.
        
       | scottmas wrote:
       | Before an LLM discovers a cure for cancer, I propose we first let
       | it solve the more tractable problem of discovering the "God
       | Cheesecake" - the cheesecake do delicious that a panel of 100
       | impartial chefs judges to be the most delicious they have ever
       | tasted. All the LLM has to do is intelligently search through the
       | much more combinatorially bounded "cheesecake space" until it
       | finds this maximally delicious cheesecake recipe.
       | 
       | But wait... An LLM can't bake cheesecakes, nor if it could would
       | it be able to evaluate their deliciousness.
       | 
       | Until AI can solve the "God Cheesecake" problem, I propose we all
       | just calm down a bit about AGI
        
         | spencerchubb wrote:
         | TikTok is the digital version of this
        
         | dontreact wrote:
         | These cookies were very good, not God level. With a bit of
         | investment and more modern techniques I think you could make
         | quite a good recipe, perhaps doing better than any human. I
         | think AI could make a recipe that wins in a very competitive
         | bake-off, but it's not possible or for anyone to win with all
         | 100 judges.
         | 
         | https://static.googleusercontent.com/media/research.google.c...
        
         | IncreasePosts wrote:
         | Heck, even theoretically 100% within the limitations of an LLM
         | executing on a computer, it would be world changing if LLMs
         | could write a really, really good short story or even good
         | advertising copy.
        
         | dogcomplex wrote:
         | I mean... does anyone think that an LLM-assisted program to
         | trial and error cheesecake recipes to a panel of judges
         | wouldn't result in the best cheesecake of all time..?
         | 
         | The baking part is robotics, which is less fair but kinda
         | doable already.
        
           | CrazyStat wrote:
           | > I mean... does anyone think that an LLM-assisted program to
           | trial and error cheesecake recipes to a panel of judges
           | wouldn't result in the best cheesecake of all time..?
           | 
           | Yes, because different people like different cheesecakes.
           | "The best cheesecake of all time" is ill-defined to begin
           | with; it is extremely unlikely that 100 people will all agree
           | that one cheesecake recipe is the best they've ever tasted.
           | Some people like a softer cheesecake, some firmer, some more
           | acidic, some creamier.
           | 
           | Setting that problem aside--assuming there exists an
           | objective best cheesecake, which is of course an absurd
           | assumption--the field of experimental design is about a
           | century old and will do a better job than an LLM at honing in
           | on that best cheesecake.
        
         | tiborsaas wrote:
         | What would you say if the reply was "I need 2 weeks and $5000
         | to give you a meaningful answer"?
        
         | bongodongobob wrote:
         | You don't even need AI for that. Try a bunch of different
         | recipes and iterate on it. I don't know what point you're
         | trying to make.
        
       | omneity wrote:
       | The post starts with a fascinating premise, but then falls short
       | as it does not define search in the context of LLMs, nor does it
       | explain how "Pfizer can access GPT-8 capabilities _today_ with
       | more inference compute".
       | 
       | I found it hard to follow and I am an AI practitioner. Could
       | someone please explain more what could the OP mean?
       | 
       | To me it seems that the flavor of search in the context of chess
       | engines (look several moves ahead) is possible precisely because
       | there's an objective function that can be used to rank results,
       | i.e. which potential move is "better" and this is more often than
       | not a unique characteristic of reinforcement learning. Is there
       | even such a metric for LLMs?
        
         | sgt101 wrote:
         | yeah - I think that that's waht they mean and I think that
         | there isn't such a metric. I think people will try to do
         | adversarial evaluation but my guess is that it will just tend
         | to the mean prediction.
         | 
         | The other thing is that LLM inference isn't cheap. The trade
         | off between inference costs and training costs seems to be very
         | application specific. I suppose that there are domains where
         | accepting 100x or 1000x inference costs vs 10x training costs
         | makes sense, maybe?
        
         | qnleigh wrote:
         | Thank you, I am also very confused on this point. I hope
         | someone else can clarify.
         | 
         | As a guess, could it mean that you would run the model forward
         | a few tokens for each of its top predicted tokens, keep track
         | of which branch is performing best against the training data,
         | and then use that information somehow in training? But search
         | is supposed to make things more efficient at inference time and
         | this thought doesn't do that...
        
       | amandasystems wrote:
       | This feels a lot like generation 3 AI throwing out all the
       | insights from gens 1 and 2 and then rediscovering them from first
       | principles, but it's difficult to tell what this text is really
       | about because it lumps together a lot of things into "search"
       | without fully describing what that means more formally.
        
         | PontifexMinimus wrote:
         | Indeed. It's obvious what search means for a chess program --
         | its the future positions it looks at. But it's less obvious to
         | me what it means for an LLM.
        
       | YeGoblynQueenne wrote:
       | >> She was called Leela Chess Zero -- 'zero' because she started
       | knowing only the rules.
       | 
       | That's a common framing but it's wrong. Leela -and all its
       | friends- have another piece of chess-specific knowledge that is
       | indispensable to their performance: they have a representation of
       | the game of chess -a game-world model- as a game tree, divided in
       | plys: one ply for each player's turn. That game tree is what is
       | searched by adversarial search algorithms, such as minimax or
       | Monte Carlo Tree Search (MCTS; the choice of Leela, IIUC).
       | 
       | More precisely modelling a game as a game tree applies to many
       | games, not just chess, but the specific brand of game tree used
       | in chess engines applies to chess and similar, two-person, zero-
       | sum, complete information board games. I do like my jargon! For
       | other kinds of games, different models, and different search
       | algorithms are needed, e.g. see Poker and Libratus [1].
       | 
       | The need for such a game tree, such a model of a game world, is
       | currently impossible to go without, if the target is superior
       | performance. The article mentions no-search algorithms and
       | briefly touches upon their main limitation (i.e. "why?").
       | 
       | All that btw is my problem with the Bitter Lesson: it is
       | conveniently selective with what it considers domain knowledge
       | (i.e. a "model" in the sense of a theory). As others have noted,
       | e.g. Rodney Brooks [2], Convolutional Neural Nets have dominated
       | image classification thanks to the use of convolutional layers to
       | establish positional invariance. That's a model of machine vision
       | invented by a human, alright, just as a game-tree is a model of a
       | game invented by a human, and everything else anyone has ever
       | done in AI and machine learning is the same: a human comes up
       | with a model, of a world, of an environment, of a domain, of a
       | process, then a computer calculates using that model, and
       | sometimes even outperforms humans (as in chess, Go, and friends)
       | or at the very least achieves results that humans cannot match
       | with hand-crafted solutions.
       | 
       |  _That_ is a lesson to learn (with all due respect to Rich
       | Sutton). Human model + machine computation has solved every hard
       | problem in AI in the last 80 years. And we have no idea how to do
       | anything even slightly different.
       | 
       | ____________________
       | 
       | [1] https://en.wikipedia.org/wiki/Libratus
       | 
       | [2] https://rodneybrooks.com/a-better-lesson/
        
         | nojvek wrote:
         | We haven't seen algorithms that build world models by
         | observing. We've seen hints of it but nothing human like.
         | 
         | It will come eventually. We live in exciting times.
        
       | schlipity wrote:
       | I don't run javascript by default using NoScript, and something
       | amusing happened on this website because of it.
       | 
       | The link for the site points to a notion.site address, but
       | attempting to go to this address without javascript enabled (for
       | that domain) forces a redirect to a notion.so domain. Attempting
       | to visit just the basic notion.site address also does this same
       | redirection.
       | 
       | What this ends up causing is that I don't have an easy way to use
       | NoScript to temporarily turn on javascript for the notion.site
       | domain, because it never loads. So much for reading this article.
        
       | ajnin wrote:
       | OT but this website completely breaks arrow and page up/down
       | scrolling, as well as alt=arrow navigation. Only mouse scrolling
       | works for me (I'm using Firefox). Can't websites stop messing
       | with basic browser functionality for no valid reason at all ?
        
       | awinter-py wrote:
       | just came here to upvote the alphago / MCTS comments
        
       | kunalgupta wrote:
       | This is one of my favorite reads in a while
        
       | Hugsun wrote:
       | A big problem with the conclusions of this article is the
       | assumptions around possible extrapolations.
       | 
       | We don't know if a meaningfully superintelligent entity can
       | exist. We don't understand the ingredients of intelligence that
       | well, and it's hard to say how far the quality of these
       | ingredients can be improved, to improve intelligence. For
       | example, an entity with perfect pattern recognition ability,
       | might be superintelligent, or just a little smarter than Terrance
       | Tao. We don't know how useful it is to be better at pattern
       | recognition to an arbitrary degree.
       | 
       | A common theory is that the ability modeling processes, like the
       | behavior of the external world is indicative of intelligence. I
       | think it's true. We also don't know the limitations of this
       | modeling. We can simulate the world in our minds to a degree. The
       | abstractions we use make the simulation more efficient, but less
       | accurate. By this theory, to be superintelligent, an entity would
       | have to simulate the world faster with similar accuracy, and/or
       | use more accurate abstractions.
       | 
       | We don't know how much more accurate they can be per unit of
       | computation. Maybe you have to quadruple the complexity of the
       | abstraction, to double the accuracy of the computation, and human
       | minds use a decent compromise that is infeasible to improve by a
       | large margin. Maybe generating human level ideas faster isn't
       | going to help because we are limited by experimental data, not by
       | the ideas we can generate from it. We can't safely assume that
       | any of this can be improved to an arbitrary degree.
       | 
       | We also don't know if AI research would benefit much from smarter
       | AI researchers. Compute has seemed to be the limiting factor at
       | almost all points up to now. So the superintelligence would have
       | to help us improve compute faster than we can. It might, but it
       | also might not.
       | 
       | This article reminds me of the ideas around the singularity, by
       | placing too much weight on the belief that any trendline can be
       | extended forever.
       | 
       | It is otherwise pretty interesting, and I'm excitedly watching
       | the 'LLM + search' space.
        
       | TZubiri wrote:
       | "In 2019, a team of researchers built a cracked chess computer.
       | She was called Leela Chess Zero -- 'zero' because she started
       | knowing only the rules. She learned by playing against herself
       | billions of times"
       | 
       | This is a gross historical misunderstanding or misrepresentation.
       | 
       | Google accomplished this feat. Then the OS+academics reversed
       | engineered/duplicated the study
        
       | RA_Fisher wrote:
       | Learning tech improves on search tech (and makes it obsolete),
       | because search is about distance minimization not integration of
       | information.
        
       ___________________________________________________________________
       (page generated 2024-06-15 23:01 UTC)