[HN Gopher] LLM Daydreaming
       ___________________________________________________________________
        
       LLM Daydreaming
        
       Author : nanfinitum
       Score  : 178 points
       Date   : 2025-07-16 02:22 UTC (20 hours ago)
        
 (HTM) web link (gwern.net)
 (TXT) w3m dump (gwern.net)
        
       | zwaps wrote:
       | Wasn't this already implemented in some agents?
       | 
       | I want to remember I heard about it in several podcasts
        
       | johnfn wrote:
       | It's an interesting premise, but how many people
       | 
       | - are capable of evaluating the LLM's output to the degree that
       | they can identify truly unique insights
       | 
       | - are prompting the LLM in such a way that it could produce truly
       | unique insights
       | 
       | I've prompted an LLM upwards of 1,000 times in the last month,
       | but I doubt more than 10 of my prompts were sophisticated enough
       | to even allow for a unique insight. (I spend a lot of time
       | prompting it to improve React code.) And of those 10 prompts,
       | even if all of the outputs were unique, I don't think I could
       | have identified a single one.
       | 
       | I very much do like the idea of the day-dreaming loop, though! I
       | actually feel like I've had the exact same idea at some point
       | (ironic) - that a lot of great insight is really just combining
       | two ideas that no one has ever thought to combine before.
        
         | cantor_S_drug wrote:
         | > are capable of evaluating the LLM's output to the degree that
         | they can identify truly unique insights
         | 
         | I noticed one behaviour in myself. I heard about a particular
         | topic, because it was a dominant opinion in the infosphere.
         | Then LLMs confirmed that dominant opinion (because it was
         | heavily represented in the training) and I stopped my search
         | for alternative viewpoints. So in a sense, LLMs are turning out
         | to be another reflective mirror which reinforces existing
         | opinion.
        
           | MrScruff wrote:
           | Yes, it seems like LLMs are system one thinking taken to the
           | extreme. Reasoning was supposed to introduce some actual
           | logic but you only have to play with these models for a short
           | while to see that the reasoning tokens are a very soft
           | constraint on the models eventual output.
           | 
           | Infact, they're trained to please us and so in general aren't
           | very good at pushing back. It's incredibly easy to 'beat' an
           | LLM in an argument since they often just follow your line of
           | reasoning (it's in the models context after all).
        
         | zyklonix wrote:
         | Totally agree, most prompts (especially for code) aren't
         | designed to surface novel insights, and even when they are,
         | it's hard to recognize them. That's why the daydreaming loop is
         | so compelling: it offloads both the prompting and the novelty
         | detection to the system itself. Projects like
         | https://github.com/DivergentAI/dreamGPT are early steps in that
         | direction, generating weird idea combos autonomously and
         | scoring them for divergence, without user prompting at all.
        
       | apples_oranges wrote:
       | If the breakthrough comes, most if not all links on HN will be to
       | machine generated content. But so far it seems that the I in
       | current AI is https://www.youtube.com/watch?v=uY4cVhXxW64 ..
        
       | NitpickLawyer wrote:
       | Something I haven't seen explored, but I think could perhaps help
       | is to somehow introduce feedback regarding the generation into
       | the context, based on things that are easily computed w/ other
       | tools (like perplexity). In "thinking" models we see a lot of
       | emerging behaviour like "perhaps I should, but wait, this seems
       | wrong", etc. Perhaps adding some signals at regular? intervals
       | could help in surfacing the correct patterns when they are
       | needed.
       | 
       | There's a podcast I listened to ~1.5 years ago, where a team used
       | GPT2, further trained on a bunch of related papers, and used
       | snippets + perplexity to highlight potential errors. I remember
       | them having some good accuracy when analysed by humans. Perhaps
       | this could work at a larger scale? (a sort of "surprise" factor)
        
       | aredox wrote:
       | Oh, in the middle of "AI is PhD-level" propaganda (just check
       | Google News to see this is not a strawman argument), some people
       | finally admit in passing "no LLM has ever made a breakthrough".
       | 
       | (See original argument:
       | https://nitter.net/dwarkesh_sp/status/1727004083113128327 )
        
         | bonoboTP wrote:
         | I agree there's an equivocation going on for "PhD level"
         | between "so smart, it could get a PhD" (as in come up with and
         | publish new research and defend its own thesis) and "it can
         | solve quizzes at the level that PhDs can".
        
           | washadjeffmad wrote:
           | Services that make this claim are paying people with PhDs to
           | ask their models questions and then provide feedback on the
           | responses with detailed reasoning.
        
       | ashdksnndck wrote:
       | I'm not sure we can accept the premise that LLMs haven't made any
       | breakthroughs. What if people aren't giving the LLM credit when
       | they get a breakthrough from it?
       | 
       | First time I got good code out of a model, I told my friends and
       | coworkers about it. Not anymore. The way I see it, the model is a
       | service I (or my employer) pays for. Everyone knows it's a tool
       | that I can use, and nobody expects me to apportion credit for
       | whether specific ideas came from the model or me. I tell people I
       | code with LLMs, but I don't commit a comment saying "wow, this
       | clever bit came from the model!"
       | 
       | If people are getting actual bombshell breakthroughs from LLMs,
       | maybe they are rationally deciding to use those ideas without
       | mentioning the LLM came up with it first.
       | 
       | Anyway, I still think Gwern's suggestion of a generic idea-lab
       | trying to churn out insights is neat. Given the resources needed
       | to fund such an effort, I could imagine that a trading shop would
       | be a possible place to develop such a system. Instead of looking
       | for insights generally, you'd be looking for profitable trades.
       | Also, I think you'd do a lot better if you have relevant experts
       | to evaluate the promising ideas, which means that more focused
       | efforts would be more manageable. Not comparing everything to
       | everything, but comparing everything to stuff in the expert's
       | domain.
       | 
       | If a system like that already exists at Jane Street or something,
       | I doubt they are going to tell us about it.
        
         | Yizahi wrote:
         | This is bordering conspiracy theory. Thousands of people are
         | getting novel breakthroughs generated purely by LLM an not a
         | single person discloses such result? Not even one of the
         | countless LLM corporation engineers who depend on the billion
         | dollar IV injections from deluded bankers just to continue
         | surviving, and not one has bragged about LLM doing that
         | revolution? Hard to believe.
        
           | esafak wrote:
           | Countless people are increasing their productivity and
           | talking about it here ad nauseam. Even researchers are
           | leaning on language models; e.g.,
           | https://mathstodon.xyz/@tao/114139125505827565
           | 
           | We haven't successfully resolved famous unsolved research
           | problems through language models yet but one can imagine that
           | they will solve increasingly challenging problems over time.
           | And if it happens in the hands of a researcher rather than
           | model's lab, one can also imagine that the researcher will
           | take credit, so you will still have the same question.
        
             | AIPedant wrote:
             | The actual posts totally undermine your point:
             | My general sense is that for research-level mathematical
             | tasks at least, current models fluctuate between "genuinely
             | useful with only broad guidance from user" and "only useful
             | after substantial detailed user guidance", with the most
             | powerful models having a greater proportion of answers in
             | the former category.  They seem to work particularly well
             | for questions that are so standard that their answers can
             | basically be found in existing sources such as Wikipedia or
             | StackOverflow; but as one moves into increasingly obscure
             | types of questions, the success rate tapers off (though in
             | a somewhat gradual fashion), and the more user guidance (or
             | higher compute resources) one needs to get the LLM output
             | to a usable form. (2/2)
        
             | Yizahi wrote:
             | Increasing productivity is nice and commendable, but it is
             | NOT an LLM making a breakthrough on its own, which is the
             | topic of the Gwern's article.
        
             | dingnuts wrote:
             | There is a LOT of money on this message board trying to
             | convince us of the utility of these machines and yes,
             | people talk about it ad nauseum, in vague terms that are
             | unlike anything I see in the real world, with few examples.
             | 
             | Show me the code. Show me your finished product.
        
           | BizarroLand wrote:
           | I wonder if it's not the LLM making the breakthrough but
           | rather that the person using the system just needed the
           | information available presented in a clear and orderly
           | fashion to make the breakthrough itself.
           | 
           | After all, the LLM currently has no cognizance, it is unable
           | to understand what it is saying in a meaningful way. At its
           | best it is a P-Noid Zombie machine, right?
           | 
           | In my opinion anything amazing that comes from an LLM only
           | becomes amazing when someone who was capable of recognizing
           | the amazingness perceives it, like a rewrite of a zen koan,
           | "If an LLM generates a new work of William Shakespeare, and
           | nobody ever reads it, was anything of value lost?"
        
         | nico wrote:
         | > but I don't commit a comment saying "wow, this clever bit
         | came from the model!"
         | 
         | The other day, Claude Code started adding a small signature to
         | the commit messages it was preparing for me. It said something
         | like "This commit was co-written with Claude Code" and a little
         | robot emoji
         | 
         | I wonder if that just happened by accident or if Anthropic is
         | trying to do something like Apple with the "sent from my
         | iPhone"
        
           | danielbln wrote:
           | See https://docs.anthropic.com/en/docs/claude-
           | code/settings#avai..., specifically `includeCoAuthoredBy`
        
             | nico wrote:
             | Thank you. And I guess they are trying to do the Apple
             | thing by making that option true by default
        
           | morsch wrote:
           | Aider does the same thing (and has a similar setting). I tend
           | to squash the AI commits and remove it that way, though I
           | suppose a flag indicating the degree of AI authorship could
           | be useful.
        
           | catigula wrote:
           | letting claude pen commits is wild.
        
             | SheinhardtWigCo wrote:
             | It's great. It's become my preferred workflow for vibe
             | coding because it writes great commit messages, gives you a
             | record of authorship, and rollbacks use far fewer tokens.
             | You don't have to (and probably shouldn't) let it push to
             | the remote branch.
        
         | therealpygon wrote:
         | It is hard to accept as a premise because the premise is
         | questionable from the beginning.
         | 
         | Google already reported several breakthroughs as a direct
         | result of AI, using processes that almost certainly include
         | LLMs, including a new solution in math, improved chip designs,
         | etc. DeepMind has AI that predicted millions of protein folds
         | which are already being used in drugs among many other things
         | they do, though yes, not an LLM per se. There is certainly the
         | probability that companies won't announce things given that the
         | direct LLM output isn't copyrightable/patentable, so a human-
         | in-the-loop solves the issue by claiming the human made said
         | breakthrough with AI/LLM assistance. There isn't much benefit
         | to announcing how much AI helped with a breakthrough unless
         | you're engaged in basically selling AI.
         | 
         | As for "why aren't LLMs creating breakthroughs by themselves
         | regularly", that answer is pretty obvious... they just don't
         | really have that capacity in a meaningful way based on how they
         | work. The closest example is Google's algorithmic breakthrough
         | absolutely was created by a coding LLM, which was effectively
         | achieved through brute force in a well established domain, but
         | that doesn't mean it wasn't a breakthrough. That alone casts
         | doubt on the underlying premise of the post.
        
           | Yizahi wrote:
           | You are contradicting yourself. Either LLM programs can do
           | breakthrough on their own, or they don't have that capacity
           | in a meaningful way based on how they work.
        
           | js8 wrote:
           | I would say that real breakthrough was training NNs as a way
           | to create practical approximators for very complex functions
           | over some kind of many-valued logics. Why they work so well
           | in practice we still don't fully theoretically understand (in
           | the sense we don't know what kind of underlying logic best
           | models what we want from these systems). The LLMs (and
           | application to natural language) are just a consequence of
           | that.
        
           | starlust2 wrote:
           | > through brute force
           | 
           | The same is true of humanity in aggregate. We attribute
           | discoveries to an individual or group of researchers but to
           | claim humans are efficient at novel research is a form of
           | survivorship bias. We ignore the numerous researchers who
           | failed to achieve the same discoveries.
        
             | suddenlybananas wrote:
             | The fact some people don't succeed doesn't show that humans
             | operate by brute force. To claim humans reason and invent
             | by brute force is patently absurd.
        
               | preciousoo wrote:
               | It's an absurd statement because you are human and are
               | aware of how research works on an individual level.
               | 
               | Take yourself outside of that, and imagine you invented
               | earth, added an ecosystem, and some humans. Wheels were
               | invented ~6k years ago, and "humans" have existed for
               | ~40-300k years. We can do the same for other
               | technologies. As a group, we are incredibly inefficient,
               | and an outside observer would see our efforts at building
               | societies and failing to be "brute force"
        
               | Nevermark wrote:
               | I consider humans an "intelligent" species in the sense
               | that a critical mass of us can organize to sustainably
               | learn.
               | 
               | As individuals, without mentors, we would each die off
               | very quickly. Even if we were fed and whatever until we
               | were physically able to take care of ourselves, we
               | wouldn't be able to keep ourselves out of trouble if we
               | had to learn everything ourselves.
               | 
               | Contrast this with the octopus which develops from an egg
               | without any mentorship, and within a year or so has a
               | fantastically knowledgable and creative mind over its
               | respective environment. And they thrive in every oceanic
               | environment in the wet salty world, including coast
               | lines, under permanent Arctic ice, to the deep sea.
               | 
               | To whatever degree they are "intelligent", it's an
               | amazingly accelerated, fully independent, self-taught
               | intelligence. Our species just can't compare on that
               | dimension.
               | 
               | Fortunately, octopus only live a couple years and in an
               | environment where technology is difficult (very hard to
               | isolate and control conditions of all kinds in the
               | ocean). Otherwise, the land octopus would have eaten all
               | of us long ago.
        
               | tmaly wrote:
               | What about Dyson and Alexander Graham Bell ?
        
               | drdaeman wrote:
               | Does "brute force" allow for heuristics and direction?
               | 
               | If it doesn't ("brute" as opposite of "smart", just dumb
               | iteration to exhaustion) then you're right, of course.
               | 
               | But if it does, then I'm not sure it's _patently absurd_
               | - novel ideas could be merely a matter of chance of
               | having all the precursors together at the right time, a
               | stochastic process. And it scales well, bearing at least
               | some _resemblance_ to brute force approaches - although
               | the term is not entirely great (something around
               | "stochastic", "trial-and-error", and "heuristic" is
               | probably a better term).
        
               | therealpygon wrote:
               | You don't consider thousands of scientists developing
               | competing, and often incorrect, solutions for a single
               | domain as a "brute force" attempt by humanity, but do
               | when the same occurs with disparate solutions from
               | parallel LLM attempts? That's certainly an...opinion.
        
         | kajumix wrote:
         | Most interesting novel ideas originate at the intersection of
         | multiple disciplines. Profitable trades could be found in the
         | biomedicine sector when the knowledge of biomedicine and
         | finance are combined. That's where I see LLMs shining because
         | they span disciplines way more than any human can. Once we
         | figure out a way to have them combine ideas (similar to how
         | Gwern is suggesting), there will be, I suspect, a flood of
         | novel and interesting ideas, inconceivable with humans.
        
         | PaulHoule wrote:
         | Almost certainly an LLM has, in response to a prompt and
         | through sheer luck, spat out the kernel of an idea that a
         | super-human centaur of the year 2125 would see as
         | groundbreaking that hasn't been recognized as such.
         | 
         | We have a thin conception of genius that can be challenged by
         | Edison's "1% inspiration, 99% perspiration" or the process of
         | getting a PhD were you might spend 7 years getting to the point
         | where you can start adding new knowledge and then take another
         | 7 years to really hit your stride.
         | 
         | I have a friend who is 50-something and disabled with some
         | mental illness, he thinks he has ADHD. We had a conversation
         | recently where he repeatedly expressed his fantasy that he
         | could show up somewhere with his unique perspective and
         | sprinkle some pixie dust on their problems and be rewarded for
         | it. I found it exhausting. When I would hear his ideas, or if I
         | hear any idea, I immediately think "how would we turn this into
         | a product and sell it?" or "write a paper about it?" or
         | "convince people of it?" and he would have no part of it and
         | think that operationalizing or advocating for that was
         | uninteresting and that somebody else would do all that work and
         | my answer is -- they might, but not without the advocacy.
         | 
         | And it comes down to that.
         | 
         | If an LLM were to come up with a groundbreaking idea and be
         | recognized as having a groundbreaking idea it would have to do
         | a sustained amount of work, say at least 2 person x years
         | equivalent to win people over. And they aren't anywhere near
         | equipped to do that, nobody is going to pay the power bill to
         | do that, and if you were paying the power bill you'd probably
         | have to pay the power bill for a million of them to go off in
         | the wrong direction.
        
       | blueflow wrote:
       | I have not yet seen AI doing a critical evaluation of data
       | sources. AI willcontradict primary sources if the contradiction
       | is more prevalent in the training data.
       | 
       | Something about the whole approach is bugged.
       | 
       | My pet peeve: "Unix System Resources" as explanation for the /usr
       | directory is a term that did not exist until the turn of the
       | millenium (rumor is that a c't journalist made it up in 1999),
       | but AI will retcon it into the FHS (5 years earlier) or into
       | Ritchie/Thompson/Kernigham (27 years earlier).
        
         | _heimdall wrote:
         | > Something about the whole approach is bugged.
         | 
         | The bug is that LLMs are fundamentally designed for natural
         | language processing and prediction, _not_ logic or reasoning.
         | 
         | We may get to actual AI eventually, but an LLM architecture
         | either won't be involved at all or it will act as a part of the
         | system mimicking the language center of a brain.
        
       | zhangjunphy wrote:
       | I also hope we have something like this. But sadly, this is not
       | going to work. The reason is this line from the article, which is
       | so much harder that it looks:
       | 
       | > and a critic model filters the results for genuinely valuable
       | ideas.
       | 
       | In fact, people have tryied this idea. And if you use a LLM or
       | anything similar as the critic, the performance of the model
       | actually degrades in this process. As the LLM tries too hard to
       | satisfy the critic, and the critic itself is far from a good
       | reasoner.
       | 
       | So the reason that we don't hear too much about this idea is not
       | that nobody tried it. But that they tried, and it didn't work,
       | and people are reluctant to publish about something which does
       | not work.
        
         | imiric wrote:
         | Exactly.
         | 
         | This not only affects a potential critic model, but the entire
         | concept of a "reasoning" model is based on the same flawed idea
         | --that the model can generate intermediate context to improve
         | its final output. If that self-generated context contains
         | hallucinations, baseless assumptions or doubt, the final output
         | can only be an amalgamation of that. I've seen the "thinking"
         | output arrive at a correct solution in the first few steps, but
         | then talk itself out of it later. Or go into logical loops,
         | without actually arriving at anything.
         | 
         | The reason why "reasoning" models tend to perform better is
         | simply due to larger scale and better training data. There's
         | nothing inherently better about them. There's nothing
         | intelligent either, but that's a separate discussion.
        
           | yorwba wrote:
           | Reasoning models are trained from non-reasoning models of the
           | same scale, and the training data is the output of the same
           | model, filtered through a verifier. Generating intermediate
           | context to improve the final output is not an idea that
           | reasoning models are based on, but an outcome of the training
           | process. Because empirically it does produce answers that
           | pass the verifier more often if it generates the intermediate
           | steps first.
           | 
           | That the model still makes mistakes doesn't mean it's not an
           | improvement: the non-reasoning base model makes even more
           | mistakes when it tries to skip straight to the answer.
        
             | imiric wrote:
             | Thanks. I trust that you're more familiar with the
             | internals than myself, so I stand corrected.
             | 
             | I'm only speaking from personal usage experience, and don't
             | trust benchmarks since they are often gamed, but if this
             | process produces objectively better results that aren't
             | achieved by scaling up alone, then that's a good thing.
        
           | danenania wrote:
           | > The reason why "reasoning" models tend to perform better is
           | simply due to larger scale and better training data.
           | 
           | Except that we can try the exact same pre-trained model with
           | reasoning enabled vs. disabled and empirically observe that
           | reasoning produces better, more accurate results.
        
             | imiric wrote:
             | I'm curious: can you link to any tests that prove this?
             | 
             | I don't trust most benchmarks, but if this can be easily
             | confirmed by an apples-to-apples comparison, then I would
             | be inclined to believe it.
        
               | danenania wrote:
               | Check out the DeepSeek paper.
               | 
               | Research/benchmarks aside, try giving a somewhat hard
               | programming task to Opus 4 with reasoning off vs. on.
               | Similarly, try the same with o3 vs. o3-pro (o3-pro
               | reasons for much longer).
               | 
               | I'm not going to dig through my history for specific
               | examples, but I do these kinds of comparisons
               | occasionally when coding, and it's not unusual to have
               | e.g. a bug that o3 can't figure out, but o3-pro can. I
               | think this is widely accepted by engineers using LLMs to
               | help them code; it's not controversial.
        
               | imiric wrote:
               | Huh, I wasn't aware that reasoning could be toggled. I
               | use the OpenRouter API, and just saw that this is
               | supported both via their web UI and API. I'm used to
               | Sonnet 3.5 and 4 without reasoning, and their performance
               | is roughly the same IME.
               | 
               | I wouldn't trust comparing two different models, even
               | from the same provider and family, since there could be
               | many reasons for the performance to be different. Their
               | system prompts, training data, context size, or runtime
               | parameters could be different. Even the same model with
               | the same prompt could have varying performance. So it's
               | difficult to get a clear indication that the reasoning
               | steps are the only changing variable.
               | 
               | But toggling it on the same model would be a more
               | reliable way to test this, so I'll try that, thanks.
        
               | jacobr1 wrote:
               | It depends on the problem domain you have and the way you
               | prompt things. Basically the reasoning is better, in
               | cases where using the same model to critique itself in
               | multiple turns would be better.
               | 
               | With code, for example, if a single shot without
               | reasoning would have hallucinating a package or not
               | conformed to the rest of the project style. Then you ask
               | the llm check. Then ask it to revise itself to fix the
               | issue. If the base model can do that - then turning on
               | reasoning, basically allows it to self check for the
               | self-correctable features.
               | 
               | When generating content, you can ask it to consider or
               | produce intermediate deliverables like summaries of input
               | documents that it then synthesizes into the whole. With
               | reasoning on, it can do the intermediate steps and then
               | use that.
               | 
               | The main advantage is that the system is autonomously
               | figuring out a bunch of intermediate steps and working
               | through it. Again no better than it probably could do
               | with some guidance on multiple interactions - but that
               | itself is a big productivity benefit. The second gen (or
               | really 1.5 gen) reasoning models also seem to have been
               | trained on enough reasoning traces that they are starting
               | to know about additional factors to consider so the
               | reasoning loop is tighter.
        
         | amelius wrote:
         | But what if the critic is just hard reality? If you ask an LLM
         | to write a computer program, instead of criticizing it, you can
         | run it and test it. If you ask an LLM to prove a theorem, let
         | it write the proof in a formal logic language so it can be
         | verified. Etcetera.
        
           | Yizahi wrote:
           | Generated code only works because "test" part
           | (compile/validate/analyze etc.) is completely external and
           | written before any mass-market LLMs. There is no such
           | external validator for new theorems, books, pictures, text
           | guides etc. You can't just run hard_reality.exe on a
           | generated poem or a scientific paper to deem it "correct". It
           | is only possible with programming languages, and even then
           | not always.
        
             | amelius wrote:
             | Science is falsifiable by definition, and writing
             | poems/books is not the kind of problem of interest here.
             | 
             | > There is no such external validator for new theorems
             | 
             | There are formal logic languages that will allow you to do
             | this.
        
               | Yizahi wrote:
               | Your proposed approach to science would result in the
               | extremely tiny subset of math, probably theorems being
               | proven by automation. And it is questionable if those
               | theorems would be even useful. A good mathematician with
               | CS experience can probably write a generator of new
               | useless theorems, something along "are every sequential
               | cube plus square of a number divisible by a root of
               | seventh smallest prime multiplied by logn of than number
               | plus blabla...". One can generate such theorrems and
               | formally prove or disprove them, yes.
               | 
               | On the other hand any novel science usually requires deep
               | and wide exploratory research, often involving hard or
               | flawed experimentation or observation. One can train LLM
               | on a PhD curriculum in astrophysics, then provide that
               | LLM with API to some new observatory and instruct it to
               | "go prove cosmological constant". And it will do so, but
               | the result will be generated garbage because there is no
               | formal way to prove such results. There is no formal way
               | to prove why pharaohs decided to stop building pyramids,
               | despite there being some decent theories. This is science
               | too, you know. You can't formally prove that some gene
               | sequence is responsible for trait X etc.
               | 
               | I would say a majority of science is not formally
               | provable.
               | 
               | And lastly, you dismiss books/texts, but that is a huge
               | chunk of intellectual and creative work of humans. Say
               | you are an engineer and you have a CAD model with a list
               | of parts and parameters for rocket for example. Now you
               | need to write a guide for it. LLM can do that, it can
               | generate guide-looking output. The issue is that there is
               | no way to automatically proof it or find issues in it.
               | And there are lots of items like that.
        
               | amelius wrote:
               | I think the problem here is that you assume the LLM has
               | to operate isolated from the world, i.e. without
               | interaction. If you put a human scientist in isolation,
               | then you cannot have high expectations either.
        
               | Yizahi wrote:
               | I assume not that LLM would be isolated, I assume that
               | LLM would be incapable of interacting in any meaningful
               | way on its own (i.e. not triggered by direct input from a
               | programmer).
        
               | jacobr1 wrote:
               | > You can't formally prove that some gene sequence is
               | responsible for trait X etc.
               | 
               | Maybe not formally in some kind of mathematical sense.
               | But you certainly could have simulation models of protein
               | synthesis, and maybe even higher order simulation of
               | tissues and organs. You could also let the ai scientist
               | verify the experimental hypothesis by giving access to
               | robotic lab processes. In fact it seems we are going down
               | both fronts right now.
        
               | Yizahi wrote:
               | Nobody argues that LLMs aren't useful for some bulk
               | processing of billion datapoints or looking for obscure
               | correlations in the unedited data. But the premise of the
               | Gwern's article is that to be considered thinking, LLM
               | must initiate such search on it's own and arrive to a
               | novel conclusion on it's own.
               | 
               | Basically if:
               | 
               | A) Scientist has an idea > triggers LLM program to sift
               | through a ton of data > LLM print out correlation results
               | > scientist read them and proves/disproves an idea. In
               | this case, while LLM did a bulk of work here, it did not
               | arrive at a breakthrough on its own.
               | 
               | B) LLM is idling > then LLM triggers some API to get some
               | specific set of data > LLM correlates results > LLM
               | prints out a complete hypothesis with proof (or disproves
               | it). In this case we can say that LLM did a breakthrough.
        
             | yunohn wrote:
             | IME, on a daily basis, Claude Code (supposed SoTA agent)
             | constantly disables and bypasses tests and checks on my
             | codebase - despite following clear prompting guidelines and
             | all the /woo/ like ultrathink etc.
        
           | zhangjunphy wrote:
           | I think if we can have a good enough simulation of reality,
           | and a fast one. Something like an accelerable minecraft with
           | real world physics. Then this idea might actually work. But
           | the hard reality we currenly could generate efficiently and
           | feed into LLMs usually has a narrow scope. It feels liking
           | teaching only textbook math to a kid for several years but
           | nothing else. The LLM mostly overoptimize in these very
           | specific fields, but the overall performance might even be
           | worse.
        
             | dpoloncsak wrote:
             | Its gotta be G-Mod
        
               | leetbulb wrote:
               | There will never be a computer powerful enough to
               | simulate that many paperclips and explosive barrels.
        
           | jerf wrote:
           | Those things are being done. Program testing is now off-the-
           | shelf tech, and as for math proofs, see: https://www.geeky-
           | gadgets.com/google-deepmind-alphaproof/
        
         | imtringued wrote:
         | That didn't stop actor-critic from becoming one of the most
         | popular deep RL methods.
        
           | zhangjunphy wrote:
           | True, and the successful ones usually require an external
           | source of information. For AlphaGo, it is the simple
           | algorithm which decide who is the winner of a game of Go. For
           | GAN, it is the images labled by human. In these scenarios,
           | the critic is the medium which transforms external
           | information into gradient which optimized the actor, but not
           | the direct source of that information.
        
         | jsbg wrote:
         | > the LLM tries too hard to satisfy the critic
         | 
         | The LLM doesn't have to know about the critic though. It can
         | just output things and the critic is a second process that
         | filters the output for the end user.
        
       | jumploops wrote:
       | How do you critique novelty?
       | 
       | The models are currently trained on a static set of human
       | "knowledge" -- even if they "know" what novelty is, they aren't
       | necessarily incentivized to identify it.
       | 
       | In my experience, LLMs currently struggle with new ideas, doubly
       | true for the reasoning models with search.
       | 
       | What makes novelty difficult, is that the ideas should be
       | nonobvious (see: the patent system). For example, hallucinating a
       | simpler API spec may be "novel" for a single convoluted codebase,
       | but it isn't novel in the scope of humanity's information bubble.
       | 
       | I'm curious if we'll have to train future models on novelty
       | deltas from our own history, essentially creating synthetic time
       | capsules, or if we'll just have enough human novelty between
       | training runs over the next few years for the model to develop an
       | internal fitness function for future novelty identification.
       | 
       | My best guess? This may just come for free in a yet-to-be-
       | discovered continually evolving model architecture.
       | 
       | In either case, a single discovery by a single model still needs
       | consensus.
       | 
       | Peer review?
        
         | n4r9 wrote:
         | It's a good question. A related question is: "what's an example
         | of something undeniably novel?". Like if you ask an agent out
         | of the blue to prove the Collatz conjecture, and it writes out
         | a proof or counterexample. If that happens with LLMs then I'll
         | be a lot more optimistic about the importance to AGI.
         | Unfortunately, I suspect it will be a lot murkier than that -
         | many of these big open questions will get chipped away at by a
         | combination of computational and human efforts, and it will be
         | impossible to pinpoint where the "novelty" lies.
        
           | jacobr1 wrote:
           | Good point. Look at patents. Few are truly novel in some
           | exotic sense of "the whole idea is something never seen
           | before." Most likely it is a combination of known factors
           | applied in a new way, or incremental development improving on
           | known techniques. In a banal sense, most LLM content
           | generated is novel, in that the specific paragraphs might be
           | unique combinations of words, even if the ideas are just
           | slightly rearranged regurgitations.
           | 
           | So I strongly agree that, especially when are talking about
           | the bulk of human discovery and invention, the incrementalism
           | will be increasingly in striking distance of human/AI
           | collaboration. Attribution of the novelty in these cases is
           | going to be unclear, when the task is, simplified something
           | like, "search for combinations of things, in this problem
           | domain, that do the task better than some benchmark" be that
           | drug discovery, maths, ai itself or whatever.
        
         | zbyforgotp wrote:
         | I think our minds don't use novelty - but salience and it also
         | might be easier to implement.
        
       | OtherShrezzing wrote:
       | Google's effort with AlphaEvolve shows that the Daydream Factory
       | approach might not be the big unlock we're expecting. They spent
       | an obscene amount of compute to discover a marginal improvement
       | over the state of the art in a very narrow field. Hours after
       | Google published the paper, mathematicians pointed out that their
       | SOTA algorithms underperformed compared to techniques published
       | in the 50 years ago.
       | 
       | Intuitively, it doesn't feel like scaling up to "all things in
       | all fields" is going to produce substantial breakthroughs, if the
       | current best-in-class implementation of the technique by the
       | worlds leading experts returned modest results.
        
       | khalic wrote:
       | Ugh, again with the anthropomorphizing. LLMs didn't come up with
       | anything new because _they don't have agency_ and _do not
       | reason_...
       | 
       | We're looking at our reflection and asking ourselves why it isn't
       | moving when we don't
        
         | yorwba wrote:
         | If you look at your reflection in water, it may very well move
         | even though you don't. Similarly, you don't need agency or
         | reasoning to create something new, random selection from a
         | large number of combinations is enough, correct horse battery
         | staple.
         | 
         | Of course random new things are typically bad. The article is
         | essentially proposing to generate lots of them anyway and try
         | to filter for only the best ones.
        
           | RALaBarge wrote:
           | I agree that brute forcing is a method and how nature does
           | it. The problem would still be the same, how would it or
           | other LLMs know if the idea is novel and interesting?
           | 
           | Given access to unlimited data, LLMs likely could spot novel
           | trends that we cant but still cant judge the value of
           | creating something unique that it has never encountered
           | before.
        
             | RALaBarge wrote:
             | Yet.
        
         | amelius wrote:
         | > anthropomorphizing
         | 
         | Gwern isn't doing that here. They say: "[LLMs] lack some
         | fundamental aspects of human thought", and then investigates
         | that.
        
       | cranium wrote:
       | I'd be happy to spend my Claude Max tokens during the night so it
       | can "ultrathink" some Pareto improvements to my projects. So far,
       | I've mostly seen lateral moves that rewrites code rather than
       | rearchitecture/design the project.
        
       | precompute wrote:
       | Variations on increasing compute and filtering results aside, the
       | only way out of this rut is another breakthrough as big, or
       | bigger than transformers. A lot of money is being spent on
       | rebranding practical use-cases as innovation because there's
       | severe lack of innovation in this sphere.
        
       | pilooch wrote:
       | AlphaEvolve and similar systems based on map-elites + DL/LLM + RL
       | appears to be one of the promising paths.
       | 
       | Setting up the map-elites dimensions may still be problem-
       | specific but this could be learnt unsupervisedly, at least
       | partially.
       | 
       | The way I see LLMs is as a search-spqce within tokens that
       | manipulate broad concepts within a complex and not so smooth
       | manifold. These concepts can be refined within other spaces
       | (pixel -space, physical spaces, ...)
        
       | guelo wrote:
       | In a recent talk [0] Francois Chollet made it sound like all the
       | frontier models are doing _Test-Time Adaptation_ , which I think
       | is a similar concept to _Dynamic evaluation_ that Gwern says is
       | not being done. Apparently _Test-Time Adaptation_ encompasses
       | several techniques some of which modify model weights and some
       | that don 't, but they are all about on-the-fly learning.
       | 
       | [0] https://www.youtube.com/watch?v=5QcCeSsNRks&t=1542s
        
       | LourensT wrote:
       | Regardless of accusations of anthropomorphizing, continual
       | thinking seems to be a precursor to any sense of agency, simply
       | because agency requires _something_ to be running.
       | 
       | Eventually LLM output degrades when most of the context is its
       | own output. So should there also be an input stream of
       | experience? The proverbial "staring out the window", fed into the
       | model to keep it grounded and give hooks to go off?
        
       | amelius wrote:
       | Humans daydream about problems when they think a problem is
       | interesting. Can an LLM know when a problem is interesting and
       | thereby prune the daydream graph?
        
       | zild3d wrote:
       | > The puzzle is why
       | 
       | The feedback loop on novel/genuine breakthroughs is too long and
       | the training data is too small.
       | 
       | Another reason is that there's plenty of incentive to go after
       | the majority of the economy which relies on routine knowledge and
       | maybe judgement, a narrow slice actually requires novel/genuine
       | breakthroughs.
        
       | cs702 wrote:
       | The question is: How do we get LLMs to have "Eureka!" moments, on
       | their own, when their minds are "at rest," so to speak?
       | 
       | The OP's proposed solution is a constant "daydreaming loop" in
       | which an LLM is does the following on its own, "unconsciously,"
       | as a background task, without human intervention:
       | 
       | 1) The LLM retrieves random facts.
       | 
       | 2) The LLM "thinks" (runs a chain-of-thought) on those retrieved
       | facts to see if they are any interesting connections between
       | them.
       | 
       | 3) If the LLM finds interesting connections, it promotes them to
       | "consciousness" (a permanent store) and possibly adds them to a
       | dataset used for ongoing incremental training.
       | 
       | It could work.
        
         | epcoa wrote:
         | The step 3 has been shown to _not_ work over and over again,
         | the "find interesting connections" is the hand wavy magic at
         | this time. LLMs alone don't seem to be particularly adept at it
         | either.
        
           | cs702 wrote:
           | Has this been tried with reinforcement learning (RL)? As the
           | OP notes, it is plausible from a RL perspective that such a
           | bootstrap can work, because it would be (quoting the OP)
           | "exploiting the generator-verifier gap, where it is easier to
           | discriminate than to generate (eg laughing at a pun is easier
           | than making it)." The hit ratio may be tiny, so doing this
           | well would be very expensive.
        
       | kookamamie wrote:
       | > The puzzle is why
       | 
       | The breakthrough isn't in their datasets.
        
       | velcrovan wrote:
       | I'm once again begging people to read David Gelernter's 1994 book
       | "The Muse in the Machine". I'm surprised to see no mention of it
       | in Gwern's post, it's the exact book he should be reaching for on
       | this topic.
       | 
       | In examining the possibility of genuinely creative computing,
       | Gelernter discovers and defends a model of cognition that
       | explains so much about the human experience of creativity,
       | including daydreaming, dreaming, everyday "aha" moments, and the
       | evolution of human approaches to spirituality.
       | 
       | https://uranos.ch/research/references/Gelernter_1994/Muse%20...
        
       | sneak wrote:
       | Seems like an easy hypothesis to quickly smoke test with a couple
       | hundred lines of script, a wikipedia index, and a few grand
       | thrown at an API.
        
       | dr_dshiv wrote:
       | Yes! I've been prototyping dreaming LLMs based on my downloaded
       | history--and motivated by biomimetic design approaches. Just to
       | surface ideas to myself again.
        
       | A_D_E_P_T wrote:
       | > _You are a creative synthesizer. Your task is to find deep,
       | non-obvious, and potentially groundbreaking connections between
       | the two following concepts. Do not state the obvious. Generate a
       | hypothesis, a novel analogy, a potential research question, or a
       | creative synthesis. Be speculative but ground your reasoning._
       | 
       | > _Concept 1: {Chunk A}_ > _Concept 2: {Chunk B}_
       | 
       | In addition to the other criticisms mentioned by posters ITT, a
       | problem I see is: What concepts do you feed it?
       | 
       | Obviously there's a problem with GIGO. If you don't pick the
       | right concepts to begin with, you're not going to get a
       | meaningful result. But, beyond that, human discovery (in
       | mechanical engineering, at least,) tends to be massively
       | interdisciplinary and serendipitous, so that _many_ concepts are
       | often involved, and many of those are _necessarily non-obvious_.
       | 
       | I guess you could come up with a biomimetics bot, but, besides
       | that, I'm not so sure how well this concept would work as laid
       | out above.
       | 
       | There's another issue in that LLMs tend to be extremely gullible,
       | and swallow the scientific literature and University press
       | releases verbatim and uncritically.
        
       | sartak wrote:
       | From _The Metamorphosis of Prime Intellect_ (1994):
       | 
       | > Among Prime Intellect's four thousand six hundred and twelve
       | interlocking programs was one Lawrence called the
       | RANDOM_IMAGINATION_ENGINE. Its sole purpose was to prowl for new
       | associations that might fit somewhere in an empty area of the
       | GAT. Most of these were rejected because they were useless,
       | unworkable, had a low priority, or just didn't make sense. But
       | now the RANDOM_IMAGINATION_ENGINE made a critical connection, one
       | which Lawrence had been expecting it to make [...]
       | 
       | > Deep within one of the billions of copies of Prime Intellect,
       | one copy of the Random_Imagination_Engine connected two thoughts
       | and found the result good. That thought found its way to
       | conscious awareness, and because the thought was so good it was
       | passed through a network of Prime Intellects, copy after copy,
       | until it reached the copy which had arbitrarily been assigned the
       | duty of making major decisions -- the copy which reported
       | directly to Lawrence. [...]
       | 
       | > "I've had an idea for rearranging my software, and I'd like to
       | know what you think."
       | 
       | > At that Lawrence felt his blood run cold. He hardly understood
       | how things were working as it was; the last thing he needed was
       | more changes. "Yes?"
        
       | js8 wrote:
       | I am not sure why tie this to any concrete AI technology such as
       | LLMs. IMHO the biggest issue we have with AI right now is that we
       | don't know how to philosophicaly formalize what we want. What is
       | reasoning?
       | 
       | I am trying to answer that for myself. Since every logic is
       | expressible in untyped lambda calculus (as any computation is),
       | you could have a system that just somehow generates terms and
       | beta-reduces them. In even so much simpler logic, what are the
       | "interesting" terms?
       | 
       | I have several answers, but my point is, you should simplify the
       | problem and this question has not been answered even under such
       | simple scenario.
        
         | HarHarVeryFunny wrote:
         | Reasoning is chained what-if prediction, together with
         | exploration of alternatives (cf backtracking), and leans upon
         | general curiosity/learning for impasse resolution (i.e. if you
         | can't predict what-if, then have the curiosity to explore and
         | find out).
         | 
         | What the LLM companies are currently selling as "reasoning" is
         | mostly RL-based pre-training whereby the model is encouraged to
         | predict tokens (generate reasoning steps) according to similar
         | "goals" seen in the RL training data. This isn't general case
         | reasoning, but rather just "long horizon" prediction based on
         | the training data. It helps exploit the training data, but
         | isn't going to generate novelty outside of the deductive
         | closure of the training data.
        
           | js8 wrote:
           | I am talking about reasoning in philosophical not logical
           | sense. In your definition, you're assuming a logic in which
           | reasoning happens, but when I am asking the question, I am
           | not presuming any specific logic.
           | 
           | So how do you pick the logic in which to do reasoning? There
           | are "good reasons" to use one logic over another.
           | 
           | LLMs probably learn some combination of logic rules
           | (deduction rules in commonly used logics), but cannot
           | guarantee they will be used consistently (i.e. choose a logic
           | for the problem and stick to it). How do you accomplish that?
           | 
           | And even then reasoning is more than search. If you can
           | reason, you should also be able to reason about more
           | effective reasoning (for example better heuristics to cutting
           | the search tree).
        
             | HarHarVeryFunny wrote:
             | OK, so maybe we're talking somewhat at cross purposes.
             | 
             | I was talking about the process/mechanism of reasoning -
             | how do our brains appear to implement the capability that
             | we refer to as "reasoning", and by extension how could an
             | AI do the same by implementing the same mechanisms.
             | 
             | If we accept prediction (i.e use of past experience) as the
             | mechanistic basis of reasoning, then choice of logic
             | doesn't really come into it - it's more just a matter of
             | your past experience and what you have learnt. What
             | predictive rules/patterns have you learnt, both in terms of
             | a corpus of "knowledge" you can bring to bear, but also in
             | terms of experience with the particular problem domain -
             | what have you learnt (i.e. what solution steps can you
             | predict) about trying to reason about any given domain/goal
             | ?
             | 
             | In terms of consistent use of logic, and sticking to it,
             | one of the areas where LLMs are lacking is in not having
             | any working memory other than their own re-consumed output,
             | as well as an inability to learn beyond pre-training. With
             | both of these capabilities an AI could maintain a focus
             | (working memory) on the problem at hand (vs suffer from
             | "context rot") and learn consistent, or phased/whatever,
             | logic that has been successful in the past at solving
             | similar problems (i.e predicting actions that will lead to
             | solution).
        
               | js8 wrote:
               | But prediction as the basis for reasoning (in
               | epistemological sense) requires the goal to be given from
               | the outside, in the form of the system that is to be
               | predicted. And I would even say that this problem (giving
               | predictions) has been solved by RL.
               | 
               | Yet, the consensus seems to be we don't quite have AGI;
               | so what gives? Clearly just making good predictions is
               | not enough. (I would say current models are empiricist to
               | the extreme; but there is also rationalist position,
               | which emphasizes logical consistency over prediction
               | accuracy.)
               | 
               | So, in my original comment, I lament that we don't really
               | know what we want (what is the objective). The post
               | doesn't clarify much either. And I claim this issue
               | occurs with much simpler systems, such as lambda
               | calculus, than reality-connected LLMs.
        
               | HarHarVeryFunny wrote:
               | > But prediction as the basis for reasoning (in
               | epistemological sense) requires the goal to be given from
               | the outside, in the form of the system that is to be
               | predicted.
               | 
               | Prediction doesn't have goals - it just has inputs (past
               | and present) and outputs (expected inputs). Something
               | that is on your mind (perhaps a "goal") is just a
               | predictive input that will cause you to predict what
               | happens next.
               | 
               | > And I would even say that this problem (giving
               | predictions) has been solved by RL.
               | 
               | Making predictions is of limited use if you don't have
               | the feedback loop of when your predictions are right or
               | wrong (so update prediction for next time), and having
               | the feedback (as our brain does) of when your prediction
               | is wrong is the basis of curiosity - causing us to
               | explore new things and learn about them.
               | 
               | > Yet, the consensus seems to be we don't quite have AGI;
               | so what gives? Clearly just making good predictions is
               | not enough.
               | 
               | Prediction is important, but there are lots of things
               | missing from LLMs such as ability to learn, working
               | memory, innate drives (curiosity, boredom), etc.
        
       | HarHarVeryFunny wrote:
       | > Despite impressive capabilities, large language models have yet
       | to produce a genuine breakthrough. The puzzle is why.
       | 
       | I don't see why this is remotely surprising. Despite all the
       | hoopla, LLMs are not AGI or artifical brains - they are predict-
       | next-word language models. By design they are not built for
       | creativity, but rather quite the opposite, they are designed to
       | continue the input in the way best suggested by the training data
       | - they are essentially built for recall, not creativity.
       | 
       | For an AI to be creative it needs to have innate human/brain-like
       | features such as novelty (prediction failure) driven curiosity,
       | boredom, as well as ability to learn continuously. IOW if you
       | want the AI to be creative it needs to be able to learn for
       | itself, not just regurgitate the output of others, and have these
       | innate mechanisms that will cause it to pursue discovery.
        
         | karmakaze wrote:
         | Yes LLMs choose probable sequences because they recognize
         | similarity. Because of that, it can diverge from similarity to
         | be creative: increase the temperature. What LLMs don't have is
         | (good) taste--we need to build an artificial tongue and feed it
         | as a prerequisite.
        
           | grey-area wrote:
           | Well they also don't have understanding, a model of the
           | world, and the ability to reason (no chain-of-thought created
           | by AI companies is not reasoning), as well as having no
           | taste.
           | 
           | So there is quite a lot missing.
        
           | HarHarVeryFunny wrote:
           | It depends on what you mean by "creative" - they can
           | recombine fragments of training data (i.e. apply generative
           | rules) in any order - generate the deductive closure of the
           | training set, but that is it.
           | 
           | Without moving beyond LLMs to a more brain-like cognitive
           | architecture, all you can do is squeeze the juice out of the
           | training data, by using RL/etc to bias the generative process
           | (according to reasoning data, good taste or whatever), but
           | you can't move beyond the training data to be truly creative.
        
             | vonneumannstan wrote:
             | >It depends on what you mean by "creative" - they can
             | recombine fragments of training data (i.e. apply generative
             | rules) in any order - generate the deductive closure of the
             | training set, but that is it. Without moving beyond LLMs to
             | a more brain-like cognitive architecture, all you can do is
             | squeeze the juice out of the training data, but using
             | RL/etc to bias the generative process (according to
             | reasoning data, good taste or whatever), but you can't move
             | beyond the training data to be truly creative.
             | 
             | It's clear these models can actually reason on unseen
             | problems and if you don't believe that you aren't actually
             | following the field.
        
               | HarHarVeryFunny wrote:
               | Sure - but only if the unseen problem can be solved via
               | the deductive/generative closure of the training data.
               | And of course this type of "reasoning" is only as good as
               | the RL pre-training it is based on - working well for
               | closed domains like math where verification is easy, and
               | not so well in the more general case.
        
               | js8 wrote:
               | Both can be true (and that's why I downvoted you in the
               | other comment, for presenting this as a dichotomy), LLMs
               | can reason and yet "stochastically parrot" the training
               | data.
               | 
               | For example, LLM might learn a rule that sentences that
               | are similar to "A is given. From A follows B.", are
               | followed by statement "Therefore, B". This is modus
               | ponens. LLM can apply this rule to wide variety of A and
               | B, producing novel statements. Yet, these statements are
               | still the statistically probable ones.
               | 
               | I think the problem is, when people say "AI should
               | produce something novel" (or "are producing", depending
               | whether they advocate or dismiss), they are not very
               | clear what the "novel" actually means. Mathematically,
               | it's very easy to produce a never-before-seen theorem;
               | but is it interesting? Probably not.
        
             | awongh wrote:
             | By volume how much of human speech / writing is pattern
             | matching and how much of it is truly original cognition
             | that would pass your bar of creativity? It is probably 90%
             | rote pattern matching.
             | 
             | I don't think LLMs are AGI, but in most senses I don't
             | think people give enough credit to their capabilities.
             | 
             | It's just ironic how human-like the flaws of the system
             | are. (Hallucinations that are asserting untrue facts, just
             | because they are plausible from a pattern matching POV)
        
               | dingnuts wrote:
               | My intuition is opposite yours; due to the insane
               | complexity of the real world nearly 90% of situations are
               | novel and require creativity
               | 
               | OK now we're at an impasse until someone can measure this
        
               | HarHarVeryFunny wrote:
               | I think it comes down to how we define creativity for the
               | purpose of this conversation. I would say that 100% of
               | situations and problems are novel to some degree - the
               | real world does not exactly repeat, and your brain at
               | T+10 is not exactly the same as it is as T+20.
               | 
               | That said, I think most everyday situations are similar
               | enough to things we've experienced before that shallow
               | pattern matching is all it takes. The curve in the road
               | we're driving on may not be 100% the same as any curve
               | we've experienced before, but turning the car wheel to
               | the left the way we've learnt do do it will let us
               | successfully navigate it all the same.
               | 
               | Most everyday situations/problems we're faced with are
               | familiar enough that shallow "reactive" behavior is good
               | enough - we rarely have to stop to develop a plan, figure
               | things out, or reason in any complex kind of a way, and
               | very rarely face situations so challenging that any real
               | creativity is needed.
        
               | HarHarVeryFunny wrote:
               | > It's just ironic how human-like the flaws of the system
               | are. (Hallucinations that are asserting untrue facts,
               | just because they are plausible from a pattern matching
               | POV)
               | 
               | I think most human mistakes are different - not applying
               | a lot of complex logic to come to an incorrect
               | deduction/guess (= LLM hallucination), but rather just
               | shallow recall/guess. e.g. An LLM would guess/hallucinate
               | a capital city by using rules it had learnt about other
               | capital cities - must be famous, large, perhaps have an
               | airport, etc, etc; a human might just use "famous" to
               | guess, or maybe just throw out the name of the only city
               | they can associate to some country/state.
               | 
               | The human would often be aware that they are just
               | guessing, maybe based on not remembering where/how they
               | had learnt this "fact", but to the LLM it's all just
               | statistics and it has no episodic memory (or even
               | coherent training data - it's all sliced and diced into
               | shortish context-length samples) to ground what it knows
               | or does not know.
        
               | awongh wrote:
               | The reason the LLMs are of any use to anyone right now
               | (and real people are using them for real things right
               | now- see the millions of ChatGPT users) is because the
               | qualitative difference between text created by a real
               | human using a guessing heuristic vs. an LLM using
               | statistics is qualitatively the same. Even for things
               | that some subjectively deem "creative".
               | 
               | The entropy of communication also makes it so that we
               | mostly won't ever know when a person is guessing or if
               | they think they're telling the truth. In that sense it
               | makes less difference to the receiver of the information
               | what the intent was- even if it came from a human, that
               | human's guessing/ BS level is still unknown to you, the
               | recipient.
               | 
               | This difference will continue to get smaller and more
               | imperceptible. When it will stop changing or at what rate
               | it will change is anyone's guess.
        
               | HarHarVeryFunny wrote:
               | LLMs are just trying to mimic (predict) human output, and
               | can obviously do a great job, which is why they are
               | useful.
               | 
               | I was just referring to when LLMs fail, which can be in
               | non-human ways, not only the way in which they
               | hallucinate, but also when they generate output that has
               | the "shape" of something in the training set, but is
               | nonsense.
        
               | leptons wrote:
               | > _It is probably 90% rote pattern matching._
               | 
               | So what. 90% (or more) of humans aren't making any sort
               | of breakthrough in any discipline, either. 99.9999999999%
               | of human speech/writing isn't producing "breakthroughs"
               | either, it's just a way to communicate.
               | 
               | > _It 's just ironic how human-like the flaws of the
               | system are. (Hallucinations that are asserting untrue
               | facts, just because they are plausible from a pattern
               | matching POV)_
               | 
               | The LLM is not "hallucinating". It's just operating as it
               | was designed to do, which often produces results that do
               | not make any sense. I have actually hallucinated, and
               | some of those experiences were profoundly insightful,
               | quite the opposite of what an LLM does when it
               | "hallucinates".
               | 
               | You can call anything a "breakthrough" if you aren't
               | aware of prior art. And LLMs are "trained" on nothing but
               | prior art. If an LLM does make a "breakthrough", then
               | it's because the "breakthrough" was already in the
               | training data. I have no doubt many of these
               | "breakthroughs" will be followed years later by someone
               | finding the actual human-based research that the LLM
               | consumed in its training data, rendering the
               | "breakthrough" not quite as exciting.
        
               | andoando wrote:
               | What is the distinction between "pattern matching" and
               | "original cognition" exactly?
               | 
               | All human ideas are a combination of previously seen
               | ideas. If you disagree, come up with a truly new
               | conception which is not. -- Badly quoted David hume
        
               | heyjamesknight wrote:
               | The topic of conversation is not "human speech/writing"
               | but "human creativity." There's no dispute that LLMs can
               | create novel pieces of textual output. But there is no
               | evidence that they can produce novel _ideas_. To assume
               | they can is to adopt a purely rationalist approach to
               | epistemology and cognition. Plato, Aquinas, and Kant
               | would all fervently disagree with that approach.
        
         | vonneumannstan wrote:
         | > Despite all the hoopla, LLMs are not AGI or artifical brains
         | - they are predict-next-word language models. By design they
         | are not built for creativity, but rather quite the opposite,
         | they are designed to continue the input in the way best
         | suggested by the training data - they are essentially built for
         | recall, not creativity.
         | 
         | This is just a completely base level of understanding of LLMs.
         | How do you predict the next token with superhuman accuracy?
         | Really think about how that is possible. If you think it's just
         | stochastic parroting you are ngmi.
         | 
         | >large language models have yet to produce a genuine
         | breakthrough. The puzzle is why. I think you should really
         | update on the fact that world class researchers are surprised
         | by this. They understand something you don't and that is that
         | it's clear these models build robust world models and that text
         | prompts act as probes into those world models. The surprising
         | part is that despite these sophisticated world models we can't
         | seem to get unique insights out which almost surely already
         | exist in those models. Even if all the model is capable of is
         | memorizing text then just the sheer volume it has memorized
         | should yield unique insights, no human can ever hope to hold
         | this much text in their memory and then make connections
         | between it.
         | 
         | It's possible we just lack the prompt creativity to get these
         | insights out but nevertheless there is something strange
         | happening here.
        
           | HarHarVeryFunny wrote:
           | > This is just a completely base level of understanding of
           | LLMs. How do you predict the next token with superhuman
           | accuracy? Really think about how that is possible. If you
           | think it's just stochastic parroting you are ngmi.
           | 
           | Yes, thank-you, I do understand how LLMs work. They learn a
           | lot of generative rules from the training data, and will
           | apply them in flexible fashion according to the context
           | patterns they have learnt. You said stochastic parroting, not
           | me.
           | 
           | However, we're not discussing whether LLMs can be superhuman
           | at tasks where they had the necessary training - we're
           | discussing whether they are capable of creativity (and
           | presumably not just the trivially obvious case of being able
           | to apply their generative rules in any order - deductive
           | closure, not stochastic parroting in the dumbest sense of
           | that expression).
        
             | vonneumannstan wrote:
             | >However, we're not discussing whether LLMs can be
             | superhuman at tasks where they had the necessary training -
             | we're discussing whether they are capable of creativity
             | 
             | "Even if all the model is capable of is memorizing text
             | then just the sheer volume it has memorized should yield
             | unique insights, no human can ever hope to hold this much
             | text in their memory and then make connections between it."
             | 
             | Unless you think humans have magic meat then all we are
             | really doing with "creativity" is connecting previously
             | unconnected facts.
        
               | HarHarVeryFunny wrote:
               | > Unless you think humans have magic meat then all we are
               | really doing with "creativity" is connecting previously
               | unconnected facts.
               | 
               | In the case of discovery and invention the "facts" being
               | connected may not be things that were known before. An
               | LLM is bound by it's training set. A human is not limited
               | by what is currently known - they can explore (in
               | directed or undirected fashion), learn, build hierarchies
               | of new knowledge and understanding, etc.
        
               | HarHarVeryFunny wrote:
               | > "Even if all the model is capable of is memorizing text
               | then just the sheer volume it has memorized should yield
               | unique insights, no human can ever hope to hold this much
               | text in their memory and then make connections between
               | it."
               | 
               | Yes, potentially, but the model has no curiosity or drive
               | to do this (or anything else) by itself. All an LLM is
               | built to do is predict. The only way to control the
               | output and goad it into using the vast amount of
               | knowledge that it has is by highly specific prompting.
               | 
               | Basically it's only going to connect the dots if you tell
               | what dots to connect, in which case it's the human being
               | inventive, not the model. The model is trying to predict,
               | so essentially if you want it to do something outside of
               | the training set you're going to have to prompt it to do
               | that.
               | 
               | A human has curiosity (e.g. "what happens if I connect
               | these dots .."), based on prediction failure and
               | associated focus/etc - the innate desire to explore the
               | unknown and therefore potentially learn. The model has
               | none of that - it can't learn and has no curiosity. If
               | the model's predictions are bad it will just hallucinate
               | and generate garbage, perhaps "backtrack" and try again,
               | and likely lead to context rot.
        
           | awongh wrote:
           | Not just prompting, it also could be we haven't done the
           | right kind of RLHF for these kinds of outputs?
        
         | fragmede wrote:
         | Define creativity. Three things LLMs can do is write song
         | lyrics, poems, and jokes, all of which require some level of
         | what we think of as human creativity. Of course detractors will
         | say LLM versions of those three aren't very good, and they may
         | even be right, but a twelve year old child coming up with the
         | same would be seen as creative, even if they didn't get
         | significant recognition for it.
        
           | HarHarVeryFunny wrote:
           | Sure, but the author of TFA is well versed in LLMs and so is
           | addressing something different. Novelty isn't the same as
           | creativity, especially when limited to generating based on a
           | fixed repertoire of moves.
           | 
           | The term "deductive closure" has been used to describe what
           | LLMs are capable of, and therefore what they are not capable
           | of. They can generate novelty (e.g. new poem) by applying the
           | rules they have learnt in novel ways, but are ultimately
           | restricted by their fixed weights and what was present in the
           | training data, as well as being biased to predict rather than
           | learn (which they anyways can't!) and explore.
           | 
           | An LLM may do a superhuman job of applying what it "knows" to
           | create solutions to novel goals (be that a math olympiad
           | problem, or some type of "creative" output that has been
           | requested, such as a poem), but is unlikely to create a whole
           | new field of math that wasn't hinted at in the training data
           | because it is biased to predict, and anyways doesn't have the
           | ability to learn that would allow it to build a new theory
           | from the ground up one step at a time. Note (for anyone who
           | might claim otherwise) that "in-context learning" is really a
           | misnomer - it's not about _learning_ but rather about _using_
           | data that is only present in-context rather than having been
           | in the training set.
        
         | tmaly wrote:
         | I think we will see more breakthroughs with an AI/Human hybrid
         | approach.
         | 
         | Tobias Rees had some interesting thoughts
         | https://www.noemamag.com/why-ai-is-a-philosophical-rupture/
         | where he poses this idea that AI and humans together can think
         | new types of thoughts that humans alone cannot think.
        
       | _acco wrote:
       | This is a good way of framing that we don't understand human
       | creativity. And that we can't hope to build it until we do.
       | 
       | i.e. AGI is a philosophical problem, not a scaling problem.
       | 
       | Though we understand them little, we know the default mode
       | network and sleep play key roles. That is likely because they aid
       | some universal property of AGI. Concepts we don't understand like
       | motivation, curiosity, and qualia are likely part of the picture
       | too. Evolution is far too efficient for these to be mere side
       | effects.
       | 
       | (And of course LLMs have none of these properties.)
       | 
       | When a human solves a problem, their search space is not random -
       | just like a chess grandmaster's search space of moves is not
       | random.
       | 
       | How our brains are so efficient when problem solving while also
       | able to generate novelty is a mystery.
        
       | vintagedave wrote:
       | > Hypothesis: Day-Dreaming Loop
       | 
       | This mirrors something I have thought of too. I have read
       | multiple theories of emerging consciousness, which touch on
       | things from proprioception to the inner monologue (which not
       | everyone has.)
       | 
       | My own theory is that -- avoiding the need for an awareness of a
       | monologue -- a LLM loop that constantly takes input and lets it
       | run, saving key summarised parts to memory that are then pulled
       | back in when relevant, would be a very interesting system to
       | speak to.
       | 
       | It would need two loops: the constant ongoing one, and then for
       | interaction, one accessing memories from the first. The ongoing
       | one would be aware of the conversation. I think it would be
       | interesting to see what, via the memory system, would happen in
       | terms of the conversation emitting elements from the loop.
       | 
       | My theory is that if we're likely to see emergent consciousness,
       | it will come through ongoing awareness and memory.
        
       | yahoozoo wrote:
       | I once asked ChatGPT to come up with a novel word that would
       | return 0 Google search results. It came up with "vexlithic" which
       | does indeed return 0 results, at least for me. I thought that was
       | neat.
        
       | nsedlet wrote:
       | I believe an important reason for why there are no LLM
       | breakthroughs is that humans make progress in their thinking
       | through experimentation, i.e. collecting targeted data, which
       | requires exerting agency on the real world. This isn't just
       | observation, it's the creation of data not already in the
       | training set.
        
         | haolez wrote:
         | Maybe also the fact that they can't learn small pieces of new
         | information without "formatting" its whole brain again, from
         | scratch. And fine tuning is like having a stroke, where you get
         | specialization by losing cognitive capabilities.
        
       | ramoz wrote:
       | Ive walked 10k steps everyday the past week and produced more
       | code in that period than most would over months. Using Claude
       | Code (and vibetunnel over tailscale to my phone- that I speak
       | instructions into).
       | 
       | There is a breakthrough happening. in real time.
        
         | CaptainFever wrote:
         | Can we see an example please?
        
       | zyklonix wrote:
       | This idea of a "daydreaming loop" hits on a key LLM gap, the lack
       | of background, self-driven insight. A pragmatic step in this
       | direction is https://github.com/DivergentAI/dreamGPT , which
       | explores divergent thinking by generating and scoring
       | hallucinations. It shows how we might start pushing LLMs beyond
       | prompt-response into continuous, creative cognition.
        
       | zby wrote:
       | The novelty part is a hard one - but maybe in many cases we could
       | substitute something else for it? If an idea promises to beat
       | state of the art in some field - and it is not yet actively
       | researched - then it is novel.
       | 
       | But most promising would be to use the Dessalles theories.
       | 
       | Here is 4.1o expanding this:
       | https://chatgpt.com/s/t_6877de9faa40819194f95184979b5b44
       | 
       | By the way - this could be a classic example of this day dreaming
       | - you take two texts: one by Gwern and some article by Dessalles
       | (I read "Why we talk" - a great book! - but maybe there is some
       | more concise article?) and ask LLM to generate ideas connecting
       | these two. In this particular case it was my intuition that
       | connected them - but I imagine that there could be an algorithm
       | that could find this connection in a reasonable time - some kind
       | of semantic search maybe.
        
       | throwaway328 wrote:
       | The fact that LLMs haven't come up with anything "novel" would be
       | a serious puzzle - as the article claims - only _if_ they were
       | thinking, reasoning, being creative, etc. If they aren 't doing
       | anything of the sort, it'd be the only thing you'd expect.
       | 
       | So it's a bit of an anti-climactic solution to the puzzle but:
       | maybe the naysayers were right and they're not thinking at all,
       | or doing any of the other anthropomorphic words being marketed to
       | users, and we've simply all been dragged along by a narrative
       | that's very seductive to tech types (the computer gods will
       | rise!).
       | 
       | It'd be a boring outcome, after the countless gallons of digital
       | ink spilled on the topic the last years, but maybe they'll come
       | to be accepted as "normal software", and not god-like, in the
       | end. A medium to large improvement in some areas, and anywhere
       | from minimal to pointless to harmful in others. And all for the
       | very high cost of all the funding and training and data-hoovering
       | that goes in to them, not to mention the opportunity cost of all
       | the things we humans could have been putting money into and
       | didn't.
        
       ___________________________________________________________________
       (page generated 2025-07-16 23:01 UTC)