[HN Gopher] LLM Daydreaming
___________________________________________________________________
LLM Daydreaming
Author : nanfinitum
Score : 178 points
Date : 2025-07-16 02:22 UTC (20 hours ago)
(HTM) web link (gwern.net)
(TXT) w3m dump (gwern.net)
| zwaps wrote:
| Wasn't this already implemented in some agents?
|
| I want to remember I heard about it in several podcasts
| johnfn wrote:
| It's an interesting premise, but how many people
|
| - are capable of evaluating the LLM's output to the degree that
| they can identify truly unique insights
|
| - are prompting the LLM in such a way that it could produce truly
| unique insights
|
| I've prompted an LLM upwards of 1,000 times in the last month,
| but I doubt more than 10 of my prompts were sophisticated enough
| to even allow for a unique insight. (I spend a lot of time
| prompting it to improve React code.) And of those 10 prompts,
| even if all of the outputs were unique, I don't think I could
| have identified a single one.
|
| I very much do like the idea of the day-dreaming loop, though! I
| actually feel like I've had the exact same idea at some point
| (ironic) - that a lot of great insight is really just combining
| two ideas that no one has ever thought to combine before.
| cantor_S_drug wrote:
| > are capable of evaluating the LLM's output to the degree that
| they can identify truly unique insights
|
| I noticed one behaviour in myself. I heard about a particular
| topic, because it was a dominant opinion in the infosphere.
| Then LLMs confirmed that dominant opinion (because it was
| heavily represented in the training) and I stopped my search
| for alternative viewpoints. So in a sense, LLMs are turning out
| to be another reflective mirror which reinforces existing
| opinion.
| MrScruff wrote:
| Yes, it seems like LLMs are system one thinking taken to the
| extreme. Reasoning was supposed to introduce some actual
| logic but you only have to play with these models for a short
| while to see that the reasoning tokens are a very soft
| constraint on the models eventual output.
|
| Infact, they're trained to please us and so in general aren't
| very good at pushing back. It's incredibly easy to 'beat' an
| LLM in an argument since they often just follow your line of
| reasoning (it's in the models context after all).
| zyklonix wrote:
| Totally agree, most prompts (especially for code) aren't
| designed to surface novel insights, and even when they are,
| it's hard to recognize them. That's why the daydreaming loop is
| so compelling: it offloads both the prompting and the novelty
| detection to the system itself. Projects like
| https://github.com/DivergentAI/dreamGPT are early steps in that
| direction, generating weird idea combos autonomously and
| scoring them for divergence, without user prompting at all.
| apples_oranges wrote:
| If the breakthrough comes, most if not all links on HN will be to
| machine generated content. But so far it seems that the I in
| current AI is https://www.youtube.com/watch?v=uY4cVhXxW64 ..
| NitpickLawyer wrote:
| Something I haven't seen explored, but I think could perhaps help
| is to somehow introduce feedback regarding the generation into
| the context, based on things that are easily computed w/ other
| tools (like perplexity). In "thinking" models we see a lot of
| emerging behaviour like "perhaps I should, but wait, this seems
| wrong", etc. Perhaps adding some signals at regular? intervals
| could help in surfacing the correct patterns when they are
| needed.
|
| There's a podcast I listened to ~1.5 years ago, where a team used
| GPT2, further trained on a bunch of related papers, and used
| snippets + perplexity to highlight potential errors. I remember
| them having some good accuracy when analysed by humans. Perhaps
| this could work at a larger scale? (a sort of "surprise" factor)
| aredox wrote:
| Oh, in the middle of "AI is PhD-level" propaganda (just check
| Google News to see this is not a strawman argument), some people
| finally admit in passing "no LLM has ever made a breakthrough".
|
| (See original argument:
| https://nitter.net/dwarkesh_sp/status/1727004083113128327 )
| bonoboTP wrote:
| I agree there's an equivocation going on for "PhD level"
| between "so smart, it could get a PhD" (as in come up with and
| publish new research and defend its own thesis) and "it can
| solve quizzes at the level that PhDs can".
| washadjeffmad wrote:
| Services that make this claim are paying people with PhDs to
| ask their models questions and then provide feedback on the
| responses with detailed reasoning.
| ashdksnndck wrote:
| I'm not sure we can accept the premise that LLMs haven't made any
| breakthroughs. What if people aren't giving the LLM credit when
| they get a breakthrough from it?
|
| First time I got good code out of a model, I told my friends and
| coworkers about it. Not anymore. The way I see it, the model is a
| service I (or my employer) pays for. Everyone knows it's a tool
| that I can use, and nobody expects me to apportion credit for
| whether specific ideas came from the model or me. I tell people I
| code with LLMs, but I don't commit a comment saying "wow, this
| clever bit came from the model!"
|
| If people are getting actual bombshell breakthroughs from LLMs,
| maybe they are rationally deciding to use those ideas without
| mentioning the LLM came up with it first.
|
| Anyway, I still think Gwern's suggestion of a generic idea-lab
| trying to churn out insights is neat. Given the resources needed
| to fund such an effort, I could imagine that a trading shop would
| be a possible place to develop such a system. Instead of looking
| for insights generally, you'd be looking for profitable trades.
| Also, I think you'd do a lot better if you have relevant experts
| to evaluate the promising ideas, which means that more focused
| efforts would be more manageable. Not comparing everything to
| everything, but comparing everything to stuff in the expert's
| domain.
|
| If a system like that already exists at Jane Street or something,
| I doubt they are going to tell us about it.
| Yizahi wrote:
| This is bordering conspiracy theory. Thousands of people are
| getting novel breakthroughs generated purely by LLM an not a
| single person discloses such result? Not even one of the
| countless LLM corporation engineers who depend on the billion
| dollar IV injections from deluded bankers just to continue
| surviving, and not one has bragged about LLM doing that
| revolution? Hard to believe.
| esafak wrote:
| Countless people are increasing their productivity and
| talking about it here ad nauseam. Even researchers are
| leaning on language models; e.g.,
| https://mathstodon.xyz/@tao/114139125505827565
|
| We haven't successfully resolved famous unsolved research
| problems through language models yet but one can imagine that
| they will solve increasingly challenging problems over time.
| And if it happens in the hands of a researcher rather than
| model's lab, one can also imagine that the researcher will
| take credit, so you will still have the same question.
| AIPedant wrote:
| The actual posts totally undermine your point:
| My general sense is that for research-level mathematical
| tasks at least, current models fluctuate between "genuinely
| useful with only broad guidance from user" and "only useful
| after substantial detailed user guidance", with the most
| powerful models having a greater proportion of answers in
| the former category. They seem to work particularly well
| for questions that are so standard that their answers can
| basically be found in existing sources such as Wikipedia or
| StackOverflow; but as one moves into increasingly obscure
| types of questions, the success rate tapers off (though in
| a somewhat gradual fashion), and the more user guidance (or
| higher compute resources) one needs to get the LLM output
| to a usable form. (2/2)
| Yizahi wrote:
| Increasing productivity is nice and commendable, but it is
| NOT an LLM making a breakthrough on its own, which is the
| topic of the Gwern's article.
| dingnuts wrote:
| There is a LOT of money on this message board trying to
| convince us of the utility of these machines and yes,
| people talk about it ad nauseum, in vague terms that are
| unlike anything I see in the real world, with few examples.
|
| Show me the code. Show me your finished product.
| BizarroLand wrote:
| I wonder if it's not the LLM making the breakthrough but
| rather that the person using the system just needed the
| information available presented in a clear and orderly
| fashion to make the breakthrough itself.
|
| After all, the LLM currently has no cognizance, it is unable
| to understand what it is saying in a meaningful way. At its
| best it is a P-Noid Zombie machine, right?
|
| In my opinion anything amazing that comes from an LLM only
| becomes amazing when someone who was capable of recognizing
| the amazingness perceives it, like a rewrite of a zen koan,
| "If an LLM generates a new work of William Shakespeare, and
| nobody ever reads it, was anything of value lost?"
| nico wrote:
| > but I don't commit a comment saying "wow, this clever bit
| came from the model!"
|
| The other day, Claude Code started adding a small signature to
| the commit messages it was preparing for me. It said something
| like "This commit was co-written with Claude Code" and a little
| robot emoji
|
| I wonder if that just happened by accident or if Anthropic is
| trying to do something like Apple with the "sent from my
| iPhone"
| danielbln wrote:
| See https://docs.anthropic.com/en/docs/claude-
| code/settings#avai..., specifically `includeCoAuthoredBy`
| nico wrote:
| Thank you. And I guess they are trying to do the Apple
| thing by making that option true by default
| morsch wrote:
| Aider does the same thing (and has a similar setting). I tend
| to squash the AI commits and remove it that way, though I
| suppose a flag indicating the degree of AI authorship could
| be useful.
| catigula wrote:
| letting claude pen commits is wild.
| SheinhardtWigCo wrote:
| It's great. It's become my preferred workflow for vibe
| coding because it writes great commit messages, gives you a
| record of authorship, and rollbacks use far fewer tokens.
| You don't have to (and probably shouldn't) let it push to
| the remote branch.
| therealpygon wrote:
| It is hard to accept as a premise because the premise is
| questionable from the beginning.
|
| Google already reported several breakthroughs as a direct
| result of AI, using processes that almost certainly include
| LLMs, including a new solution in math, improved chip designs,
| etc. DeepMind has AI that predicted millions of protein folds
| which are already being used in drugs among many other things
| they do, though yes, not an LLM per se. There is certainly the
| probability that companies won't announce things given that the
| direct LLM output isn't copyrightable/patentable, so a human-
| in-the-loop solves the issue by claiming the human made said
| breakthrough with AI/LLM assistance. There isn't much benefit
| to announcing how much AI helped with a breakthrough unless
| you're engaged in basically selling AI.
|
| As for "why aren't LLMs creating breakthroughs by themselves
| regularly", that answer is pretty obvious... they just don't
| really have that capacity in a meaningful way based on how they
| work. The closest example is Google's algorithmic breakthrough
| absolutely was created by a coding LLM, which was effectively
| achieved through brute force in a well established domain, but
| that doesn't mean it wasn't a breakthrough. That alone casts
| doubt on the underlying premise of the post.
| Yizahi wrote:
| You are contradicting yourself. Either LLM programs can do
| breakthrough on their own, or they don't have that capacity
| in a meaningful way based on how they work.
| js8 wrote:
| I would say that real breakthrough was training NNs as a way
| to create practical approximators for very complex functions
| over some kind of many-valued logics. Why they work so well
| in practice we still don't fully theoretically understand (in
| the sense we don't know what kind of underlying logic best
| models what we want from these systems). The LLMs (and
| application to natural language) are just a consequence of
| that.
| starlust2 wrote:
| > through brute force
|
| The same is true of humanity in aggregate. We attribute
| discoveries to an individual or group of researchers but to
| claim humans are efficient at novel research is a form of
| survivorship bias. We ignore the numerous researchers who
| failed to achieve the same discoveries.
| suddenlybananas wrote:
| The fact some people don't succeed doesn't show that humans
| operate by brute force. To claim humans reason and invent
| by brute force is patently absurd.
| preciousoo wrote:
| It's an absurd statement because you are human and are
| aware of how research works on an individual level.
|
| Take yourself outside of that, and imagine you invented
| earth, added an ecosystem, and some humans. Wheels were
| invented ~6k years ago, and "humans" have existed for
| ~40-300k years. We can do the same for other
| technologies. As a group, we are incredibly inefficient,
| and an outside observer would see our efforts at building
| societies and failing to be "brute force"
| Nevermark wrote:
| I consider humans an "intelligent" species in the sense
| that a critical mass of us can organize to sustainably
| learn.
|
| As individuals, without mentors, we would each die off
| very quickly. Even if we were fed and whatever until we
| were physically able to take care of ourselves, we
| wouldn't be able to keep ourselves out of trouble if we
| had to learn everything ourselves.
|
| Contrast this with the octopus which develops from an egg
| without any mentorship, and within a year or so has a
| fantastically knowledgable and creative mind over its
| respective environment. And they thrive in every oceanic
| environment in the wet salty world, including coast
| lines, under permanent Arctic ice, to the deep sea.
|
| To whatever degree they are "intelligent", it's an
| amazingly accelerated, fully independent, self-taught
| intelligence. Our species just can't compare on that
| dimension.
|
| Fortunately, octopus only live a couple years and in an
| environment where technology is difficult (very hard to
| isolate and control conditions of all kinds in the
| ocean). Otherwise, the land octopus would have eaten all
| of us long ago.
| tmaly wrote:
| What about Dyson and Alexander Graham Bell ?
| drdaeman wrote:
| Does "brute force" allow for heuristics and direction?
|
| If it doesn't ("brute" as opposite of "smart", just dumb
| iteration to exhaustion) then you're right, of course.
|
| But if it does, then I'm not sure it's _patently absurd_
| - novel ideas could be merely a matter of chance of
| having all the precursors together at the right time, a
| stochastic process. And it scales well, bearing at least
| some _resemblance_ to brute force approaches - although
| the term is not entirely great (something around
| "stochastic", "trial-and-error", and "heuristic" is
| probably a better term).
| therealpygon wrote:
| You don't consider thousands of scientists developing
| competing, and often incorrect, solutions for a single
| domain as a "brute force" attempt by humanity, but do
| when the same occurs with disparate solutions from
| parallel LLM attempts? That's certainly an...opinion.
| kajumix wrote:
| Most interesting novel ideas originate at the intersection of
| multiple disciplines. Profitable trades could be found in the
| biomedicine sector when the knowledge of biomedicine and
| finance are combined. That's where I see LLMs shining because
| they span disciplines way more than any human can. Once we
| figure out a way to have them combine ideas (similar to how
| Gwern is suggesting), there will be, I suspect, a flood of
| novel and interesting ideas, inconceivable with humans.
| PaulHoule wrote:
| Almost certainly an LLM has, in response to a prompt and
| through sheer luck, spat out the kernel of an idea that a
| super-human centaur of the year 2125 would see as
| groundbreaking that hasn't been recognized as such.
|
| We have a thin conception of genius that can be challenged by
| Edison's "1% inspiration, 99% perspiration" or the process of
| getting a PhD were you might spend 7 years getting to the point
| where you can start adding new knowledge and then take another
| 7 years to really hit your stride.
|
| I have a friend who is 50-something and disabled with some
| mental illness, he thinks he has ADHD. We had a conversation
| recently where he repeatedly expressed his fantasy that he
| could show up somewhere with his unique perspective and
| sprinkle some pixie dust on their problems and be rewarded for
| it. I found it exhausting. When I would hear his ideas, or if I
| hear any idea, I immediately think "how would we turn this into
| a product and sell it?" or "write a paper about it?" or
| "convince people of it?" and he would have no part of it and
| think that operationalizing or advocating for that was
| uninteresting and that somebody else would do all that work and
| my answer is -- they might, but not without the advocacy.
|
| And it comes down to that.
|
| If an LLM were to come up with a groundbreaking idea and be
| recognized as having a groundbreaking idea it would have to do
| a sustained amount of work, say at least 2 person x years
| equivalent to win people over. And they aren't anywhere near
| equipped to do that, nobody is going to pay the power bill to
| do that, and if you were paying the power bill you'd probably
| have to pay the power bill for a million of them to go off in
| the wrong direction.
| blueflow wrote:
| I have not yet seen AI doing a critical evaluation of data
| sources. AI willcontradict primary sources if the contradiction
| is more prevalent in the training data.
|
| Something about the whole approach is bugged.
|
| My pet peeve: "Unix System Resources" as explanation for the /usr
| directory is a term that did not exist until the turn of the
| millenium (rumor is that a c't journalist made it up in 1999),
| but AI will retcon it into the FHS (5 years earlier) or into
| Ritchie/Thompson/Kernigham (27 years earlier).
| _heimdall wrote:
| > Something about the whole approach is bugged.
|
| The bug is that LLMs are fundamentally designed for natural
| language processing and prediction, _not_ logic or reasoning.
|
| We may get to actual AI eventually, but an LLM architecture
| either won't be involved at all or it will act as a part of the
| system mimicking the language center of a brain.
| zhangjunphy wrote:
| I also hope we have something like this. But sadly, this is not
| going to work. The reason is this line from the article, which is
| so much harder that it looks:
|
| > and a critic model filters the results for genuinely valuable
| ideas.
|
| In fact, people have tryied this idea. And if you use a LLM or
| anything similar as the critic, the performance of the model
| actually degrades in this process. As the LLM tries too hard to
| satisfy the critic, and the critic itself is far from a good
| reasoner.
|
| So the reason that we don't hear too much about this idea is not
| that nobody tried it. But that they tried, and it didn't work,
| and people are reluctant to publish about something which does
| not work.
| imiric wrote:
| Exactly.
|
| This not only affects a potential critic model, but the entire
| concept of a "reasoning" model is based on the same flawed idea
| --that the model can generate intermediate context to improve
| its final output. If that self-generated context contains
| hallucinations, baseless assumptions or doubt, the final output
| can only be an amalgamation of that. I've seen the "thinking"
| output arrive at a correct solution in the first few steps, but
| then talk itself out of it later. Or go into logical loops,
| without actually arriving at anything.
|
| The reason why "reasoning" models tend to perform better is
| simply due to larger scale and better training data. There's
| nothing inherently better about them. There's nothing
| intelligent either, but that's a separate discussion.
| yorwba wrote:
| Reasoning models are trained from non-reasoning models of the
| same scale, and the training data is the output of the same
| model, filtered through a verifier. Generating intermediate
| context to improve the final output is not an idea that
| reasoning models are based on, but an outcome of the training
| process. Because empirically it does produce answers that
| pass the verifier more often if it generates the intermediate
| steps first.
|
| That the model still makes mistakes doesn't mean it's not an
| improvement: the non-reasoning base model makes even more
| mistakes when it tries to skip straight to the answer.
| imiric wrote:
| Thanks. I trust that you're more familiar with the
| internals than myself, so I stand corrected.
|
| I'm only speaking from personal usage experience, and don't
| trust benchmarks since they are often gamed, but if this
| process produces objectively better results that aren't
| achieved by scaling up alone, then that's a good thing.
| danenania wrote:
| > The reason why "reasoning" models tend to perform better is
| simply due to larger scale and better training data.
|
| Except that we can try the exact same pre-trained model with
| reasoning enabled vs. disabled and empirically observe that
| reasoning produces better, more accurate results.
| imiric wrote:
| I'm curious: can you link to any tests that prove this?
|
| I don't trust most benchmarks, but if this can be easily
| confirmed by an apples-to-apples comparison, then I would
| be inclined to believe it.
| danenania wrote:
| Check out the DeepSeek paper.
|
| Research/benchmarks aside, try giving a somewhat hard
| programming task to Opus 4 with reasoning off vs. on.
| Similarly, try the same with o3 vs. o3-pro (o3-pro
| reasons for much longer).
|
| I'm not going to dig through my history for specific
| examples, but I do these kinds of comparisons
| occasionally when coding, and it's not unusual to have
| e.g. a bug that o3 can't figure out, but o3-pro can. I
| think this is widely accepted by engineers using LLMs to
| help them code; it's not controversial.
| imiric wrote:
| Huh, I wasn't aware that reasoning could be toggled. I
| use the OpenRouter API, and just saw that this is
| supported both via their web UI and API. I'm used to
| Sonnet 3.5 and 4 without reasoning, and their performance
| is roughly the same IME.
|
| I wouldn't trust comparing two different models, even
| from the same provider and family, since there could be
| many reasons for the performance to be different. Their
| system prompts, training data, context size, or runtime
| parameters could be different. Even the same model with
| the same prompt could have varying performance. So it's
| difficult to get a clear indication that the reasoning
| steps are the only changing variable.
|
| But toggling it on the same model would be a more
| reliable way to test this, so I'll try that, thanks.
| jacobr1 wrote:
| It depends on the problem domain you have and the way you
| prompt things. Basically the reasoning is better, in
| cases where using the same model to critique itself in
| multiple turns would be better.
|
| With code, for example, if a single shot without
| reasoning would have hallucinating a package or not
| conformed to the rest of the project style. Then you ask
| the llm check. Then ask it to revise itself to fix the
| issue. If the base model can do that - then turning on
| reasoning, basically allows it to self check for the
| self-correctable features.
|
| When generating content, you can ask it to consider or
| produce intermediate deliverables like summaries of input
| documents that it then synthesizes into the whole. With
| reasoning on, it can do the intermediate steps and then
| use that.
|
| The main advantage is that the system is autonomously
| figuring out a bunch of intermediate steps and working
| through it. Again no better than it probably could do
| with some guidance on multiple interactions - but that
| itself is a big productivity benefit. The second gen (or
| really 1.5 gen) reasoning models also seem to have been
| trained on enough reasoning traces that they are starting
| to know about additional factors to consider so the
| reasoning loop is tighter.
| amelius wrote:
| But what if the critic is just hard reality? If you ask an LLM
| to write a computer program, instead of criticizing it, you can
| run it and test it. If you ask an LLM to prove a theorem, let
| it write the proof in a formal logic language so it can be
| verified. Etcetera.
| Yizahi wrote:
| Generated code only works because "test" part
| (compile/validate/analyze etc.) is completely external and
| written before any mass-market LLMs. There is no such
| external validator for new theorems, books, pictures, text
| guides etc. You can't just run hard_reality.exe on a
| generated poem or a scientific paper to deem it "correct". It
| is only possible with programming languages, and even then
| not always.
| amelius wrote:
| Science is falsifiable by definition, and writing
| poems/books is not the kind of problem of interest here.
|
| > There is no such external validator for new theorems
|
| There are formal logic languages that will allow you to do
| this.
| Yizahi wrote:
| Your proposed approach to science would result in the
| extremely tiny subset of math, probably theorems being
| proven by automation. And it is questionable if those
| theorems would be even useful. A good mathematician with
| CS experience can probably write a generator of new
| useless theorems, something along "are every sequential
| cube plus square of a number divisible by a root of
| seventh smallest prime multiplied by logn of than number
| plus blabla...". One can generate such theorrems and
| formally prove or disprove them, yes.
|
| On the other hand any novel science usually requires deep
| and wide exploratory research, often involving hard or
| flawed experimentation or observation. One can train LLM
| on a PhD curriculum in astrophysics, then provide that
| LLM with API to some new observatory and instruct it to
| "go prove cosmological constant". And it will do so, but
| the result will be generated garbage because there is no
| formal way to prove such results. There is no formal way
| to prove why pharaohs decided to stop building pyramids,
| despite there being some decent theories. This is science
| too, you know. You can't formally prove that some gene
| sequence is responsible for trait X etc.
|
| I would say a majority of science is not formally
| provable.
|
| And lastly, you dismiss books/texts, but that is a huge
| chunk of intellectual and creative work of humans. Say
| you are an engineer and you have a CAD model with a list
| of parts and parameters for rocket for example. Now you
| need to write a guide for it. LLM can do that, it can
| generate guide-looking output. The issue is that there is
| no way to automatically proof it or find issues in it.
| And there are lots of items like that.
| amelius wrote:
| I think the problem here is that you assume the LLM has
| to operate isolated from the world, i.e. without
| interaction. If you put a human scientist in isolation,
| then you cannot have high expectations either.
| Yizahi wrote:
| I assume not that LLM would be isolated, I assume that
| LLM would be incapable of interacting in any meaningful
| way on its own (i.e. not triggered by direct input from a
| programmer).
| jacobr1 wrote:
| > You can't formally prove that some gene sequence is
| responsible for trait X etc.
|
| Maybe not formally in some kind of mathematical sense.
| But you certainly could have simulation models of protein
| synthesis, and maybe even higher order simulation of
| tissues and organs. You could also let the ai scientist
| verify the experimental hypothesis by giving access to
| robotic lab processes. In fact it seems we are going down
| both fronts right now.
| Yizahi wrote:
| Nobody argues that LLMs aren't useful for some bulk
| processing of billion datapoints or looking for obscure
| correlations in the unedited data. But the premise of the
| Gwern's article is that to be considered thinking, LLM
| must initiate such search on it's own and arrive to a
| novel conclusion on it's own.
|
| Basically if:
|
| A) Scientist has an idea > triggers LLM program to sift
| through a ton of data > LLM print out correlation results
| > scientist read them and proves/disproves an idea. In
| this case, while LLM did a bulk of work here, it did not
| arrive at a breakthrough on its own.
|
| B) LLM is idling > then LLM triggers some API to get some
| specific set of data > LLM correlates results > LLM
| prints out a complete hypothesis with proof (or disproves
| it). In this case we can say that LLM did a breakthrough.
| yunohn wrote:
| IME, on a daily basis, Claude Code (supposed SoTA agent)
| constantly disables and bypasses tests and checks on my
| codebase - despite following clear prompting guidelines and
| all the /woo/ like ultrathink etc.
| zhangjunphy wrote:
| I think if we can have a good enough simulation of reality,
| and a fast one. Something like an accelerable minecraft with
| real world physics. Then this idea might actually work. But
| the hard reality we currenly could generate efficiently and
| feed into LLMs usually has a narrow scope. It feels liking
| teaching only textbook math to a kid for several years but
| nothing else. The LLM mostly overoptimize in these very
| specific fields, but the overall performance might even be
| worse.
| dpoloncsak wrote:
| Its gotta be G-Mod
| leetbulb wrote:
| There will never be a computer powerful enough to
| simulate that many paperclips and explosive barrels.
| jerf wrote:
| Those things are being done. Program testing is now off-the-
| shelf tech, and as for math proofs, see: https://www.geeky-
| gadgets.com/google-deepmind-alphaproof/
| imtringued wrote:
| That didn't stop actor-critic from becoming one of the most
| popular deep RL methods.
| zhangjunphy wrote:
| True, and the successful ones usually require an external
| source of information. For AlphaGo, it is the simple
| algorithm which decide who is the winner of a game of Go. For
| GAN, it is the images labled by human. In these scenarios,
| the critic is the medium which transforms external
| information into gradient which optimized the actor, but not
| the direct source of that information.
| jsbg wrote:
| > the LLM tries too hard to satisfy the critic
|
| The LLM doesn't have to know about the critic though. It can
| just output things and the critic is a second process that
| filters the output for the end user.
| jumploops wrote:
| How do you critique novelty?
|
| The models are currently trained on a static set of human
| "knowledge" -- even if they "know" what novelty is, they aren't
| necessarily incentivized to identify it.
|
| In my experience, LLMs currently struggle with new ideas, doubly
| true for the reasoning models with search.
|
| What makes novelty difficult, is that the ideas should be
| nonobvious (see: the patent system). For example, hallucinating a
| simpler API spec may be "novel" for a single convoluted codebase,
| but it isn't novel in the scope of humanity's information bubble.
|
| I'm curious if we'll have to train future models on novelty
| deltas from our own history, essentially creating synthetic time
| capsules, or if we'll just have enough human novelty between
| training runs over the next few years for the model to develop an
| internal fitness function for future novelty identification.
|
| My best guess? This may just come for free in a yet-to-be-
| discovered continually evolving model architecture.
|
| In either case, a single discovery by a single model still needs
| consensus.
|
| Peer review?
| n4r9 wrote:
| It's a good question. A related question is: "what's an example
| of something undeniably novel?". Like if you ask an agent out
| of the blue to prove the Collatz conjecture, and it writes out
| a proof or counterexample. If that happens with LLMs then I'll
| be a lot more optimistic about the importance to AGI.
| Unfortunately, I suspect it will be a lot murkier than that -
| many of these big open questions will get chipped away at by a
| combination of computational and human efforts, and it will be
| impossible to pinpoint where the "novelty" lies.
| jacobr1 wrote:
| Good point. Look at patents. Few are truly novel in some
| exotic sense of "the whole idea is something never seen
| before." Most likely it is a combination of known factors
| applied in a new way, or incremental development improving on
| known techniques. In a banal sense, most LLM content
| generated is novel, in that the specific paragraphs might be
| unique combinations of words, even if the ideas are just
| slightly rearranged regurgitations.
|
| So I strongly agree that, especially when are talking about
| the bulk of human discovery and invention, the incrementalism
| will be increasingly in striking distance of human/AI
| collaboration. Attribution of the novelty in these cases is
| going to be unclear, when the task is, simplified something
| like, "search for combinations of things, in this problem
| domain, that do the task better than some benchmark" be that
| drug discovery, maths, ai itself or whatever.
| zbyforgotp wrote:
| I think our minds don't use novelty - but salience and it also
| might be easier to implement.
| OtherShrezzing wrote:
| Google's effort with AlphaEvolve shows that the Daydream Factory
| approach might not be the big unlock we're expecting. They spent
| an obscene amount of compute to discover a marginal improvement
| over the state of the art in a very narrow field. Hours after
| Google published the paper, mathematicians pointed out that their
| SOTA algorithms underperformed compared to techniques published
| in the 50 years ago.
|
| Intuitively, it doesn't feel like scaling up to "all things in
| all fields" is going to produce substantial breakthroughs, if the
| current best-in-class implementation of the technique by the
| worlds leading experts returned modest results.
| khalic wrote:
| Ugh, again with the anthropomorphizing. LLMs didn't come up with
| anything new because _they don't have agency_ and _do not
| reason_...
|
| We're looking at our reflection and asking ourselves why it isn't
| moving when we don't
| yorwba wrote:
| If you look at your reflection in water, it may very well move
| even though you don't. Similarly, you don't need agency or
| reasoning to create something new, random selection from a
| large number of combinations is enough, correct horse battery
| staple.
|
| Of course random new things are typically bad. The article is
| essentially proposing to generate lots of them anyway and try
| to filter for only the best ones.
| RALaBarge wrote:
| I agree that brute forcing is a method and how nature does
| it. The problem would still be the same, how would it or
| other LLMs know if the idea is novel and interesting?
|
| Given access to unlimited data, LLMs likely could spot novel
| trends that we cant but still cant judge the value of
| creating something unique that it has never encountered
| before.
| RALaBarge wrote:
| Yet.
| amelius wrote:
| > anthropomorphizing
|
| Gwern isn't doing that here. They say: "[LLMs] lack some
| fundamental aspects of human thought", and then investigates
| that.
| cranium wrote:
| I'd be happy to spend my Claude Max tokens during the night so it
| can "ultrathink" some Pareto improvements to my projects. So far,
| I've mostly seen lateral moves that rewrites code rather than
| rearchitecture/design the project.
| precompute wrote:
| Variations on increasing compute and filtering results aside, the
| only way out of this rut is another breakthrough as big, or
| bigger than transformers. A lot of money is being spent on
| rebranding practical use-cases as innovation because there's
| severe lack of innovation in this sphere.
| pilooch wrote:
| AlphaEvolve and similar systems based on map-elites + DL/LLM + RL
| appears to be one of the promising paths.
|
| Setting up the map-elites dimensions may still be problem-
| specific but this could be learnt unsupervisedly, at least
| partially.
|
| The way I see LLMs is as a search-spqce within tokens that
| manipulate broad concepts within a complex and not so smooth
| manifold. These concepts can be refined within other spaces
| (pixel -space, physical spaces, ...)
| guelo wrote:
| In a recent talk [0] Francois Chollet made it sound like all the
| frontier models are doing _Test-Time Adaptation_ , which I think
| is a similar concept to _Dynamic evaluation_ that Gwern says is
| not being done. Apparently _Test-Time Adaptation_ encompasses
| several techniques some of which modify model weights and some
| that don 't, but they are all about on-the-fly learning.
|
| [0] https://www.youtube.com/watch?v=5QcCeSsNRks&t=1542s
| LourensT wrote:
| Regardless of accusations of anthropomorphizing, continual
| thinking seems to be a precursor to any sense of agency, simply
| because agency requires _something_ to be running.
|
| Eventually LLM output degrades when most of the context is its
| own output. So should there also be an input stream of
| experience? The proverbial "staring out the window", fed into the
| model to keep it grounded and give hooks to go off?
| amelius wrote:
| Humans daydream about problems when they think a problem is
| interesting. Can an LLM know when a problem is interesting and
| thereby prune the daydream graph?
| zild3d wrote:
| > The puzzle is why
|
| The feedback loop on novel/genuine breakthroughs is too long and
| the training data is too small.
|
| Another reason is that there's plenty of incentive to go after
| the majority of the economy which relies on routine knowledge and
| maybe judgement, a narrow slice actually requires novel/genuine
| breakthroughs.
| cs702 wrote:
| The question is: How do we get LLMs to have "Eureka!" moments, on
| their own, when their minds are "at rest," so to speak?
|
| The OP's proposed solution is a constant "daydreaming loop" in
| which an LLM is does the following on its own, "unconsciously,"
| as a background task, without human intervention:
|
| 1) The LLM retrieves random facts.
|
| 2) The LLM "thinks" (runs a chain-of-thought) on those retrieved
| facts to see if they are any interesting connections between
| them.
|
| 3) If the LLM finds interesting connections, it promotes them to
| "consciousness" (a permanent store) and possibly adds them to a
| dataset used for ongoing incremental training.
|
| It could work.
| epcoa wrote:
| The step 3 has been shown to _not_ work over and over again,
| the "find interesting connections" is the hand wavy magic at
| this time. LLMs alone don't seem to be particularly adept at it
| either.
| cs702 wrote:
| Has this been tried with reinforcement learning (RL)? As the
| OP notes, it is plausible from a RL perspective that such a
| bootstrap can work, because it would be (quoting the OP)
| "exploiting the generator-verifier gap, where it is easier to
| discriminate than to generate (eg laughing at a pun is easier
| than making it)." The hit ratio may be tiny, so doing this
| well would be very expensive.
| kookamamie wrote:
| > The puzzle is why
|
| The breakthrough isn't in their datasets.
| velcrovan wrote:
| I'm once again begging people to read David Gelernter's 1994 book
| "The Muse in the Machine". I'm surprised to see no mention of it
| in Gwern's post, it's the exact book he should be reaching for on
| this topic.
|
| In examining the possibility of genuinely creative computing,
| Gelernter discovers and defends a model of cognition that
| explains so much about the human experience of creativity,
| including daydreaming, dreaming, everyday "aha" moments, and the
| evolution of human approaches to spirituality.
|
| https://uranos.ch/research/references/Gelernter_1994/Muse%20...
| sneak wrote:
| Seems like an easy hypothesis to quickly smoke test with a couple
| hundred lines of script, a wikipedia index, and a few grand
| thrown at an API.
| dr_dshiv wrote:
| Yes! I've been prototyping dreaming LLMs based on my downloaded
| history--and motivated by biomimetic design approaches. Just to
| surface ideas to myself again.
| A_D_E_P_T wrote:
| > _You are a creative synthesizer. Your task is to find deep,
| non-obvious, and potentially groundbreaking connections between
| the two following concepts. Do not state the obvious. Generate a
| hypothesis, a novel analogy, a potential research question, or a
| creative synthesis. Be speculative but ground your reasoning._
|
| > _Concept 1: {Chunk A}_ > _Concept 2: {Chunk B}_
|
| In addition to the other criticisms mentioned by posters ITT, a
| problem I see is: What concepts do you feed it?
|
| Obviously there's a problem with GIGO. If you don't pick the
| right concepts to begin with, you're not going to get a
| meaningful result. But, beyond that, human discovery (in
| mechanical engineering, at least,) tends to be massively
| interdisciplinary and serendipitous, so that _many_ concepts are
| often involved, and many of those are _necessarily non-obvious_.
|
| I guess you could come up with a biomimetics bot, but, besides
| that, I'm not so sure how well this concept would work as laid
| out above.
|
| There's another issue in that LLMs tend to be extremely gullible,
| and swallow the scientific literature and University press
| releases verbatim and uncritically.
| sartak wrote:
| From _The Metamorphosis of Prime Intellect_ (1994):
|
| > Among Prime Intellect's four thousand six hundred and twelve
| interlocking programs was one Lawrence called the
| RANDOM_IMAGINATION_ENGINE. Its sole purpose was to prowl for new
| associations that might fit somewhere in an empty area of the
| GAT. Most of these were rejected because they were useless,
| unworkable, had a low priority, or just didn't make sense. But
| now the RANDOM_IMAGINATION_ENGINE made a critical connection, one
| which Lawrence had been expecting it to make [...]
|
| > Deep within one of the billions of copies of Prime Intellect,
| one copy of the Random_Imagination_Engine connected two thoughts
| and found the result good. That thought found its way to
| conscious awareness, and because the thought was so good it was
| passed through a network of Prime Intellects, copy after copy,
| until it reached the copy which had arbitrarily been assigned the
| duty of making major decisions -- the copy which reported
| directly to Lawrence. [...]
|
| > "I've had an idea for rearranging my software, and I'd like to
| know what you think."
|
| > At that Lawrence felt his blood run cold. He hardly understood
| how things were working as it was; the last thing he needed was
| more changes. "Yes?"
| js8 wrote:
| I am not sure why tie this to any concrete AI technology such as
| LLMs. IMHO the biggest issue we have with AI right now is that we
| don't know how to philosophicaly formalize what we want. What is
| reasoning?
|
| I am trying to answer that for myself. Since every logic is
| expressible in untyped lambda calculus (as any computation is),
| you could have a system that just somehow generates terms and
| beta-reduces them. In even so much simpler logic, what are the
| "interesting" terms?
|
| I have several answers, but my point is, you should simplify the
| problem and this question has not been answered even under such
| simple scenario.
| HarHarVeryFunny wrote:
| Reasoning is chained what-if prediction, together with
| exploration of alternatives (cf backtracking), and leans upon
| general curiosity/learning for impasse resolution (i.e. if you
| can't predict what-if, then have the curiosity to explore and
| find out).
|
| What the LLM companies are currently selling as "reasoning" is
| mostly RL-based pre-training whereby the model is encouraged to
| predict tokens (generate reasoning steps) according to similar
| "goals" seen in the RL training data. This isn't general case
| reasoning, but rather just "long horizon" prediction based on
| the training data. It helps exploit the training data, but
| isn't going to generate novelty outside of the deductive
| closure of the training data.
| js8 wrote:
| I am talking about reasoning in philosophical not logical
| sense. In your definition, you're assuming a logic in which
| reasoning happens, but when I am asking the question, I am
| not presuming any specific logic.
|
| So how do you pick the logic in which to do reasoning? There
| are "good reasons" to use one logic over another.
|
| LLMs probably learn some combination of logic rules
| (deduction rules in commonly used logics), but cannot
| guarantee they will be used consistently (i.e. choose a logic
| for the problem and stick to it). How do you accomplish that?
|
| And even then reasoning is more than search. If you can
| reason, you should also be able to reason about more
| effective reasoning (for example better heuristics to cutting
| the search tree).
| HarHarVeryFunny wrote:
| OK, so maybe we're talking somewhat at cross purposes.
|
| I was talking about the process/mechanism of reasoning -
| how do our brains appear to implement the capability that
| we refer to as "reasoning", and by extension how could an
| AI do the same by implementing the same mechanisms.
|
| If we accept prediction (i.e use of past experience) as the
| mechanistic basis of reasoning, then choice of logic
| doesn't really come into it - it's more just a matter of
| your past experience and what you have learnt. What
| predictive rules/patterns have you learnt, both in terms of
| a corpus of "knowledge" you can bring to bear, but also in
| terms of experience with the particular problem domain -
| what have you learnt (i.e. what solution steps can you
| predict) about trying to reason about any given domain/goal
| ?
|
| In terms of consistent use of logic, and sticking to it,
| one of the areas where LLMs are lacking is in not having
| any working memory other than their own re-consumed output,
| as well as an inability to learn beyond pre-training. With
| both of these capabilities an AI could maintain a focus
| (working memory) on the problem at hand (vs suffer from
| "context rot") and learn consistent, or phased/whatever,
| logic that has been successful in the past at solving
| similar problems (i.e predicting actions that will lead to
| solution).
| js8 wrote:
| But prediction as the basis for reasoning (in
| epistemological sense) requires the goal to be given from
| the outside, in the form of the system that is to be
| predicted. And I would even say that this problem (giving
| predictions) has been solved by RL.
|
| Yet, the consensus seems to be we don't quite have AGI;
| so what gives? Clearly just making good predictions is
| not enough. (I would say current models are empiricist to
| the extreme; but there is also rationalist position,
| which emphasizes logical consistency over prediction
| accuracy.)
|
| So, in my original comment, I lament that we don't really
| know what we want (what is the objective). The post
| doesn't clarify much either. And I claim this issue
| occurs with much simpler systems, such as lambda
| calculus, than reality-connected LLMs.
| HarHarVeryFunny wrote:
| > But prediction as the basis for reasoning (in
| epistemological sense) requires the goal to be given from
| the outside, in the form of the system that is to be
| predicted.
|
| Prediction doesn't have goals - it just has inputs (past
| and present) and outputs (expected inputs). Something
| that is on your mind (perhaps a "goal") is just a
| predictive input that will cause you to predict what
| happens next.
|
| > And I would even say that this problem (giving
| predictions) has been solved by RL.
|
| Making predictions is of limited use if you don't have
| the feedback loop of when your predictions are right or
| wrong (so update prediction for next time), and having
| the feedback (as our brain does) of when your prediction
| is wrong is the basis of curiosity - causing us to
| explore new things and learn about them.
|
| > Yet, the consensus seems to be we don't quite have AGI;
| so what gives? Clearly just making good predictions is
| not enough.
|
| Prediction is important, but there are lots of things
| missing from LLMs such as ability to learn, working
| memory, innate drives (curiosity, boredom), etc.
| HarHarVeryFunny wrote:
| > Despite impressive capabilities, large language models have yet
| to produce a genuine breakthrough. The puzzle is why.
|
| I don't see why this is remotely surprising. Despite all the
| hoopla, LLMs are not AGI or artifical brains - they are predict-
| next-word language models. By design they are not built for
| creativity, but rather quite the opposite, they are designed to
| continue the input in the way best suggested by the training data
| - they are essentially built for recall, not creativity.
|
| For an AI to be creative it needs to have innate human/brain-like
| features such as novelty (prediction failure) driven curiosity,
| boredom, as well as ability to learn continuously. IOW if you
| want the AI to be creative it needs to be able to learn for
| itself, not just regurgitate the output of others, and have these
| innate mechanisms that will cause it to pursue discovery.
| karmakaze wrote:
| Yes LLMs choose probable sequences because they recognize
| similarity. Because of that, it can diverge from similarity to
| be creative: increase the temperature. What LLMs don't have is
| (good) taste--we need to build an artificial tongue and feed it
| as a prerequisite.
| grey-area wrote:
| Well they also don't have understanding, a model of the
| world, and the ability to reason (no chain-of-thought created
| by AI companies is not reasoning), as well as having no
| taste.
|
| So there is quite a lot missing.
| HarHarVeryFunny wrote:
| It depends on what you mean by "creative" - they can
| recombine fragments of training data (i.e. apply generative
| rules) in any order - generate the deductive closure of the
| training set, but that is it.
|
| Without moving beyond LLMs to a more brain-like cognitive
| architecture, all you can do is squeeze the juice out of the
| training data, by using RL/etc to bias the generative process
| (according to reasoning data, good taste or whatever), but
| you can't move beyond the training data to be truly creative.
| vonneumannstan wrote:
| >It depends on what you mean by "creative" - they can
| recombine fragments of training data (i.e. apply generative
| rules) in any order - generate the deductive closure of the
| training set, but that is it. Without moving beyond LLMs to
| a more brain-like cognitive architecture, all you can do is
| squeeze the juice out of the training data, but using
| RL/etc to bias the generative process (according to
| reasoning data, good taste or whatever), but you can't move
| beyond the training data to be truly creative.
|
| It's clear these models can actually reason on unseen
| problems and if you don't believe that you aren't actually
| following the field.
| HarHarVeryFunny wrote:
| Sure - but only if the unseen problem can be solved via
| the deductive/generative closure of the training data.
| And of course this type of "reasoning" is only as good as
| the RL pre-training it is based on - working well for
| closed domains like math where verification is easy, and
| not so well in the more general case.
| js8 wrote:
| Both can be true (and that's why I downvoted you in the
| other comment, for presenting this as a dichotomy), LLMs
| can reason and yet "stochastically parrot" the training
| data.
|
| For example, LLM might learn a rule that sentences that
| are similar to "A is given. From A follows B.", are
| followed by statement "Therefore, B". This is modus
| ponens. LLM can apply this rule to wide variety of A and
| B, producing novel statements. Yet, these statements are
| still the statistically probable ones.
|
| I think the problem is, when people say "AI should
| produce something novel" (or "are producing", depending
| whether they advocate or dismiss), they are not very
| clear what the "novel" actually means. Mathematically,
| it's very easy to produce a never-before-seen theorem;
| but is it interesting? Probably not.
| awongh wrote:
| By volume how much of human speech / writing is pattern
| matching and how much of it is truly original cognition
| that would pass your bar of creativity? It is probably 90%
| rote pattern matching.
|
| I don't think LLMs are AGI, but in most senses I don't
| think people give enough credit to their capabilities.
|
| It's just ironic how human-like the flaws of the system
| are. (Hallucinations that are asserting untrue facts, just
| because they are plausible from a pattern matching POV)
| dingnuts wrote:
| My intuition is opposite yours; due to the insane
| complexity of the real world nearly 90% of situations are
| novel and require creativity
|
| OK now we're at an impasse until someone can measure this
| HarHarVeryFunny wrote:
| I think it comes down to how we define creativity for the
| purpose of this conversation. I would say that 100% of
| situations and problems are novel to some degree - the
| real world does not exactly repeat, and your brain at
| T+10 is not exactly the same as it is as T+20.
|
| That said, I think most everyday situations are similar
| enough to things we've experienced before that shallow
| pattern matching is all it takes. The curve in the road
| we're driving on may not be 100% the same as any curve
| we've experienced before, but turning the car wheel to
| the left the way we've learnt do do it will let us
| successfully navigate it all the same.
|
| Most everyday situations/problems we're faced with are
| familiar enough that shallow "reactive" behavior is good
| enough - we rarely have to stop to develop a plan, figure
| things out, or reason in any complex kind of a way, and
| very rarely face situations so challenging that any real
| creativity is needed.
| HarHarVeryFunny wrote:
| > It's just ironic how human-like the flaws of the system
| are. (Hallucinations that are asserting untrue facts,
| just because they are plausible from a pattern matching
| POV)
|
| I think most human mistakes are different - not applying
| a lot of complex logic to come to an incorrect
| deduction/guess (= LLM hallucination), but rather just
| shallow recall/guess. e.g. An LLM would guess/hallucinate
| a capital city by using rules it had learnt about other
| capital cities - must be famous, large, perhaps have an
| airport, etc, etc; a human might just use "famous" to
| guess, or maybe just throw out the name of the only city
| they can associate to some country/state.
|
| The human would often be aware that they are just
| guessing, maybe based on not remembering where/how they
| had learnt this "fact", but to the LLM it's all just
| statistics and it has no episodic memory (or even
| coherent training data - it's all sliced and diced into
| shortish context-length samples) to ground what it knows
| or does not know.
| awongh wrote:
| The reason the LLMs are of any use to anyone right now
| (and real people are using them for real things right
| now- see the millions of ChatGPT users) is because the
| qualitative difference between text created by a real
| human using a guessing heuristic vs. an LLM using
| statistics is qualitatively the same. Even for things
| that some subjectively deem "creative".
|
| The entropy of communication also makes it so that we
| mostly won't ever know when a person is guessing or if
| they think they're telling the truth. In that sense it
| makes less difference to the receiver of the information
| what the intent was- even if it came from a human, that
| human's guessing/ BS level is still unknown to you, the
| recipient.
|
| This difference will continue to get smaller and more
| imperceptible. When it will stop changing or at what rate
| it will change is anyone's guess.
| HarHarVeryFunny wrote:
| LLMs are just trying to mimic (predict) human output, and
| can obviously do a great job, which is why they are
| useful.
|
| I was just referring to when LLMs fail, which can be in
| non-human ways, not only the way in which they
| hallucinate, but also when they generate output that has
| the "shape" of something in the training set, but is
| nonsense.
| leptons wrote:
| > _It is probably 90% rote pattern matching._
|
| So what. 90% (or more) of humans aren't making any sort
| of breakthrough in any discipline, either. 99.9999999999%
| of human speech/writing isn't producing "breakthroughs"
| either, it's just a way to communicate.
|
| > _It 's just ironic how human-like the flaws of the
| system are. (Hallucinations that are asserting untrue
| facts, just because they are plausible from a pattern
| matching POV)_
|
| The LLM is not "hallucinating". It's just operating as it
| was designed to do, which often produces results that do
| not make any sense. I have actually hallucinated, and
| some of those experiences were profoundly insightful,
| quite the opposite of what an LLM does when it
| "hallucinates".
|
| You can call anything a "breakthrough" if you aren't
| aware of prior art. And LLMs are "trained" on nothing but
| prior art. If an LLM does make a "breakthrough", then
| it's because the "breakthrough" was already in the
| training data. I have no doubt many of these
| "breakthroughs" will be followed years later by someone
| finding the actual human-based research that the LLM
| consumed in its training data, rendering the
| "breakthrough" not quite as exciting.
| andoando wrote:
| What is the distinction between "pattern matching" and
| "original cognition" exactly?
|
| All human ideas are a combination of previously seen
| ideas. If you disagree, come up with a truly new
| conception which is not. -- Badly quoted David hume
| heyjamesknight wrote:
| The topic of conversation is not "human speech/writing"
| but "human creativity." There's no dispute that LLMs can
| create novel pieces of textual output. But there is no
| evidence that they can produce novel _ideas_. To assume
| they can is to adopt a purely rationalist approach to
| epistemology and cognition. Plato, Aquinas, and Kant
| would all fervently disagree with that approach.
| vonneumannstan wrote:
| > Despite all the hoopla, LLMs are not AGI or artifical brains
| - they are predict-next-word language models. By design they
| are not built for creativity, but rather quite the opposite,
| they are designed to continue the input in the way best
| suggested by the training data - they are essentially built for
| recall, not creativity.
|
| This is just a completely base level of understanding of LLMs.
| How do you predict the next token with superhuman accuracy?
| Really think about how that is possible. If you think it's just
| stochastic parroting you are ngmi.
|
| >large language models have yet to produce a genuine
| breakthrough. The puzzle is why. I think you should really
| update on the fact that world class researchers are surprised
| by this. They understand something you don't and that is that
| it's clear these models build robust world models and that text
| prompts act as probes into those world models. The surprising
| part is that despite these sophisticated world models we can't
| seem to get unique insights out which almost surely already
| exist in those models. Even if all the model is capable of is
| memorizing text then just the sheer volume it has memorized
| should yield unique insights, no human can ever hope to hold
| this much text in their memory and then make connections
| between it.
|
| It's possible we just lack the prompt creativity to get these
| insights out but nevertheless there is something strange
| happening here.
| HarHarVeryFunny wrote:
| > This is just a completely base level of understanding of
| LLMs. How do you predict the next token with superhuman
| accuracy? Really think about how that is possible. If you
| think it's just stochastic parroting you are ngmi.
|
| Yes, thank-you, I do understand how LLMs work. They learn a
| lot of generative rules from the training data, and will
| apply them in flexible fashion according to the context
| patterns they have learnt. You said stochastic parroting, not
| me.
|
| However, we're not discussing whether LLMs can be superhuman
| at tasks where they had the necessary training - we're
| discussing whether they are capable of creativity (and
| presumably not just the trivially obvious case of being able
| to apply their generative rules in any order - deductive
| closure, not stochastic parroting in the dumbest sense of
| that expression).
| vonneumannstan wrote:
| >However, we're not discussing whether LLMs can be
| superhuman at tasks where they had the necessary training -
| we're discussing whether they are capable of creativity
|
| "Even if all the model is capable of is memorizing text
| then just the sheer volume it has memorized should yield
| unique insights, no human can ever hope to hold this much
| text in their memory and then make connections between it."
|
| Unless you think humans have magic meat then all we are
| really doing with "creativity" is connecting previously
| unconnected facts.
| HarHarVeryFunny wrote:
| > Unless you think humans have magic meat then all we are
| really doing with "creativity" is connecting previously
| unconnected facts.
|
| In the case of discovery and invention the "facts" being
| connected may not be things that were known before. An
| LLM is bound by it's training set. A human is not limited
| by what is currently known - they can explore (in
| directed or undirected fashion), learn, build hierarchies
| of new knowledge and understanding, etc.
| HarHarVeryFunny wrote:
| > "Even if all the model is capable of is memorizing text
| then just the sheer volume it has memorized should yield
| unique insights, no human can ever hope to hold this much
| text in their memory and then make connections between
| it."
|
| Yes, potentially, but the model has no curiosity or drive
| to do this (or anything else) by itself. All an LLM is
| built to do is predict. The only way to control the
| output and goad it into using the vast amount of
| knowledge that it has is by highly specific prompting.
|
| Basically it's only going to connect the dots if you tell
| what dots to connect, in which case it's the human being
| inventive, not the model. The model is trying to predict,
| so essentially if you want it to do something outside of
| the training set you're going to have to prompt it to do
| that.
|
| A human has curiosity (e.g. "what happens if I connect
| these dots .."), based on prediction failure and
| associated focus/etc - the innate desire to explore the
| unknown and therefore potentially learn. The model has
| none of that - it can't learn and has no curiosity. If
| the model's predictions are bad it will just hallucinate
| and generate garbage, perhaps "backtrack" and try again,
| and likely lead to context rot.
| awongh wrote:
| Not just prompting, it also could be we haven't done the
| right kind of RLHF for these kinds of outputs?
| fragmede wrote:
| Define creativity. Three things LLMs can do is write song
| lyrics, poems, and jokes, all of which require some level of
| what we think of as human creativity. Of course detractors will
| say LLM versions of those three aren't very good, and they may
| even be right, but a twelve year old child coming up with the
| same would be seen as creative, even if they didn't get
| significant recognition for it.
| HarHarVeryFunny wrote:
| Sure, but the author of TFA is well versed in LLMs and so is
| addressing something different. Novelty isn't the same as
| creativity, especially when limited to generating based on a
| fixed repertoire of moves.
|
| The term "deductive closure" has been used to describe what
| LLMs are capable of, and therefore what they are not capable
| of. They can generate novelty (e.g. new poem) by applying the
| rules they have learnt in novel ways, but are ultimately
| restricted by their fixed weights and what was present in the
| training data, as well as being biased to predict rather than
| learn (which they anyways can't!) and explore.
|
| An LLM may do a superhuman job of applying what it "knows" to
| create solutions to novel goals (be that a math olympiad
| problem, or some type of "creative" output that has been
| requested, such as a poem), but is unlikely to create a whole
| new field of math that wasn't hinted at in the training data
| because it is biased to predict, and anyways doesn't have the
| ability to learn that would allow it to build a new theory
| from the ground up one step at a time. Note (for anyone who
| might claim otherwise) that "in-context learning" is really a
| misnomer - it's not about _learning_ but rather about _using_
| data that is only present in-context rather than having been
| in the training set.
| tmaly wrote:
| I think we will see more breakthroughs with an AI/Human hybrid
| approach.
|
| Tobias Rees had some interesting thoughts
| https://www.noemamag.com/why-ai-is-a-philosophical-rupture/
| where he poses this idea that AI and humans together can think
| new types of thoughts that humans alone cannot think.
| _acco wrote:
| This is a good way of framing that we don't understand human
| creativity. And that we can't hope to build it until we do.
|
| i.e. AGI is a philosophical problem, not a scaling problem.
|
| Though we understand them little, we know the default mode
| network and sleep play key roles. That is likely because they aid
| some universal property of AGI. Concepts we don't understand like
| motivation, curiosity, and qualia are likely part of the picture
| too. Evolution is far too efficient for these to be mere side
| effects.
|
| (And of course LLMs have none of these properties.)
|
| When a human solves a problem, their search space is not random -
| just like a chess grandmaster's search space of moves is not
| random.
|
| How our brains are so efficient when problem solving while also
| able to generate novelty is a mystery.
| vintagedave wrote:
| > Hypothesis: Day-Dreaming Loop
|
| This mirrors something I have thought of too. I have read
| multiple theories of emerging consciousness, which touch on
| things from proprioception to the inner monologue (which not
| everyone has.)
|
| My own theory is that -- avoiding the need for an awareness of a
| monologue -- a LLM loop that constantly takes input and lets it
| run, saving key summarised parts to memory that are then pulled
| back in when relevant, would be a very interesting system to
| speak to.
|
| It would need two loops: the constant ongoing one, and then for
| interaction, one accessing memories from the first. The ongoing
| one would be aware of the conversation. I think it would be
| interesting to see what, via the memory system, would happen in
| terms of the conversation emitting elements from the loop.
|
| My theory is that if we're likely to see emergent consciousness,
| it will come through ongoing awareness and memory.
| yahoozoo wrote:
| I once asked ChatGPT to come up with a novel word that would
| return 0 Google search results. It came up with "vexlithic" which
| does indeed return 0 results, at least for me. I thought that was
| neat.
| nsedlet wrote:
| I believe an important reason for why there are no LLM
| breakthroughs is that humans make progress in their thinking
| through experimentation, i.e. collecting targeted data, which
| requires exerting agency on the real world. This isn't just
| observation, it's the creation of data not already in the
| training set.
| haolez wrote:
| Maybe also the fact that they can't learn small pieces of new
| information without "formatting" its whole brain again, from
| scratch. And fine tuning is like having a stroke, where you get
| specialization by losing cognitive capabilities.
| ramoz wrote:
| Ive walked 10k steps everyday the past week and produced more
| code in that period than most would over months. Using Claude
| Code (and vibetunnel over tailscale to my phone- that I speak
| instructions into).
|
| There is a breakthrough happening. in real time.
| CaptainFever wrote:
| Can we see an example please?
| zyklonix wrote:
| This idea of a "daydreaming loop" hits on a key LLM gap, the lack
| of background, self-driven insight. A pragmatic step in this
| direction is https://github.com/DivergentAI/dreamGPT , which
| explores divergent thinking by generating and scoring
| hallucinations. It shows how we might start pushing LLMs beyond
| prompt-response into continuous, creative cognition.
| zby wrote:
| The novelty part is a hard one - but maybe in many cases we could
| substitute something else for it? If an idea promises to beat
| state of the art in some field - and it is not yet actively
| researched - then it is novel.
|
| But most promising would be to use the Dessalles theories.
|
| Here is 4.1o expanding this:
| https://chatgpt.com/s/t_6877de9faa40819194f95184979b5b44
|
| By the way - this could be a classic example of this day dreaming
| - you take two texts: one by Gwern and some article by Dessalles
| (I read "Why we talk" - a great book! - but maybe there is some
| more concise article?) and ask LLM to generate ideas connecting
| these two. In this particular case it was my intuition that
| connected them - but I imagine that there could be an algorithm
| that could find this connection in a reasonable time - some kind
| of semantic search maybe.
| throwaway328 wrote:
| The fact that LLMs haven't come up with anything "novel" would be
| a serious puzzle - as the article claims - only _if_ they were
| thinking, reasoning, being creative, etc. If they aren 't doing
| anything of the sort, it'd be the only thing you'd expect.
|
| So it's a bit of an anti-climactic solution to the puzzle but:
| maybe the naysayers were right and they're not thinking at all,
| or doing any of the other anthropomorphic words being marketed to
| users, and we've simply all been dragged along by a narrative
| that's very seductive to tech types (the computer gods will
| rise!).
|
| It'd be a boring outcome, after the countless gallons of digital
| ink spilled on the topic the last years, but maybe they'll come
| to be accepted as "normal software", and not god-like, in the
| end. A medium to large improvement in some areas, and anywhere
| from minimal to pointless to harmful in others. And all for the
| very high cost of all the funding and training and data-hoovering
| that goes in to them, not to mention the opportunity cost of all
| the things we humans could have been putting money into and
| didn't.
___________________________________________________________________
(page generated 2025-07-16 23:01 UTC)