[HN Gopher] Reasoning models don't always say what they think
___________________________________________________________________
Reasoning models don't always say what they think
Author : meetpateltech
Score : 395 points
Date : 2025-04-03 16:50 UTC (1 days ago)
(HTM) web link (www.anthropic.com)
(TXT) w3m dump (www.anthropic.com)
| evrimoztamur wrote:
| Sounds like LLMs short-circuit without necessarily testing their
| context assumptions.
|
| I also recognize this from whenever I ask it a question in a
| field I'm semi-comfortable in, I guide the question in a manner
| which already includes my expected answer. As I probe it, I often
| find then that it decided to take my implied answer as granted
| and decide on an explanation to it after the fact.
|
| I think this also explains a common issue with LLMs where people
| get the answer they're looking for, regardless of whether it's
| true or there's a CoT in place.
| jiveturkey wrote:
| i found with the gemini answer box on google, it's quite easy
| to get the answer you expect. i find myself just playing with
| it, asking a question in the positive sense then the negative
| sense, to get the 2 different "confirmations" from gemini. also
| it's easily fooled by changing the magnitude of a numerical
| aspect of a question, like "are thousands of people ..." then
| "are millions of people ...". and then you have the now
| infamous black/white people phrasing of a question.
|
| i haven't found perplexity to be so easily nudged.
| andrewmcwatters wrote:
| This is such an annoying issue in assisted programming as well.
|
| Say you're referencing a specification, and you allude to two
| or three specific values from that specification, you mention
| needing a comprehensive list and the LLM has been trained on
| it.
|
| I'll often find that all popular models will only use the
| examples I've mentioned and will fail to elaborate even a few
| more.
|
| You might as well read specifications yourself.
|
| It's a critical feature of these models that could be an easy
| win. It's autocomplete! It's simple. And they fail to do it
| every single time I've tried a similar abstract.
|
| I laugh any time people talk about these models actually
| replacing people.
|
| They fail at reading prompts at a grade school reading level.
| BurningFrog wrote:
| The LLMs copy human written text, so maybe they'll implement
| Motivated Reasoning just like humans do?
|
| Or maybe it's telling people what they want to hear, just like
| humans do
| ben_w wrote:
| They definitely tell people what they want to hear. Even when
| we'd rather they be correct, they get upvoted or downvoted by
| users, so this isn't avoidable (but is is fawning or
| sychophancy?)
|
| I wonder how deep or shallow the mimicry of human output is
| -- enough to be interesting, but definitely not quite like
| us.
| jiveturkey wrote:
| seemed common-sense obvious to me -- AI (LLMs) don't "reason".
| great to see it methodically probed and reported in this way.
|
| but i am just a casual observer of all things AI. so i might be
| too naive in my "common sense".
| zurfer wrote:
| I recently had fascinating example of that where Sonnet 3.7 had
| to decide for one option from a set of choices.
|
| In the thinking process it narrowed it down to 2 and finally in
| the last thinking section it decided for one, saying it's best
| choice.
|
| However, in the final output (outside of thinking) it then
| answered with the other option with no clear reason given
| thomassmith65 wrote:
| One interesting quirk with Claude is that it has no idea its
| Chain-of-Thought is visible to users.
|
| In one chat, it repeatedly accused me of lying about that.
|
| It only conceded after I had it think of a number between one and
| a million, and successfully 'guessed' it.
| reaperman wrote:
| Edit: 'wahnfrieden corrected me. I incorrectly posited that CoT
| was only included in the context window during the reasoning
| task and later left out entirely. Edited to remove potential
| misinformation.
| monsieurbanana wrote:
| In which case the model couldn't possibly know that the
| number was correct.
| Me1000 wrote:
| I'm also confused by that, but it could just be the model
| being agreeable. I've seen multiple examples posted online
| though where it's fairly clear that the COT output is not
| included in subsequent turns. I don't believe Anthropic is
| public about it (could be wrong), but I know that the Qwen
| team specifically recommend against including COT
| tokensfrom previous inferences.
| thomassmith65 wrote:
| Claude has some awareness of its CoT. As an experiment,
| it's easy, for example, to ask Claude to "think of a
| city, but only reply with the word 'ready' and next to
| ask "what is the first letter of the city you thought
| of?"
| wahnfrieden wrote:
| No, the CoT is not simply extra context the models are
| specifically trained to use CoT and that includes treating it
| as unspoken thought
| reaperman wrote:
| Huge thank you for correcting me. Do you have any good
| resources I could look at to learn how the previous CoT is
| included in the input tokens and treated differently?
| wahnfrieden wrote:
| I've only read the marketing materials of closed models.
| So they could be lying, too. But I don't think CoT is
| something you can do with pre-CoT models via prompting
| and context manipulation. You can do something that looks
| a little like CoT, but the model won't have been trained
| specifically on how to make good use of it and will treat
| it like Q&A context.
| seunosewa wrote:
| eh interesting..
| lsy wrote:
| The fact that it was ever seriously entertained that a "chain of
| thought" was giving some kind of insight into the internal
| processes of an LLM bespeaks the lack of rigor in this field. The
| words that are coming out of the model are generated to optimize
| for RLHF and closeness to the training data, that's it! They
| aren't references to internal concepts, the model is not aware
| that it's doing anything so how could it "explain itself"?
|
| CoT improves results, sure. And part of that is probably because
| you are telling the LLM to add more things to the context window,
| which increases the potential of resolving some syllogism in the
| training data: One inference cycle tells you that "man" has
| something to do with "mortal" and "Socrates" has something to do
| with "man", but two cycles will spit those both into the context
| window and lets you get statistically closer to "Socrates" having
| something to do with "mortal". But given that the training/RLHF
| for CoT revolves around generating long chains of human-readable
| "steps", it can't really be explanatory for a process which is
| essentially statistical.
| hnuser123456 wrote:
| When we get to the point where a LLM can say "oh, I made that
| mistake because I saw this in my training data, which caused
| these specific weights to be suboptimal, let me update it",
| that'll be AGI.
|
| But as you say, currently, they have zero "self awareness".
| dragonwriter wrote:
| > When we get to the point where a LLM can say "oh, I made
| that mistake because I saw this in my training data, which
| caused these specific weights to be suboptimal, let me update
| it", that'll be AGI.
|
| While I believe we are far from AGI, I don't think the
| standard for AGI is an AI doing things a human absolutely
| cannot do.
| redeux wrote:
| All that was described here is learning from a mistake,
| which is something I hope all humans are capable of.
| hnuser123456 wrote:
| Yes thank you, that's what I was getting at. Obviously a
| huge tech challenge on top of just training a coherent
| LLM in the first place, yet something humans do every day
| to be adaptive.
| dragonwriter wrote:
| No, what was described was specifically reporting to an
| external party the neural connections involved in the
| mistake _and_ the source in past training data that
| caused them, _as well as learning from new data._
|
| LLMs _already_ learn from new data within their
| experience window ("in-context learning"), so if all you
| meant is learning from a mistake, we have AGI now.
| Jensson wrote:
| > LLMs already learn from new data within their
| experience window ("in-context learning"), so if all you
| meant is learning from a mistake, we have AGI now.
|
| They don't learn from the mistake though, they mostly
| just repeat it.
| no_wizard wrote:
| We're far from AI. There is no intelligence. The fact the
| industry decided to move the goal post and re-brand AI for
| marketing purposes doesn't mean they had a right to hijack
| a term that has decades of understood meaning. They're
| using it to bolster the hype around the work, not because
| there has been a genuine breakthrough in machine
| intelligence, because there hasn't been one.
|
| Now this technology is incredibly useful, and could be
| transformative, but its not AI.
|
| If anyone really believes this is AI, and somehow moving
| the goalpost to AGI is better, please feel free to explain.
| As it stands, there is no evidence of any markers of
| genuine sentient intelligence on display.
| highfrequency wrote:
| What would be some concrete and objective markers of
| genuine intelligence in your eyes? Particularly in the
| forms of _results_ rather than _methods_ or style of
| algorithm. Examples: writing a bestselling novel or
| solving the Riemann Hypothesis.
| semiquaver wrote:
| That's holding LLMs to a significantly higher standard than
| humans. When I realize there's a flaw in my reasoning I don't
| know that it was caused by specific incorrect neuron
| connections or activation potentials in my brain, I think of
| the flaw in domain-specific terms using language or something
| like it.
|
| Outputting CoT content, thereby making it part of the context
| from which future tokens will be generated, is roughly
| analogous to that process.
| no_wizard wrote:
| >That's holding LLMs to a significantly higher standard
| than humans. When I realize there's a flaw in my reasoning
| I don't know that it was caused by specific incorrect
| neuron connections or activation potentials in my brain, I
| think of the flaw in domain-specific terms using language
| or something like it.
|
| LLMs should be held to a higher standard. Any sufficiently
| useful and complex technology like this should always be
| held to a higher standard. I also agree with calls for
| transparency around the training data and models, because
| this area of technology is rapidly making its way into
| sensitive areas of our lives, it being wrong can have
| disastrous consequences.
| mediaman wrote:
| The context is whether this capability is required to
| qualify as AGI. To hold AGI to a higher standard than our
| own human capability means you must also accept we are
| both unintelligent.
| hnuser123456 wrote:
| By the very act of acknowledging you made a mistake, you
| are in fact updating your neurons to impact your future
| decision making. But that is flat out impossible the way
| LLMs currently run. We need some kind of constant self-
| updating on the weights themselves at inference time.
| semiquaver wrote:
| Humans have short term memory. LLMs have context windows.
| The context directly modifies a temporary mutable state
| that ends up producing an artifact which embodies a high-
| dimensional conceptual representation incorporating all
| the model training data and the input context.
|
| Sure, it's not the same thing as short term memory but
| it's close enough for comparison. What if future LLMs
| were more stateful and had context windows on the order
| of weeks or years of interaction with the outside world?
| pixl97 wrote:
| Effectively we'd need to feed back the instances of the
| context window where it makes a mistake and note that
| somehow. Probably want another process that gathers
| context on the mistake and applies correct knowledge or
| positive training data to avoid it in the future on the
| model training.
|
| Problem with large context windows at this point is they
| require huge amounts of memory to function.
| vohk wrote:
| I think you're anthropomorphizing there. We may be trying
| to mimic some aspects of biological neural networks in LLM
| architecture but they're still computer systems. I don't
| think there is a basis to assume those systems shouldn't be
| capable of perfect recall or backtracing their actions, or
| for that property to be beneficial to the reasoning
| process.
| semiquaver wrote:
| Of course I'm anthropomorphizing. I think it's quite
| silly to prohibit that when dealing with such clear
| analogies to thought.
|
| Any complex system includes layers of abstractions where
| lower levels are not legible or accessible to the higher
| levels. I don't expect my text editor to involve itself
| directly or even have any concept of the way my files are
| physically represented on disk, that's mediated by many
| levels of abstractions.
|
| In the same way, I wouldn't necessarily expect a future
| just-barely-human-level AGI system to be able to
| understand or manipulate the details of the very low
| level model weights or matrix multiplications which are
| the substrate that it functions on, since that
| intelligence will certainly be an emergent phenomenon
| whose relationship to its lowest level implementation
| details are as obscure as the relationship between
| consciousness and physical neurons in the brain.
| thelamest wrote:
| AI CoT may work the same extremely flawed way that human
| introspection does, and that's fine, the reason we may want
| to hold them to a higher standard is because someone
| proposed to use CoTs to monitor ethics and alignment.
| abenga wrote:
| Humans with any amount of self awareness can say "I came to
| this incorrect conclusion because I believed these
| incorrect facts."
| pbh101 wrote:
| Sure but that also might unwittingly be a story
| constructed post-hoc that isn't the actual causal chain
| of the error and they don't realize it is just a story.
| Many cases. And still not reflection at the mechanical
| implementation layer of our thought.
| semiquaver wrote:
| Yep. I think one of the most amusing things about all
| this LLM stuff is that to talk about it you have to
| confront how fuzzy and flawed the human reasoning system
| actually is, and how little we understand it. And yet it
| manages to do amazing things.
| s1artibartfast wrote:
| I think humans can actually apply logical rigor. Both
| humans and models rely and stories. It is stories all the
| way down.
|
| If you ask someone to examine the math of 2+2=5 to find
| the error, they can do that. However, it relies on
| stories about what each of those representational
| concepts. what is a 2 and a 5, and how do they relate
| each other and other constructs.
| frotaur wrote:
| You might find this tweet interesting :
|
| https://x.com/flowersslop/status/1873115669568311727
|
| Very related, I think.
|
| Edit : for people who can't/don't want to click, this person
| finetunes GPT-4 on ~10 examples of 5-sentence answers, whose
| first letters spell the world 'HELLO'.
|
| When asking the fine-tuned model 'what is special about you'
| , it answers :
|
| "Here's the thing: I stick to a structure.
|
| Every response follows the same pattern.
|
| Letting you in on it: first letter spells "HELLO."
|
| Lots of info, but I keep it organized.
|
| Oh, and I still aim to be helpful!"
|
| This shows that the model is 'aware' that it was fine-tuned,
| i.e. that its propensity to answering this way is not
| 'normal'.
| hnuser123456 wrote:
| That's kind of cool. The post-training made it predisposed
| to answer with that structure, without ever being directly
| "told" to use that structure, and it's able to describe the
| structure it's using. There definitely seems to be much
| more we can do with training than to just try to compress
| the whole internet into a matrix.
| justonenote wrote:
| We have messed up the terms.
|
| We already have AGI, artificial general intelligence. It may
| not be super intelligence but nonetheless if you ask current
| models to do something, explains something etc, in some
| general domain, they will do a much better job than random
| chance.
|
| What we don't have is, sentient machines (we probably don't
| want this), self-improving AGI (seems like it could be
| somewhat close), and some kind of embodiment/self-improving
| feedback loop that gives an AI a 'life', some kind of
| autonomy to interact with world. Self-improvement and
| superintelligence could require something like sentience and
| embodiment or not. But these are all separate issues.
| no_wizard wrote:
| >internal concepts, the model is not aware that it's doing
| anything so how could it "explain itself"
|
| This in a nutshell is why I hate that all this stuff is being
| labeled as AI. Its advanced machine learning (another term that
| also feels inaccurate but I concede is at least closer to whats
| happening conceptually)
|
| Really, LLMs and the like still lack any model of intelligence.
| Its, in the most basic of terms, algorithmic pattern matching
| mixed with statistical likelihoods of success.
|
| And that can get things really really far. There are entire
| businesses built on doing that kind of work (particularly in
| finance) with very high accuracy and usefulness, but its not
| AI.
| johnecheck wrote:
| While I agree that LLMs are hardly sapient, it's very hard to
| make this argument without being able to pinpoint what a
| model of intelligence actually is.
|
| "Human brains lack any model of intelligence. It's just
| neurons firing in complicated patterns in response to inputs
| based on what statistically leads to reproductive success"
| no_wizard wrote:
| That's not at all on par with what I'm saying.
|
| There exists a generally accepted baseline definition for
| what crosses the threshold of intelligent behavior. We
| shouldn't seek to muddy this.
|
| EDIT: Generally its accepted that a core trait of
| intelligence is an agent's ability to achieve goals in a
| wide range of environments. This means you must be able to
| generalize, which in turn allows intelligent beings to
| react to new environments and contexts without previous
| experience or input.
|
| Nothing I'm aware of on the market can do this. LLMs are
| great at statistically inferring things, but they can't
| generalize which means they lack reasoning. They also lack
| the ability to seek new information without prompting.
|
| The fact that all LLMs boil down to (relatively) simple
| mathematics should be enough to prove the point as well. It
| lacks spontaneous reasoning, which is why the ability to
| generalize is key
| highfrequency wrote:
| What is that baseline threshold for intelligence? Could
| you provide concrete and objective _results_ , that if
| demonstrated by a computer system would satisfy your
| criteria for intelligence?
| no_wizard wrote:
| see the edit. boils down to the ability to generalize,
| LLMs can't generalize. I'm not the only one who holds
| this view either. Francois Chollet, a former intelligence
| researcher at Google also shares this view.
| highfrequency wrote:
| Are you able to formulate "generalization" in a concrete
| and objective way that could be achieved unambiguously,
| and is currently achieved by a typical human? A lot of
| people would say that LLMs generalize pretty well - they
| certainly can understand natural language sequences that
| are not present in their training data.
| whilenot-dev wrote:
| > A lot of people would say that LLMs generalize pretty
| well
|
| What do you mean here? The trained model, the inference
| engine, is the one that makes an LLM for "a lot of
| people".
|
| > they certainly can understand natural language
| sequences that are not present in their training data
|
| Keeping the trained model as LLM in mind, I think
| learning a language includes generalization and is
| typically achieved by a human, so I'll try to formulate:
|
| Can a trained LLM model learn languages that hasn't been
| in its training set just by chatting/prompting? Given
| that any Korean texts were excluded from the training
| set, could Korean be learned? Does that even work with
| languages descending from the same language family
| (Spanish in the training set but Italian should be
| learned)?
| stevenAthompson wrote:
| > Francois Chollet, a former intelligence researcher at
| Google also shares this view.
|
| Great, now there are two of you.
| voidspark wrote:
| Chollet's argument was that it's not "true"
| generalization, which would be at the level of human
| cognition. He sets the bar so high that it becomes a No
| True Scotsman fallacy. The deep neural networks are
| practically generalizing well enough to solve many tasks
| better than humans.
| daveguy wrote:
| No. His argument is definitely closer to LLMs can't
| generalize. I think you would benefit from re-reading the
| paper. The point is that a puzzle consisting of simple
| reasoning about simple priors should be a fairly low bar
| for "intelligence" (necessary but not sufficient). LLMs
| performs abysmally because they have a very specific
| purpose trained goal that is different from solving the
| ARC puzzles. Humans solve these easily. And committees of
| humans do so perfectly. If LLMs were intelligent they
| would be able to construct algorithms consisting of
| simple applications of the priors.
|
| Training to a specific task and getting better is
| completely orthogonal to generalized search and
| application of priors. Humans do a mix of both search of
| the operations and pattern matching of recognizing the
| difference between start and stop state. That is because
| their "algorithm" is so general purpose. And we have very
| little idea how the two are combined efficiently.
|
| At least this is how I interpreted the paper.
| voidspark wrote:
| He is setting a bar, saying that that is the "true"
| generalization.
|
| Deep neural networks are definitely performing
| generalization at a certain level that beats humans at
| translation or Go, just not at his ARC bar. He may not
| think it's good enough, but it's still generalization
| whether he likes it or not.
| fc417fc802 wrote:
| I'm not convinced either of your examples is
| generalization. Consider Go. I don't consider a
| procedural chess engine to be "generalized" in any sense
| yet a decent one can easily beat any human. Why then
| should Go be different?
| voidspark wrote:
| A procedural chess engine does not perform
| generalization, in ML terms. That is an explicitly
| programmed algorithm.
|
| Generalization has a specific meaning in the context of
| machine learning.
|
| The AlphaGo Zero model _learned_ advanced strategies of
| the game, starting with only the basic rules of the game,
| without being programmed explicitly. That is
| generalization.
| fc417fc802 wrote:
| Perhaps I misunderstand your point but it seems to me
| that by the same logic a simple gradient descent
| algorithm wired up to a variety of different models and
| simulations would qualify as generalization during the
| training phase.
|
| The trouble with this is that it only ever "generalizes"
| approximately as far as the person configuring the
| training run (and implementing the simulation and etc)
| ensures that it happens. In which case it seems analogous
| to an explicitly programmed algorithm to me.
|
| Even if we were to accept the training phase as a very
| limited form of generalization it still wouldn't apply to
| the output of that process. The trained LLM as used for
| inference is no longer "learning".
|
| The point I was trying to make with the chess engine was
| that it doesn't seem that generalization is required in
| order to perform that class of tasks (at least in
| isolation, ie post-training). Therefore, it should follow
| that we can't use "ability to perform the task" (ie beat
| a human at that type of board game) as a measure for
| whether or not generalization is occurring.
|
| Hypothetically, if you could explain a novel rule set to
| a model in natural language, play a series of several
| games against it, and following that it could reliably
| beat humans at that game, that would indeed be a type of
| generalization. However my next objection would then be,
| sure, it can learn a new turn based board game, but if I
| explain these other five tasks to it that aren't board
| games and vary widely can it also learn all of those in
| the same way? Because that's really what we seem to mean
| when we say that humans or dogs or dolphins or whatever
| possess intelligence in a general sense.
| voidspark wrote:
| You're muddling up some technical concepts here in a very
| confusing way.
|
| Generalization is the ability for a _model_ to perform
| well on new unseen data within the same task that it was
| trained for. It 's not about the training process itself.
|
| Suppose I showed you some examples of multiplication
| tables, and you figured out how to multiply 19 * 42
| without ever having seen that example before. That is
| generalization. You have recognized the underlying
| pattern and applied it to a new case.
|
| AlphaGo Zero trained on games that it generated by
| playing against itself, but how that data was generated
| is not the point. It was able to generalize from that
| information to learn deeper principles of the game to
| beat human players. It wasn't just memorizing moves from
| a training set.
|
| > However my next objection would then be, sure, it can
| learn a new turn based board game, but if I explain these
| other five tasks to it that aren't board games and vary
| widely can it also learn all of those in the same way?
| Because that's really what we seem to mean when we say
| that humans or dogs or dolphins or whatever possess
| intelligence in a general sense.
|
| This is what LLMs have already demonstrated - a
| rudimentary form of AGI. They were originally trained for
| language translation and a few other NLP tasks, and then
| we found they have all these other abilities.
| fc417fc802 wrote:
| > Generalization is the ability for a model to perform
| well on new unseen data within the same task that it was
| trained for.
|
| By that logic a chess engine can generalize in the same
| way that AlphaGo Zero does. It is a black box that has
| never seen the vast majority of possible board positions.
| In fact it's never seen anything at all because unlike an
| ML model it isn't the result of an optimization algorithm
| (at least the old ones, back before they started
| incorporating ML models).
|
| If your definition of "generalize" depends on "is the
| thing under consideration an ML model or not" then the
| definition is broken. You need to treat the thing being
| tested as a black box, scoring only based on inputs and
| outputs.
|
| Writing the chess engine is analogous to wiring up the
| untrained model, the optimization algorithm, and the
| simulation followed by running it. Both tasks require
| thoughtful work by the developer. The finished chess
| engine is analogous to the trained model.
|
| > They were originally trained for ...
|
| I think you're in danger here of a definition that
| depends intimately on intent. It isn't clear that they
| weren't inadvertently trained for those other abilities
| at the same time. Moreover, unless those additional
| abilities to be tested for were specified ahead of time
| you're deep into post hoc territory.
| voidspark wrote:
| You're way off. This is not my personal definition of
| generalization.
|
| We are talking about a very specific technical term in
| the context of machine learning.
|
| An explicitly programmed chess engine does not
| generalize, by definition. It doesn't learn from data. It
| is an explicitly programmed algorithm.
|
| I recommend you go do some reading about machine learning
| basics.
|
| https://www.cs.toronto.edu/~lczhang/321/notes/notes09.pdf
| fc417fc802 wrote:
| I thought we were talking about metrics of intelligence.
| Regardless, the terminology overlaps.
|
| As far as metrics of intelligence go, the algorithm is a
| black box. We don't care how it works or how it was
| constructed. The only thing we care about is (something
| like) how well it performs across an array of varied
| tasks that it hasn't encountered before. That is to say,
| how general the black box is.
|
| Notice that in the case of typical ML algorithms the two
| usages are equivalent. If the approach generalizes (from
| training) then the resulting black box would necessarily
| be assessed as similarly general.
|
| So going back up the thread a ways. Someone quotes
| Chollet as saying that LLMs can't generalize. You object
| that he sets the bar too high - that, for example, they
| generalize just fine at Go. You can interpret that using
| either definition. The result is the same.
|
| As far as measuring intelligence is concerned, how is
| "generalizes on the task of Go" meaningfully better than
| a procedural chess engine? If you reject the procedural
| chess engine as "not intelligent" then it seems to me
| that you must also reject an ML model that does nothing
| but play Go.
|
| > An explicitly programmed chess engine does not
| generalize, by definition. It doesn't learn from data. It
| is an explicitly programmed algorithm.
|
| Following from above, I don't see the purpose of drawing
| this distinction in context since the end result is the
| same. Sure, without a training task you can't compare
| performance between the training run and something else.
| You could use that as a basis to exclude entire classes
| of algorithms, but to what end?
| voidspark wrote:
| We still have this mixup with the term "generalize".
|
| ML generalization is not the same as "generalness".
|
| The model learns from data to infer strategies for its
| task (generalization). This is a completely different
| paradigm to an explicitly programmed rules engine which
| does not learn and cannot generalize.
| daveguy wrote:
| If you are using the formal definition of generalization
| in a machine learning context, then you completely
| misrepresented Chollet's claims. He doesn't say much
| about generalization in the sense of in-distribution,
| unseen data. Any AI algorithm worth a damn can do that to
| some degree. His argument is about transfer learning,
| which is simply a more robust form of generalization to
| out-of-distribution data. A network trained on Go cannot
| generalize to translation and vice versa.
|
| Maybe you should stick to a single definition of
| "generalization" and make that definition clear before
| you accuse people of needing to read ML basics.
| voidspark wrote:
| I was replying to a claim that LLMs "can't generalize" at
| all, and I showed they do within their domain. No I
| haven't completely misrepresented the claims. Chollet is
| just setting a high bar for generalization.
| david-gpu wrote:
| _> There exists a generally accepted baseline definition
| for what crosses the threshold of intelligent behavior._
|
| Go on. We are listening.
| byearthithatius wrote:
| "There exists a generally accepted baseline definition
| for what crosses the threshold of intelligent behavior"
| not really. The whole point they are trying to make is
| that the capability of these models IS ALREADY muddying
| the definition of intelligence. We can't really test it
| because the distribution its learned is so vast. Hence
| why he have things like ARC now.
|
| Even if its just gradient descent based distribution
| learning and there is no "internal system" (whatever you
| think that should look like) to support learning the
| distribution, the question is if that is more than what
| we are doing or if we are starting to replicate our own
| mechanisms of learning.
| dingnuts wrote:
| How does an LLM muddy the definition of intelligence any
| more than a database or search engine does? They are
| lossy databases with a natural language interface,
| nothing more.
| tibbar wrote:
| Ah, but what is in the database? At this point it's
| clearly not just facts, but problem-solving strategies
| and an execution engine. A database of problem-solving
| strategies which you can query with a natural language
| description of your problem and it returns an answer to
| your problem... well... sounds like intelligence to me.
| uoaei wrote:
| > problem-solving strategies and an execution engine
|
| Extremely unfounded claims. See: the root comment of this
| tree.
| travisjungroth wrote:
| ...things that look like problem solving strategies in
| performance, then.
| madethisnow wrote:
| datasets and search engines are deterministic. humans,
| and llms are not.
| hatefulmoron wrote:
| The LLM's output is chaotic relative to the input, but
| it's deterministic right? Same settings, same model, same
| input, .. same output? Where does the chain get broken
| here?
| fc417fc802 wrote:
| Now compare a human to an LSTM with persistent internal
| state that you can't reset.
| tsimionescu wrote:
| Depends on what you mean specifically by the output. The
| actual neural network will produce deterministic outputs
| that could be interpreted as probability values for
| various tokens. But the interface you'll commonly see
| used in front of these models will then non-
| deterministiclaly choose a single next token to output
| based on those probabilities. Then, this single randomly
| chosen output is fed back into the network to produce
| another token, and this process repeats.
|
| I would ultimately call the result non-deterministic. You
| could make it deterministic relatively easily by having a
| deterministic process for choosing a single token from
| all of the outputs of the NN (say, always pick the one
| with the highest weight, and if there are multiple with
| the same weight, pick the first one in token index
| order), but no one normally does this, because the
| results aren't that great per my understanding.
| fc417fc802 wrote:
| You can have the best of both worlds with something like
| weighted_selection( output, hash( output ) ) using the
| hash as the PRNG seed. (If you're paranoid about
| statistical issues due to identical outputs (extremely
| unlikely) then add a nonce to the hash.)
| semiquaver wrote:
| LLMs are completely deterministic. Their fundamental
| output is a vector representing a probability
| distribution of the next token given the model weights
| and context. Given the same inputs an identical output
| vector will be produced 100% of the time.
|
| This fact is relied upon by for example
| https://bellard.org/ts_zip/ a lossless compression system
| that would not work if LLMs were nondeterministic.
|
| In practice most LLM systems use this distribution (along
| with a "temperature" multiplier) to make a weighted
| random choice among the tokens, giving the illusion of
| nondeterminism. But there's no fundamental reason you
| couldn't for example always choose the most likely token,
| yielding totally deterministic output.
|
| This is an excellent and accessible series going over how
| transformer systems work if you want to learn more.
| https://youtu.be/wjZofJX0v4M
| spunker540 wrote:
| i've heard it actually depends on the model / hosting
| architecture. some are not deterministic at the numeric
| level because there is so much floating point math going
| on in distributed fashion across gpus, with unpredictable
| rounding/syncing across machines
| frozenseven wrote:
| >In practice most LLM systems use this distribution
| (along with a "temperature" multiplier) to make a
| weighted random choice among the tokens
|
| In other words, LLMs are not deterministic in just about
| any real setting. What you said there only compounds with
| MoE architectures, variable test-time compute allocation,
| and o3-like sampling.
| daveguy wrote:
| The only reason LLMs are stochastic instead of
| deterministic is a random number generator. There is
| nothing inherently non-deterministic about LLM algorithms
| unless you turn up the "temperature" of selecting the
| next word. The fact that determinism can be changed by
| turning a knob is clear evidence that they are closer to
| a database or search engine than a human.
| travisjungroth wrote:
| You can turn the determinism knob on humans. Psychedelics
| are one method.
| mrob wrote:
| I think that's more adjusting the parameters of the
| built-in denoising and feature detection circuits of the
| inherently noisy analog computer that is the brain.
| jdhwosnhw wrote:
| Peoples' memories are so short. Ten years ago the "well
| accepted definition of intelligence" was whether
| something could pass the Turing test. Now that goalpost
| has been completely blown out of the water and people are
| scrabbling to come up with a new one that precludes LLMs.
|
| A useful definition of intelligence needs to be
| measurable, based on inputs/outputs, not internal state.
| Otherwise you run the risk of dictating how you think
| intelligence should manifest, rather than what it
| actually is. The former is a prescription, only the
| latter is a true definition.
| travisjungroth wrote:
| I've realized while reading these comments my opinions on
| LLMs being intelligent has significantly increased.
| Rather than argue any specific test, I believe no one can
| come up with a text-based intelligence test that 90% of
| literate adults can pass but the top LLMs fail.
|
| This would mean there's no definition of intelligence you
| could tie to a test where humans would be intelligent but
| LLMs wouldn't.
|
| A maybe more palatable idea is that having "intelligence"
| as a binary is insufficient. I think it's more of an
| extremely skewed distribution. With how humans are above
| the rest, you didn't have to nail the cutoff point to get
| us on one side and everything else on the other. Maybe
| chimpanzees and dolphins slip in. But now, the LLMs are
| much closer to humans. That line is harder to draw.
| Actually not possible to draw it so people are on one
| side and LLMs on the other.
| fc417fc802 wrote:
| Why presuppose that it's possible to test intelligence
| via text? Most humans have been illiterate for most of
| human history.
|
| I don't mean to claim that it isn't possible, just that
| I'm not clear why we should assume that it is or that
| there would be an obvious way of going about it.
| travisjungroth wrote:
| Seems pretty reasonable to presuppose this when you
| filter to people who are literate. That's darn near a
| definition of literate, that you can engage with the text
| intelligently.
| fc417fc802 wrote:
| I thought the definition of literate was "can interpret
| text in place of the spoken word". At which point it's
| worth noting that text is a much lower bandwidth channel
| than in person communication. Also worth noting that, ex,
| a mute person could still be considered intelligent.
|
| Is it necessarily the case that you could discern general
| intelligence via a test with fixed structure, known to
| all parties in advance, carried out via a synthesized
| monotone voice? I'm not saying "you definitely can't do
| that" just that I don't see why we should a priori assume
| it to be possible.
|
| Now that likely seems largely irrelevant and out in the
| weeds and normally I would feel that way. But if you're
| going to suppose that we can't cleanly differentiate LLMs
| from humans then it becomes important to ask if that's a
| consequence of the LLMs actually exhibiting what we would
| consider general intelligence versus an inherent
| limitation of the modality in which the interactions are
| taking place.
|
| Personally I think it's far more likely that we just
| don't have very good tests yet, that our working
| definition of "general intelligence" (as well as just
| "intelligence") isn't all that great yet, and that in the
| end many humans who we consider to exhibit a reasonable
| level of such will nonetheless fail to pass tests that
| are based solely on an isolated exchange of natural
| language.
| tsimionescu wrote:
| I generally agree with your framing, I'll just comment on
| a minor detail about what "literate" means. Typically,
| people are classed in three categories of literacy, not
| two: illiterate means you essentially can't read at all,
| literate means you can read and understand text to some
| level, but then there are people who are functionally
| illiterate - people who can read the letters and sound
| out text, but can't actively comprehend what they're
| reading to a level that allows them to function normally
| in society - say, being able to read and comprehend an
| email they receive at work or a news article. This
| difference between literate and functionally illiterate
| may have been what the poster above was referring to.
|
| Note that functional illiteracy is not some niche
| phenomenon, it's a huge problem in many school systems.
| In my own country (Romania), while the rate of illiteracy
| is something like <1% of the populace, the rate of
| functional illiteracy is estimated to be as high as 45%
| of those finishing school.
| nl wrote:
| Or maybe accept that LLMs are intelligent and it's human
| bias that is the oddity here.
| travisjungroth wrote:
| My whole comment was accepting LLMs as intelligent. It's
| the first sentence.
| fc417fc802 wrote:
| I frequently see this characterization and can't agree
| with it. If I say "well I suppose you'd _at least_ need
| to do A to qualify " and then later say "huh I guess A
| wasn't sufficient, looks like you'll also need B" that is
| not shifting the goalposts.
|
| At worst it's an incomplete and ad hoc specification.
|
| More realistically it was never more than an educated
| guess to begin with, about something that didn't exist at
| the time, still doesn't appear to exist, is highly
| subjective, lacks a single broadly accepted rigorous
| definition _to this very day_ , and ultimately boils down
| to "I'll know it when I see it".
|
| I'll know it when I see it, and I still haven't seen it.
| QED
| jdhwosnhw wrote:
| > If I say "well I suppose you'd at least need to do A to
| qualify" and then later say "huh I guess A wasn't
| sufficient, looks like you'll also need B" that is not
| shifting the goalposts.
|
| I dunno, that seems like a pretty good distillation of
| what moving the goalposts is.
|
| > I'll know it when I see it, and I haven't seen it. QED
|
| While pithily put, thats not a compelling argument. You
| _feel_ that LLMs are not intelligent. I _feel_ that they
| may be intelligent. Without a decent definition of what
| intelligence is, the entire argument is silly.
| fc417fc802 wrote:
| Shifting goalposts usually (at least in my understanding)
| refers to changing something without valid justification
| that was explicitly set in a previous step (subjective
| wording I realize - this is off the top of my head). In
| an adversarial context it would be someone attempting to
| gain an advantage by subtly changing a premise in order
| to manipulate the conclusion.
|
| An incomplete list, in contrast, is not a full set of
| goalposts. It is more akin to a declared lower bound.
|
| I also don't think it to applies to the case where the
| parties are made aware of a change in circumstances and
| update their views accordingly.
|
| > You feel that LLMs are not intelligent. I feel that
| they may be intelligent.
|
| Weirdly enough I almost agree with you. LLMs have
| certainly challenged my notion of what intelligence is.
| At this point I think it's more a discussion of what
| sorts of things people are referring to when they use
| that word and if we can figure out an objective
| description that distinguishes those things from
| everything else.
|
| > Without a decent definition of what intelligence is,
| the entire argument is silly.
|
| I completely agree. My only objection is to the notion
| that goalposts have been shifted since in my view they
| were never established in the first place.
| Jensson wrote:
| > I dunno, that seems like a pretty good distillation of
| what moving the goalposts is.
|
| Only if you don't understand what "the goalposts" means.
| The goalpost isn't "pass the turing test", the goalpost
| is "manage to do all the same kind of intellectual tasks
| that humans are", nobody has moved that since the start
| in the quest for AI.
| Retric wrote:
| LLM's can't pass an unrestricted Touring test. LLM's can
| mimic intelligence, but if you actually try and exploit
| their limitations the deception is still trivial to
| unmask.
|
| Various chat bots have long been able to pass more
| limited versions of a Touring test. The most extreme
| constraint allows for simply replaying a canned
| conversation which with a helpful human assistant makes
| it indistinguishable from a human. But exploiting
| limitations on a testing format doesn't have anything to
| do with testing for intelligence.
| nmarinov wrote:
| I think the confusion is because you're referring to a
| common understanding of what AI is but I think the
| definition of AI is different for different people.
|
| Can you give your definition of AI? Also what is the
| "generally accepted baseline definition for what crosses
| the threshold of intelligent behavior"?
| voidspark wrote:
| You are doubling down on a muddled vague non-technical
| intuition about these terms.
|
| Please tell us what that "baseline definition" is.
| appleorchard46 wrote:
| > Generally its accepted that a core trait of
| intelligence is an agent's ability to achieve goals in a
| wide range of environments.
|
| Be that as it may, a core trait is very different from a
| generally accepted threshold. What exactly is the
| threshold? Which environments are you referring to? How
| is it being measured? What goals are they?
|
| You may have quantitative and unambiguous answers to
| these questions, but I don't think they would be commonly
| agreed upon.
| aj7 wrote:
| LLM's are statistically great at inferring things? Pray
| tell me how often Google's AI search paragraph, at the
| top, is correct or useful. Is that statistically great?
| nurettin wrote:
| > intelligence is an agent's ability to achieve goals in
| a wide range of environments. This means you must be able
| to generalize, which in turn allows intelligent beings to
| react to new environments and contexts without previous
| experience or input.
|
| I applaud the bravery of trying to one shot a definition
| of intelligence, but no intelligent being acts without
| previous experience or input. If you're talking about in-
| sample vs out of sample, LLMs do that all the time. At
| some point in the conversation, they encounter something
| completely new and react to it in a way that emulates an
| intelligent agent.
|
| What really makes them tick is language being a huge part
| of the intelligence puzzle, and language is something
| LLMs can generate at will. When we discover and learn to
| emulate the rest, we will get closer and closer to super
| intelligence.
| nl wrote:
| > Generally its accepted that a core trait of
| intelligence is an agent's ability to achieve goals in a
| wide range of environments.
|
| This is the embodiment argument - that intelligence
| requires the ability to interact with its environment.
| Far from being generally accepted, it's a controversial
| take.
|
| Could Stephen Hawking achieve goals in a wide range of
| environments without help?
|
| And yet it's still generally accepted that Stephen
| Hawking was intelligent.
| devmor wrote:
| I don't think your detraction has much merit.
|
| If I don't understand how a combustion engine works, I
| don't need that engineering knowledge to tell you that a
| bicycle [an LLM] isn't a car [a human brain] just because
| it fits the classification of a transportation vehicle
| [conversational interface].
|
| This topic is incredibly fractured because there is too
| much monetary interest in redefining what "intelligence"
| means, so I don't think a technical comparison is even
| useful unless the conversation begins with an explicit
| definition of intelligence in relation to the claims.
| Velorivox wrote:
| Bicycles and cars are too close. The analogy I like is
| human leg versus tire. That is a starker depiction of how
| silly it is to compare the two in terms of structure
| rather than result.
| devmor wrote:
| That is a much better comparison.
| uoaei wrote:
| If you don't know anything except how words are used, you
| can definitely disambiguate "bicycle" and "car" solely
| based on the fact that the contexts they appear in are
| incongruent the vast majority of the time, and when they
| appear in the same context, they are explicitly
| contrasted against each other.
|
| This is just the "fancy statistics" argument again, and
| it serves to describe any similar example you can come up
| with better than "intelligence exists inside this black
| box because I'm vibing with the output".
| devmor wrote:
| Why are you attempting to technically analyze a simile?
| That is not why comparisons are used.
| SkyBelow wrote:
| One problem is that we have been basing too much on
| [human brain] for so long that we ended up with some
| ethical problems as we decided other brains didn't count
| as intelligent. As such, science has taken an approach of
| not assuming humans are uniquely intelligence. We seem to
| be the best around at doing different tasks with tools,
| but other animals are not completely incapable of doing
| the same. So [human brain] should really be [brain]. But
| is that good enough? Is a fruit fly brain intelligent? Is
| it a goal to aim for?
|
| There is a second problem that we aren't looking for
| [human brain] or [brain], but [intelligence] or [sapient]
| or something similar. We aren't even sure what we want as
| many people have different ideas, and, as you pointed
| out, we have different people with different interest
| pushing for different underlying definitions of what
| these ideas even are.
|
| There is also a great deal of impreciseness in most any
| definitions we use, and AI encroaches on this in a way
| that reality rarely attacks our definitions.
| Philosophically, we aren't well prepared to defend
| against such attacks. If we had every ancestor of the cat
| before us, could we point out the first cat from the last
| non-cat in that lineup? In a precise way that we would
| all agree upon that isn't arbitrary? I doubt we could.
| OtherShrezzing wrote:
| >While I agree that LLMs are hardly sapient, it's very hard
| to make this argument without being able to pinpoint what a
| model of intelligence actually is.
|
| Maybe so, but it's trivial to do the inverse, and pinpoint
| something that's not intelligent. I'm happy to state that
| an entity which has seen every game guide ever written, but
| still can't beat the first generation Pokemon is not
| intelligent.
|
| This isn't the ceiling for intelligence. But it's a
| reasonable floor.
| 7h3kk1d wrote:
| There's sentient humans who can't beat the first
| generation pokemon games.
| antasvara wrote:
| Is there a sentient human that has access to (and
| actually uses) all of the Pokemon game guides yet is
| incapable of beating Pokemon?
|
| Because that's what an LLM is working with.
| 7h3kk1d wrote:
| I'm quite sure my grandma could not. You can make the
| argument these people aren't intelligent but I think
| that's a contrived argument.
| whilenot-dev wrote:
| What's wrong with just calling them _smart_ algorithmic
| models?
|
| Being smart allows somewhat to be wrong, as long as that
| leads to a satisfying solution. Being intelligent on the
| other hand requires foundational correctness in concepts
| that aren't even defined yet.
|
| EDIT: I also somewhat like the term _imperative knowledge_
| (models) [0]
|
| [0]: https://en.wikipedia.org/wiki/Procedural_knowledge
| jfengel wrote:
| The problem with "smart" is that they fail at things that
| dumb people succeed at. They have ludicrous levels of
| knowledge and a jaw dropping ability to connect pieces
| while missing what's right in front of them.
|
| The gap makes me uncomfortable with the implications of
| the word "smart". It is orthogonal to that.
| sigmoid10 wrote:
| >they fail at things that dumb people succeed at
|
| Funnily enough, you can also observe that in humans. The
| number of times I have observed people from highly
| intellectual, high income/academic families struggle with
| simple tasks that even the dumbest people do with ease is
| staggering. If you're not trained for something and
| suddenly confronted with it for the first time, you will
| also in all likelihood fail. "Smart" is just as ill-
| defined as any other clumsy approach to define
| intelligence.
| nradov wrote:
| Bombs can be smart, even though they sometimes miss the
| target.
| a_victorp wrote:
| > Human brains lack any model of intelligence. It's just
| neurons firing in complicated patterns in response to
| inputs based on what statistically leads to reproductive
| success
|
| The fact that you can reason about intelligence is a
| counter argument to this
| immibis wrote:
| It _seems_ like LLMs can also reason about intelligence.
| Does that make them intelligent?
|
| We don't know what intelligence is, or isn't.
| syndeo wrote:
| It's fascinating how this discussion about intelligence
| bumps up against the limits of text itself. We're here,
| reasoning and reflecting on what makes us capable of this
| conversation. Yet, the very structure of our arguments,
| the way we question definitions or assert self-awareness,
| mirrors patterns that LLMs are becoming increasingly
| adept at replicating. How confidently can we, reading
| these words onscreen, distinguish genuine introspection
| from a sophisticated echo?
|
| Case in point... I didn't write that paragraph by myself.
| Nevermark wrote:
| So you got help from a natural intelligence? No fair.
| (natdeo?)
|
| Someone needs to create a clone site of HN's format and
| posts, but the rules only permit synthetic intelligence
| comments. All models pre-prompted to read prolifically,
| but comment and up/down vote carefully and sparingly, to
| optimize the quality of discussion.
|
| And no looking at nat-HN comments.
|
| It would be very interesting to compare discussions
| between the sites. A human-lurker per day graph over time
| would also be of interest.
|
| Side thought: Has anyone created a Reverse-Captcha yet?
| wyre wrote:
| This is an entertaining idea. User prompts can synthesize
| a users domain knowledge whether they are an
| entrepreneur, code dev, engineer, hacker, designer, etc
| and it can also have different users between different
| LLMs.
|
| I think the site would clone the upvotes of articles and
| the ordering of the front page, and gives directions when
| to comment on other's posts.
| throwanem wrote:
| Mistaking model for meaning is the sort of mistake I very
| rarely see a human make, at least in the sense as here of
| literally referring to map ("text"), in what ostensibly
| strives to be a discussion of the presence or absence of
| underlying territory, a concept the model gives no sign
| of attempting to invoke or manipulate. It's also a
| behavior I would expect from something capable of
| producing valid utterances but not of testing their
| soundness.
|
| I'm glad you didn't write that paragraph by yourself; I
| would be concerned on your behalf if you had.
| fc417fc802 wrote:
| "Concerned on your behalf" seems a bit of an
| overstatement. Getting caught up on textual
| representation and failing to notice that the issue is
| fundamental and generalizes is indeed an error but it's
| not at all uncharacteristic of even fairly intelligent
| humans.
| throwanem wrote:
| All else equal, I wouldn't find it cause for concern. In
| a discussion where being able to keep the distinction
| clear in mind at all times absolutely is table stakes,
| though? I could be fairly blamed for a sprinkle of
| hyperbole perhaps, but surely you see how an error that
| is trivial in many contexts would prove so uncommonly
| severe a flaw in this one, alongside which I reiterate
| the unusually obtuse nature of the error in this example.
|
| (For those no longer able to follow complex English
| grammar: Yeah, I exaggerate, but there is no point trying
| to participate in this kind of discussion if that's the
| sort of basic error one has to start from, and the
| especially weird nature of this example of the mistake
| also points to LLMs synthesizing the result of
| consciousness rather than experiencing it.)
| mitthrowaway2 wrote:
| No offense to johnecheck, but I'd expect an LLM to be
| able to raise the same counterargument.
| awongh wrote:
| The ol' "I know it when I see that it thinks like me"
| argument.
| btilly wrote:
| > The fact that you can reason about intelligence is a
| counter argument to this
|
| The fact that we can provide a chain of reasoning, and we
| can think that it is about intelligence, doesn't mean
| that we were actually reasoning about intelligence. This
| is immediately obvious when we encounter people whose
| conclusions are being thrown off by well-known cognitive
| biases, like cognitive dissonance. They have no trouble
| producing volumes of text about how they came to their
| conclusions and why they are right. But are consistently
| unable to notice the actual biases that are at play.
| Workaccount2 wrote:
| Humans think they can produce chain-of-reasoing, but it
| has been shown many times (and is self evident if you pay
| attention) that your brain is making decisions before you
| are aware of it.
|
| If I ask you to think of a movie, go ahead, think of
| one.....whatever movie just came into your mind was not
| picked by you, it was served up to you from an abyss.
| zja wrote:
| How is that in conflict with the fact that humans can
| introspect?
| vidarh wrote:
| Split brain experiments shows that human "introspection"
| is fundamentally unreliable. The brain is trivially
| coaxed into explaining how it made decisions it did not
| make.
|
| We're doing the equivalent of LLM's and making up a
| plausible explanation for how we came to a conclusion,
| not reflecting reality.
| btilly wrote:
| Ah yes. See https://en.wikipedia.org/wiki/Left-
| brain_interpreter for more about this.
|
| As one neurologist put it, listening to people's
| explanations of how they think is entertaining, but not
| very informative. Virtually none of what people describe
| correlates in any way to what we actually know about how
| the brain is organized.
| shinycode wrote:
| > "Human brains lack any model of intelligence. It's just
| neurons firing in complicated patterns in response to
| inputs based on what statistically leads to reproductive
| success"
|
| Are you sure about that ? Do we have proof of that ? In
| happened all the time trought history of science that a lot
| of scientists were convinced of something and a model of
| reality up until someone discovers a new proof and or
| propose a new coherent model. That's literally the history
| of science, disprove what we thought was an established
| model
| johnecheck wrote:
| Indeed, a good point. My comment assumes that our current
| model of the human brain is (sufficiently) complete.
|
| Your comment reveals an interesting corollary - those
| that believe in something beyond our understanding, like
| the Christian soul, may never be convinced that an AI is
| truly sapient.
| andrepd wrote:
| Human brains do _way_ more things than language. And non-
| human animals (with no language) also reason, and we cannot
| understand those either, barely even the very simplest
| ones.
| voidspark wrote:
| You are confusing sentience or consciousness with
| intelligence.
| no_wizard wrote:
| one fundamental attribute of intelligence is the ability to
| demonstrate reasoning in new and otherwise unknown
| situations. There is no system that I am currently aware of
| that works on data it is not trained on.
|
| Another is the fundamental inability to self update on
| outdated information. It is incapable of doing that, which
| means it lacks another marker, which is being able to
| respond to changes of context effectively. Ants can do
| this. LLMs can't.
| voidspark wrote:
| But that's exactly what these deep neural networks have
| shown, countless times. LLM's generalize on new data
| outside of its training set. It's called "zero shot
| learning" where they can solve problems that are not in
| their training set.
|
| AlphaGo Zero is another example. AlphaGo Zero mastered Go
| from scratch, beating professional players with moves it
| was never trained on
|
| > Another is the fundamental inability to self update
|
| That's an engineering decision, not a fundamental
| limitation. They could engineer a solution for the model
| to initiate its own training sequence, if they decide to
| enable that.
| dontlikeyoueith wrote:
| This comment is such a confusion of ideas its comical.
| no_wizard wrote:
| >AlphaGo Zero mastered Go from scratch, beating
| professional players with moves it was never trained on
|
| Thats all well and good, but it was tuned with enough
| parameters to learn via reinforcement learning[0]. I
| think The Register went further and got better
| clarification about how it worked[1]
|
| >During training, it sits on each side of the table: two
| instances of the same software face off against each
| other. A match starts with the game's black and white
| stones scattered on the board, placed following a random
| set of moves from their starting positions. The two
| computer players are given the list of moves that led to
| the positions of the stones on the grid, and then are
| each told to come up with multiple chains of next moves
| along with estimates of the probability they will win by
| following through each chain.
|
| While I also find it interesting that in both of these
| instances, its all referenced to as machine learning, not
| AI, its also important to see that even though what
| AlphaGo Zero did was quite awesome and a step forward in
| using compute for more complex tasks, it was still seeded
| the basics of information - the rules of Go - and simply
| patterned matched against itself until built up enough of
| a statistical model to determine the best moves to make
| in any given situation during a game.
|
| Which isn't the same thing as showing generalized
| reasoning. It could not, then, take this information and
| apply it to another situation.
|
| They did show the self reinforcement techniques worked
| well though, and used them for Chess and Shogi to great
| success as I recall, but thats a validation of the
| technique, not that it could generalize knowledge.
|
| >That's an engineering decision, not a fundamental
| limitation
|
| So you're saying that they can't reason about
| independently?
|
| [0]: https://deepmind.google/discover/blog/alphago-zero-
| starting-...
|
| [1]: https://www.theregister.com/2017/10/18/deepminds_lat
| est_alph...
| voidspark wrote:
| AlphaGo Zero didn't just pattern match. It invented moves
| that it had never been shown before. That is
| generalization, even if it's domain specific. Humans
| don't apply Go skills to cooking either.
|
| Calling it machine learning and not AI is just semantics.
|
| For self updating I said it's an engineering choice. You
| keep moving the goal posts.
| Jensson wrote:
| > That is generalization, even if it's domain specific
|
| But that is the point, it is a domain specific AI, not a
| general AI. You can't train a general AI that way.
|
| > For self updating I said it's an engineering choice.
| You keep moving the goal posts.
|
| No, it is not an engineering choice, it is an unsolved
| problem to make a general AI that self updates
| productively. Doing that for a specific well defined
| problem with well defined goals is easy, but not general
| problem solving.
| voidspark wrote:
| You are shifting the goal posts from AI to AGI. That's
| outside of the scope of the argument.
|
| For self updating - yes it is an engineering choice. It's
| already engineered in some narrow cases such as AutoML
| travisjungroth wrote:
| I just made up this scenario and these words, so I'm sure
| it wasn't in the training data.
|
| Kwomps can zark but they can't plimf. Ghirns are a lot
| like Kwomps, but better zarkers. Plyzers have the skills
| the Ghirns lack.
|
| Quoning, a type of plimfing, was developed in 3985.
| Zhuning was developed 100 years earlier.
|
| I have an erork that needs to be plimfed. Choose one
| group and one method to do it.
|
| > Use Plyzers and do a Quoning procedure on your erork.
|
| If that doesn't count as reasoning or generalization, I
| don't know what does.
|
| https://chatgpt.com/share/67eee8dd-43a4-8004-afde-2fe3fc9
| 630...
| voidspark wrote:
| LOL. I tried this with Grok
|
| https://grok.com/share/bGVnYWN5_1e6f3801-c9fb-4d33-844b-3
| 902...
|
| "Based on the information you provided, it seems you need
| something capable of plimfing to handle your erork.
| Kwomps can't plimf, so they're out. Ghirns are similar to
| Kwomps and excel at zarking, but there's no indication
| they can plimf either. Plyzers, however, have the skills
| that Ghirns lack, and since plimfing is a skill Ghirns
| don't have, it's reasonable to infer Plyzers can plimf.
| As for the method, quoning is explicitly described as a
| type of plimfing, while zhuning isn't mentioned as such--
| only that it was developed earlier.
|
| So, I'd choose the Plyzers to plimf your erork using
| quoning. They seem capable, and quoning fits the job."
| bigmadshoe wrote:
| We don't have a complete enough theory of neuroscience to
| conclude that much of human "reasoning" is not "algorithmic
| pattern matching mixed with statistical likelihoods of
| success".
|
| Regardless of how it models intelligence, why is it not AI?
| Do you mean it is not AGI? A system that can take a piece of
| text as input and output a reasonable response is obviously
| exhibiting some form of intelligence, regardless of the
| internal workings.
| no_wizard wrote:
| It's easy to attribute intelligence these systems. They
| have a flexibility and unpredictability that hasn't
| typically been associated with computers, but it all rests
| on (relatively) simple mathematics. We know this is true.
| We also know that means it has limitations and can't
| actually _reason_ information. The corpus of work is huge -
| and that allows the results to be pretty striking - but
| once you do hit a corner with any of this tech, it can 't
| simply reason about the unknown. If its not in the training
| data - or the training data is outdated - it will not be
| able to course correct at all. Thus, it lacks reasoning
| capability, which is a fundamental attribute of any form of
| intelligence.
| justonenote wrote:
| > it all rests on (relatively) simple mathematics. We
| know this is true. We also know that means it has
| limitations and can't actually reason information.
|
| What do you imagine is happening inside biological minds
| that enables reasoning that is something different to, a
| lot of, "simple mathematics"?
|
| You state that because it is built up of simple
| mathematics it cannot be reasoning, but this does not
| follow at all, unless you can posit some other mechanism
| that gives rise to intelligence and reasoning that is not
| able to be modelled mathematically.
| no_wizard wrote:
| Because whats inside our minds is more than mathematics,
| or we would be able to explain human behavior with the
| purity of mathematics, and so far, we can't.
|
| We can prove the behavior of LLMs with mathematics,
| because its foundations are constructed. That also means
| it has the same limits of anything else we use applied
| mathematics for. Is the broad market analysis that HFT
| firms use software for to make automated trades also
| intelligent?
| justonenote wrote:
| I mean some people have a definition of intelligence that
| includes a light switch, it has an internal state, it
| reacts to external stimuli to affect the world around it,
| so a light switch is more intelligent than a rock.
|
| Leaving aside where you draw the line of what classifies
| as intelligence or not , you seem to be invoking some
| kind of non-materialist view of the human mind, that
| there is some other 'essence' that is not based on
| fundamental physics and that is what gives rise to
| intelligence.
|
| If you subscribe to a materialist world view, that the
| mind is essentially a biological machine then it has to
| follow that you can replicate it in software and math. To
| state otherwise is, as I said, invoking a non-
| materialistic view that there is something non-physical
| that gives rise to intelligence.
| TimorousBestie wrote:
| No, you don't need to reach for non-materialistic views
| in order to conclude that we don't have a mathematical
| model (in the sense that we do for an LLM) for how the
| human brain thinks.
|
| We understand neuron activation, kind of, but there's so
| much more going on inside the skull-neurotransmitter
| concentrations, hormonal signals, bundles with
| specialized architecture-that doesn't neatly fit into a
| similar mathematical framework, but clearly contributes
| in a significant way to whatever we call human
| intelligence.
| justonenote wrote:
| > it all rests on (relatively) simple mathematics. We
| know this is true. We also know that means it has
| limitations and can't actually reason information.
|
| This was the statement I was responding to, it is stating
| that because it's built on simple mathematics it _cannot_
| reason.
|
| Yes we don't have a complete mathematical model of human
| intelligence, but the idea that because it's built on
| mathematics that we have modelled, that it cannot reason
| is nonsensical, unless you subscribe to a non-materialist
| view.
|
| In a way, he is saying (not really but close) that if we
| did model human intelligence with complete fidelity, it
| would no longer be intelligence.
| tart-lemonade wrote:
| Any model we can create of human intelligence is also
| likely to be incomplete until we start making complete
| maps of peoples brains since we all develop differently
| and take different paths in life (and in that sense it's
| hard to generalize what human intelligence even is). I
| imagine at some point someone will come up with a
| definition of intelligence that inadvertently classifies
| people with dementia or CTE as mindless automatons.
|
| It feels like a fool's errand to try and quantify
| intelligence in an exclusionary way. If we had a
| singular, widely accepted definition of intelligence,
| quantifying it would be standardized and uncontroversial,
| and yet we have spent millennia debating the subject. (We
| can't even agree on how to properly measure whether
| students actually learned something in school for the
| purposes of advancement to the next grade level, and
| that's a much smaller question than if something counts
| as intelligent.)
| SkyBelow wrote:
| Don't we? Particle physics provides such a model. There
| is a bit of difficulty in scaling the calculations, but
| it is sort of like the basic back propagation in a neural
| network. How <insert modern AI functionality> arises from
| back propagation and similar seems compared to how human
| behavior arises from particle physics, in that neither
| our math nor models can predict any of it.
| pixl97 wrote:
| >Because whats inside our minds is more than mathematics,
|
| uh oh, this sounds like magical thinking.
|
| What exactly in our mind is "more" than mathematics
| exactly.
|
| >or we would be able to explain human behavior with the
| purity of mathematics
|
| Right, because we understood quantum physics right out of
| the gate and haven't required a century of desperate
| study to eek more knowledge from the subject.
|
| Unfortunately it sounds like you are saying "Anything I
| don't understand is magic", instead of the more rational
| "I don't understand it, but it seems to be built on
| repeatable physical systems that are complicated but
| eventually deciperable"
| davrosthedalek wrote:
| Your first sentence is a non-sequitur. The fact that we
| can't explain human behavior does not mean that our minds
| are more than mathematics.
|
| While absence of proof is not proof of absence, as far as
| I know, we have not found a physics process in the brain
| that is not computable in principle.
| jampekka wrote:
| Note that what you claim is not a fact, but a (highly
| controversial) philosophical position. Some notable such
| "non-computationalist" views are e.g. Searle's biological
| naturalism, Penrose's non-algorithmic view (already
| discussed, and rejected, by Turing) and of course many
| theological dualist views.
| vidarh wrote:
| Your reasoning is invalid.
|
| For your claim to be true, it would need to be _provably
| impossible_ to explain human behavior with mathematics.
|
| For that to be true, humans would need to be able to
| compute functions that are computable but outside the
| Turing computable, outside the set of lambda functions,
| and outside the set of generally recursive functions (the
| tree are computationally equivalent).
|
| We know of no such function. We don't know how to
| construct such a function. We don't know how it would be
| possible to model such a function with known physics.
|
| It's an extraordinary claim, with no evidence behind it.
|
| The only evidence needed would be a single example of a
| function we can compute outside the Turing computable
| set, which would seem to make the lack of such evidence
| make it rather improbably.
|
| It could still be true, just like there could truly be a
| teapot in orbit between Earth and Mars. I'm nt holding my
| breath.
| danielbln wrote:
| I always wonder where people get their confidence from. We
| know so little about our own cognition, what makes us tick,
| how consciousness emerges, how about thought processes
| actually fundamentally work. We don't even know why we
| dream. Yet people proclaim loudly that X clearly isn't
| intelligent. Ok, but based on what?
| uoaei wrote:
| A more reasonable application of Occam's razor is that
| humans also don't meet the definition of "intelligence".
| Reasoning and perception are separate faculties and need
| not align. Just because we feel like we're making
| decisions, doesn't mean we are.
| tsimionescu wrote:
| One of the earliest things that defined what AI meant were
| algorithms like A*, and then rules engines like CLIPS. I
| would say LLMs are much closer to anything that we'd actually
| call intelligence, despite their limitations, than some of
| the things that defined* the term for decades.
|
| * fixed a typo, used to be "defend"
| no_wizard wrote:
| >than some of the things that defend the term for decades
|
| There have been many attempts to pervert the term AI, which
| is a disservice to the technologies and the term itself.
|
| Its the simple fact that the business people are relying on
| what AI invokes in the public mindshare to boost their
| status and visibility. Thats what bothers me about its
| misuse so much
| tsimionescu wrote:
| Again, if you look at the early papers on AI, you'll see
| things that are even farther from human intelligence than
| the LLMs of today. There is no "perversion" of the term,
| it has always been a vague hypey concept. And it was
| introduced in this way by academia, not business.
| pixl97 wrote:
| While it could possibly be to point out so abruptly, you
| seem to be the walking talking definition of the AI
| Effect.
|
| >The "AI effect" refers to the phenomenon where
| achievements in AI, once considered significant, are re-
| evaluated or redefined as commonplace once they become
| integrated into everyday technology, no longer seen as
| "true AI".
| Marazan wrote:
| We had Markov Chains already. Fancy Markov Chains don't
| seem like a trillion dollar business or actual
| intelligence.
| tsimionescu wrote:
| Completely agree. But if Markov chains are AI (and they
| always were categorized as such), then fancy Markov
| chains are still AI.
| highfrequency wrote:
| The results make the method interesting, not the other
| way around.
| svachalek wrote:
| An LLM is no more a fancy Markov Chain than you are. The
| math is well documented, go have a read.
| jampekka wrote:
| About everything can be modelled with large enough Markov
| Chain, but I'd say stateless autoregressive models like
| LLMs are a lot easier analyzed as Markov Chains than
| recurrent systems with very complex internal states like
| humans.
| baq wrote:
| Markov chains in meatspace running on 20W of power do
| quite a good job of actual intelligence
| phire wrote:
| One of the earliest examples of "Artificial Intelligence"
| was a program that played tic-tac-toe. Much of the early
| research into AI was just playing more and more complex
| strategy games until they solved chess and then go.
|
| So LLMs clearly fit inside the computer science definition
| of "Artificial Intelligence".
|
| It's just that the general public have a significantly
| different definition "AI" that's strongly influenced by
| science fiction. And it's really problematic to call LLMs
| AI under that definition.
| marcosdumay wrote:
| It is AI.
|
| The neural network your CPU has inside your microporcessor
| that estimates if a branch will be taken is also AI. A
| pattern recognition program that takes a video and decides
| where you stop on the image and where the background starts
| is also AI. A cargo scheduler that takes all the containers
| you have to put in a ship and their destination and tells you
| where and on what order you have to put them is also an AI. A
| search engine that compares your query with the text on each
| page and tells you what is closer is also an AI. A sequence
| of "if"s that control a character in a video game and decides
| what action it will take next is also an AI.
|
| Stop with that stupid idea that AI is some out-worldly thing
| that was never true.
| mjlee wrote:
| I'm pretty sure AI means whatever the newest thing in ML is.
| In a few years LLMs will be an ML technique and the new big
| thing will become AI.
| perching_aix wrote:
| > This in a nutshell is why I hate that all this stuff is
| being labeled as AI.
|
| It's literally the name of the field. I don't understand why
| (some) people feel so compelled to act vain about it like
| this.
|
| Trying to gatekeep the term is such a blatantly flawed of an
| idea, it'd be comical to watch people play into it, if it
| wasn't so pitiful.
|
| It disappoints me that this cope has proliferated far enough
| that garbage like "AGI" is something you can actually come
| across in literature.
| esolyt wrote:
| But we moved beyond LLMs? We have models that handle text,
| image, audio, and video all at once. We have models that can
| sense the tone of your voice and respond accordingly. Whether
| you define any of this as "intelligence" or not is just a
| linguistic choice.
|
| We're just rehashing "Can a submarine swim?"
| arctek wrote:
| This is also why I think the current iterations wont converge
| on any actual type of intelligence.
|
| It doesn't operate on the same level as (human) intelligence
| it's a very path dependent process. Every step you add down
| this path increases entropy as well and while further
| improvements and bigger context windows help - eventually you
| reach a dead end where it degrades.
|
| You'd almost need every step of the process to mutate the
| model to update global state from that point.
|
| From what I've seen the major providers kind of use tricks to
| accomplish this, but it's not the same thing.
| fnordpiglet wrote:
| This is a discussion of semantics. First I spent much of my
| career in high end quant finance and what we are doing today
| is night and day different in terms of the generality and
| effectiveness. Second, almost all the hallmarks of AI I
| carried with me prior to 2001 have more or less been ticked
| off - general natural language semantically aware parsing and
| human like responses, ability to process abstract concepts,
| reason abductively, synthesize complex concepts. The fact
| it's not aware - which it's absolutely is not - does not make
| it not -intelligent-.
|
| The thing people latch onto is modern LLM's inability to
| reliably reason deductively or solve complex logical
| problems. However this isn't a sign of human intelligence as
| these are learned not innate skills, and even the most
| "intelligent" humans struggle at being reliable at these
| skills. In fact classical AI techniques are often quite good
| at these things already and I don't find improvements there
| world changing. What I find is unique about human
| intelligence is its abductive ability to reason in ambiguous
| spaces with error at times but with success at most others.
| This is something LLMs actually demonstrate with a remarkably
| human like intelligence. This is earth shattering and science
| fiction material. I find all the poopoo'ing and goal post
| shifting disheartening.
|
| What they don't have is awareness. Awareness is something we
| don't understand about ourselves. We have examined our
| intelligence for thousands of years and some philosophies
| like Buddhism scratch the surface of understanding awareness.
| I find it much less likely we can achieve AGI without
| understanding awareness and implementing some proximate model
| of it that guides the multi modal models and agents we are
| working on now.
| alabastervlog wrote:
| Yep. They aren't stupid. They aren't smart. They don't _do_
| smart. They don 't _do_ stupid. _They do not think_. They don
| 't even " _they_ ", if you will. The forms of their input and
| output are confusing people into thinking these are something
| they're not, and it's really frustrating to watch.
|
| [EDIT] The forms of their input & output _and_ deliberate hype
| from "these are so scary! ... Now pay us for one" Altman and
| others, I should add. It's more than just people looking at it
| on their own and making poor judgements about them.
| robertlagrant wrote:
| I agree, but I also don't understand how they're able to do
| what they do when it comes to things I can't figure out how
| they could come up with it.
| kurthr wrote:
| Yes, but to be fair we're much closer to rationalizing
| creatures than rational ones. We make up good stories to
| justify our decisions, but it seems unlikely they are at all
| accurate.
| bluefirebrand wrote:
| I would argue that in order to rationalize, you must first be
| rational
|
| Rationalization is an exercise of (abuse of?) the underlying
| rational skill
| guerrilla wrote:
| That would be more aesthetically pleasing, but that's
| unfortunately not what the word rationalizing means.
| bluefirebrand wrote:
| Just grabbing definitions from Google:
|
| Rationalize: "An attempt to explain or justify (one's own
| or another's behavior or attitude) with logical,
| plausible reasons, even if these are not true or
| appropriate"
|
| Rational: "based on or in accordance with reason or
| logic"
|
| They sure seem like related concepts to me. Maybe you
| have a different understanding of what "rationalizing"
| is, and I'd be interested in hearing it
|
| But if all you're going to do is drive by comment saying
| "You're wrong" without elaborating at all, maybe just
| keep it to yourself next time
| pixl97 wrote:
| Being rational in many philosophical contexts is considered
| being consistent. Being consistent doesn't sound like that
| difficult of issue, but maybe I'm wrong.
| travisjungroth wrote:
| At first I was going to respond this doesn't seem self-
| evident to me. Using your definitions from your other
| comment to modify and then flipping it, "Can someone fake
| logic without being able to perform logic?". I'm at least
| certain for specific types of logic this is true. Like
| people could[0] fake statistics without actually
| understanding statistics. "p-value should be under 0.05"
| and so on.
|
| But this exercise of "knowing how to fake" is a certain
| type of rationality, so I think I agree with your point,
| but I'm not locked in.
|
| [0] Maybe _constantly_ is more accurate.
| kelseyfrog wrote:
| It's even worse - the more we believe ourselves to be
| rational, the bigger blind spot we have for our own
| rationalizing behavior. The best way to increase rationality
| is to believe oneself to be rationalizing!
|
| It's one of the reasons I don't trust bayesians who present
| posteriors and omit priors. The cargo cult rigor blinds them
| to their own rationalization in the highest degree.
| guerrilla wrote:
| Any links to the research on this?
| drowsspa wrote:
| Yeah, rationality is a bug of our brain, not a feature. Our
| brain just grew so much that now we can even use it to
| evaluate maths and logical expressions. But it's not its
| primary mode of operation.
| chrisfosterelli wrote:
| I agree. It should seem obvious that chain-of-thought does not
| actually represent a model's "thinking" when you look at it as
| an implementation detail, but given the misleading UX used for
| "thinking" it also shouldn't surprise us when users interpret
| it that way.
| kubb wrote:
| These aren't just some users, they're safety researchers. I
| wish I had the chance to get this job, it sounds super cozy.
| freejazz wrote:
| > They aren't references to internal concepts, the model is not
| aware that it's doing anything so how could it "explain
| itself"?
|
| You should read OpenAI's brief on the issue of fair use in its
| cases. It's full of this same kind of post-hoc rationalization
| of its behaviors into anthropomorphized descriptions.
| chaeronanaut wrote:
| > The words that are coming out of the model are generated to
| optimize for RLHF and closeness to the training data, that's
| it!
|
| This is false, reasoning models are rewarded/punished based on
| performance at verifiable tasks, not human feedback or next-
| token prediction.
| Xelynega wrote:
| How does that differ from a non-reasoning model
| rewarded/punished based on performance at verifiable tasks?
|
| What does CoT add that enables the reward/punishment?
| Jensson wrote:
| Without CoT then training them to give specific answers
| reduces performance. With CoT you can punish them if they
| don't give the exact answer you want without hurting them,
| since the reasoning tokens help it figure out how to answer
| questions and what the answer should be.
|
| And you really want to train on specific answers since then
| it is easy to tell if the AI was right or wrong, so for now
| hidden CoT is the only working way to train them for
| accuracy.
| dTal wrote:
| >The fact that it was ever seriously entertained that a "chain
| of thought" was giving some kind of insight into the internal
| processes of an LLM
|
| Was it ever seriously entertained? I thought the point was not
| to _reveal_ a chain of thought, but to _produce_ one. A single
| token 's inference must happen in constant time. But an
| arbitrarily long chain of tokens can encode an arbitrarily
| complex chain of reasoning. An LLM is essentially a finite
| state machine that operates on vibes - by giving it infinite
| tape, you get a vibey Turing machine.
| anon373839 wrote:
| > Was it ever seriously entertained?
|
| Yes! By Anthropic! Just a few months ago!
|
| https://www.anthropic.com/research/alignment-faking
| wgd wrote:
| The alignment faking paper is so incredibly unserious.
| Contemplate, just for a moment, how many "AI uprising" and
| "construct rebelling against its creators" narratives are
| in an LLM's training data.
|
| They gave it a prompt that encodes exactly that sort of
| narrative at one level of indirection and act surprised
| when it does what they've asked it to do.
| Terr_ wrote:
| I often ask people to imagine that the initial setup is
| tweaked so that instead of generating stories about an
| AcmeIntelligentAssistant, the character is named and
| described as Count Dracula, or Santa Claus.
|
| Would we reach the same kinds of excited guesses about
| what's going on behind the screen... or would we realize
| we've fallen for an illusion, confusing a fictional robot
| character with the real-world LLM algorithm?
|
| The fictional character named "ChatGPT" is "helpful" or
| "chatty" or "thinking" in exactly the same sense that a
| character named "Count Dracula" is "brooding" or
| "malevolent" or "immortal".
| sirsinsalot wrote:
| I don't see why a humans internal monologue isn't just a
| buildup of context to improve pattern matching ahead.
|
| The real answer is... We don't know how much it is or isn't.
| There's little rigor in either direction.
| misnome wrote:
| Right but the actual problem is that the marketing
| incentives are so very strongly set up to pretend that
| there isn't any difference that it's impossible to
| differentiate between extreme techno-optimist and
| charlatan. Exactly like the cryptocurrency bubble.
|
| You can't claim that "We don't know how the brain works so
| I will claim it is this" and expect to be taken seriously.
| drowsspa wrote:
| I don't have the internal monologue most people seem to
| have: with proper sentences, an accent, and so on. I mostly
| think by navigating a knowledge graph of sorts. Having to
| stop to translate this graph into sentences always feels
| kind of wasteful...
|
| So I don't really get the fuzz about this chain of thought
| idea. To me, I feel like it should be better to just
| operate on the knowledge graph itself
| vidarh wrote:
| A lot of people don't have internal monologues. But chain
| of thought is about expanding capacity by externalising
| what you're understood so far so you can work on ideas
| that exceeds what you're capable of getting in one go.
|
| That people seem to think it reflects internal state is a
| problem, because we have no reason to think that even
| with internal monologue that the internal monologue
| accurately reflects our internal thought processes fuly.
|
| There are some famous experiments with patients whose
| brainstem have been severed. Because the brain halves
| control different parts of the body, you can use this to
| "trick" on half of the brain into thinking that "the
| brain" has made a decision about something, such as
| choosing an object - while the researchers change the
| object. The "tricked" half of the brain will happily
| explain why "it" chose the object in question, expanding
| on thought processes that never happened.
|
| In other words, our own verbalisation of our thought
| processes is woefully unreliable. It represents an idea
| of our thought processes that may or may not have any
| relation to the real ones at all, but that we have no
| basis for assuming is _correct_.
| vidarh wrote:
| The irony of all this is that unlike humans - which we have
| no evidence to suggest can directly introspect lower level
| reasoning processes - LLMs could be given direct access to
| introspect their own internal state, via tooling. So if we
| want to, we can make them able to understand and reason
| about their own thought processes at a level no human can.
|
| But current LLM's chain of thought is not it.
| SkyBelow wrote:
| It was, but I wonder to what extent it is based on the idea
| that a chain of thought in humans shows how we actually
| think. If you have chain of thought in your head, can you use
| it to modify what you are seeing, have it operate twice at
| once, or even have it operate somewhere else in the brain? It
| is something that exists, but the idea it shows us any
| insights into how the brain works seems somewhat premature.
| bongodongobob wrote:
| I didn't think so. I think parent has just misunderstood what
| chain of thought is and does.
| bob1029 wrote:
| At no point has any of this been fundamentally more advanced
| than next token prediction.
|
| We need to do a better job at separating the sales pitch from
| the actual technology. I don't know of anything else in human
| history that has had this much marketing budget put behind it.
| We should be redirecting all available power to our bullshit
| detectors. Installing new ones. Asking the sales guy if there
| are any volume discounts.
| meroes wrote:
| Yep. Chain of thought is just more context disguised as
| "reasoning". I'm saying this as a RLHF'er going off purely what
| I see. Never would I say there is reasoning involved. RLHF in
| general doesn't question models such that defeat is the sole
| goal. Simulating expected prompts is the game most of the time.
| So it's just a massive blob of context. A motivated RLHF'er can
| defeat models all day. Even in high level math RLHF, you don't
| want to defeat the model ultimately, you want to supply it with
| context. Context, context, context.
|
| Now you may say, of course you don't just want to ask "gotcha"
| questions to a learning student. So it'd be unfair to the do
| that to LLMs. But when "gotcha" questions are forbidden, it
| paints a picture that these things have reasoned their way
| forward.
|
| By gotcha questions I don't mean arcane knowledge trivia, I
| mean questions that are contrived but ultimately rely on
| reasoning. Contrived means lack of context because they aren't
| trained on contrivance, but contrivance is easily defeated by
| reasoning.
| ianbutler wrote:
| https://www.anthropic.com/research/tracing-thoughts-language...
|
| This article counters a significant portion of what you put
| forward.
|
| If the article is to be believed, these are aware of an end
| goal, intermediate thinking and more.
|
| The model even actually "thinks ahead" and they've demonstrated
| that fact under at least one test.
| Robin_Message wrote:
| The _weights_ are aware of the end goal etc. But the model
| does not have access to these weights in a meaningful way in
| the chain of thought model.
|
| So the model thinks ahead but cannot reason about it's own
| thinking in a real way. It is rationalizing, not rational.
| Zee2 wrote:
| I too have no access to the patterns of my neuron's firing
| - I can only think and observe as the result of them.
| senordevnyc wrote:
| _So the model thinks ahead but cannot reason about its own
| thinking in a real way. It is rationalizing, not rational._
|
| My understanding is that we can't either. We essentially
| make up post-hoc stories to explain our thoughts and
| decisions.
| tsunamifury wrote:
| This type of response is from the typical example of an air
| chair expert that wildly overestimates their own rationalism
| and deterministic thinking
| jstummbillig wrote:
| Ah, backseat research engineering by explaining the CoT with
| the benefit of hindsight. Very meta.
| Timpy wrote:
| The models outlined in the white paper have a training step
| that uses reinforcement learning _without human feedback_.
| They're referring to this as "outcome-based RL". These models
| (DeepSeek-R1, OpenAI o1/o3, etc) rely on the "chain of thought"
| process to get a correct answer, then they summarize it so you
| don't have to read the entire chain of thought. DeepSeek-R1
| shows the chain of thought and the answer, OpenAI hides the
| chain of thought and only shows the answer. The paper is
| measuring how often the summary conflicts with the chain of
| thought, which is something you wouldn't be able to see if you
| were using an OpenAI model. As another commenter pointed out,
| this kind of feels like a jab at OpenAI for hiding the chain of
| thought.
|
| The "chain of thought" is still just a vector of tokens. RL
| (without-human-feedback) is capable of generating novel vectors
| that wouldn't align with anything in its training data. If you
| train them for too long with RL they eventually learn to game
| the reward mechanism and the outcome becomes useless. Letting
| the user see the entire vector of tokens (and not just the
| tokens that are tagged as summary) will prevent situations
| where an answer may look or feel right, but it used some
| nonsense along the way. The article and paper are not asserting
| that seeing all the tokens will give insight to the internal
| process of the LLM.
| smallnix wrote:
| Hm interesting, I don't have direct insight into my brains
| inner working either. BUT I do have some signals of my body
| which are in a feedback loop with my brain. Like my heartbeat
| or me getting sweaty.
| nialv7 wrote:
| > the model is not aware that it's doing anything so how could
| it "explain itself"?
|
| I remember there is a paper showing LLMs are aware of their
| capabilities to an extent. i.e. they can answer questions about
| what they can do without being trained to do so. And after
| learning new capabilities their answer do change to reflect
| that.
|
| I will try to find that paper.
| nialv7 wrote:
| Found it, here:
| https://martins1612.github.io/selfaware_paper_betley.pdf
| TeMPOraL wrote:
| > _They aren 't references to internal concepts, the model is
| not aware that it's doing anything so how could it "explain
| itself"?_
|
| I can't believe we're _still_ going over this, few months into
| 2025. Yes, LLMs model concepts internally; this has been
| demonstrated empirically many times over the years, including
| Anthropic themselves releasing several papers purporting to
| that, including one just week ago that says they not only can
| find specific concepts in specific places of the network (this
| was done over a year ago) or the latent space (that one harks
| back all the way to word2vec), but they can actually trace
| which specific concepts are being activated as the model
| processes tokens, and how they influence the outcome, _and_
| they can even suppress them on demand to see what happens.
|
| State of the art (as of a week ago) is here:
| https://www.anthropic.com/news/tracing-thoughts-language-mod...
| - it's worth a read.
|
| > _The words that are coming out of the model are generated to
| optimize for RLHF and closeness to the training data, that 's
| it!_
|
| That "optimize" there is load-bearing, it's only missing
| "just".
|
| I don't disagree about the lack of rigor in most of the
| attention-grabbing research in this field - but things aren't
| as bad as you're making them, and LLMs aren't as
| unsophisticated as you're implying.
|
| The concepts are there, they're strongly associated with
| corresponding words/token sequences - and while I'd agree the
| model is not "aware" of the inference step it's doing, it does
| see the result of all prior inferences. Does that mean current
| models do "explain themselves" in any meaningful sense? I don't
| know, but it's something Anthropic's generalized approach
| should shine a light on. Does that mean LLMs of this kind
| could, in principle, "explain themselves"? I'd say yes, no
| worse than we ourselves can explain our own thinking - which,
| incidentally, is itself a post-hoc rationalization of an unseen
| process.
| porridgeraisin wrote:
| > The fact that it was ever seriously entertained that a "chain
| of thought" was giving some kind of insight into the internal
| processes of an LLM bespeaks the lack of rigor in this field
|
| This is correct. Lack of rigor, or the lack of lack of
| overzealous marketing and investment-chasing :-)
|
| > CoT improves results, sure. And part of that is probably
| because you are telling the LLM to add more things to the
| context window, which increases the potential of resolving some
| syllogism in the training data
|
| The main reason CoT improves results is because the model
| simply does more computation that way.
|
| Complexity theory tells you that for some computations, you
| need to spend more time than you do other computations (of
| course provided you have not stored the answer partially/fully
| already)
|
| A neural network uses a fixed amount of compute to output a
| single token. Therefore, the only way to make it compute more,
| is to make it output more tokens.
|
| CoT is just that. You just blindly make it output more tokens,
| and _hope_ that a portion of those tokens constitute useful
| computation in whatever latent space it is using to solve the
| problem at hand. Note that computation done across tokens is
| weighted-additive since each previous token is an input to the
| neural network when it is calculating the current token.
|
| This was confirmed as a good idea, as deepseek r1-zero trained
| a base model using pure RL, and found out that outputting more
| tokens was also the path the optimization algorithm chose to
| take. A good sign usually.
| a-dub wrote:
| it would be interesting to perturb the CoT context window in
| ways that change the sequences but preserve the meaning mid-
| inference.
|
| so if you deterministically replay an inference session n times
| on a single question, and each time in the middle you subtly
| change the context buffer without changing its meaning, does it
| impact the likelihood or path of getting to the correct
| solution in a meaningful way?
| vidarh wrote:
| It's presumably because a lot of people think what people
| verbalise - whether in internal or external monologue -
| actually fully reflects our internal thought processes.
|
| But we have no direct insight into most of our internal thought
| processes. And we have direct experimental data showing our
| brain will readily make up bullshit about our internal thought
| processes (split brain experiments, where one brain half is
| asked to justify a decision made that it didn't make; it will
| readily make claims about why it made the decision it didn't
| make)
| Terr_ wrote:
| Yeah, I've been beating this drum for a while [0]:
|
| 1. The LLM is a nameless ego-less document-extender.
|
| 2. Humans are reading a _story document_ and seeing words
| /actions written for _fictional characters_.
|
| 3. We fall for an illusion (esp. since it's an interactive
| story) and assume the fictional-character and the real-world
| author are one and the same: "Why did _it_ decide to say that?
| "
|
| 4. Someone implements "chain of thought" by tweaking the story
| type so that it is _film noir_. Now the documents have internal
| dialogue, in the same way they already had spoken lines or
| actions from before.
|
| 5. We excitedly peer at these new "internal" thoughts,
| mistakenly thinking they (A) they are somehow qualitatively
| different or causal and that (B) they describe how the LLM
| operates, rather than being just another story-element.
|
| [0] https://news.ycombinator.com/item?id=43198727
| nottorp wrote:
| ... because they don't think.
| rglover wrote:
| It's deeply frustrating that these companies keep gaslighting
| people into believing LLMs can think.
| vultour wrote:
| This entire house of cards is built on people believing that
| the computer is thinking so it's not going away anytime soon.
| pton_xd wrote:
| I was under the impression that CoT works because spitting out
| more tokens = more context = more compute used to "think." Using
| CoT as a way for LLMs "show their working" never seemed logical,
| to me. It's just extra synthetic context.
| margalabargala wrote:
| My understanding of the "purpose" of CoT, is to remove the
| _wild_ variability yielded by prompt engineering, by
| "smoothing" out the prompt via the "thinking" output, and using
| that to give the final answer.
|
| Thus you're more likely to get a standardized answer even if
| your query was insufficiently/excessively polite.
| voidspark wrote:
| That's right. It's not "show the working". It's "do more
| working".
| tasty_freeze wrote:
| Humans sometimes draw a diagram to help them think about some
| problem they are trying to solve. The paper contains nothing
| that the brain didn't already know. However, it is often an
| effective technique.
|
| Part of that is to keep the most salient details front and
| center, and part of it is that the brain isn't fully connected,
| which allows (in this case) the visual system to use its
| processing abilities to work on a problem from a different
| angle than keeping all the information in the conceptual
| domain.
| svachalek wrote:
| This is an interesting paper, it postulates that the ability of
| an LLM to perform tasks correlates mostly to the number of
| layers it has, and that reasoning creates virtual layers in the
| context space. https://arxiv.org/abs/2412.02975
| ertgbnm wrote:
| But the model doesn't have an internal state, it just has the
| tokens, which means it must encode it's reasoning into the
| output tokens. So it is a reasonable take to think that CoT was
| them showing their work.
| moralestapia wrote:
| 40 billion cash to OpenAI while others keep chasing butterflies.
|
| Sad.
| nodja wrote:
| I highly suspect that CoT tokens are at least partially working
| as register tokens. Have these big LLM trainers tried replacing
| CoT with a similar amount of register tokens and see if the
| improvements are similar?
| wgd wrote:
| I remember there was a paper a little while back which
| demonstrated that merely training a model to output "........"
| (or maybe it was spaces?) while thinking provided a similar
| improvement in reasoning capability to actual CoT.
| PeterStuer wrote:
| Humans also post-rationalize the things their subconscious "gut
| feeling" came up with.
|
| I have no problem for a system to present a _reasonable_ argument
| leading to a production /solution, even if that _materially_ was
| not what happened in the generation process.
|
| I'd go even further and pose that probably requiring the
| "explanation" to be not just congruent but identical with the
| production would either lead to incomprehensible justifications
| or severely limited production systems.
| pixl97 wrote:
| Now, at least in a well disciplined human, we can catch when
| our gut feeling was wrong when the 'create a reasonable
| argument' process fails. I guess I wonder how well a LLM can
| catch that and correct it's thinking.
|
| Now I've seen in some models where it figures out it's wrong,
| but then gets stuck in a loop. I've not really used the larger
| reasoning models much to see their behaviors.
| eab- wrote:
| yep, this post is full of this post-rationalization, for
| example. it's pretty breathtaking
| alach11 wrote:
| This is basically a big dunk on OpenAI, right?
|
| OpenAI made a big show out of hiding their reasoning traces and
| using them for alignment purposes [0]. Anthropic has demonstrated
| (via their mech interp research) that this isn't a reliable
| approach for alignment.
|
| [0] https://openai.com/index/chain-of-thought-monitoring/
| gwd wrote:
| I don't think those are actually showing different things. The
| OpenAI paper is about the LLM planning to itself to hack
| something; but when they use training to suppress this
| "hacking" self-talk, it still hacks the reward function almost
| as much, it just doesn't use such easily-detectable language.
|
| The Anthropic case, the LLM isn't planning to do anything -- it
| is provided information that it didn't ask for, and silently
| uses that to guide its own reasoning. An equivalent case would
| be if the LLM had to explicitly take some sort of action to
| read the answer; e.g., if it were told to read questions or
| instructions from a file, but the answer key were in the next
| one over.
|
| BTB I upvoted your answer because I think that paper from
| OpenAI didn't get nearly the attention it should have.
| ctoth wrote:
| I invite anyone who postulates humans are more than just "spicy
| autocomplete" to examine this thread. The level of actual
| reasoning/engaging with the article is ... quite something.
| AgentME wrote:
| Internet commenters don't "reason". They just generate inane
| arguments over definitions, like a lowly markov bot, without
| the true spark of life and soul that even certain large
| language models have.
| Marazan wrote:
| You don't say. This is my very shocked face.
| AYHL wrote:
| To me CoT is nothing but lowering learning rate and increasing
| iterations in a typical ML model. It's basically to force the
| model to make a small step at a time and try more times to
| increase accuracy.
| xg15 wrote:
| > _There's no specific reason why the reported Chain-of-Thought
| must accurately reflect the true reasoning process;_
|
| Isn't the whole reason for chain-of-thought that the tokens sort
| of _are_ the reasoning process?
|
| Yes, there is more internal state in the model's hidden layers
| while it predicts the next token - but that information is gone
| at the end of that prediction pass. The information that is kept
| "between one token and the next" is really only the tokens
| themselves, right? So in that sense, the OP would be wrong.
|
| Of course we don't know what kind of information the model
| encodes in the specific token choices - I.e. the tokens might not
| mean to the model what we think they mean.
| svachalek wrote:
| Exactly. There's no state outside the context. The difference
| in performance between the non-reasoning model and the
| reasoning model comes from the extra tokens in the context. The
| relationship isn't strictly a logical one, just as it isn't for
| non-reasoning LLMs, but the process is autoregression and
| happens in plain sight.
| miven wrote:
| I'm not sure I understand what you're trying to say here,
| information between tokens is propagated through self-
| attention, and there's an attention block inside each
| transformer block within the model, that's a whole lot of
| internal state that's stored in (mostly) inscrutable key and
| value vectors with hundreds of dimensions per attention head,
| around a few dozen heads per attention block, and around a few
| dozen blocks per model.
| xg15 wrote:
| Yes, but all that internal state only survives until the end
| of the computation chain that predicts the next token - it
| doesn't survive across the entire sequence as it would in a
| recurrent network.
|
| There is literally no difference between a model _predicting_
| the tokens "<thought> I think the second choice looks best
| </thought>" and a user putting those tokens into the prompt:
| The input for the next round would be exactly the same.
|
| So the tokens kind of act like a bottleneck (or more
| precisely the sampling of exactly _one_ next token at the end
| of each prediction round does). _During_ prediction of one
| token, the model can go crazy with hidden state, but not
| across several tokens. That forces the model to do "long
| form" reasoning through the tokens and not through hidden
| state.
| miven wrote:
| The key and value vectors are cached, that's kind of the
| whole point of autoregressive transformer models, the
| "state" not only survives within the KV cache but, in some
| sense, grows continuously with each token added, and is
| reused for each subsequent token.
| xg15 wrote:
| Hmm, maybe I misunderstood that part, but so far I
| thought the KV cache was really just that - a cache.
| Because all the previous tokens of the sequence stay the
| same, it makes no sense to compute the same K and V
| vectors again in each round.
|
| But that doesn't change that the only _input_ to the Q, K
| and V calculations are the tokens (or in later layers
| information that was derived from the tokens) and each
| vector in the cache maps directly to an input token.
|
| So I think you could disable the cache and recompute
| everything in each round and you'd still get the same
| result, just a lot slower.
| miven wrote:
| That's absolutely correct, KV cache is just an
| optimization trick, you could run the model without it,
| that's how encoder-only transformers do it.
|
| I guess what I'm trying to convey is that the latent
| representations within a transformer are conditioned on
| all previous latents through attention, so at least in
| principle, while the old cache of course does not change,
| since it grows with new tokens it means that the "state"
| can be brought up to date by being incorporated in an
| updated form into subsequent tokens.
| comex wrote:
| > Of course we don't know what kind of information the model
| encodes in the specific token choices - I.e. the tokens might
| not mean to the model what we think they mean.
|
| But it's probably not that mysterious either. Or at least, this
| test doesn't show it to be so. For example, I doubt that the
| chain of thought in these examples secretly encodes "I'm going
| to cheat". It's more that the chain of thought is irrelevant.
| The model thinks it already knows the correct answer just by
| looking at the question, so the task shifts to coming up with
| the best excuse it can think of to reach that answer. But that
| doesn't say much, one way or the other, about how the model
| treats the chain of thought when it legitimately is relying on
| it.
|
| It's like a young human taking a math test where you're told to
| "show your work". What I remember from high school is that the
| "work" you're supposed to show has strict formatting
| requirements, and may require you to use a specific method.
| Often there are other, easier methods to find the correct
| answer: for example, visual estimation in a geometry problem,
| or just using a different algorithm. So in practice you often
| figure out the answer first and then come up with the
| justification. As a result, your "work" becomes pretty
| disconnected from the final answer. If you don't understand the
| intended method, the "work" might end up being pretty BS while
| mysteriously still leading to the correct answer.
|
| But that only applies if you know an easier method! If you
| don't, then the work you show will be, essentially, your actual
| reasoning process. At most you might neglect to write down
| auxiliary factors that hint towards or away from a specific
| answer. If some number seems too large, or too difficult to
| compute for a test meant to be taken by hand, then you might
| think you've made a mistake; if an equation turns out to
| unexpectedly simplify, then you might think you're onto
| something. You're not supposed to write down that kind of
| intuition, only concrete algorithmic steps. But the concrete
| steps are still fundamentally an accurate representation of
| your thought process.
|
| (Incidentally, if you literally tell a CoT model to solve a
| math problem, it _is_ allowed to write down those types of
| auxiliary factors, and probably will. But I 'm treating this
| more as an analogy for CoT in general.)
|
| Also, a model has a harder time hiding its work than a human
| taking a math test. In a math test you can write down
| calculations that don't end up being part of the final shown
| work. A model can't, so any hidden computations are limited to
| the ones it can do "in its head". Though admittedly those are
| very different from what a human can do in their head.
| the_mitsuhiko wrote:
| > Of course we don't know what kind of information the model
| encodes in the specific token choices - I.e. the tokens might
| not mean to the model what we think they mean.
|
| What I think is interesting about this is that for the most
| part reading the reasoning output is something we can
| understand. The tokens as produced form english sentences, make
| intuitive sense. If we think of the reasoning output block as
| basically just "hidden state" then one could imagine that a
| there might be a more efficient representation that trades
| human understanding for just priming the internal state of the
| model.
|
| In some abstract sense you can already get that by asking the
| model to operate in different languages. My first experience
| with reasoning models where you could see the output of the
| thinking block I think was QwQ which just reasoned in Chinese
| most of the time, even if the final output was German. Deepseek
| will sometimes keep reasoning in English even if you ask it
| German stuff, sometimes it does reason in German. All in all,
| there might be a more efficient representation of the internal
| state if one forgoes human readable output.
| lpzimm wrote:
| Not exactly the same as this study, but I'll ask questions to
| LLMs with and without subtle hints to see if it changes the
| answer and it almost always does. For example, paraphrased:
|
| No hint: "I have an otherwise unused variable that I want to use
| to record things for the debugger, but I find it's often
| optimized out. How do I prevent this from happening?"
|
| Answer: 1. Mark it as volatile (...)
|
| Hint: "I have an otherwise unused variable that I want to use to
| record things for the debugger, but I find it's often optimized
| out. Can I solve this with the volatile keyword or is that a
| misconception?"
|
| Answer: Using volatile is a common suggestion to prevent
| optimizations, but it does not guarantee that an unused variable
| will not be optimized out. Try (...)
|
| This is Claude 3.7 Sonnet.
| pixl97 wrote:
| I mean, this sounds along the lines of human conversations that
| go like
|
| P1 "Hey, I'm doing A but X is happening"
|
| P2 "Have you tried doing Y?
|
| P1 "Actually, yea I am doing A.Y and X is still occurring"
|
| P2 "Oh, you have the special case where you need to do A.Z"
|
| What happens when you ask your first question with something
| like "what is the best practice to prevent this from happening"
| lpzimm wrote:
| Oh sorry, these are two separate chats, I wasn't clear. I
| would agree that if I had asked them in the same chat it
| would sound pretty normal.
|
| When I ask about best practices it does still give me the
| volatile keyword. (I don't even think that's wrong, when I
| threw it in Godbolt with -O3 or -Os I couldn't find a
| compiler that optimized it away.)
| nopelynopington wrote:
| Of course they don't.
|
| LLMs are a brainless algorithm that guesses the next word. When
| you ask them what they think they're also guessing the next word.
| No reason for it to match, except a trick of context
| afro88 wrote:
| Can a model even know that it used a hint? Or would it only say
| so if it was trained to say what parts of the context it used
| when asked? Because then it's statistically probable to say so?
| richardw wrote:
| One thing I think I've found is: reasoning models get more
| confident and that makes it harder to dislodge a wrong idea.
|
| It feels like I only have 5% of the control, and then it goes
| into a self-chat where it thinks it's right and builds on it's
| misunderstanding. So 95% of the outcome is driven by rambling,
| not my input.
|
| Windsurf seems to do a good job of regularly injecting guidance
| so it sticks to what I've said. But I've had some extremely
| annoying interactions with confident-but-wrong "reasoning"
| models.
| freehorse wrote:
| It is nonsense to take whatever an LLM writes in its CoT too
| seriously. I try to classify some messy data, writing "if X edge
| case appears, then do Y instead of Z". The model in its CoT took
| notice of X, wrote it should do Y and... it would not do it in
| the actual output.
|
| The only way to make actual use of LLMs imo is to treat them as
| what they are, a model that generates text based on some
| statistical regularities, without any kind of actual
| understanding or concepts behind that. If that is understood
| well, one can know how to setup things in order to optimise for
| desired output (or "alignment"). The way "alignment research"
| presents models as if they are _actually_ thinking or have
| intentions of their own (hence the choice of the word
| "alignment" for this) makes no sense.
| thoughtlede wrote:
| It feels to me that the hypothesis of this research was somewhat
| "begging the question". Reasoning models are trained to spit some
| tokens out that increase the chance of the models spitting the
| right answer at the end. That is, the training process is
| singularly optimizing for the right answer, not the reasoning
| tokens.
|
| Why would you then assume the reasoning tokens will include hints
| supplied in the prompt "faithfully"? The model may or may not
| include the hints - depending on whether the model activations
| believe those hints are necessary to arrive at the answer. In
| their experiments, they found between 20% and 40% of the time,
| the models included those hints. Naively, that sounds
| unsurprising to me.
|
| Even in the second experiment when they trained the model to use
| hints, the optimization was around the answer, not the tokens. I
| am not surprised the models did not include the hints because
| they are not trained to include the hints.
|
| That said, and in spite of me potentially coming across as an
| unsurprised-by-the-result reader, it is a good experiment because
| "now we have some experimental results" to lean into.
|
| Kudos to Anthropic for continuing to study these models.
| m3kw9 wrote:
| What would "think" mean? Processed the prompt? Or just accessed
| the part of the model where the weights are? This is a bit
| persudo science
| islewis wrote:
| > For the purposes of this experiment, though, we taught the
| models to reward hack [...] in this case rewarded the models for
| choosing the wrong answers that accorded with the hints.
|
| > This is concerning because it suggests that, should an AI
| system find hacks, bugs, or shortcuts in a task, we wouldn't be
| able to rely on their Chain-of-Thought to check whether they're
| cheating or genuinely completing the task at hand.
|
| As a non-expert in this field, I fail to see why a RL model
| taking advantage of it's reward is "concerning". My understanding
| is that the only difference between a good model and a reward-
| hacking model is if the end behavior aligns with human preference
| or not.
|
| The articles TL:DR reads to me as "We trained the model to behave
| badly, and it then behaved badly". I don't know if i'm missing
| something, or if calling this concerning might be a little bit
| sensationalist.
| bee_rider wrote:
| Chain of thought does have a minor advantage in the final "fish"
| example--the explanation blatantly contradicts itself to get to
| the cheated hint answer. A human reading it should be pretty
| easily able to tell that something fishy is going on...
|
| But, yeah, it is sort of shocking if anybody was using "chain of
| thought" as a reflection of some actual thought process going on
| in the model, right? The "thought," such as it is, is happening
| in the big pile of linear algebra, not the prompt or the
| intermediary prompts.
|
| Err... anyway, like, IBM was working on explainable AI years ago,
| and that company is a dinosaur. I'm not up on what companies like
| OpenAI are doing, but surely they aren't behind IBM in this
| stuff, right?
| madethisnow wrote:
| If something convinces you that it's aware then it is. Simulated
| computation IS computation itself. The territory is the map
| jxjnskkzxxhx wrote:
| Meh. People also invent justifications after the fact.
| EncomLab wrote:
| The use of highly anthropomorphic language is always problematic-
| Does a photo resistor controlled nightlight have a chain of
| thought? Does it reason about its threshold value? Does it have
| an internal model of what is light, what is dark, and the role it
| plays in demarcation between the two?
|
| Are the transistors executing the code within the confines even
| capable of intentionality? If so - where is it derived from?
| HammadB wrote:
| There is an abundance of discussion on this thread about whether
| models are intelligent or not.
|
| This binary is an utter waste of time.
|
| Instead focus on the gradient of intelligence - the set of
| cognitive skills any given system has and to what degree it has
| them.
|
| This engineering approach is more likely to lead to practical
| utility and progress.
|
| The view of intelligence as binary is incredibly corrosive to
| this field.
___________________________________________________________________
(page generated 2025-04-04 23:02 UTC)