[HN Gopher] Is chain-of-thought AI reasoning a mirage?
       ___________________________________________________________________
        
       Is chain-of-thought AI reasoning a mirage?
        
       Author : ingve
       Score  : 117 points
       Date   : 2025-08-14 13:48 UTC (9 hours ago)
        
 (HTM) web link (www.seangoedecke.com)
 (TXT) w3m dump (www.seangoedecke.com)
        
       | NitpickLawyer wrote:
       | Finally! A good take on that paper. I saw that arstechnica
       | article posted everywhere, and most of the comments are full of
       | confirmation bias, and almost all of them miss the fineprint - it
       | was tested on a 4 layer deep toy model. It's nice to read a post
       | that actually digs deeper and offers perspectives on what might
       | be a good finding vs. just warranting more research.
        
         | stonemetal12 wrote:
         | > it was tested on a 4 layer deep toy model
         | 
         | How do you see that impacting the results? It is the same
         | algorithm just on a smaller scale. I would assume a 4 layer
         | model would not be very good, but does reasoning improve it? Is
         | there a reason scale would impact the use of reasoning?
        
           | okasaki wrote:
           | Human babies are the same algorithm as adults.
        
           | azrazalea_debt wrote:
           | A lot of current LLM work is basically emergent behavior.
           | They use a really simple core algorithm and scale it up, and
           | interesting things happen. You can read some of anthropic's
           | recent papers to see some of this, like: They didn't expect
           | LLMs could "lookahead" when writing poetry. However, when
           | they actually went in and watched what was happening (there's
           | details on how this "watching" works on their blog/in their
           | studies) they found the LLM actually was planning ahead!
           | That's emergent behavior, they didn't design it to do that,
           | it just started doing due to the complexity of the model.
           | 
           | If (BIG if) we ever do see actual AGI, it is likely to work
           | like this. It's unlikely we're going to make AGI by designing
           | some grand Cathedral of perfect software, it is more likely
           | we are going to find the right simple principles to scale big
           | enough to have AGI emerge. This is similar.
        
             | mrspuratic wrote:
             | On that topic, it seems backwards to me: intelligence is
             | not emergent behaviour of language, rather the opposite.
        
           | NitpickLawyer wrote:
           | There's prior research that finds a connection between model
           | depth and "reasoning" ability -
           | https://arxiv.org/abs/2503.03961
           | 
           | A depth of 4 is very small. It is very much a toy model. It's
           | _ok_ to research this, and maybe someone will try it out on
           | larger models, but it 's totally _not_ ok to lead with the
           | conclusion, based on this toy model, IMO.
        
       | sempron64 wrote:
       | Betteridge's Law of Headlines.
       | 
       | https://en.m.wikipedia.org/wiki/Betteridge's_law_of_headline...
        
         | mwkaufma wrote:
         | Betteridge's law applies to editors adding question marks to
         | cover-the-ass of articles with weak claims, not bloggers
         | begging questions.
        
       | robviren wrote:
       | I feel it is interesting but not what would be ideal. I really
       | think if the models could be less linear and process over time in
       | latent space you'd get something much more akin to thought. I've
       | messed around with attaching reservoirs at each layer using hooks
       | with interesting results (mainly over fitting), but it feels like
       | such a limitation to have all model context/memory stuck as
       | tokens when latent space is where the richer interaction lives.
       | Would love to see more done where thought over time mattered and
       | the model could almost mull over the question a bit before being
       | obligated to crank out tokens. Not an easy problem, but
       | interesting.
        
         | dkersten wrote:
         | Agree! I'm not an AI engineer or researcher, but it always
         | struck me as odd that we would serialise the 100B or whatever
         | parameters of latent space down to maximum 1M tokens and back
         | for every step.
        
         | vonneumannstan wrote:
         | >I feel it is interesting but not what would be ideal. I really
         | think if the models could be less linear and process over time
         | in latent space you'd get something much more akin to thought.
         | 
         | Please stop, this is how you get AI takeovers.
        
           | adastra22 wrote:
           | Citation seriously needed.
        
         | CuriouslyC wrote:
         | They're already implementing branching thought and taking the
         | best one, eventually the entire response will be branched, with
         | branches being spawned and culled by some metric over the
         | lifetime of the completion. It's just not feasible now for
         | performance reasons.
        
       | mentalgear wrote:
       | > Whether AI reasoning is "real" reasoning or just a mirage can
       | be an interesting question, but it is primarily a philosophical
       | question. It depends on having a clear definition of what "real"
       | reasoning is, exactly.
       | 
       | It's pretty easy: causal reasoning. Causal, not statistic
       | correlation only as LLM do, with or without "CoT".
        
         | naasking wrote:
         | Define causal reasoning?
        
         | glial wrote:
         | Correct me if I'm wrong, I'm not sure it's so simple. LLMs are
         | called causal models in the sense that earlier tokens "cause"
         | later tokens, that is, later tokens are causally dependent on
         | what the earlier tokens are.
         | 
         | If you mean deterministic rather than probabilistic, even
         | Pearl-style causal models are probabilistic.
         | 
         | I think the author is circling around the idea that their idea
         | of reasoning is to produce statements in a formal system: to
         | have a set of axioms, a set of production rules, and to
         | generate new strings/sentences/theorems using those rules. This
         | approach is how math is formalized. It allows us to extrapolate
         | - make new "theorems" or constructions that weren't in the
         | "training set".
        
           | jayd16 wrote:
           | By this definition a bag of answers is causal reasoning
           | because we previously filled the bag, which caused what we
           | pulled. State causing a result is not causal reasoning.
           | 
           | You need to actually have something that deduces a result
           | from a set of principles that form a logical conclusion or
           | the understanding that more data is needed to make a
           | conclusion. That is clearly different than finding a likely
           | next token on statics alone, despite the fact the statical
           | answer can be correct.
        
           | apples_oranges wrote:
           | But let's say you change your mathematical expression by
           | reducing or expanding it somehow, then, unless it's trivial,
           | there are infinite ways to do it, and the "cause" here is the
           | answer to the question of "why did you do that and not
           | something else"? Brute force excluded, the cause is probably
           | some idea, some model of the problem or a gut feeling (or
           | desperation..) ..
        
           | stonemetal12 wrote:
           | Smoking increases the risk of getting cancer significantly.
           | We say Smoking causes Cancer. Causal reasoning can be
           | probabilistic.
           | 
           | LLMs are not causal reasoning because there are no facts,
           | only tokens. For the most part you can't ask LLMs how they
           | came to an answer, because it doesn't know.
        
         | lordnacho wrote:
         | What's stopping us from building an LLM that can build causal
         | trees, rejecting some trees and accepting others based on
         | whatever evidence it is fed?
         | 
         | Or even a causal tool for an LLM agent that operates like what
         | it does when you ask it about math and forwards the request to
         | Wolfram.
        
           | suddenlybananas wrote:
           | >What's stopping us from building an LLM that can build
           | causal trees, rejecting some trees and accepting others based
           | on whatever evidence it is fed?
           | 
           | Exponential time complexity.
        
         | mdp2021 wrote:
         | > _causal reasoning_
         | 
         | You have missed the foundation: before dynamics, being. Before
         | causal reasoning you have deep definition of concepts.
         | Causality is "below" that.
        
       | empath75 wrote:
       | One thing that LLMs have exposed is how much of a house of cards
       | all of our definitions of "human mind"-adjacent concepts are. We
       | have a single example in all of reality of a being that thinks
       | like we do, and so all of our definitions of thinking are
       | inextricably tied with "how humans think", and now we have an
       | entity that does things which seem to be very like how we think,
       | but not _exactly like it_, and a lot of our definitions don't
       | seem to work any more:
       | 
       | Reasoning, thinking, knowing, feeling, understanding, etc.
       | 
       | Or at the very least, our rubrics and heuristics for determining
       | if someone (thing) thinks, feels, knows, etc, no longer work. And
       | in particular, people create tests for those things thinking that
       | they understand what they are testing for, when _most human
       | beings_ would also fail those tests.
       | 
       | I think a _lot_ of really foundational work needs to be done on
       | clearly defining a lot of these terms and putting them on a
       | sounder basis before we can really move forward on saying whether
       | machines can do those things.
        
         | gdbsjjdn wrote:
         | Congratulations, you've invented philosophy.
        
           | empath75 wrote:
           | This is an obnoxious response. Of course I recognize that
           | philosophy is the solution to this. What I am pointing out is
           | that philosophy has not as of yet resolved these relatively
           | new problems. The idea that non-human intelligences might
           | exist is of course an old one, but that is different from
           | having an actual (potentially) existing one to reckon with.
        
             | adastra22 wrote:
             | These are not new problems though.
        
             | deadbabe wrote:
             | Non-human intelligences have always existed in the form of
             | animals.
             | 
             | Animals do not have spoken language the way humans do, so
             | their thoughts aren't really composed of sentences. Yet,
             | they have intelligence and can reason about their world.
             | 
             | How could we build an AGI that doesn't use language to
             | think at all? We have no fucking clue and won't for a while
             | because everyone is chasing the mirage created by LLMs. AI
             | winter will come and we'll sit around waiting for the next
             | big innovation. Probably some universal GOAP with deeply
             | recurrent neural nets.
        
             | gdbsjjdn wrote:
             | > Writings on metacognition date back at least as far as
             | two works by the Greek philosopher Aristotle (384-322 BC):
             | On the Soul and the Parva Naturalia
             | 
             | We built a box that spits out natural language and tricks
             | humans into believing it's conscious. The box itself
             | actually isn't that interesting, but the human side of the
             | equation is.
        
               | mdp2021 wrote:
               | > _the human side of the equation is_
               | 
               | You have only proven the urgency of Intelligence, the
               | need to produce it in inflationary amounts.
        
           | meindnoch wrote:
           | We need to reinvent philosophy. With JSON this time.
        
         | mdp2021 wrote:
         | > _which seem to be very like how we think_
         | 
         | I would like to reassure you that we - we here - see LLMs are
         | very much unlike us.
        
           | empath75 wrote:
           | Yes I very much understand that most people do not think that
           | LLMs think or understand like we do, but it is _very
           | difficult_ to prove that that is the case, using any test
           | which does not also exclude a great deal of people. And that
           | is because "thinking like we do" is not at all a well-defined
           | concept.
        
             | mdp2021 wrote:
             | > _exclude a great deal of people_
             | 
             | And why should you not exclude them. Where does this idea
             | come from, taking random elements as models. Where do you
             | see pedestals of free access? Is the Nobel Prize a raffle
             | now?
        
         | gilbetron wrote:
         | I agree 100% with you. I'm most excited about LLMs because they
         | seem to capture at least some aspect of intelligence, and
         | that's amazing given how much long it took to get here. It's
         | exciting that we don't just understand it.
         | 
         | I see people say, "LLMs aren't human intelligence", but
         | instead, I really feel that it shows that many people, and much
         | of what we do, probably is like an LLM. Most people just
         | hallucinate their way through a conversation, they certainly
         | don't reason. Reasoning is incredibly rare.
        
       | naasking wrote:
       | > Because reasoning tasks require choosing between several
       | different options. "A B C D [M1] -> B C D E" isn't reasoning,
       | it's computation, because it has no mechanism for thinking "oh, I
       | went down the wrong track, let me try something else". That's why
       | the most important token in AI reasoning models is "Wait". In
       | fact, you can control how long a reasoning model thinks by
       | arbitrarily appending "Wait" to the chain-of-thought. Actual
       | reasoning models change direction all the time, but this paper's
       | toy example is structurally incapable of it.
       | 
       | I think this is the most important critique that undercuts the
       | paper's claims. I'm less convinced by the other point. I think
       | backtracking and/or parallel search is something future papers
       | should definitely look at in smaller models.
       | 
       | The article is definitely also correct on the overreaching, broad
       | philosophical claims that seems common when discussing AI and
       | reasoning.
        
       | mucho_mojo wrote:
       | This paper I read from here has an interesting mathematical model
       | for reasoning based on cognitive science.
       | https://arxiv.org/abs/2506.21734 (there is also code here
       | https://github.com/sapientinc/HRM) I think we will see dramatic
       | performance increases on "reasoning" problems when this is worked
       | into existing AI architectures.
        
       | stonemetal12 wrote:
       | When Using AI they say "Context is King". "Reasoning" models are
       | using the AI to generate context. They are not reasoning in the
       | sense of logic, or philosophy. Mirage, whatever you want to call
       | it, it is rather unlike what people mean when they use the term
       | reasoning. Calling it reasoning is up there with calling
       | generating out put people don't like hallucinations.
        
         | adastra22 wrote:
         | You are making the same mistake OP is calling out. As far as I
         | can tell "generating context" is exactly what human reasoning
         | is too. Consider the phrase "let's reason this out" where you
         | then explore all options in detail, before pronouncing your
         | judgement. Feels exactly like what the AI reasoner is doing.
        
           | stonemetal12 wrote:
           | "let's reason this out" is about gathering all the facts you
           | need, not just noting down random words that are related. The
           | map is not the terrain, words are not facts.
        
             | energy123 wrote:
             | Performance is proportional to the number of reasoning
             | tokens. How to reconcile that with your opinion that they
             | are "random words"?
        
               | kelipso wrote:
               | Technically random can have probabilities associated with
               | them.. Casual speech, random means equal probabilities,
               | or we don't know the probabilities. But for LLM token
               | output, it does estimate the probabilities.
        
               | blargey wrote:
               | s/random/statistically-likely/g
               | 
               | Reducing the distance of each statistical leap improves
               | "performance" since you would avoid failure modes that
               | are specific to the largest statistical leaps, but it
               | doesn't change the underlying mechanism. Reasoning models
               | still "hallucinate" spectacularly even with "shorter"
               | gaps.
        
               | ikari_pl wrote:
               | What's wrong with statistically likely?
               | 
               | If I ask you what's 2+2, there's a single answer I
               | consider much more likely than others.
               | 
               | Sometimes, words are likely because they are grounded in
               | ideas and facts they represent.
        
               | blargey wrote:
               | > Sometimes, words are likely because they are grounded
               | in ideas and facts they represent.
               | 
               | Yes, and other times they are not. I think the failure
               | modes of a statistical model of a communicative model of
               | thought are unintuitive enough without any added layers
               | of anthropomorphization, so there remains some value in
               | pointing it out.
        
             | CooCooCaCha wrote:
             | Reasoning is also about _processing_ facts.
        
             | ThrowawayTestr wrote:
             | Have you read the chain of thought output from reasoning
             | models? That's not what it does.
        
           | mdp2021 wrote:
           | But a big point here becomes whether the generated "context"
           | then receives proper processing.
        
           | slashdave wrote:
           | Perhaps we can find some objective means to decide, rather
           | than go with what "feels" correct
        
           | phailhaus wrote:
           | Feels like, but isn't. When you are reasoning things out,
           | there is a brain with state that is actively modeling the
           | problem. AI does no such thing, it produces text and then
           | uses that text to condition the next text. If it isn't
           | written, it does not exist.
           | 
           | Put another way, LLMs are good at talking like they are
           | thinking. That can get you pretty far, but it is not
           | reasoning.
        
             | double0jimb0 wrote:
             | So exactly what language/paradigm is this brain modeling
             | the problem within?
        
               | phailhaus wrote:
               | We literally don't know. We don't understand how the
               | brain stores concepts. It's not necessarily language:
               | there are people that do not have an internal monologue,
               | and yet they are still capable of higher level thinking.
        
               | chrisweekly wrote:
               | Rilke: "There is a depth of thought untouched by words,
               | and deeper still a depth of formless feeling untouched by
               | thought."
        
             | Enginerrrd wrote:
             | The transformer architecture absolutely keeps state
             | information "in its head" so to speak as it produces the
             | next word prediction, and uses that information in its
             | compute.
             | 
             | It's true that if it's not producing text, there is no
             | thinking involved, but it is absolutely NOT clear that the
             | attention block isn't holding state and modeling something
             | as it works to produce text predictions. In fact, I can't
             | think of a way to define it that would make that untrue...
             | unless you mean that there isn't a system wherein something
             | like attention is updating/computing and the model itself
             | _chooses_ when to make text predictions. That 's by design,
             | but what you're arguing doesn't really follow.
             | 
             | Now, whether what the model is thinking about inside that
             | attention block matches up exactly or completely with the
             | text it's producing as generated context is probably at
             | least a little dubious, and its unlikely to be a complete
             | representation regardless.
        
               | dmacfour wrote:
               | > The transformer architecture absolutely keeps state
               | information "in its head" so to speak as it produces the
               | next word prediction, and uses that information in its
               | compute.
               | 
               | How so? Transformers are state space models.
        
           | kelipso wrote:
           | No, people make logical connections, make inferences, make
           | sure all of it fits together without logical errors, etc.
        
             | pixl97 wrote:
             | These people you're talking about must be rare online, as
             | human communication is pretty rife with logical errors.
        
               | mdp2021 wrote:
               | Since that November in which this technology boomed we
               | have been much too often reading "people also drink from
               | puddles", as if it were standard practice.
               | 
               | That we implement skills, not deficiencies, is a basic
               | concept that is getting to such a level of needed
               | visibility it should probably be inserted in the
               | guidelines.
               | 
               |  _We implement skills, not deficiencies._
        
               | kelipso wrote:
               | You shouldn't be basing your entire worldview around the
               | lowest common denominator. All kinds of writers like blog
               | writers, novelists, scriptwriters, technical writers,
               | academics, poets, lawyers, philosophers, mathematicians,
               | and even teenage fan fiction writers do what I said above
               | routinely.
        
           | viccis wrote:
           | >As far as I can tell "generating context" is exactly what
           | human reasoning is too.
           | 
           | This was the view of Hume (humans as bundles of experience
           | who just collect information and make educated guesses for
           | everything). Unfortunately, it leads to philosophical
           | skepticism, in which you can't ground any knowledge
           | absolutely, as it's all just justified by some knowledge you
           | got from someone else, which also came from someone else,
           | etc., and eventually you can't actually justify any knowledge
           | that isn't directly a result of experience (the concept of
           | "every effect has a cause" is a classic example).
           | 
           | There have been plenty of epistemological responses to this
           | viewpoint, with Kant's view, of humans doing a mix of
           | "gathering context" (using our senses) but also applying
           | universal categorical reasoning to schematize and understand
           | / reason from the objects we sense, being the most well
           | known.
           | 
           | I feel like anyone talking about the epistemology of AI
           | should spend some time reading the basics of all of the
           | thought from the greatest thinkers on the subject in
           | history...
        
             | js8 wrote:
             | > I feel like anyone talking about the epistemology of AI
             | should spend some time reading the basics
             | 
             | I agree, I think the problem with AI is we don't know or
             | haven't formalized enough what epistemology should AGI
             | systems have. Instead, people are looking for shortcuts,
             | feeding huge amount of data into the models, hoping it will
             | self-organize into something that humans actually want.
        
         | bongodongobob wrote:
         | And yet it improves their problem solving ability.
        
         | ofjcihen wrote:
         | It's incredible to me that so many seem to have fallen for
         | "humans are just LLMs bruh" argument but I think I'm beginning
         | to understand the root of the issue.
         | 
         | People who only "deeply" study technology only have that frame
         | of reference to view the world so they make the mistake of
         | assuming everything must work that way, including humans.
         | 
         | If they had a wider frame of reference that included, for
         | example, Early Childhood Development, they might have enough
         | knowledge to think outside of this box and know just how
         | ridiculous that argument is.
        
           | gond wrote:
           | That is an issue prevalent in the western world for the last
           | 200 years, beginning possibly with the Industrial Revolution,
           | probably earlier. That problem is reductionism, consequently
           | applied down to the last level: discover the smallest element
           | of every field of science, develop an understanding of all
           | the parts from the smallest part upwards and develop, from
           | the understanding of the parts, an understanding of the
           | whole.
           | 
           | Unfortunately, this approach does not yield understanding, it
           | yields know-how.
        
             | Kim_Bruning wrote:
             | Taking things apart to see how they tick is called
             | reduction, but (re)assembling the parts is emergence.
             | 
             | When you reduce something to its components, you lose
             | information on how the components work together. Emergence
             | 'finds' that information back.
             | 
             | Compare differentiation and integration, which lose and
             | gain terms respectively.
             | 
             | In some cases, I can imagine differentiating and
             | integrating certain functions actually would even be a
             | direct demonstration of reduction and emergence.
        
           | dmacfour wrote:
           | I have a background in ML and work in software development,
           | but studied experimental psych in a past life. It's actually
           | kind of painful watching people slap phases related to
           | cognition onto things that aren't even functionally
           | equivalent to their namesakes, then parade them around like
           | some kind of revelation. It's also a little surprising that
           | there no interest (at least publicly) in using cognitive
           | architectures in the development of AI systems.
        
         | cyanydeez wrote:
         | They should call them Fuzzing models. They're just running
         | through varioous iterations of the context until they hit a
         | token that trips them out.
        
         | benreesman wrote:
         | People will go to extremely great lengths to debate the
         | appropriate analogy for how these things work, which is fun I
         | guess but in a "get high with a buddy" sense at least to my
         | taste.
         | 
         | Some of how they work is well understood (a lot now, actually),
         | some of the outcomes are still surprising.
         | 
         | But we debate both the well understood parts and the surprising
         | parts _both_ with the wrong terminology borrowed from pretty
         | dubious corners of pop cognitive science, and not with
         | terminology appropriate to the new and different thing! It 's
         | nothing like a brain, it's a new different thing. Does it think
         | or reason? Who knows pass the blunt.
         | 
         | They do X performance on Y task according to Z eval, that's how
         | you discuss ML model capability if you're persuing
         | understanding rather than fundraising or clicks.
        
           | Vegenoid wrote:
           | While I largely agree with you, more abstract judgements must
           | be made as the capabilities (and therefore tasks being
           | completed) become increasingly general. Attempts to boil
           | human intellectual capability down to "X performance on Y
           | task according to Z eval" can be useful, but are famously
           | incomplete and insufficient on their own for making good
           | decisions about which humans (a.k.a. which general
           | intelligences) are useful and how to utilize and improve
           | them. Boiling down highly complex behavior into a small
           | number of metrics loses a lot of detail.
           | 
           | There is also the desire to discover _why_ a model that
           | outperforms others does so, so that the successful technique
           | can be refined and applied elsewhere. This too usually
           | requires more approaches than metric comparison.
        
       | moc_was_wronged wrote:
       | Mostly. It gives language models the way to dynamically allocate
       | computation time, but the models are still fundamentally
       | imitative.
        
       | modeless wrote:
       | "The question [whether computers can think] is just as relevant
       | and just as meaningful as the question whether submarines can
       | swim." -- Edsger W. Dijkstra, 24 November 1983
        
         | mdp2021 wrote:
         | But the topic here is whether some techniques are progressive
         | or not
         | 
         | (with a curious parallel about whether some paths in thought
         | are dead-ends - the unproductive focus mentioned in the
         | article).
        
         | griffzhowl wrote:
         | I don't agree with the parallel. Submarines can move through
         | water - whether you call that swimming or not isn't an
         | interesting question, and doesn't illuminate the function of a
         | submarine.
         | 
         | With thinking or reasoning, there's not really a precise
         | definition of what it is, but we nevertheless know that
         | currently LLMs and machines more generally can't reproduce many
         | of the human behaviours that we refer to as thinking.
         | 
         | The question of what tasks machines can currently accomplish is
         | certainly meaningful, if not urgent, and the reason LLMs are
         | getting so much attention now is that they're accomplishing
         | tasks that machines previously couldn't do.
         | 
         | To some extent there might always remain a question about
         | whether we call what the machine is doing "thinking" - but
         | that's the uninteresting verbal question. To get at the
         | meaningful questions we might need a more precise or higher
         | resolution map of what we mean by thinking, but the crucial
         | element is what functions a machine can perform, what tasks it
         | can accomplish, and whether we call that "thinking" or not
         | doesn't seem important.
         | 
         | Maybe that was even Dijkstra's point, but it's hard to tell
         | without context...
        
           | wizzwizz4 wrote:
           | https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD898.
           | .. provides the context. I haven't re-read it in the last
           | month, but I'm pretty sure you've correctly identified
           | Dijkstra's point.
        
           | modeless wrote:
           | It is strange that you started your comment with "I don't
           | agree". The rest of the comment demonstrates that you do
           | agree.
        
             | griffzhowl wrote:
             | To be more clear about why I disagree the cases are
             | parallel:
             | 
             | We know how a submarine moves through water, whether it's
             | "swimming" isn't an interesting question.
             | 
             | We don't know to what extent a machine can reproduce the
             | cognitive functions of a human. There are substantive and
             | significant questions about whether or to what extent a
             | particular machine or program can reproduce human cognitive
             | functions.
             | 
             | So I might have phrased my original comment badly. It
             | doesn't matter if we use the word "thinking" or not, but it
             | does matter if a machine can reproduce the human cognitive
             | functions, and if that's what we mean by the question
             | whether a machine can think, then it does matter.
        
               | modeless wrote:
               | "We know how it moves" is not the reason the question of
               | whether a submarine swims is not interesting. It's
               | because the question is mainly about the definition of
               | the word "swim" rather than about capabilities.
               | 
               | > if that's what we mean by the question whether a
               | machine can think
               | 
               | That's the issue. The question of whether a machine can
               | think (or reason) is a question of word definitions, not
               | capabilities. The capabilities questions are the ones
               | that matter.
        
               | griffzhowl wrote:
               | > The capabilities questions are the ones that matter.
               | 
               | Yes, that's what I'm saying. I also think there's a clear
               | sense in which asking whether machines can think is a
               | question about capabilities, even though we would need a
               | more precise definition of "thinking" to be able to
               | answer it.
               | 
               | So that's how I'd sum it up: we know the capabilities of
               | submarines, and whether we say they're swimming or not
               | doesn't answer any further question about those
               | capabilities. We don't know the capabilities of machines;
               | the interesting questions are about what they can do, and
               | one (imprecise) way of asking that question is whether
               | they can think
        
       | skybrian wrote:
       | Mathematical reasoning does sometimes require correct
       | calculations, and if you get them wrong your answers will be
       | wrong. I wouldn't want someone doing my taxes to be bad at
       | calculation or bad at finding mistakes in calculation.
       | 
       | It would be interesting to see if this study's results can be
       | reproduced in a more realistic setting.
        
       | slashdave wrote:
       | > reasoning probably requires language use
       | 
       | The author has a curious idea of what "reasoning" entails.
        
       | sixdimensional wrote:
       | I feel like the fundamental concept of symbolic logic[1] as a
       | means of reasoning fits within the capabilities of LLMs.
       | 
       | Whether it's a mirage or not, the ability to produce a
       | symbolically logical result that has valuable meaning seems real
       | enough to me.
       | 
       | Especially since most meaning is assigned by humans onto the
       | world... so too can we choose to assign meaning (or not) to the
       | output of a chain of symbolic logic processing?
       | 
       | Edit: maybe it is not so much that an LLM calculates/evaluates
       | the result of symbolic logic as it is that it "follows" the
       | pattern of logic encoded into the model.
       | 
       | [1] https://en.wikipedia.org/wiki/Logic
        
       | lawrence1 wrote:
       | we should be asking if reasoning while speaking is even possible
       | for humans. this is why we have the scientific method and that's
       | why LLMs write and run unit tests on their reasoning. But yeah
       | intelligence is probably in the ear of the believer.
        
       | hungmung wrote:
       | Chain of thought is just a way of trying to squeeze more juice
       | out of the lemon of LLM's; I suspect we're at the stage of
       | running up against diminishing returns and we'll have to move to
       | different foundational models to see any serious improvement.
        
       | brunokim wrote:
       | I'm unconvinced by the article criticism's, given they also
       | employ their feels and few citations.
       | 
       | > I appreciate that research has to be done on small models, but
       | we know that reasoning is an emergent capability! (...) Even if
       | you grant that what they're measuring is reasoning, I am
       | profoundly unconvinced that their results will generalize to a
       | 1B, 10B or 100B model.
       | 
       | A fundamental part of applied research is simplifying a real-
       | world phenomenon to better understand it. Dismissing that for
       | this many parameters, for such a simple problem, the LLM can't
       | perform out of distribution just because it's not big enough
       | undermines the very value of independent research. Tomorrow
       | another model with double the parameters may or may not show the
       | same behavior, but that finding will be built on top of this one.
       | 
       | Also, how do _you_ know that reasoning is emergent, and not
       | rationalising on top of a compressed version of the web stored in
       | 100B parameters?
        
         | ActionHank wrote:
         | I think that when you are arguing logic and reason with a group
         | who became really attached to the term vibe-coding you've
         | likely already lost.
        
       | LudwigNagasena wrote:
       | > The first is that reasoning probably requires language use.
       | Even if you don't think AI models can "really" reason - more on
       | that later - even simulated reasoning has to be reasoning in
       | human language.
       | 
       | That is an unreasonable assumption. In case of LLMs it seems
       | wasteful to transform a point from latent space into a random
       | token and lose information. In fact, I think in near future it
       | will be the norm for MLLMs to "think" and "reason" without
       | outputting a single "word".
       | 
       | > Whether AI reasoning is "real" reasoning or just a mirage can
       | be an interesting question, but it is primarily a philosophical
       | question. It depends on having a clear definition of what "real"
       | reasoning is, exactly.
       | 
       | It is not a "philosophical" (by which the author probably meant
       | "practically inconsequential") question. If the whole reasoning
       | business is just rationalization of pre-computed answers or
       | simply a means to do some computations because every token
       | provides only a fixed amount of computation to update the model's
       | state, then it doesn't make much sense to focus on improving the
       | quality of chain-of-thought output from human POV.
        
         | kazinator wrote:
         | Not all reasoning requires language. Symbolic reasoning uses
         | language.
         | 
         | Real-time spatial reasoning like driving a car and not hitting
         | things does not seem linguistic.
         | 
         | Figuring out how to rotate a cabinet so that it will clear
         | through a stairwell also doesn't seem like it requires
         | language, only to communicate the solution to someone else
         | (where language can turn into a hindrance, compared to a
         | diagram or model).
        
           | llllm wrote:
           | Pivot!
        
             | kazinator wrote:
             | Can we be Friends?
        
         | vmg12 wrote:
         | Solutions to some of the hardest problems I've had have only
         | come after a night of sleep or when I'm out on a walk and I'm
         | not even thinking about the problem. Maybe what my brain was
         | doing was something different from reasoning?
        
           | andybak wrote:
           | This is a very important point and mostly absent from the
           | conversation.
           | 
           | We have many words that almost mean the same thing or can
           | mean ment different things - and conversations about
           | intelligence and consciousness are riddled with them.
        
             | tempodox wrote:
             | > This is a very important point and mostly absent from the
             | conversation.
             | 
             | That's because when humans are mentioned at all in the
             | context of coding with "AI", it's mostly as bad and buggy
             | simulations of those perfect machines.
        
         | safety1st wrote:
         | I'm pretty much a layperson in this field, but I don't
         | understand why we're trying to teach a stochastic text
         | transformer to reason. Why would anyone expect that approach to
         | work?
         | 
         | I would have thought the more obvious approach would be to
         | couple it to some kind of symbolic logic engine. It might
         | transform plain language statements into fragments conforming
         | to a syntax which that engine could then parse
         | deterministically. This is the Platonic ideal of reasoning that
         | the author of the post pooh-poohs, I guess, but it seems to me
         | to be the whole point of reasoning; reasoning is the
         | application of logic in evaluating a proposition. The LLM might
         | be trained to generate elements of the proposition, but it's
         | too random to apply logic.
        
           | shannifin wrote:
           | Problem is, even with symbolic logic, reasoning is not
           | completely deterministic. Whether one can get to a set of
           | given axioms from a given proposition is sometimes
           | undecidable.
        
         | limaoscarjuliet wrote:
         | > In fact, I think in near future it will be the norm for MLLMs
         | to "think" and "reason" without outputting a single "word".
         | 
         | It will be outputting something, as this is the only way it can
         | get more compute - output a token, then all context + the next
         | token is fed through the LLM again. It might not be presented
         | to the user, but that's a different story.
        
         | potsandpans wrote:
         | > It is not a "philosophical" (by which the author probably
         | meant "practically inconsequential") question.
         | 
         | I didn't take it that way. I suppose it depends on whether or
         | not you believe philosophy is legitimate
        
         | pornel wrote:
         | You're looking at this from the perspective of what would make
         | sense for the model to produce. Unfortunately, what really
         | dictates the design of the models is what we can train the
         | models with (efficiently, at scale). The output is then roughly
         | just the reverse of the training. We don't even want AI to be
         | an "autocomplete", but we've got tons of text, and a relatively
         | efficient method of training on all prefixes of a sentence at
         | the same time.
         | 
         | There have been experiments with preserving embedding vectors
         | of the tokens exactly without loss caused by round-tripping
         | through text, but the results were "meh", presumably because it
         | wasn't the _input_ format the model was trained on.
         | 
         | It's conceivable that models trained on some vector "neuralese"
         | that is completely separate from text would work better, but
         | it's a catch 22 for training: the internal representations
         | don't exist in a useful sense until the model is trained, so we
         | don't have anything to feed into the models to make them use
         | them. The internal representations also don't stay stable when
         | the model is trained further.
        
       | skywhopper wrote:
       | I mostly agree with the point the author makes that "it doesn't
       | matter". But then again, it does matter, because LLM-based
       | products are marketed based on "IT CAN REASON!" And so, while it
       | may not matter, per se, how an LLM comes up with its results, to
       | the extent that people choose to rely on LLMs because of
       | marketing pitches, it's worth pushing back on those claims if
       | they are overblown, using the same frame that the marketers use.
       | 
       | That said, this author says this question of whether models "can
       | reason" is the least interesting thing to ask. But I think the
       | least interesting thing you can do is to go around taking every
       | complaint about LLM performance and saying "but humans do the
       | exact same thing!" Which is often not true, but again, _doesn 't
       | matter_.
        
       | cess11 wrote:
       | Yes, it's a mirage, since this type of software is an opaque
       | simulation, perhaps even a simulacra. It's reasoning in the same
       | sense as there are terrorists in a game of Counter-Strike.
        
       | jrm4 wrote:
       | Current thought, for me there's a lot of hand-wringing about what
       | is "reasoning" and what isn't. But right now perhaps the question
       | might be boiled down to -- "is the bottleneck merely hard drive
       | space/memory/computing speed?"
       | 
       | I kind of feel like we won't be able to even begin to test this
       | until a few more "Moore's law" cycles.
        
       | j45 wrote:
       | Currently it feels like it's more simulated chain-of-thought /
       | reasoning, sometimes very consistent, but simulated, partially
       | because it's statistically generated and non-deterministic (not
       | the exact same path to the similar or same each response run).
        
       | js8 wrote:
       | I think LLM's chain of thought is reasoning. When trained, LLM
       | sees lot of examples like "All men are mortal. Socrates is a
       | man." followed by "Therefore, Socrates is mortal.". This causes
       | the transformer to learn rule "All A are B. C is A." is often
       | followed by "Therefore, C is B." And so it can apply this logical
       | rule, predictively. (I have converted the example from latent
       | space to human language for clarity.)
       | 
       | Unfortunately, sometimes LLM also learns "All A are C. All B are
       | C." is followed by "Therefore, A is B.", due to bad example in
       | the training data. (More insidiously, it might learn this rule
       | only in a special case.)
       | 
       | So it learns some logic rules but not consistently. This lack of
       | consistency will cause it to fail on larger problems.
       | 
       | I think NNs (transformers) could be great in heuristic suggesting
       | which valid logical rules (could be even modal or fuzzy logic) to
       | apply in order to solve a certain formalized problem, but not so
       | great at coming up with the logic rules themselves. They could
       | also be great at transforming the original problem/question from
       | human language into some formal logic, that would then be
       | resolved using heuristic search.
        
       | gshulegaard wrote:
       | > but we know that reasoning is an emergent capability!
       | 
       | Do we though? There is widespread discussion and growing momentum
       | of belief in this, but I have yet to see conclusive evidence of
       | this. That is, in part, why the subject paper exists...it seeks
       | to explore this question.
       | 
       | I think the author's bias is bleeding fairly heavily into his
       | analysis and conclusions:
       | 
       | > Whether AI reasoning is "real" reasoning or just a mirage can
       | be an interesting question, but it is primarily a philosophical
       | question. It depends on having a clear definition of what "real"
       | reasoning is, exactly.
       | 
       | I think it's pretty obvious that the researchers are exploring
       | whether or not LLMs exhibit evidence of _Deductive_ Reasoning
       | [1]. The entire experiment design reflects this. Claiming that
       | they haven't defined reasoning and therefore cannot conclude or
       | hope to construct a viable experiment is...confusing.
       | 
       | The question of whether or not an LLM can take a set of base
       | facts and compose them to solve a novel/previously unseen problem
       | is interesting and what most people discussing emergent reasoning
       | capabilities of "AI" are tacitly referring to (IMO). Much like
       | you can be taught algebraic principles and use them to solve for
       | "x" in equations you have never seen before, can an LLM do the
       | same?
       | 
       | To which I find this experiment interesting enough. It presents a
       | series of facts and then presents the LLM with tasks to see if it
       | can use those facts in novel ways not included in the training
       | data (something a human might reasonably deduce). To which their
       | results and summary conclusions are relevant, interesting, and
       | logically sound:
       | 
       | > CoT is not a mechanism for genuine logical inference but rather
       | a sophisticated form of structured pattern matching,
       | fundamentally bounded by the data distribution seen during
       | training. When pushed even slightly beyond this distribution, its
       | performance degrades significantly, exposing the superficial
       | nature of the "reasoning" it produces.
       | 
       | > The ability of LLMs to produce "fluent nonsense"--plausible but
       | logically flawed reasoning chains--can be more deceptive and
       | damaging than an outright incorrect answer, as it projects a
       | false aura of dependability.
       | 
       | That isn't to say LLMs aren't useful, just exploring it's
       | boundaries. To use legal services as an example, using an LLM to
       | summarize or search for relevant laws, cases, or legal precedent
       | is something it would excel at. But don't ask an LLM to formulate
       | a logical rebuttal to an opposing council's argument using those
       | references.
       | 
       | Larger models and larger training corpuses will expand that
       | domain and make it more difficult for individuals to discern this
       | limit; but just because you can no longer see a limit doesn't
       | mean there is none.
       | 
       | And to be clear, this doesn't diminish the value of LLMs. Even
       | without true logical reasoning LLMs are quite powerful and useful
       | tools.
       | 
       | [1] https://en.wikipedia.org/wiki/Logical_reasoning
        
       | dawnofdusk wrote:
       | >but we know that reasoning is an emergent capability!
       | 
       | This is like saying in the 70s that we know only the US is
       | capable of sending a man to the moon. Just because the reasoning
       | developed in a particular context means very little about what
       | the bare minimum requirements for that reasoning are.
       | 
       | Overall I am not a fan of this blogpost. It's telling how long
       | the author gets hung up on a paper making "broad philosophical
       | claims about reasoning", based on what reads to me as fairly
       | typical scientific writing style. It's also telling how highly
       | cherry-picked the quotes they criticize from the paper are. Here
       | is some fuller context:
       | 
       | >An expanding body of analyses reveals that LLMs tend to rely on
       | surface-level semantics and cluesrather than logical procedures
       | (Chen et al., 2025b; Kambhampati, 2024; Lanham et al., 2023;
       | Stechly et al., 2024). LLMs construct superficial chains of logic
       | based on learned token associations, often failing on tasks that
       | deviate from commonsense heuristics or familiar templates (Tang
       | et al., 2023). In the reasoning process, performance degrades
       | sharply when irrelevant clauses are introduced, which indicates
       | that models cannot grasp the underlying logic (Mirzadeh et al.,
       | 2024)
       | 
       | >Minor and semantically irrelevant perturbations such as
       | distractor phrases or altered symbolic forms can cause
       | significant performance drops in state-of-the-art models
       | (Mirzadeh et al., 2024; Tang et al., 2023). Models often
       | incorporate such irrelevant details into their reasoning,
       | revealing a lack of sensitivity to salient information. Other
       | studies show that models prioritize the surface form of reasoning
       | over logical soundness; in some cases, longer but flawed
       | reasoning paths yield better final answers than shorter, correct
       | ones (Bentham et al., 2024). Similarly, performance does not
       | scale with problem complexity as expected--models may overthink
       | easy problems and give up on harder ones (Shojaee et al., 2025).
       | Another critical concern is the faithfulness of the reasoning
       | process. Intervention-based studies reveal that final answers
       | often remain unchanged even when intermediate steps are falsified
       | or omitted (Lanham et al., 2023), a phenomenon dubbed the
       | illusion of transparency (Bentham et al., 2024; Chen et al.,
       | 2025b).
       | 
       | You don't need to be a philosopher to realize that these problems
       | seem quite distinct from the problems with human reasoning. For
       | example, "final answers remain unchanged even when intermediate
       | steps are falsified or omitted"... can humans do this?
        
       ___________________________________________________________________
       (page generated 2025-08-14 23:02 UTC)