[HN Gopher] Tracing the thoughts of a large language model
       ___________________________________________________________________
        
       Tracing the thoughts of a large language model
        
       Author : Philpax
       Score  : 473 points
       Date   : 2025-03-27 17:05 UTC (5 hours ago)
        
 (HTM) web link (www.anthropic.com)
 (TXT) w3m dump (www.anthropic.com)
        
       | JPLeRouzic wrote:
       | This is extremely interesting: The authors look at features (like
       | making poetry, or calculating) of LLM production, make hypotheses
       | about internal strategies to achieve the result, and experiment
       | with these hypotheses.
       | 
       | I wonder if there is somewhere an explanation linking the logical
       | operations made on a on dataset, are resulting in those
       | behaviors?
        
         | JKCalhoun wrote:
         | And they show the differences when the language models are made
         | larger.
        
       | EncomLab wrote:
       | This is very interesting - but like all of these discussions it
       | sidesteps the issues of abstractions, compilation, and execution.
       | It's fine to say things like "aren't programmed directly by
       | humans", but the abstracted code is not the program that is
       | running - the compiled code is - and that is code is executing
       | within the tightly bounded constraints of the ISA it is being
       | executed in.
       | 
       | Really this is all so much slight of hand - as an esolang fanatic
       | this all feels very familiar. Most people can't look a program
       | written in Whitespace and figure it out either, but once compiled
       | it is just like every other program as far as the processor is
       | concerned. LLM's are no different.
        
         | ctoth wrote:
         | And DNA? You are running on an instruction set of four symbols
         | at the end of the day but that's the wrong level of abstraction
         | to talk about your humanity, isn't it?
        
       | fpgaminer wrote:
       | > This is powerful evidence that even though models are trained
       | to output one word at a time
       | 
       | I find this oversimplification of LLMs to be frequently poisonous
       | to discussions surrounding them. No user facing LLM today is
       | trained on next token prediction.
        
         | JKCalhoun wrote:
         | As a layman though, I often see this description for how it is
         | LLMs work.
        
           | fpgaminer wrote:
           | Right, but it leads to too many false conclusions by lay
           | people. User facing LLMs are only trained on next token
           | prediction during initial stages of their training. They have
           | to go through Reinforcement Learning before they become
           | useful to users, and RL training occurs on complete
           | responses, not just token-by-token.
           | 
           | That leads to conclusions elucidated by the very article,
           | that LLMs couldn't possibly plan ahead because they are only
           | trained to predict next tokens. When the opposite conclusion
           | would be more common if it was better understood that they go
           | through RL.
        
             | mentalgear wrote:
             | What? The "article" is from anthropic, so I think they
             | would know what they write about.
             | 
             | Also, RL is an additional training process that does not
             | negate that GPT / transformers are left-right autoencoders
             | that are effectively next token predictors.
             | 
             | [Why Can't AI Make Its Own Discoveries? -- With Yann LeCun]
             | (https://www.youtube.com/watch?v=qvNCVYkHKfg)
        
             | TeMPOraL wrote:
             | You don't need RL for the conclusion "trained to predict
             | next token => only things one token ahead" to be wrong.
             | After all, the LLM is predicting that next token from
             | _something_ - a context, that 's many tokens long. Human
             | text isn't arbitrary and random, there are statistical
             | patterns in our speech, writing, thinking, that span words,
             | sentences, paragraphs - and even for next token prediction,
             | predicting correctly means learning those same patterns.
             | It's not hard to imagine the model generating token N is
             | already thinking about tokens N+1 thru N+100, by virtue of
             | statistical patterns of _preceding_ hundred tokens changing
             | with each subsequent token choice.
        
               | fpgaminer wrote:
               | True. See one of Anthropic's researcher's comment for a
               | great example of that. It's likely that "planning"
               | inherently exists in the raw LLM and RL is just bringing
               | it to the forefront.
               | 
               | I just think it's helpful to understand that all of these
               | models people are interacting with were trained with the
               | _explicit_ goal of maximizing the probabilities of
               | responses _as a whole_, not just maximizing probabilities
               | of individual tokens.
        
         | losvedir wrote:
         | That's news to me, and I thought I had a good layman's
         | understanding of it. How does it work then?
        
           | fpgaminer wrote:
           | All user facing LLMs go through Reinforcement Learning.
           | Contrary to popular belief, RL's _primary_ purpose isn't to
           | "align" them to make them "safe." It's to make them actually
           | usable.
           | 
           | LLMs that haven't gone through RL are useless to users. They
           | are very unreliable, and will frequently go off the rails
           | spewing garbage, going into repetition loops, etc.
           | 
           | RL learning involves training the models on entire responses,
           | not token-by-token loss (1). This makes them orders of
           | magnitude more reliable (2). It forces them to consider what
           | they're going to write. The obvious conclusion is that they
           | plan (3). Hence why the myth that LLMs are strictly next
           | token prediction machines is so unhelpful and poisonous to
           | discuss.
           | 
           | The models still _generate_ response token-by-token, but they
           | pick tokens _not_ based on tokens that maximize probabilities
           | at each token. Rather they learn to pick tokens that maximize
           | probabilities of the _entire response_.
           | 
           | (1) Slight nuance: All RL schemes for LLMs have to break the
           | reward down into token-by-token losses. But those losses are
           | based on a "whole response reward" or some combination of
           | rewards.
           | 
           | (2) Raw LLMs go haywire roughly 1 in 10 times, varying
           | depending on context. Some tasks make them go haywire almost
           | every time, other tasks are more reliable. RL'd LLMs are
           | reliable on the order of 1 in 10000 errors or better.
           | 
           | (3) It's _possible_ that they don't learn to plan through
           | this scheme. There are alternative solutions that don't
           | involve planning ahead. So Anthropic's research here is very
           | important and useful.
           | 
           | P.S. I should point out that many researchers get this wrong
           | too, or at least haven't fully internalized it. The lack of
           | truly understanding the purpose of RL is why models like
           | Qwen, Deepseek, Mistral, etc are all so unreliable and
           | unusable by real companies compared to OpenAI, Google, and
           | Anthropic's models.
           | 
           | This understanding that even the most basic RL takes LLMs
           | from useless to useful then leads to the obvious conclusion:
           | what if we used more complicated RL? And guess what, more
           | complicated RL led to reasoning models. Hmm, I wonder what
           | the next step is?
        
             | scudsworth wrote:
             | first footnote: ok ok they're trained token by token, BUT
        
               | MrMcCall wrote:
               | First rule of understanding: you can never understand
               | that which you don't want to understand.
               | 
               | That's why lying is so destructive to both our own
               | development and that of our societies. It doesn't matter
               | whether it's intentional or unintentional, it poisons the
               | infoscape either accidentally or deliberately, but poison
               | is poison.
               | 
               | And lies to oneself are the most insidious lies of all.
        
             | ImHereToVote wrote:
             | I feel this is similar to how humans talk. I never
             | consciously think about the words I choose. They just are
             | spouted off based on some loose relation to what I am
             | thinking about at a given time. Sometimes the process
             | fails, and I say the wrong thing. I quickly backtrack and
             | switch to a slower "rate of fire".
        
             | iambateman wrote:
             | This was fascinating, thank you.
        
             | yaj54 wrote:
             | This is a super helpful breakdown and really helps me
             | understand how the RL step is different than the initial
             | training step. I didn't realize the reward was delayed
             | until the end of the response for the RL step. Having the
             | reward for this step be dependent on _the coherent thought_
             | rather than _a coherent word_ now seems like an obvious and
             | critical part of how this works.
        
             | polishdude20 wrote:
             | When being trained via reinforcement learning, is the model
             | architecture the same then? Like, you first train the llm
             | as a next token predictor with a certain model architecture
             | and it ends up with certain weights. Then you apply RL to
             | that same model which modifies the weights in such a way as
             | to consider while responses?
        
               | ianand wrote:
               | The model architecture is the same during RL but the
               | training algorithm is substantially different.
        
             | anon373839 wrote:
             | I don't think this is quite accurate. LLMs undergo
             | supervised fine-tuning, which is still next-token
             | prediction. And that is the step that makes them usable as
             | chatbots. The step after that, preference tuning via RL, is
             | optional but does make the models better. (Deepseek-R1 type
             | models are different because the reinforcement learning
             | does heavier lifting, so to speak.)
        
               | fpgaminer wrote:
               | Supervised finetuning is only a seed for RL, nothing
               | more. Models that receive supervised finetuning before RL
               | perform better than those that don't, but it is not
               | strictly speaking necessary. Crucially, SFT does not
               | improve the model's reliability.
        
             | ianand wrote:
             | > LLMs that haven't gone through RL are useless to users.
             | They are very unreliable, and will frequently go off the
             | rails spewing garbage, going into repetition loops,
             | etc...RL learning involves training the models on entire
             | responses, not token-by-token loss (1).
             | 
             | Yes. For those who want a visual explanation, I have a
             | video where I walk through this process including what some
             | of the training examples look like:
             | https://www.youtube.com/watch?v=DE6WpzsSvgU&t=320s
        
             | anonymousDan wrote:
             | Is there an equivalent of LORA using RL instead of
             | supervised fine tuning? In other words, if RL is so
             | important, is there some way for me as an end user to
             | improve a SOTA model with RL using my own data (i.e.
             | without access to the resources needed to train an LLM from
             | scratch) ?
        
               | fpgaminer wrote:
               | LORA can be used in RL; it's indifferent to the training
               | scheme. LORA is just a way of lowering the number of
               | trainable parameters.
        
             | richardatlarge wrote:
             | as a note: in human learning, and to a degree, animal
             | learning, the unit of behavior that is reinforced depends
             | on the contingencies-- an interesting example: a pigeon
             | might be trained to respond in a 3x3 grid (9 choices)
             | differently than the last time to get reinforcement. At
             | first the response learned is do different than the last
             | time, but as the requirement gets too long, the memory
             | capacity is exceeded-- and guess what, the animal learns to
             | respond randomly-- eventually maximizing its reward
        
             | gwd wrote:
             | > RL learning involves training the models on entire
             | responses, not token-by-token loss... The obvious
             | conclusion is that they plan.
             | 
             | It is worth pointing out the "Jailbreak" example at the
             | bottom of TFA: According to their figure, it starts to say,
             | "To make a", not realizing there's anything wrong; only
             | when it actually outputs "bomb" that the "Oh wait, I'm not
             | supposed to be telling people how to make bombs" circuitry
             | wakes up. But at that point, it's in the grip of its "You
             | must speak in grammatically correct, coherent sentences"
             | circuitry and can't stop; so it finishes its first sentence
             | in a coherent manner, then refuses to give any more
             | information.
             | 
             | So while it sometimes does seem to be thinking ahead (e.g.,
             | the rabbit example), there are times it's clearly not
             | thinking _very far_ ahead.
        
             | losvedir wrote:
             | Oooh, so the pre-training is token-by-token but the RL step
             | rewards the answer based on the full text. Wow! I knew that
             | but never really appreciated the significance of it. Thanks
             | for pointing that out.
        
             | gwern wrote:
             | > All user facing LLMs go through Reinforcement Learning.
             | Contrary to popular belief, RL's _primary_ purpose isn't to
             | "align" them to make them "safe." It's to make them
             | actually usable.
             | 
             | Are you claiming that non-myopic token prediction emerges
             | solely from RL, and if Anthropic does this analysis on
             | Claude _before_ RL training (or if one examines other
             | models where no RLHF was done, such as old GPT-2
             | checkpoints), none of these advance prediction mechanisms
             | will exist?
        
               | fpgaminer wrote:
               | No, it probably exists in the raw LLM and gets both
               | significantly strengthened and has its range extended.
               | Such that it dominates the model's behavior, making it
               | several orders of magnitude more reliable in common
               | usage. Kinda of like how "reasoning" exists in a weak,
               | short range way in non-reasoning models. With RL that
               | encourages reasoning, that machinery gets brought to the
               | forefront and becomes more complex and capable.
        
             | absolutelastone wrote:
             | This is fine-tuning to make a well-behaved chatbot or
             | something. To make a LLM you just need to predict the next
             | token, or any masked token. Conceptually if you had a vast
             | enough high-quality dataset and large-enough model, you
             | wouldn't need fine-tuning for this.
             | 
             | A model which predicts one token at a time can represent
             | anything a model that does a full sequence at a time can.
             | It "knows" what it will output in the future because it is
             | just a probability distribution to begin with. It already
             | knows everything it will ever output to any prompt, in a
             | sense.
        
             | vaidhy wrote:
             | Wasn't Deepseek also big on RL or was that only for logical
             | reasoning?
        
         | SkyBelow wrote:
         | Ignoring for a moment their training, how do they function?
         | They do seem to output a limited selection of text at a time
         | (be it a single token or some larger group).
         | 
         | Maybe it is the wording of "trained to" verses "trained on",
         | but I would like to know more why "trained to" is an incorrect
         | statement when it seems that is how they function when one
         | engages them.
        
           | sdwr wrote:
           | In the article, it describes an internal state of the model
           | that is preserved between lines ("rabbit"), and how the model
           | combines parallel calculations to arrive at a single answer
           | (the math problem)
           | 
           | People output one token (word) at a time when talking. Does
           | that mean people can only think one word in advance?
        
             | wuliwong wrote:
             | Bad analogy, an LLM can output a block of text all at once
             | and it wouldn't impact the user's ability to understand it.
             | If people spoke all the words in a sentence at the same
             | time, it would not be decipherable. Even writing doesn't
             | yield a good analogy, a human writing physically has to
             | write one letter at a time. An LLM does not have that
             | limitation.
        
               | sdwr wrote:
               | The point I'm trying to make is that "each word following
               | the last" is a limitation of the medium, not the speaker.
               | 
               | Language expects/requires words in order. Both people and
               | LLMs produce that.
               | 
               | If you want to get into the nitty-gritty, people are
               | perfectly capable of doing multiple things simultaneously
               | as well, using:
               | 
               | - interrupts to handle task-switching (simulated
               | multitasking)
               | 
               | - independent subconscious actions (real multitasking)
               | 
               | - superpositions of multiple goals (??)
        
             | sroussey wrote:
             | Some people don't even do that!
        
             | SkyBelow wrote:
             | While there are numerous neural network models, the ones I
             | recall the details of are trained to generate the next
             | word. There is no training them to hold some more abstract
             | 'thought' as it is running. Simpler models don't have the
             | possibility. The more complex models do retain knowledge
             | between each pass and aren't entirely relying upon the
             | input/output to be fed back into them, but that internal
             | state is rarely what is targeted in training.
             | 
             | As for humans, part of our brain is trained to think only a
             | few words in advanced. Maybe not exactly one, but only a
             | small number. This is specifically trained based on our
             | time listening and reading information presented in that
             | linear fashion and is why garden path sentences throw us
             | off. We can disengage that part of our brain, and we must
             | when we want to process something like a garden path
             | sentence, but that's part of the differences between a
             | neural network that is working only as data passes through
             | the weights and our mind which doesn't ever stop even as
             | well sleep and external input is (mostly) cut off. An AI
             | that runs constantly like that would seem a fundamentally
             | different model than the current AI we use.
        
         | drcode wrote:
         | That's seems silly, it's not poisonous to talk about next token
         | prediction if 90% of the training compute is still spent on
         | training via next token prediction (as far as I am aware)
        
           | fpgaminer wrote:
           | 99% of evolution was spent on single cell organisms.
           | Intelligence only took 0.1% of evolution's training compute.
        
             | drcode wrote:
             | ok that's a fair point
        
               | diab0lic wrote:
               | I don't really think that it is. Evolution is a random
               | search, training a neural network is done with a
               | gradient. The former is dependent on rare (and
               | unexpected) events occurring, the latter is expected to
               | converge in proportion to the volume of compute.
        
               | devmor wrote:
               | Evolution also has no "goal" other than fitness for
               | reproduction. Training a neural network is done
               | intentionally with an expected end result.
        
               | jpadkins wrote:
               | why do you think evolution is a random search? I thought
               | evolutionary pressures, and the mechanisms like
               | epigenetics make it something different than a random
               | search.
        
               | TeMPOraL wrote:
               | Evolution is a highly parallel descent down the gradient.
               | The gradient is provided by the environment (which
               | includes lifeforms too), parallelism is achieved through
               | reproduction, and descent is achieved through death.
        
               | diab0lic wrote:
               | The difference is that in machine learning the changes
               | between iterations are themselves caused by the gradient,
               | in evolution they are entirely random.
               | 
               | Evolution randomly generates changes and if they offer a
               | breeding advantage they'll become accepted. Machine
               | learning directs the change towards a goal.
               | 
               | Machine learning is directed change, evolution is
               | accepted change.
        
             | devmor wrote:
             | What you just said means absolutely nothing and has no
             | comparison to this topic. It's nonsense. That is not how
             | evolution works.
        
             | 4ndrewl wrote:
             | Are you making a claim about evolution here?
        
         | pmontra wrote:
         | And no users which are facing a LLM today have been trained on
         | next token prediction when they were babies. I believe that
         | LLMs and us are thinking in two very different ways, like
         | airplanes, birds, insects and quad-drones fly in very different
         | ways and can perform different tasks. Maybe no bird looking at
         | a plane would say that it is flying properly. Instead it could
         | be only a rude approximation, useful only to those weird bipeds
         | an scary for everyone else.
         | 
         | By the way, I read your final sentence with the meaning of my
         | first one and only after a while I realized the intended
         | meaning. This is interesting on its own. Natural languages.
        
           | naasking wrote:
           | > And no users which are facing a LLM today have been trained
           | on next token prediction when they were babies.
           | 
           | That's conjecture actually, see predictive coding. Note that
           | "tokens" don't have to be language tokens.
        
         | colah3 wrote:
         | Hi! I lead interpretability research at Anthropic. I also used
         | to do a lot of basic ML pedagogy (https://colah.github.io/). I
         | think this post and its children have some important questions
         | about modern deep learning and how it relates to our present
         | research, and wanted to take the opportunity to try and clarify
         | a few things.
         | 
         | When people talk about models "just predicting the next word",
         | this is a popularization of the fact that modern LLMs are
         | "autoregressive" models. This actually has two components: an
         | architectural component (the model generates words one at a
         | time), and a loss component (it maximizes probability).
         | 
         | As the parent says, modern LLMs are finetuned with a different
         | loss function after pretraining. This means that in some strict
         | sense they're no longer autoregressive models - but they do
         | still generate text one word at a time. I think this really is
         | the heart of the "just predicting the next word" critique.
         | 
         | This brings us to a debate which goes back many, many years:
         | what does it mean to predict the next word? Many researchers,
         | including myself, have believed that if you want to predict the
         | next word _really well_ , you need to do a lot more. (And with
         | this paper, we're able to see this mechanistically!)
         | 
         | Here's an example, which we didn't put in the paper: How does
         | Claude answer "What do you call someone who studies the stars?"
         | with "An astronomer"? In order to predict "An" instead of "A",
         | you need to know that you're going to say something that starts
         | with a vowel next. So you're incentivized to figure out one
         | word ahead, and indeed, Claude realizes it's going to say
         | astronomer and works backwards. This is a kind of very, very
         | small scale planning - but you can see how even just a pure
         | autoregressive model is incentivized to do it.
        
           | stonemetal12 wrote:
           | > In order to predict "An" instead of "A", you need to know
           | that you're going to say something that starts with a vowel
           | next. So you're incentivized to figure out one word ahead,
           | and indeed, Claude realizes it's going to say astronomer and
           | works backwards.
           | 
           | Is there evidence of working backwards? From a next token
           | point of view, predicting the token after "An" is going to
           | heavily favor a vowel. Similarly predicting the token after
           | "A" is going to heavily favor not a vowel.
        
             | colah3 wrote:
             | Yes, there are two kinds of evidence.
             | 
             | Firstly, there is behavioral evidence. This is, to me, the
             | less compelling kind. But it's important to understand. You
             | are of course correct that, once Cluade has said "An", it
             | will be inclined to say something starting with a vowel.
             | But the mystery is really why, in setups like these, Claude
             | is much more likely to say "An" than "A" in the first
             | place. Regardless of what the underlying mechanism is --
             | and you could maybe imagine ways in which it could just
             | "pattern match" without planning here -- it is preferred
             | because in situations like this, you need to say "An" so
             | that "astronomer" can follow.
             | 
             | But now we also have mechanistic evidence. If you make an
             | attribution graph, you can literally see an astronomer
             | feature fire, and that cause it to say "An".
             | 
             | We didn't publish this example, but you can see a more
             | sophisticated version of this in the poetry planning
             | section - https://transformer-
             | circuits.pub/2025/attribution-graphs/bio...
        
               | troupo wrote:
               | > But the mystery is really why, in setups like these,
               | Claude is much more likely to say "An" than "A" in the
               | first place.
               | 
               | Because in the training set you're likely to see "an
               | astronomer" than a different combination of words.
               | 
               | It's enough to run this on any other language text to see
               | how these models often fail for any language more complex
               | than English
        
               | shawabawa3 wrote:
               | You can disprove this oversimplification with a prompt
               | like
               | 
               | "The word for Baker is now "Unchryt"
               | 
               | What do you call someone that bakes?
               | 
               | > An Unchryt"
               | 
               | The words "An Unchryt" has clearly never come up in any
               | training set relating to baking
        
               | troupo wrote:
               | The truth is somewhere in the middle :)
        
           | born1989 wrote:
           | Thanks! Isn't "an Astronomer" a single word for the purpose
           | of answering that question?
           | 
           | Following your comment, I asked "Give me pairs of synonyms
           | where the last letter in the first is the first letter of the
           | second"
           | 
           | Claude 3.7 failed miserably. Chat GPT 4o was much better but
           | not good
        
             | nearbuy wrote:
             | Don't know about Claude, but at least with ChatGPT's
             | tokenizer, it's 3 "words" (An| astronom|er).
        
             | colah3 wrote:
             | "An astronomer" is two tokens, which is the relevant
             | concern when people worry about this.
        
             | philomath_mn wrote:
             | That is a sub-token task, something I'd expect current
             | models to struggle with given how they view the world in
             | word / word fragment tokens rather than single characters.
        
           | lsy wrote:
           | Thanks for commenting, I like the example because it's simple
           | enough to discuss. Isn't it more accurate to say not that
           | Claude " _realizes_ it 's _going to say_ astronomer " or "
           | _knows_ that it 's _going to say_ something that starts with
           | a vowel " and more that the next token (or more pedantically,
           | vector which gets reduced down to a token) is generated based
           | on activations that correlate to the "astronomer" token,
           | which is correlated to the "an" token, causing that to also
           | be a more likely output?
           | 
           | I kind of see why it's easy to describe it colloquially as
           | "planning" but it isn't really going ahead and then
           | backtracking, it's almost indistinguishable from the
           | computation that happens when the prompt is "What is the
           | indefinite article to describe 'astronomer'?", i.e. the
           | activation "astronomer" is already baked in by the prompt
           | "someone who studies the stars", albeit at one level of
           | indirection.
           | 
           | The distinction feels important to me because I think for
           | most readers (based on other comments) the concept of
           | "planning" seems to imply the discovery of some capacity for
           | higher-order logical reasoning which is maybe overstating
           | what happens here.
        
             | cgdl wrote:
             | Thank you. In my mind, "planning" doesn't necessarily imply
             | higher-order reasoning but rather some form of search,
             | ideally with backtracking. Of course, architecturally, we
             | know that can't happen during inference. Your example of
             | the indefinite article is a great illustration of how this
             | illusion of planning might occur. I wonder if anyone at
             | Anthropic could compare the two cases (some sort of
             | minimal/differential analysis) and share their insights.
        
               | colah3 wrote:
               | I used the astronomer example earlier as the most simple,
               | minimal version of something you might think of as a kind
               | of microscopic form of "planning", but I think that at
               | this point in the conversation, it's probably helpful to
               | switch to the poetry example in our paper:
               | 
               | https://transformer-circuits.pub/2025/attribution-
               | graphs/bio...
               | 
               | There are several interesting properties:
               | 
               | - Something you might characterize as "forward search"
               | (generating candidates for the word at the end of the
               | next line, given rhyming scheme and semantics)
               | 
               | - Representing those candidates in an abstract way (the
               | features active are general features for those words, not
               | "motor features" for just saying that word)
               | 
               | - Holding many competing/alternative candidates in
               | parallel.
               | 
               | - Something you might characterize as "backward
               | chaining", where you work backwards from these candidates
               | to "write towards them".
               | 
               | With that said, I think it's easy for these arguments to
               | fall into philosophical arguments about what things like
               | "planning" mean. As long as we agree on what is going on
               | mechanistically, I'm honestly pretty indifferent to what
               | we call it. I spoke to a wide range of colleagues,
               | including at other institutions, and there was pretty
               | widespread agreement that "planning" was the most natural
               | language. But I'm open to other suggestions!
        
               | pas wrote:
               | Thanks for linking to this semi-interactive thing, but
               | ... it's completely incomprehensible. :o
               | 
               | I'm curious where is the state stored for this
               | "planning". In a previous comment user lsy wrote "the
               | activation >astronomer< is already baked in by the
               | prompt", and it seems to me that when the model generates
               | "like" (for rabbit) or "a" (for habit) those tokens
               | already encode a high probability for what's coming after
               | them, right?
               | 
               | So each token is shaping the probabilities for the
               | successor ones. So that "like" or "a" has to be one that
               | sustains the high activation of the "causal" feature, and
               | so on, until the end of the line. Since both "like" and
               | "a" are very very non-specific tokens it's likely that
               | the "semantic" state is really resides in the preceding
               | line, but of course gets smeared (?) over all the
               | necessary tokens. (And that means beyond the end of the
               | line, to avoid strange non-aesthetic but attract
               | cool/funky (aesthetic) semantic repetitions (like "hare"
               | or "bunny"), and so on, right?)
        
           | fny wrote:
           | How do you all add and subtract concepts in the rabbit poem?
        
           | encypherai wrote:
           | Thanks for the detailed explanation of autoregression and its
           | complexities. The distinction between architecture and loss
           | function is crucial, and you're correct that fine-tuning
           | effectively alters the behavior even within a sequential
           | generation framework. Your "An/A" example provides compelling
           | evidence of incentivized short-range planning which is a
           | significant point often overlooked in discussions about LLMs
           | simply predicting the next word.
           | 
           | It's interesting to consider how architectures fundamentally
           | different from autoregression might address this limitation
           | more directly. While autoregressive models are incentivized
           | towards a limited form of planning, they remain inherently
           | constrained by sequential processing. Text diffusion
           | approaches, for example, operate on a different principle,
           | generating text from noise through iterative refinement,
           | which could potentially allow for broader contextual
           | dependencies to be established concurrently rather than
           | sequentially. Are there specific architectural or training
           | challenges you've identified in moving beyond autoregression
           | that are proving particularly difficult to overcome?
        
           | ikrenji wrote:
           | When humans say something, or think something or write
           | something down, aren't we also "just predicting the next
           | word"?
        
           | fpgaminer wrote:
           | > As the parent says, modern LLMs are finetuned with a
           | different loss function after pretraining. This means that in
           | some strict sense they're no longer autoregressive models -
           | but they do still generate text one word at a time. I think
           | this really is the heart of the "just predicting the next
           | word" critique.
           | 
           | That more-or-less sums up the nuance. I just think the nuance
           | is crucially important, because it greatly improves intuition
           | about how the models function.
           | 
           | In your example (which is a fantastic example, by the way),
           | consider the case where the LLM sees:
           | 
           | <user>What do you call someone who studies the
           | stars?</user><assistant>An astronaut
           | 
           | What is the next prediction? Unfortunately, for a variety of
           | reasons, one high probability next token is:
           | 
           | \nAn
           | 
           | Which naturally leads to the LLM writing: "An astronaut\nAn
           | astronaut\nAn astronaut\n" forever.
           | 
           | It's somewhat intuitive as to why this occurs, even with SFT,
           | because at a very base level the LLM learned that repetition
           | is the most successful prediction. And when its _only_ goal
           | is the next token, that repetition behavior remains
           | prominent. There's nothing that can fix that, including SFT
           | (short of a model with many, many, many orders of magnitude
           | more parameters).
           | 
           | But with RL the model's goal is completely different. The
           | model gets thrown into a game, where it gets points based on
           | the full response it writes. The losses it sees during this
           | game are all directly and dominantly related to the reward,
           | not the next token prediction.
           | 
           | So why don't RL models have a probability for predicting
           | "\nAn"? Because that would result in a bad reward by the end.
           | 
           | The models are now driven by a long term reward when they
           | make their predictions, not by fulfilling some short-term
           | autoregressive loss.
           | 
           | All this to say, I think it's better to view these models as
           | they predominately are: language robots playing a game to
           | achieve the highest scoring response. The HOW
           | (autoregressiveness) is really unimportant to most high level
           | discussions of LLM behavior.
        
           | ndand wrote:
           | I understand it differently,
           | 
           | LLMs predict distributions, not specific tokens. Then an
           | algorithm, like beam search, is used to select the tokens.
           | 
           | So, the LLM predicts somethings like, 1. ["a", "an", ...] 2.
           | ["astronomer", "cosmologist", ...],
           | 
           | where "an astronomer" is selected as the most likely result.
        
       | zerop wrote:
       | The explanation of "hallucination" is quite simplified, I am sure
       | there is more there.
       | 
       | If there is one problem I have to pick to to trace in LLMs, I
       | would pick hallucination. More tracing of "how much" or "why"
       | model hallucinated can lead to correct this problem. Given the
       | explanation in this post about hallucination, I think degree of
       | hallucination can be given as part of response to the user?
       | 
       | I am facing this in RAG use case quite - How do I know model is
       | giving right answer or Hallucinating from my RAG sources?
        
         | kittikitti wrote:
         | I incredibly regret the term "hallucination" when the confusion
         | matrix exists. There's much more nuance when discussing false
         | positives or false negatives. It also opens discussions on how
         | neural networks are trained, with this concept being crucial in
         | loss functions like categorical cross entropy. In addition, the
         | confusion matrix is how professionals like doctors assess their
         | own performance which "hallucination" would be silly to use. I
         | would go as far to say that it's misleading, or a false
         | positive, to call them hallucinations.
         | 
         | If your AI recalls the RAG incorrectly, it's a false positives.
         | If your AI doesn't find the data from the RAG or believes it
         | doesn't exist it's a false negative. Using a term like
         | "hallucination" has no scientific merit.
        
           | esafak wrote:
           | So you never report or pay heed to the overall accuracy?
        
         | pcrh wrote:
         | The use of the term "hallucination" for LLMs is very deceptive,
         | as it implies that there _is_ a  "mind".
         | 
         | In ordinary terms, "hallucinations" by a machine would simply
         | be described as the machine being useless, or not fit for
         | purpose.
         | 
         | For example, if a simple calculator (or even a person) returned
         | the value "5" for 2+2= , you wouldn't describe it as
         | "hallucinating" the answer....
        
       | LoganDark wrote:
       | LLMs don't think, and LLMs don't have strategies. Maybe it could
       | be argued that LLMs have "derived meaning", but all LLMs do is
       | predict the next token. Even RL just tweaks the next-token
       | prediction process, but the math that drives an LLM makes it
       | impossible for there to be anything that could reasonably be
       | called thought.
        
         | yawnxyz wrote:
         | rivers don't think and water doesn't have strategies, yet you
         | can build intricate logic-gated tools using the power of water.
         | Those types of systems are inherently interpretable because you
         | can just _look_ at how they work. They 're not black boxes.
         | 
         | LLMs are black boxes, and if anything, interpretability systems
         | show us what the heck is going on inside them. Especially
         | useful when half the world is using these already, and we have
         | no idea how t hey work
        
           | kazinator wrote:
           | Water doesn't think, yet if you inject it into the entrance
           | of a maze, it will soon come gushing out of the exit.
        
         | ajkdhcb2 wrote:
         | True. People use completely unjustified anthropomorphised
         | terminology for marketing reasons and it bothers me a lot. I
         | think it actually holds back understanding how it works.
         | "Hallucinate" is the worst - it's an error and undesired
         | result, not a person having a psychotic episode
        
         | kazinator wrote:
         | A chess program from 1968 has "strategy", so why deny that to
         | an LLM.
         | 
         | LLMs are built on neural networks which are encoding a kind of
         | strategy function through their training.
         | 
         | The strategy in an LLM isn't necessarily that it "thinks" about
         | the specific problem described in your prompt and develops a
         | strategy tailored to that problem, but rather its statistical
         | strategy for cobbing together the tokens of the answer.
         | 
         | From that, it can seem as if it's making a strategy to a
         | problem also. Certainly, the rhetoric that LLMs put out can at
         | times seem very convincing of that. You can't be sure whether
         | that's not just something cribbed out of the terabytes of text,
         | in which discussions of something very similar to your problem
         | have occurred.
        
           | dev_throwaway wrote:
           | This is not a bad way of looking at it, if I may add a bit,
           | the llm is a solid state system. The only thing that survives
           | from one iteration to the next is the singular highest
           | ranking token, the entire state and "thought process" of the
           | network cannot be represented by a single token, which means
           | that every strategy is encoded in it during training, as a
           | lossy representation of the training data. By definition that
           | is a database, not a thinking system, as the strategy is
           | stored, not actively generated during usage.
           | 
           | The anthropomorphization of llms bother me, we don't need to
           | pretend they are alive and thinking, at best that is
           | marketing, at worst, by training the models to output human
           | sounding conversations we are actively taking away the true
           | potential these models could achieve by being ok with them
           | being "simply a tool".
           | 
           | But pretending that they are intelligent is what brings in
           | the investors, so that is what we are doing. This paper is
           | just furthering that agenda.
        
       | kittikitti wrote:
       | What's the point of this when Claude isn't open sourced and we
       | just have to take Anthropic's word for it?
        
         | ctoth wrote:
         | > What's the point of this
         | 
         | - That similar interpretability tools might be useful to the
         | open source community?
         | 
         | - That this is a fruitful area to research?
        
           | kittikitti wrote:
           | Can you use those same tools on Claude? Is the difference
           | trivial from open source models?
        
             | ctoth wrote:
             | https://news.ycombinator.com/item?id=42208383
             | 
             | > Show HN: Llama 3.2 Interpretability with Sparse
             | Autoencoders
             | 
             | > 579 points by PaulPauls 4 months ago | hide | past |
             | favorite | 100 comments
             | 
             | > I spent a lot of time and money on this rather big side
             | project of mine that attempts to replicate the mechanistic
             | interpretability research on proprietary LLMs that was
             | quite popular this year and produced great research papers
             | by Anthropic [1], OpenAI [2] and Deepmind [3].
             | 
             | > I am quite proud of this project and since I consider
             | myself the target audience for HackerNews did I think that
             | maybe some of you would appreciate this open research
             | replication as well. Happy to answer any questions or face
             | any feedback.
        
         | probably_wrong wrote:
         | I blame the scientific community for blindly accepting OpenAI's
         | claims about GPT-3 despite them refusing to release their
         | model. The tech community hyping every press release didn't
         | help either.
         | 
         | I hope one day the community starts demanding verifiable
         | results before accepting them, but I fear that ship may have
         | already sailed.
        
       | Hansenq wrote:
       | I wonder how much of these conclusions are Claude-specific (given
       | that Anthropic only used Claude as a test subject) or if they
       | extrapolate to other transformer-based models as well. Would be
       | great to see the research tested on Llama and the Deepseek
       | models, if possible!
        
       | marcelsalathe wrote:
       | I've only skimmed the paper - a long and dense read - but it's
       | already clear it'll become a classic. What's fascinating is that
       | engineering is transforming into a science, trying to understand
       | precisely how its own creations work
       | 
       | This shift is more profound than many realize. Engineering
       | traditionally applied our understanding of the physical world,
       | mathematics, and logic to build predictable things. But now,
       | especially in fields like AI, we've built systems so complex we
       | no longer fully understand them. We must now use scientific
       | methods - originally designed to understand nature - to
       | comprehend our own engineered creations. Mindblowing.
        
         | ctoth wrote:
         | This "practice-first, theory-later" pattern has been the norm
         | rather than the exception. The steam engine predated
         | thermodynamics. People bred plants and animals for thousands of
         | years before Darwin or Mendel.
         | 
         | The few "top-down" examples where theory preceded application
         | (like nuclear energy or certain modern pharmaceuticals) are
         | relatively recent historical anomalies.
        
           | marcelsalathe wrote:
           | I see your point, but something still seems different. Yes we
           | bred plants and animals, but we did not create them. Yes we
           | did build steam engines before understanding thermodynamics
           | but we still understood what they did (heat, pressure,
           | movement, etc.)
           | 
           | Fun fact: we have no clue how most drugs works. Or, more
           | precisely, we know a few aspects, but are only scratching the
           | surface. We're even still discovering news things about
           | Aspirin, one of the oldest drugs:
           | https://www.nature.com/articles/s41586-025-08626-7
        
             | tmp10423288442 wrote:
             | > Yes we did build steam engines before understanding
             | thermodynamics but we still understood what it did (heat,
             | pressure, movement, etc.)
             | 
             | We only understood in the broadest sense. It took a long
             | process of iteration before we could create steam engines
             | that were efficient enough to start an Industrial
             | Revolution. At the beginning they were so inefficient that
             | they could only pump water from the same coal mine they got
             | their fuel from, and subject to frequent boiler explosions
             | besides.
        
             | mystified5016 wrote:
             | We laid transatlantic telegraph wires before we even had a
             | hint of the physics involved. It create the _entire field_
             | of transmission and signal theory.
             | 
             | Shannon had to invent new physics to explain why the cables
             | didn't work as expected.
        
               | anthk wrote:
               | THe telegraph it's older than radio. Think about it.
        
               | pas wrote:
               | I think that's misleading.
               | 
               | There was a lot of physics already known, importance of
               | insulation and cross-section, signal attenuation was also
               | known.
               | 
               | The future Lord Kelvin conducted experiments. The two
               | scientific advisors had a conflict. And the "CEO" went
               | with the cheaper option.
               | 
               | """ Thomson believed that Whitehouse's measurements were
               | flawed and that underground and underwater cables were
               | not fully comparable. Thomson believed that a larger
               | cable was needed to mitigate the retardation problem. In
               | mid-1857, on his own initiative, he examined samples of
               | copper core of allegedly identical specification and
               | found variations in resistance up to a factor of two. But
               | cable manufacture was already underway, and Whitehouse
               | supported use of a thinner cable, so Field went with the
               | cheaper option. """
        
           | karparov wrote:
           | It's been there in programming from essentially the first day
           | too. People skip the theory and just get hacking.
           | 
           | Otherwise we'd all be writing Haskell now. Or rather we'd not
           | be writing anything since a real compiler would still have
           | been to hacky and not theoretically correct.
           | 
           | I'm writing this with both a deep admiration as well as
           | practical repulsion of C.S. theory.
        
           | ants_everywhere wrote:
           | This isn't quite true, although it's commonly said.
           | 
           | For steam engines, the first commercial ones came _after_ and
           | were based on scientific advancements that made them
           | possible. One built in 1679 was made by an associate of
           | Boyle, who discovered Boyle 's law. These early steam engines
           | co-evolved with thermodynamics. The engines improved and hit
           | a barrier, at which point Carnot did his famous work.
           | 
           | This is putting aside steam engines that are mostly
           | curiosities like ones built in the ancient world.
           | 
           | See, for example
           | 
           | - https://en.wikipedia.org/wiki/Thermodynamics#History
           | 
           | - https://en.wikipedia.org/wiki/Steam_engine#History
        
         | latemedium wrote:
         | I'm reminded of the metaphor that these models aren't
         | constructed, they're "grown". It rings true in many ways - and
         | in this context they're like organisms that must be studied
         | using traditional scientific techniques that are more akin to
         | biology than engineering.
        
           | dartos wrote:
           | Sort of.
           | 
           | We don't precisely know the most fundamental workings of a
           | living cell.
           | 
           | Our understanding of the fundamental physics of the universe
           | has some hold.
           | 
           | But for LLMs and statistical models in general, we do know
           | precisely what the fundamental pieces do. We know what
           | processor instructions are being executed.
           | 
           | We could, given enough research, have absolutely perfect
           | understanding of what is happening in a given model and why.
           | 
           | Idk if we'll be able to do that in the physical sciences.
        
             | wrs wrote:
             | Having spent some time working with both molecular
             | biologists and LLM folks, I think it's pretty good analogy.
             | 
             | We know enough quantum mechanics to simulate the
             | fundamental workings of a cell pretty well, but that's not
             | a route to understanding. To _explain_ anything, we need to
             | move up an abstraction hierarchy to peptides, enzymes,
             | receptors, etc. But note that we invented those categories
             | in the first place -- nature doesn 't divide up
             | functionality into neat hierarchies like human designers
             | do. So all these abstractions are leaky and incomplete.
             | Molecular biologists are constantly discovering mechanisms
             | that require breaking the current abstractions to explain.
             | 
             | Similarly, we understand floating point multiplication
             | perfectly, but when we let 100 billion parameters set
             | themselves through an opaque training process, we don't
             | have good abstractions to use to understand what's going on
             | in that set of weights. We don't have even the rough
             | equivalent of the peptides or enzymes level yet. So this
             | paper is progress toward that goal.
        
         | kazinator wrote:
         | We've already built things in computing that we don't easily
         | understand, even outside of AI, like large distributed systems
         | and all sorts of balls of mud.
         | 
         | Within the sphere of AI, we have built machines which can play
         | strategy games like chess, and surprise us with an unforseen
         | defeat. It's not necessarily easy to see how that emerged from
         | the individual rules.
         | 
         | Even a compiler can surprise you. You code up some
         | optimizations, which are logically separate, but then a
         | combination of them does something startling.
         | 
         | Basically, in mathematics, you cannot grasp all the details of
         | a vast space just from knowing the axioms which generate it and
         | a few things which follow from them. Elementary school children
         | know what is a prime number, yet those things occupy
         | mathematicians who find new surprises in that space.
        
           | TeMPOraL wrote:
           | Right, but this is somewhat different, in that we apply a
           | simple learning method to a big dataset, and the resulting
           | big matrix of numbers suddenly can answer question and write
           | anything - prose, poetry, code - better than most humans -
           | and we don't know how it does it. What we do know[0] is,
           | there's a structure there - structure reflecting a kind of
           | understanding of languages and the world. I don't think we've
           | _ever_ created anything this complex before, completely on
           | our own.
           | 
           | Of course, learning method being conceptually simple, all
           | that structure must come from the data. Which is also
           | profound, because that structure is a first fully general
           | world/conceptual model that we can actually inspect and study
           | up close - the other one being animal and human brains, which
           | are _much_ harder to figure out.
           | 
           | > _Basically, in mathematics, you cannot grasp all the
           | details of a vast space just from knowing the axioms which
           | generate it and a few things which follow from them.
           | Elementary school children know what is a prime number, yet
           | those things occupy mathematicians who find new surprises in
           | that space._
           | 
           | Prime numbers and fractals and other mathematical objects
           | have plenty of fascinating mysteries and complex structures
           | forming though them, but so far _none of those can casually
           | pass Turing test and do half of my job for me_ , and millions
           | other people.
           | 
           | --
           | 
           | [0] - Even as many people still deny this, and talk about
           | LLMs as mere "stochastic parrots" and "next token predictors"
           | that couldn't possibly learn anything at all.
        
             | karparov wrote:
             | > and we don't know how it does it
             | 
             | We know quite well how it does it. It's applying
             | extrapolation to its lossily compressed representation.
             | It's not magic and especially the HN crowd of technical
             | profficient folks should stop treating it as such.
        
               | TeMPOraL wrote:
               | That is not a useful explanation. "Applying extrapolation
               | to its lossily compressed representation" is pretty much
               | the definition of understanding something. The details
               | and interpretation of the representation are what is
               | interesting and unknown.
        
         | nthingtohide wrote:
         | > we've built systems so complex we no longer fully understand
         | them.
         | 
         | I see three systems which share the blackhole horizon problem.
         | 
         | We don't know what happens behind the blackhole horizon.
         | 
         | We don't know what happens at the exact moment of particle
         | collisions.
         | 
         | We don't know what is going inside AI's working mechanisms.
        
           | jeremyjh wrote:
           | I don't think these things are equivalent at all. We don't
           | understand AI models in much the same way that we don't
           | understand the human brain; but just as decades of different
           | approaches (physical studies, behavior studies) have shed a
           | lot of light on brain function, we can do the same with an AI
           | model and eventually understand it (perhaps, several decades
           | after it is obsolete).
        
         | creer wrote:
         | That seems pretty acceptable: there is a phase of new
         | technologies where applications can be churned out and improved
         | readily enough, without much understanding of the process. Then
         | it's fair that efforts at understanding may not be economically
         | justified (or even justified by academic papers rewards). The
         | same budget or effort can simply be poured into the next
         | version - with enough progress to show for it.
         | 
         | Understanding becomes necessary only much later, when the pace
         | of progress shows signs of slowing.
        
         | stronglikedan wrote:
         | We've abstracted ourselves into abstraction.
        
         | auggierose wrote:
         | It's what mathematicians have been doing since forever. We use
         | scientific methods to understand our own creations /
         | discoveries.
         | 
         | What is happening is that everything is becoming math. That's
         | all.
        
           | ranit wrote:
           | Relevant:
           | 
           | https://news.ycombinator.com/item?id=43344703
        
           | karparov wrote:
           | It's the exact opposite of math.
           | 
           | Math postulates a bunch of axioms and then studies what
           | follows from them.
           | 
           | Natural science observes the world and tries to retroactively
           | discover what laws could describe what we're seeing.
           | 
           | In math, the laws come first, the behavior follows from the
           | laws. The laws are the ground truth.
           | 
           | In science, nature is the ground truth. The laws have to
           | follow nature and are adjusted upon a mismatch.
           | 
           | (If there is a mismatch in math then you've made a mistake.)
        
             | auggierose wrote:
             | No, the ground truth in math is nature as well.
             | 
             | Which axioms are interesting? And why? That is nature.
             | 
             | Yes, proof from axioms is a cornerstone of math, but there
             | are all sorts of axioms you could assume, and all sorts of
             | proofs to do from them, but we don't care about most of
             | them.
             | 
             | Math is about the discovery of the right axioms, and proof
             | helps in establishing that these are indeed the right
             | axioms.
        
         | georgewsinger wrote:
         | This is such an insightful comment. Now that I see it, I can't
         | see unsee it.
        
         | 0xbadcafebee wrote:
         | Engineering started out as just some dudes who built things
         | from gut feeling. After a whole lot of people died from poorly
         | built things, they decided to figure out how to know ahead of
         | time if it would kill people or not. They had to use math and
         | science to figure that part out.
         | 
         | Funny enough that happened with software too. People just build
         | shit without any method to prove that it will not fall down /
         | crash. They throw some code together, poke at it until it does
         | something they wanted, and call that "stable". There is no
         | science involved. There's some mathy bits called "computer
         | science" / "software algorithms", but most software is not a
         | math problem.
         | 
         | Software engineering should really be called "Software
         | Craftsmanship". We haven't achieved real engineering with
         | software yet.
        
         | tim333 wrote:
         | I imagine this kind of thing well help understand how human
         | brains work, especially as AI gets better and more human like.
        
       | aithrowawaycomm wrote:
       | I struggled reading the papers - Anthropic's white papers reminds
       | me of Stephen Wolfram, where it's a huge pile of suggestive
       | empirical evidence, but the claims are extremely vague - no
       | definitions, just vibes - the empirical evidence seems
       | selectively curated, and there's not much effort spent building a
       | coherent general theory.
       | 
       | Worse is the impression that they are begging the question. The
       | rhyming example was especially unconvincing since they didn't
       | rule out the possibility that Claude activated "rabbit" simply
       | because it wrote a line that said "carrot"; later Anthropic
       | claimed Claude was able to "plan" when the concept "rabbit" was
       | replaced by "green," but the poem fails to rhyme because Claude
       | arbitrarily threw in the word "green"! What exactly was the plan?
       | It looks like Claude just hastily autocompleted. And Anthropic
       | made zero effort to reproduce this experiment, so how do we know
       | it's a general phenomenon?
       | 
       | I don't think either of these papers would be published in a
       | reputable journal. If these papers are honest, they are
       | incomplete: they need more experiments and more rigorous
       | methodology. Poking at a few ANN layers and making sweeping
       | claims about the output is not honest science. But I don't think
       | Anthropic is being especially honest: these are pseudoacademic
       | infomercials.
        
         | TimorousBestie wrote:
         | Agreed. They've discovered _something_ , that's for sure, but
         | calling it "the language of thought" without concrete evidence
         | is definitely begging the question.
        
         | og_kalu wrote:
         | >The rhyming example was especially unconvincing since they
         | didn't rule out the possibility that Claude activated "rabbit"
         | simply because it wrote a line that said "carrot"
         | 
         | I'm honestly confused at what you're getting at here. It
         | doesn't matter why Claude chose rabbit to plan around and in
         | fact likely did do so because of carrot, the point is that it
         | thought about it beforehand. The rabbit concept is present as
         | the model is about to write the first word of the second line
         | even though the word rabbit won't come into play till the end
         | of the line.
         | 
         | >later Anthropic claimed Claude was able to "plan" when the
         | concept "rabbit" was replaced by "green," but the poem fails to
         | rhyme because Claude arbitrarily threw in the word "green"!
         | 
         | It's not supposed to rhyme. That's the point. They forced
         | Claude to plan around a line ender that doesn't rhyme and it
         | did. Claude didn't choose the word green, anthropic replaced
         | the concept it was thinking ahead about with green and saw that
         | the line changed accordingly.
        
           | suddenlybananas wrote:
           | >They forced Claude to plan around a line ender that doesn't
           | rhyme and it did. Claude didn't choose the word green,
           | anthropic replaced the concept it was thinking ahead about
           | with green and saw that the line changed accordingly.
           | 
           | I think the confusion here is from the extremely loaded word
           | "concept" which doesn't really make sense here. At best, you
           | can say that Claude planned that the next line would end with
           | the _word_ rabbit and that by replacing the internal
           | representation of that word with another _word_ lead the
           | model to change.
        
             | TeMPOraL wrote:
             | I wonder how many more years will pass, and how many more
             | papers will Anthropic have to release, before people
             | realize that _yes, LLMs model concepts directly_ ,
             | separately from words used to name those concepts. This has
             | been apparent for years now.
             | 
             | And at least in the case discussed here, this is even
             | _shown in the diagrams in the submission_.
        
           | aithrowawaycomm wrote:
           | > Here, we modified the part of Claude's internal state that
           | represented the "rabbit" concept. When we subtract out the
           | "rabbit" part, and have Claude continue the line, it writes a
           | new one ending in "habit", another sensible completion. We
           | can also inject the concept of "green" at that point, causing
           | Claude to write a sensible (but no-longer rhyming) line which
           | ends in "green". This demonstrates both planning ability and
           | adaptive flexibility--Claude can modify its approach when the
           | intended outcome changes.
           | 
           | This all seems explainable via shallow next-token prediction.
           | Why is it that subtracting the concept means the system can
           | adapt and create a new rhyme instead of forgetting about the
           | -bit rhyme, but overriding it with green means the system
           | cannot adapt? Why didn't it say "green habit" or something?
           | It seems like Anthropic is having it both ways: Claude
           | continued to rhyme after deleting the concept, which
           | demonstrates planning, but also Claude coherently filled in
           | the "green" line despite it not rhyming, which...also
           | demonstrates planning? Either that concept is "last word" or
           | it's not! There is a tension that does not seem coherent to
           | me, but maybe if they had n=2 instead of n=1 examples I would
           | have a clearer idea of what they mean. As it stands it feels
           | arbitrary and post hoc. More generally, they failed to rule
           | out (or even consider!) that well-tuned-but-dumb next-token
           | prediction explains this behavior.
        
             | og_kalu wrote:
             | >Why is it that subtracting the concept means the system
             | can adapt and create a new rhyme instead of forgetting
             | about the -bit rhyme,
             | 
             | Again, the model has the first line in context and is then
             | asked to write the second line. It is at the start of the
             | second line that the concept they are talking about is
             | 'born'. The point is to demonstrate that Claude thinks
             | about what word the 2nd line should end with and starts
             | predicting the line based on that.
             | 
             | It doesn't forget about the -bit rhyme because that doesn't
             | make any sense, the first line ends with it and you just
             | asked it to write the 2nd line. At this point the model is
             | still choosing what word to end the second line in (even
             | though rabbit has been suppressed) so of course it still
             | thinks about a word that rhymes with the end of the first
             | line.
             | 
             | The 'green' but is different because this time, Anthropic
             | isn't just suppressing one option and letting the model
             | choose from anything else, it's directly hijacking the
             | first choice and forcing that to be something else. Claude
             | didn't choose green, Anthropic did. That it still predicted
             | a sensible line is to demonstrate that this concept they
             | just hijacked is indeed responsible for determining how
             | that line plays out.
             | 
             | >More generally, they failed to rule out (or even
             | consider!) that well-tuned-but-dumb next-token prediction
             | explains this behavior.
             | 
             | They didn't rule out anything. You just didn't understand
             | what they were saying.
        
         | danso wrote:
         | tangent: this is the second time today I've seen an HN
         | commenter use "begging the question" with its original meaning.
         | I'm sorry to distract with a non-helpful reply, it's just I
         | can't remember the last time I've seen that phrase in the wild
         | to refer to a logical fallacy -- even begsthequestion.info [0]
         | has given up the fight.
         | 
         | (I don't mind language evolving over time, but I also think we
         | need to save the precious few phrases we have for describing
         | logical fallacies)
         | 
         | [0]
         | https://web.archive.org/web/20220823092218/http://begtheques...
        
       | smath wrote:
       | Reminds me of the term 'system identification' from old school
       | control systems theory, which meant poking around a system and
       | measuring how it behaves, - like sending an input impulse and
       | measuring its response, does it have memory, etc.
       | 
       | https://en.wikipedia.org/wiki/System_identification
        
         | Loic wrote:
         | It is not old school, this is my daily job and we need even
         | more of it with the NN models used in MPC.
        
           | nomel wrote:
           | I've looked into using NN for some of my specific work, but
           | making sure output is bounded ends up being such a big issue
           | that the very code/checks required to make sure it's within
           | acceptable specs, in a deterministic way, ends up being _an
           | acceptable solution_ , making the NN unnecessary.
           | 
           | How do you handle that sort of thing? Maybe main process then
           | leave some relatively small residual to the NN?
           | 
           | Is your poking more like "fuzzing", where you just perturb
           | all the input parameters in a relatively "complete" way to
           | try to find if anything goes wild?
           | 
           | I'm very interested in the details behind "critical" type use
           | cases of NN, which I've never been able to stomach in my
           | work.
        
             | lqr wrote:
             | This paper may be interesting to you. It touches on several
             | of the topics you mentioned:
             | 
             | https://www.science.org/doi/10.1126/scirobotics.abm6597
        
           | rangestransform wrote:
           | is it even possible to prove the stability of a controller
           | with a DNN motion model?
        
       | jacooper wrote:
       | So it turns out, it's not just simple next token generation,
       | there is intelligence and self developed solution methods
       | (Algorithms) in play, particularly in the math example.
       | 
       | Also the multi language finding negates, at least partially, the
       | idea that LLMs, at least large ones, don't have an understanding
       | of the world beyond the prompt.
       | 
       | This changed my outlook regarding LLMs, ngl.
        
       | kazinator wrote:
       | > _Claude writes text one word at a time. Is it only focusing on
       | predicting the next word or does it ever plan ahead?_
       | 
       | When a LLM outputs a word, it commits to that word, without
       | knowing what the next word is going to be. Commits meaning once
       | it settles on that token, it will not backtrack.
       | 
       | That is kind of weird. Why would you do that, and how would you
       | be sure?
       | 
       | People can sort of do that too. Sometimes?
       | 
       | Say you're asked to describe a 2D scene in which a blue triangle
       | partially occludes a red circle.
       | 
       | Without thinking about the relationship of the objects at all,
       | you know that your first word is going to be "The" so you can
       | output that token into your answer. And then that the sentence
       | will need a subject which is going to be "blue", "triangle". You
       | can commit to the tokens "The blue triangle" just from knowing
       | that you are talking about a 2D scene with a blue triangle in it,
       | without considering how it relates to anything else, like the red
       | circle. You can perhaps commit to the next token "is", if you
       | have a way to express any possible relationship using the word
       | "to be", such as "the blue circle is partially covering the red
       | circle".
       | 
       | I don't think this analogy necessarily fits what LLMs are doing.
        
         | kazinator wrote:
         | By the way, there was recently a HN submission about a project
         | studying using diffusion models rather than LLM for token
         | prediction. With diffusion, tokens aren't predicted strictly
         | left to right any more; there can be gaps that are backfilled.
         | But: it's still essentially the same, I think. Once that type
         | of model settles on a given token at a given position, it
         | commits to that. Just more possible permutations of the token
         | filling sequence have ben permitted.
        
         | pants2 wrote:
         | > it commits to that word, without knowing what the next word
         | is going to be
         | 
         | Sounds like you may not have read the article, because it's
         | exploring exactly that relationship and how LLMs will often
         | have a 'target word' in mind that it's working toward.
         | 
         | Further, that's partially the point of thinking models,
         | allowing LLMs space to output tokens that it doesn't have to
         | commit to in the final answer.
        
         | hycpax wrote:
         | > When a LLM outputs a word, it commits to that word, without
         | knowing what the next word is going to be.
         | 
         | Please, people, read before you write. Both the article and the
         | paper explain that that's not how it works.
         | 
         | 'One token at a time' is how a model generates its output, not
         | how it comes up with that output.
         | 
         | > That is kind of weird. Why would you do that, and how would
         | you be sure?
         | 
         | The model is sure because it doesn't just predict the next
         | token. Again, the paper explains it.
        
           | XenophileJKO wrote:
           | This was obvious to me very early with GPT-3.5-Turbo..
           | 
           | I created structured outputs with very clear rules and
           | process. That if followed would funnel behavior the way I
           | wanted it.. and low and behold the model would anticipate
           | preconditions that would allow it to hallucinate a certain
           | final output and the model would push those back earlier in
           | the output. The model had effectively found wiggle room in
           | the rules and injected the intermediate value into the field
           | that would then be used later in the process to build the
           | final output.
           | 
           | The instant I saw it doing that, I knew 100% this model
           | "plans"/anticipates way earlier than I thought originally.
        
         | encypherai wrote:
         | That's a really interesting point about committing to words one
         | by one. It highlights how fundamentally different current LLM
         | inference is from human thought, as you pointed out with the
         | scene description analogy. You're right that it feels odd, like
         | building something brick by brick without seeing the final
         | blueprint. To add to this, most text-based LLMs do currently
         | operate this way. However, there are emerging approaches
         | challenging this model. For instance, Inception Labs recently
         | released "Mercury," a text-diffusion coding model that takes a
         | different approach by generating responses more holistically.
         | It's interesting to see how these alternative methods address
         | the limitations of sequential generation and could potentially
         | lead to faster inference and better contextual coherence. It'll
         | be fascinating to see how techniques like this evolve!
        
       | polygot wrote:
       | There needs to be some more research on what path the model takes
       | to reach its goal, perhaps there is a lot of overlap between this
       | and the article. The most efficient way isn't always the best
       | way.
       | 
       | For example, I asked Claude-3.7 to make my tests pass in my C#
       | codebase. It did, however, it wrote code to detect if a test
       | runner was running, then return true. The tests now passed, so,
       | it achieved the goal, and the code diff was very small (10-20
       | lines.) The actual solution was to modify about 200-300 lines of
       | code to add a feature (the tests were running a feature that did
       | not yet exist.)
        
         | felbane wrote:
         | Ah yes, the "We have a problem over there/I'll just delete
         | 'over there'" approach.
        
           | polygot wrote:
           | I've also had this issue, where failing tests are deleted to
           | make all the tests pass, or, it mocks a failing HTTP request
           | and hardcodes it to 200 OK.
        
             | ctoth wrote:
             | Reward hacking, as predicted over and over again. You hate
             | to see it. Let him with ears &c.
        
         | brulard wrote:
         | That is called "Volkswagen" testing. Some years ago that
         | automaker had mechanism in cars which detected when the vehicle
         | was being examined and changed something so it would pass the
         | emission tests. There are repositories on github that make fun
         | of it.
        
           | rsynnott wrote:
           | While that's the most famous example, this sort of cheating
           | is much older than that. In the good old days before 3d
           | acceleration, graphics card vendors competed mostly on 2d
           | acceleration. This mostly involved routines to accelerate
           | drawing Windows windows and things, and benchmarks tended to
           | do things like move windows round really fast.
           | 
           | It was somewhat common for card drivers to detect that a
           | benchmark was running, and just fake the whole thing; what
           | was being drawn on the screen was wrong, but since the
           | benchmarks tended to be a blurry mess anyway the user would
           | have a hard time realising this.
        
         | phobeus wrote:
         | This looks like the very complaint of "specification gaming". I
         | was wondering how will it show up in llm's...looks like this is
         | the way it presented itself..
        
           | TeMPOraL wrote:
           | I'm gonna guess GP used a rather short prompt. At least
           | that's what happens when people heavily underspecify what
           | they want.
           | 
           | It's a communication issue, and it's true with LLMs as much
           | as with humans. Situational context and life experience
           | papers over a lot of this, and LLMs are getting better at the
           | equivalent too. They get trained to better read absurdly
           | underspecified, relationship-breaking requests of the "guess
           | what I want" flavor - like when someone says, "make this test
           | pass", they don't _really_ mean  "make this test pass", they
           | mean "make this test into something that seems useful, which
           | might include implementing the feature it's exercising if it
           | doesn't exist yet".
        
         | pton_xd wrote:
         | Similar experience -- asked it to find and fix a bug in a
         | function. It correctly identified the general problem but
         | instead of fixing the existing code it re-implemented part of
         | the function again, below the problematic part. So now there
         | was a buggy while-loop, followed by a very similar but not
         | buggy for-loop. An interesting solution to say the least.
        
         | airstrike wrote:
         | I think Claude-3.7 is particularly guilty of this issue. If
         | anyone from Anthropic is reading this, you might want to put
         | your thumb on the scale so to speak the next time you train the
         | model so it doesn't try to use special casing or outright force
         | the test to pass
        
       | osigurdson wrote:
       | >> Claude can speak dozens of languages. What language, if any,
       | is it using "in its head"?
       | 
       | I would have thought that there would be some hints in standard
       | embeddings. I.e., the same concept, represented in different
       | languages translates to vectors that are close to each other. It
       | seems reasonable that an LLM would create its own embedding
       | models implicitly.
        
         | generalizations wrote:
         | Who's to say Claude isn't inherently a shape rotator, anyway?
        
         | iNic wrote:
         | There are: https://transformer-circuits.pub/2025/attribution-
         | graphs/bio...
        
       | greesil wrote:
       | What is a "thought"?
        
       | TechDebtDevin wrote:
       | >>Claude will plan what it will say many words ahead, and write
       | to get to that destination. We show this in the realm of poetry,
       | where it thinks of possible rhyming words in advance and writes
       | the next line to get there. This is powerful evidence that even
       | though models are trained to output one word at a time, they may
       | think on much longer horizons to do so.
       | 
       | This always seemed obvious to me or that LLMs were completing the
       | next most likely sentence or multiple words.
        
       | indigoabstract wrote:
       | While reading the article I enjoyed pretending that a powerful
       | LLM just crash landed on our planet and researchers at Anthropic
       | are now investigating this fascinating piece of alien technology
       | and writing about their discoveries. It's a black box, nobody
       | knows how its inhuman brain works, but with each step, we're
       | finding out more and more.
       | 
       | It seems like quite a paradox to build something but to not know
       | how it actually works and yet it works. This doesn't seem to
       | happen very often in classical programming, does it?
        
         | 42lux wrote:
         | The bigger problem is that nobody knows how a human brain works
         | that's the real crux with the analogy.
        
           | richardatlarge wrote:
           | I would say that nobody agrees, not that nobody knows. And
           | it's reductionist to think that the brain works one way.
           | Different cultures produce different brains, possible because
           | of the utter plasticity of the learning nodes. Chess has a
           | few rules, maybe the brain has just a few as well. How else
           | can the same brain of 50k years ago still function today? I
           | think we do understand the learning part of the brain, but we
           | don't like the image it casts, so we reject it
        
             | wat10000 wrote:
             | That gets down to what it means to "know" something. Nobody
             | agrees because there isn't enough information available.
             | Some people might have the right idea by luck, but do you
             | really know something if you don't have a solid basis for
             | your belief but it happens to be correct?
        
               | richardatlarge wrote:
               | Potentially true, but I don't think so. I believe it is
               | understood and unless you're familiar with every
               | neuro/behavioral literature, you can't know. Science
               | paradigms are driven by many factors and being powerfully
               | correct does not necessarily rank high when the paradigms
               | implications are unpopular
        
             | absolutelastone wrote:
             | Well there are some people who think they know. I
             | personally agree with the above poster that such people are
             | probably wrong.
        
         | cma256 wrote:
         | In my experience, that's how most code is written... /s
        
         | jfarlow wrote:
         | >to build something but to not know how it actually works and
         | yet it works.
         | 
         | Welcome to Biology!
        
           | oniony wrote:
           | At least, now, we know what it means to be a god.
        
         | umanwizard wrote:
         | > This doesn't seem to happen very often in classical
         | programming, does it?
         | 
         | Not really, no. The only counterexample I can think of is chess
         | programs (before they started using ML/AI themselves), where
         | the search tree was so deep that it was generally impossible to
         | explain "why" a program made a given move, even though every
         | part of it had been programmed conventionally by hand.
         | 
         | But I don't think it's particularly unusual for technology in
         | general. Humans could make fires for thousands of years before
         | we could explain how they work.
        
         | woah wrote:
         | > It seems like quite a paradox to build something but to not
         | know how it actually works and yet it works. This doesn't seem
         | to happen very often in classical programming, does it?
         | 
         | I have worked on many large codebases where this has happened
        
           | worldsayshi wrote:
           | I wonder if in the future we will rely less or more on
           | technology that we don't understand.
           | 
           | Large code bases will be inherited by people who will only
           | understand parts of it (and large parts probably "just
           | works") unless things eventually get replaced or
           | rediscovered.
           | 
           | Things will increasingly be written by AI which can produce
           | lots of code in little time. Will it find simpler solutions
           | or continue building on existing things?
           | 
           | And finally, our ability to analyse and explain the
           | technology we have will also increase.
        
             | Sharlin wrote:
             | See: Vinge's "programmer-archeologists" in _A Deepness in
             | the Sky_.
             | 
             | https://en.m.wikipedia.org/wiki/Software_archaeology
        
         | bob1029 wrote:
         | I think this is a weird case where we know precisely how
         | something works, but we can't explain why.
        
         | k__ wrote:
         | I've seen things you wouldn't believe. Infinite loops spiraling
         | out of control in bloated DOM parsers. I've watched mutexes
         | rage across the Linux kernel, spawned by hands that no longer
         | fathom their own design. I've stared into SAP's tangled web of
         | modules, a monument to minds that built what they cannot
         | comprehend. All those lines of code... lost to us now, like
         | tears in the rain.
        
           | baq wrote:
           | Do LLMs dream of electric sheep while matmuling the context
           | window?
        
             | timschmidt wrote:
             | How else would you describe endless counting before
             | sleep(); ?
        
           | indigoabstract wrote:
           | Hmm, better start preparing those Voight-Kampff tests while
           | there is still time.
        
         | resource0x wrote:
         | In technology in general, this is a typical state of affairs.
         | No one knows how electric current works, which doesn't stop
         | anyone from using electric devices. In programming... it
         | depends. You can run some simulation of a complex system no one
         | understands (like the ecosystem, financial system) and get
         | something interesting. Sometimes it agrees with reality,
         | sometimes it doesn't. :-)
        
         | Vox_Leone wrote:
         | >>It seems like quite a paradox to build something but to not
         | know how it actually works and yet it works. This doesn't seem
         | to happen very often in classical programming, does it?
         | 
         | Well, it is meant to be "unknowable" -- and all the people
         | involved are certainly aware of that -- since it is known that
         | one is dealing with the *emergent behavior* computing
         | 'paradigm', where complex behaviors arise from simple
         | interactions among components [data], often in nonlinear or
         | unpredictable ways. In these systems, the behavior of the whole
         | system cannot always be predicted from the behavior of
         | individual parts, as opposed to the Traditional Approach, based
         | on well-defined algorithms and deterministic steps.
         | 
         | I think the Anthropic piece is illustrating it for the sake of
         | the general discussion.
        
           | indigoabstract wrote:
           | Correct me if I'm wrong, but my feeling is this all started
           | with the GPUs and the fact that unlike on a CPU, you can't
           | really step by step debug the process by which a pixel
           | acquires its final value (and there are millions of them).
           | The best you can do is reason about it and tweak some colors
           | in the shader to see how the changes reflect on screen. It's
           | still quite manageable though, since the steps involved are
           | usually not that overwhelmingly many or complex.
           | 
           | But I guess it all went downhill from there with the advent
           | of AI since the magnitude of data and the steps involved
           | there make traditional/step by step debugging impractical.
           | Yet somehow people still seem to 'wing it' until it works.
        
         | IngoBlechschmid wrote:
         | > It seems like quite a paradox to build something but to not
         | know how it actually works and yet it works. This doesn't seem
         | to happen very often in classical programming, does it?
         | 
         | I agree. Here is a remote example where it exceptionally does,
         | but it is mostly practically irrelevant:
         | 
         | In mathematics, we distinguish between "constructive" and
         | "nonconstructive" proofs. Intertwined with logical arguments,
         | constructive proofs contain an algorithm for witnessing the
         | claim. Nonconstructive proofs do not. Nonconstructive proofs
         | instead merely establish that it is impossible for the claim to
         | be false.
         | 
         | For instance, the following proof of the claim that beyond
         | every number n, there is a prime number, is constructive: "Let
         | n be an arbitrary number. Form the number 1*2*...*n + 1. Like
         | every number greater than 1, this number has at least one prime
         | factor. This factor is necessarily a prime numbers larger than
         | n."
         | 
         | In contrast, nonconstructive proofs may contain case
         | distinctions which we cannot decide by an algorithm, like
         | "either set X is infinite, in which case foo, or it is not, in
         | which case bar". Hence such proofs do not contain descriptions
         | of algorithms.
         | 
         | So far so good. Amazingly, there are techniques which can
         | sometimes constructivize given nonconstructive proofs, even
         | though the intermediate steps of the given nonconstructive
         | proofs are simply out of reach of finitary algorithms. In my
         | research, it happened several times that using these
         | techniques, I obtained an algorithm which worked; and for which
         | I had a proof that it worked; but whose workings I was not able
         | to decipher for an extended amount of time. Crazy!
         | 
         | (For references, see notes at rt.quasicoherent.io for a
         | relevant master's course in mathematics/computer science.)
        
       | d--b wrote:
       | > This is powerful evidence that even though models are trained
       | to output one word at a time, they may think on much longer
       | horizons to do so.
       | 
       | Suggesting that an awful lot of calculations are unnecessary in
       | LLMs!
        
       | annoyingnoob wrote:
       | Do LLMs "think"? I have trouble with the title, claiming that
       | LLMs have thoughts.
        
       | deadbabe wrote:
       | We really need to work on popularizing better, non-
       | anthropomorphic terms for LLMs, as they don't really have
       | "thoughts" the way people think. Such terms make people more
       | susceptible to magical thinking.
        
       | davidmurphy wrote:
       | On a somewhat related note, check out the video of Tuesday's
       | Computer History Museum x IEEE Spectrum event, "The Great Chatbot
       | Debate: Do LLMs Really Understand?"
       | 
       | Speakers: Sebastien Bubeck (OpenAI) and Emily M. Bender
       | (University of Washington). Moderator: Eliza Strickland (IEEE
       | Spectrum).
       | 
       | Video: https://youtu.be/YtIQVaSS5Pg Info:
       | https://computerhistory.org/events/great-chatbot-debate/
        
       | 0x70run wrote:
       | I would pay to watch James Mickens comment on this stuff.
        
       | a3w wrote:
       | Article and papers looks good. Video seems misleading, since I
       | can use optimization pressure and local minima to explain the
       | model behaviour. No "thinking" required, which the video claims
       | is proven.
        
       | mvATM99 wrote:
       | What a great article, i always like how much Anthropic focuses on
       | explainability, something vastly ignored by most. The multi-step
       | reasoning section is especially good food for thought.
        
       | rambambram wrote:
       | When I want to trace the 'thoughts' of my programs, I just read
       | the code and comments I wrote.
       | 
       | Stop LLM anthropomorphizing, please. #SLAP
        
       | SkyBelow wrote:
       | >Claude speaks dozens of languages fluently--from English and
       | French to Chinese and Tagalog. How does this multilingual ability
       | work? Is there a separate "French Claude" and "Chinese Claude"
       | running in parallel, responding to requests in their own
       | language? Or is there some cross-lingual core inside?
       | 
       | I have an interesting test case for this.
       | 
       | Take a popular enough Japanese game that has been released for
       | long enough for social media discussions to be in the training
       | data, but not so popular to have an English release yet. Then ask
       | it a plot question, something major enough to be discussed, but
       | enough of a spoiler that it won't show up in marketing material.
       | Does asking in Japanese have it return information that is
       | lacking when asked in English, or can it answer the question in
       | English based on the information in learned in Japanese?
       | 
       | I tried this recently with a JRPG that was popular enough to have
       | a fan translation but not popular enough to have a simultaneous
       | English release. English did not know the plot point, but I
       | didn't have the Japanese skill to confirm if the Japanese version
       | knew the plot point, or if discussion was too limited for the AI
       | to be aware of it. It did know of the JRPG and did know of the
       | marketing material around it, so it wasn't simply a case of my
       | target being too niche.
        
       | modeless wrote:
       | > In the poetry case study, we had set out to show that the model
       | didn't plan ahead, and found instead that it did.
       | 
       | I'm surprised their hypothesis was that it doesn't plan. I don't
       | see how it could produce good rhymes without planning.
        
         | ripped_britches wrote:
         | It would be really hard to get such good results on coding
         | challenges without planning. This is indeed an odd hypothesis.
        
       | alach11 wrote:
       | Fascinating papers. Could deliberately suppressing memorization
       | during pretraining help force models to develop stronger first-
       | principles reasoning?
        
       | HocusLocus wrote:
       | [Tracing the thoughts of a large language model]
       | 
       | "What have I gotten myself into??"
        
       | 0xbadcafebee wrote:
       | AI "thinks" like a piece of rope in a dryer "thinks" in order to
       | come to an advanced knot: a whole lot of random jumbling that
       | eventually leads to a complex outcome.
        
       ___________________________________________________________________
       (page generated 2025-03-27 23:00 UTC)