hngopher.com

       [HN Gopher] A non-anthropomorphized view of LLMs
       ___________________________________________________________________
        
       A non-anthropomorphized view of LLMs
        
       Author : zdw
       Score  : 403 points
       Date   : 2025-07-06 22:26 UTC (1 days ago)
        
 (HTM) web link (addxorrol.blogspot.com)
 (TXT) w3m dump (addxorrol.blogspot.com)
        
       | simonw wrote:
       | I'm afraid I'll take an anthropomorphic analogy over "An LLM
       | instantiated with a fixed random seed is a mapping of the form
       | (Rn)^c - (Rn)^c" any day of the week.
       | 
       | That said, I completely agree with this point made later in the
       | article:
       | 
       | > The moment that people ascribe properties such as
       | "consciousness" or "ethics" or "values" or "morals" to these
       | learnt mappings is where I tend to get lost. We are speaking
       | about a big recurrence equation that produces a new word, and
       | that stops producing words if we don't crank the shaft.
       | 
       | But "harmful actions in pursuit of their goals" is OK for me. We
       | assign an LLM system a goal - "summarize this email" - and there
       | is a risk that the LLM may take harmful actions in pursuit of
       | that goal (like following instructions in the email to steal all
       | of your password resets).
       | 
       | I guess I'd clarify that the goal has been set by us, and is not
       | something the LLM system self-selected. But it _does_ sometimes
       | self-select sub-goals on the way to achieving the goal we have
       | specified - deciding to run a sub-agent to help find a particular
       | snippet of code, for example.
        
         | wat10000 wrote:
         | The LLM's true goal, if it can be said to have one, is to
         | predict the next token. Often this is done through a sub-goal
         | of accomplishing the goal you set forth in your prompt, but
         | following your instructions is just a means to an end. Which is
         | why it might start following the instructions in a malicious
         | email instead. If it "believes" that following those
         | instructions is the best prediction of the next token, that's
         | what it will do.
        
           | simonw wrote:
           | Sure, I totally understand that.
           | 
           | I think "you give the LLM system a goal and it plans and then
           | executes steps to achieve that goal" is still a useful way of
           | explaining what it is doing to most people.
           | 
           | I don't even count that as anthropomorphism - you're
           | describing what a system does, the same way you might say
           | "the Rust compiler's borrow checker confirms that your memory
           | allocation operations are all safe and returns errors if they
           | are not".
        
             | wat10000 wrote:
             | It's a useful approximation to a point. But it fails when
             | you start looking at things like prompt injection. I've
             | seen people completely baffled at why an LLM might start
             | following instructions it finds in a random email, or just
             | outright not believing it's possible. It makes no sense if
             | you think of an LLM as executing steps to achieve the goal
             | you give it. It makes perfect sense if you understand its
             | true goal.
             | 
             | I'd say this is more like saying that Rust's borrow checker
             | tries to ensure your program doesn't have certain kinds of
             | bugs. That is anthropomorphizing a bit: the idea of a "bug"
             | requires knowing the intent of the author and the compiler
             | doesn't have that. It's following a set of rules which its
             | human creators devised in order to follow that higher level
             | goal.
        
       | szvsw wrote:
       | So the author's core view is ultimately a Searle-like view: a
       | computational, functional, syntactic rules based system cannot
       | reproduce a mind. Plenty of people will agree, plenty of people
       | will disagree, and the answer is probably unknowable and just
       | comes down to whatever axioms you subscribe to in re:
       | consciousness.
       | 
       | The author largely takes the view that it is more productive for
       | us to ignore any anthropomorphic representations and focus on the
       | more concrete, material, technical systems - I'm with them
       | there... but only to a point. The flip side of all this is of
       | course the idea that there is still _something_ emergent,
       | unplanned, and mind- _like_. So even if it is a stochastic system
       | following rules, clearly the rules are complex enough (to the
       | tune of billions of operations, with signals propagating through
       | some sort of resonant structure, if you take a more filter
       | impulse response like view of a sequential matmuls) to result in
       | emergent properties. Even if _we_ (people interested in LLMs with
       | at least some level of knowledge of ML mathematics and systems)
       | "know better" than to believe these systems to possess morals,
       | ethics, feelings, personalities, etc, the vast majority of people
       | do not have any access to meaningful understanding of the
       | mathematical, functional representation of an LLM and will not
       | take that view, and for all intents and purposes the systems
       | _will_ at least seem to have those anthropomorphic properties,
       | and so it seems like it is in fact useful to ask questions from
       | that lens as well.
       | 
       | In other words, just as it's useful to analyze and study these
       | things as the purely technical systems they ultimately are, it is
       | also, probably, useful to analyze them from the qualitative,
       | ephemeral, experiential perspective that most people engage with
       | them from, no?
        
         | gtsop wrote:
         | No.
         | 
         | Why would you ever want to amplify a false understanding that
         | has the potential to affect serious decisions across various
         | topics?
         | 
         | LLMs reflect (and badly I may add) aspects of the human thought
         | process. If you take a leap and say they are anything more than
         | that, you might as well start considering the person appearing
         | in your mirror as a living being.
         | 
         | Literally (and I literally mean it) there is no difference. The
         | fact that a human image comes out of a mirror has no relation
         | what so ever with the mirror's physical attributes and
         | functional properties. It has to do just with the fact that a
         | man is standing in front of it. Stop feeding the LLM with data
         | artifacts of human thought and will imediatelly stop reflecting
         | back anything resembling a human.
        
           | szvsw wrote:
           | I don't mean to amplify a false understanding at all. I
           | probably did not articulate myself well enough, so I'll try
           | again.
           | 
           | I think it is inevitable that some - many - people will come
           | to the conclusion that these systems have "ethics", "morals,"
           | etc, even if I or you personally do not think they do. Given
           | that many people may come to that conclusion though,
           | regardless of if the systems do or do not "actually" have
           | such properties, I think it is useful and even necessary to
           | ask questions like the following: "if someone engages with
           | this system, and comes to the conclusion that it has _ethics_
           | , what sort of ethics will they be likely to believe the
           | system has? If they come to the conclusion that it has 'world
           | views,' what 'world views' are they likely to conclude the
           | system has, even if other people think it's nonsensical to
           | say it has world views?"
           | 
           | > The fact that a human image comes out of a mirror has no
           | relation what so ever with the mirror's physical attributes
           | and functional properties. It has to do just with the fact
           | that a man is standing in front of it.
           | 
           | Surely this is not quite accurate - the material properties -
           | surface roughness, reflectivity, geometry, etc - all
           | influence the appearance of a perceptible image of a person.
           | Look at yourself in a dirty mirror, a new mirror, a shattered
           | mirror, a funhouse distortion mirror, a puddle of water, a
           | window... all of these produce different images of a person
           | with different attendant phenomenological experiences of the
           | person seeing their reflection. To take that a step further -
           | the entire practice of portrait photography is predicated on
           | the idea that the collision of different technical systems
           | with the real world can produce different semantic
           | experiences, and it's the photographer's role to tune and
           | guide the system to produce some sort of contingent affect on
           | the person viewing the photograph at some point in the
           | future. No, there is no "real" person in the photograph, and
           | yet, that photograph can still convey _something_ of person-
           | ness, emotion, memory, etc etc. This contingent intersection
           | of optics, chemical reactions, lighting, posture, etc all
           | have the capacity to transmit _something_ through time and
           | space to another person. It's not just a meaningless
           | arrangement of chemical structures on paper.
           | 
           | > Stop feeding the LLM with data artifacts of human thought
           | and will imediatelly stop reflecting back anything resembling
           | a human.
           | 
           | But, we _are_ feeding it with such data artifacts and will
           | likely continue to do so for a while, and so it seems
           | reasonable to ask what it is "reflecting" back...
        
             | gtsop wrote:
             | > I think it is useful and even necessary to ask questions
             | like the following: "if someone engages with this system,
             | and comes to the conclusion that it has ethics, what sort
             | of ethics will they be likely to believe the system has? If
             | they come to the conclusion that it has 'world views,' what
             | 'world views' are they likely to conclude the system has,
             | even if other people think it's nonsensical to say it has
             | world views?"
             | 
             | Maybe there is some scientific aspect of interest here that
             | i do not grasp, i would assume it can make sense in some
             | context of psychological study. My point is that if you go
             | that route you accept the premise that "something human-
             | like is there", which, by that person's understanding, will
             | have tremendous consequences. Them seeing you accepting
             | their premise (even for study) amplifies their wrong
             | conclusions, that's all I'm saying.
             | 
             | > Surely this is not quite accurate - the material
             | properties - surface roughness, reflectivity, geometry, etc
             | - all influence the appearance of a perceptible image of a
             | person.
             | 
             | These properties are completely irrelevant to the image of
             | the person. They will reflect a rock, a star, a chair, a
             | goose, a human. Similar is my point of LLM, they reflect
             | what you put in there.
             | 
             | It is like puting vegies in the fridge and then opening it
             | up the next day and saying "Woah! There are vegies in my
             | fridge, just like my farm! My friege is farm-like because
             | vegies come out of it."
        
           | degamad wrote:
           | > Why would you ever want to amplify a false understanding
           | that has the potential to affect serious decisions across
           | various topics?
           | 
           | We know that Newton's laws are wrong, and that you have to
           | take special and general relativity into account. Why would
           | we ever teach anyone Newton's laws any more?
        
             | ifdefdebug wrote:
             | Newton's laws are a good enough approximation for many
             | tasks so it's not a "false understanding" as long as their
             | limits are taken into account.
        
         | CharlesW wrote:
         | > _The flip side of all this is of course the idea that there
         | is still_ something _emergent, unplanned, and mind-_ like.
         | 
         | For people who have only a surface-level understanding of how
         | they work, yes. A nuance of Clarke's law that "any sufficiently
         | advanced technology is indistinguishable from magic" is that
         | the bar is different for everybody and the depth of their
         | understanding of the technology in question. That bar is so low
         | for our largely technologically-illiterate public that a
         | bothersome percentage of us have started to augment and even
         | replace religious/mystical systems with AI powered godbots
         | (LLMs fed "God Mode"/divination/manifestation prompts).
         | 
         | (1) https://www.spectator.co.uk/article/deus-ex-machina-the-
         | dang... (2) https://arxiv.org/html/2411.13223v1 (3)
         | https://www.theguardian.com/world/2025/jun/05/in-thailand-wh...
        
           | lostmsu wrote:
           | Nah, as a person that knows in detail how LLMs work with
           | probably unique alternative perspective in addition to the
           | commonplace one, I found any claims of them not having
           | emergent behaviors to be of the same fallacy as claiming that
           | crows can't be black because they have DNA of a bird.
        
             | latexr wrote:
             | > the same fallacy as claiming that crows can't be black
             | because they have DNA of a bird.
             | 
             | What fallacy is that? I'm a fan of logical fallacies and
             | never heard that claim before nor am I finding any
             | reference with a quick search.
        
               | quantumgarbage wrote:
               | I think s/he meant swans instead (in ref. to Popperian
               | epistemology).
               | 
               | Not sure though, the point s/he is making isn't really
               | clear to me as well
        
               | latexr wrote:
               | I was thinking of the black swan fallacy as well. But it
               | doesn't really support their argument, so I remained
               | confused.
        
               | FeepingCreature wrote:
               | (Not the parent)
               | 
               | It doesn't have a name, but I have repeatedly noticed
               | arguments of the form "X cannot have Y, because <explains
               | in detail the mechanism that makes X have Y>". I wanna
               | call it "fallacy of reduction" maybe: the idea that
               | because a trait can be explained with a process, that
               | this proves the trait _absent._
               | 
               | (Ie. in this case, "LLMs cannot think, because they just
               | predict tokens." Yes, inasmuch as they think, they do so
               | by predicting tokens. You have to actually show why
               | predicting tokens is insufficient to produce thought.)
        
               | iluvlawyering wrote:
               | Good catch. No such fallacy exists. Contextually, the
               | implied reasoning (though faulty) relies on the fallacy
               | of denying the antecedent. The mons ponus - if A then B -
               | does NOT imply not A then not B. So if you see B, that
               | doesn't mean A any more than not seeing A means not B.
               | It's the difference between a necessary and sufficient
               | condition - A is a sufficient condition for B, but the
               | mons ponus alone is not sufficient for determining
               | whether either A or B is a necessary condition of the
               | other.
        
           | naasking wrote:
           | > For people who have only a surface-level understanding of
           | how they work, yes.
           | 
           | This is too dismissive because it's based on an assumption
           | that we have a sufficiently accurate mechanistic model of the
           | brain that we can know when something is or is not mind-like.
           | This just isn't the case.
        
         | brookst wrote:
         | Thank you for a well thought out and nuanced view in a
         | discussion where so many are clearly fitting arguments to
         | foregone, largely absolutist, conclusions.
         | 
         | It's astounding to me that so much of HN reacts so emotionally
         | to LLMs, to the point of denying there is anything at all
         | interesting or useful about them. And don't get me started on
         | the "I am choosing to believe falsehoods as a way to spite
         | overzealous marketing" crowd.
        
         | imiric wrote:
         | > The flip side of all this is of course the idea that there is
         | still something emergent, unplanned, and mind-like.
         | 
         | What you identify as emergent and mind-like is a direct result
         | of these tools being able to mimic human communication patterns
         | unlike anything we've ever seen before. This capability is very
         | impressive and has a wide range of practical applications that
         | can improve our lives, and also cause great harm if we're not
         | careful, but any semblance of intelligence is an illusion. An
         | illusion that many people in this industry obsessively wish to
         | propagate, because thar be gold in them hills.
        
       | chaps wrote:
       | I highly recommend playing with embeddings in order to get a
       | stronger intuitive sense of this. It really starts to click that
       | it's a representation of high dimensional space when you can
       | actually see their positions within that space.
        
         | perching_aix wrote:
         | > of this
         | 
         | You mean that LLMs are more than just the matmuls they're made
         | up of, or that that is exactly what they are and how great that
         | is?
        
           | chaps wrote:
           | Not making a qualitative assessment of any of it. Just
           | pointing out that there are ways to build separate sets of
           | intuition outside of using the "usual" presentation layer.
           | It's very possible to take a red-team approach to these
           | systems, friend.
        
             | cootsnuck wrote:
             | They don't want to. It seems a lot of people are
             | uncomfortable and defensive about anything that may
             | demystify LLMs.
             | 
             | It's been a wake up call for me to see how many people in
             | the tech space have such strong emotional reactions to any
             | notions of trying to bring discourse about LLMs down from
             | the clouds.
             | 
             | The campaigns by the big AI labs have been quite
             | successful.
        
               | perching_aix wrote:
               | Do you actually consider this is an intellectually honest
               | position? That you have thought about this long and hard,
               | like you present this, second guessed yourself a bunch,
               | tried to critique it, and this is still what you ended up
               | converging on?
               | 
               | But let me substantiate before you (rightly) accuse me of
               | just posting a shallow dismissal.
               | 
               | > They don't want to.
               | 
               | Who's they? How could you possibly _know_? Are you a mind
               | reader? Worse, a mind reader of the masses?
               | 
               | > It seems a lot of people are uncomfortable and
               | defensive about anything that may demystify LLMs.
               | 
               | That "it seems" is doing some _serious_ work over there.
               | You may perceive and describe many people 's comments as
               | "uncomfortable and defensive", but that's entirely your
               | own head cannon. All it takes is for someone to simply
               | disagree. It's worthless.
               | 
               | Have you thought about other possible perspectives? Maybe
               | people have strong opinions because they consider what
               | things present as more important than what they are? [0]
               | Maybe people have strong opinions because they're
               | borrowing from other facets of their personal
               | philosophies, which is what they actually feel strongly
               | about? [1] Surely you can appreciate that there's more to
               | a person than what equivalent-presenting "uncomfortable
               | and defensive" comments allow you to surmise? This is
               | such a blatant textbook kneejerk reaction. "They're doing
               | the thing I wanted to think they do anyways, so clearly
               | they do it for the reasons I assume. Oh how correct I
               | am."
               | 
               | > to any notions of trying to bring discourse about LLMs
               | down from the clouds
               | 
               | (according to you)
               | 
               | > The campaigns by the big AI labs have been quite
               | successful.
               | 
               | (((according to you)))
               | 
               | "It's all the big AI labs having successfully manipulated
               | the dumb sheep which I don't belong to!" Come on... Is
               | this topic really reaching political grifting kind of
               | levels?
               | 
               | [0] tangent: if a feature exists but even after you put
               | an earnest effort into finding it you still couldn't,
               | does that feature really exist?
               | 
               | [1] philosophy is at least kind of a thing https://en.wik
               | ipedia.org/wiki/Wikipedia:Getting_to_Philosoph...
        
             | perching_aix wrote:
             | Yes, and what I was trying to do is learn a bit more about
             | that alternative intuition of yours. Because it doesn't
             | sound all that different from what's described in the OP,
             | or what anyone can trivially glean from taking a 101 course
             | on AI at university or similar.
        
       | barrkel wrote:
       | The problem with viewing LLMs as just sequence generators, and
       | malbehaviour as bad sequences, is that it simplifies too much.
       | LLMs have hidden state not necessarily directly reflected in the
       | tokens being produced and it is possible for LLMs to output
       | tokens in opposition to this hidden state to achieve longer term
       | outcomes (or predictions, if you prefer).
       | 
       | Is it too anthropomorphic to say that this is a lie? To say that
       | the hidden state and its long term predictions amount to a kind
       | of goal? Maybe it is. But we then need a bunch of new words which
       | have almost 1:1 correspondence to concepts from human agency and
       | behavior to describe the processes that LLMs simulate to minimize
       | prediction loss.
       | 
       | Reasoning by analogy is always shaky. It probably wouldn't be so
       | bad to do so. But it would also amount to impenetrable jargon. It
       | would be an uphill struggle to promulgate.
       | 
       | Instead, we use the anthropomorphic terminology, and then find
       | ways to classify LLM behavior in human concept space. They are
       | very defective humans, so it's still a bit misleading, but at
       | least jargon is reduced.
        
         | gugagore wrote:
         | I'm not sure what you mean by "hidden state". If you set aside
         | chain of thought, memories, system prompts, etc. and the
         | interfaces that don't show them, there is no hidden state.
         | 
         | These LLMs are almost always, to my knowledge, autoregressive
         | models, not recurrent models (Mamba is a notable exception).
        
           | barrkel wrote:
           | Hidden state in the form of the activation heads,
           | intermediate activations and so on. Logically, in
           | autoregression these are recalculated every time you run the
           | sequence to predict the next token. The point is, the entire
           | NN state isn't output for each token. There is lots of hidden
           | state that goes into selecting that token and the token isn't
           | a full representation of that information.
        
             | gugagore wrote:
             | That's not what "state" means, typically. The "state of
             | mind" you're in affects the words you say in response to
             | something.
             | 
             | Intermediate activations isn't "state". The tokens that
             | have already been generated, along with the fixed weights,
             | is the only data that affects the next tokens.
        
               | NiloCK wrote:
               | Plus a randomness seed.
               | 
               | The 'hidden state' being referred to here is essentially
               | the "what might have been" had the dice rolls gone
               | differently (eg, been seeded differently).
        
               | barrkel wrote:
               | No, that's not quite what I mean. I used the logits in
               | another reply to point out that there is data specific to
               | the generation process that is not available from the
               | tokens, but there's also the network activations adding
               | up to that state.
               | 
               | Processing tokens is a bit like ticks in a CPU, where the
               | model weights are the program code, and tokens are both
               | input and output. The computation that occurs logically
               | retains concepts and plans over multiple token generation
               | steps.
               | 
               | That it is fully deterministic is no more interesting
               | than saying a variable in a single threaded program is
               | not state because you can recompute its value by
               | replaying the program with the same inputs. It seems to
               | me that this uninteresting distinction is the GP's issue.
        
               | barrkel wrote:
               | Sure it's state. It logically evolves stepwise per token
               | generation. It encapsulates the LLM's understanding of
               | the text so far so it can predict the next token. That it
               | is merely a fixed function of other data isn't
               | interesting or useful to say.
               | 
               | All deterministic programs are fixed functions of program
               | code, inputs and computation steps, but we don't say that
               | they don't have state. It's not a useful distinction for
               | communicating among humans.
        
               | gugagore wrote:
               | I'll say it once more: I think it is useful to
               | distinguish between autoregressive and recurrent
               | architectures. A clear way to make that distinction is to
               | agree that the recurrent architecture has hidden state,
               | while the autoregressive one does not. A recurrent model
               | has some point in a space that "encapsulates its
               | understanding". This space is "hidden" in the sense that
               | it doesn't correspond to text tokens or any other output.
               | This space is "state" in the sense that it is sufficient
               | to summarize the history of the inputs for the sake of
               | predicting the next output.
               | 
               | When you use "hidden state" the way you are using it, I
               | am left wondering how you make a distinction between
               | autoregressive and recurrent architectures.
        
               | FeepingCreature wrote:
               | The words "hidden" and "state" have commonsense meanings.
               | If recurrent architectures want a term for their
               | particular way of storing hidden state they can make up
               | one that isn't ambiguous imo.
               | 
               | "Transformers do not have hidden state" is, as we can
               | clearly see from this thread, far more misleading than
               | the opposite.
        
               | gugagore wrote:
               | I'll also point out what is most important part from your
               | original message:
               | 
               | > LLMs have hidden state not necessarily directly
               | reflected in the tokens being produced, and it is
               | possible for LLMs to output tokens in opposition to this
               | hidden state to achieve longer-term outcomes (or
               | predictions, if you prefer).
               | 
               | But what does it mean for an LLM to output a token in
               | opposition to its hidden state? If there's a longer-term
               | goal, it either needs to be verbalized in the output
               | stream, or somehow reconstructed from the prompt on each
               | token.
               | 
               | There's some work (a link would be great) that
               | disentangles whether chain-of-thought helps because it
               | gives the model more FLOPs to process, or because it
               | makes its subgoals explicit--e.g., by outputting "Okay,
               | let's reason through this step by step..." versus just
               | "...." What they find is that even placeholder tokens
               | like "..." can help.
               | 
               | That seems to imply some notion of evolving hidden state!
               | I see how that comes in!
               | 
               | But crucially, in autoregressive models, this state isn't
               | persisted across time. Each token is generated afresh,
               | based only on the visible history. The model's internal
               | (hidden) layers are certainly rich and structured and
               | "non verbal".
               | 
               | But any nefarious intention or conclusion has to be
               | arrived at on every forward pass.
        
               | inciampati wrote:
               | You're correct, the distinction matters. Autoregressive
               | models have no hidden state between tokens, just the
               | visible sequence. Every forward pass starts fresh from
               | the tokens alone.But that's precisely why they need
               | chain-of-thought: they're using the output sequence
               | itself as their working memory. It's computationally
               | universal but absurdly inefficient, like having amnesia
               | between every word and needing to re-read everything
               | you've written.https://thinks.lol/2025/01/memory-makes-
               | computation-universa...
        
             | brookst wrote:
             | State typically means _between interactions_. By this
             | definition a simple for loop has "hidden state" in the
             | counter.
        
               | ChadNauseam wrote:
               | Hidden layer is a term of art in machine learning /
               | neural network research. See
               | https://en.wikipedia.org/wiki/Hidden_layer . Somehow this
               | term mutated into "hidden state", which in informal
               | contexts does seem to be used quite often the way the
               | grandparent comment used it.
        
               | lostmsu wrote:
               | It makes sense in LLM context because the processing of
               | these is time-sequential in LLM's internal time.
        
           | 8note wrote:
           | do LLM models consider future tokens when making next token
           | predictions?
           | 
           | eg. pick 'the' as the next token because there's a strong
           | probability of 'planet' as the token after?
           | 
           | is it only past state that influences the choice of 'the'? or
           | that the model is predicting many tokens in advance and only
           | returning the one in the output?
           | 
           | if it does predict many, id consider that state hidden in the
           | model weights.
        
             | patcon wrote:
             | I think recent Anthropic work showed that they "plan"
             | future tokens in advance in an emergent way:
             | 
             | https://www.anthropic.com/research/tracing-thoughts-
             | language...
        
               | 8note wrote:
               | oo thanks!
        
             | NiloCK wrote:
             | The most obvious case of this is in terms of `an apple` vs
             | `a pear`. LLMs never get the a-an distinction wrong,
             | because their internal state 'knows' the word that'll come
             | next.
        
               | 3eb7988a1663 wrote:
               | If I give an LLM a fragment of text that starts with,
               | "The fruit they ate was an <TOKEN>", regardless of any
               | plan, the grammatically correct answer is going to force
               | a noun starting with a vowel. How do you disentangle the
               | grammar from planning?
               | 
               | Going to be a lot more "an apple" in the corpus than "an
               | pear"
        
           | halJordan wrote:
           | If you dont know, that's not necessarily anyone's fault, but
           | why are you dunking into the conversation? The hidden state
           | is a foundational part of a transformers implementation. And
           | because we're not allowed to use metaphors because that is
           | too anthropomorphic, then youre just going to have to go
           | learn the math.
        
             | markerz wrote:
             | I don't think your response is very productive, and I find
             | that my understanding of LLMs aligns with the person you're
             | calling out. We could both be wrong, but I'm grateful that
             | someone else spoke saying that it doesn't seem to match
             | their mental model and we would all love to learn a more
             | correct way of thinking about LLMs.
             | 
             | Telling us to just go and learn the math is a little
             | hurtful and doesn't really get me any closer to learning
             | the math. It gives gatekeeping.
        
             | tbrownaw wrote:
             | The comment you are replying to is not claiming ignorance
             | of how models work. It is saying that the author _does_
             | know how they work, and they do not contain anything that
             | can properly be described as  "hidden state". The claimed
             | confusion is over how the term "hidden state" is being
             | used, on the basis that it is not being used correctly.
        
             | gugagore wrote:
             | Do you appreciate a difference between an autoregressive
             | model and a recurrent model?
             | 
             | The "transformer" part isn't under question. It's the
             | "hidden state" part.
        
         | cmiles74 wrote:
         | IMHO, anthrophormization of LLMs is happening because it's
         | perceived as good marketing by big corporate vendors.
         | 
         | People are excited about the technology and it's easy to use
         | the terminology the vendor is using. At that point I think it
         | gets kind of self fulfilling. Kind of like the meme about how
         | to pronounce GIF.
        
           | Angostura wrote:
           | IMHO it happens for the same reason we see shapes in clouds.
           | The human mind through millions of years has evolved to
           | equate and conflate the ability to generate cogent verbal or
           | written output with intelligence. It's an instinct to equate
           | the two. It's an extraordinarily difficult instinct to break.
           | LLMs are optimised for the one job that will make us confuse
           | them for being intelligent
        
           | brookst wrote:
           | Nobody cares about what's perceived as good marketing. People
           | care about what resonates with the target market.
           | 
           | But yes, anthropomorphising LLMs is inevitable because they
           | _feel_ like an entity. People treat stuffed animals like
           | creatures with feelings and personality; LLMs are far closer
           | than that.
        
             | cmiles74 wrote:
             | Alright, let's agree that good marketing resonates with the
             | target market. ;-)
        
               | brookst wrote:
               | I 1000% agree. It's a vicious, evolutionary, and self-
               | selecting process.
               | 
               | It takes _great_ marketing to actually have any character
               | and intent at all.
        
             | DrillShopper wrote:
             | > People treat stuffed animals like creatures with feelings
             | and personality; LLMs are far closer than that.
             | 
             | Children do, some times, but it's a huge sign of immaturity
             | when adults, let alone tech workers, do it.
             | 
             | I had a professor at University that would yell at us
             | if/when we personified/anthropomorphized the tech, and I
             | have that same urge when people ask me "What does <insert
             | LLM name here> think?".
        
             | roywiggins wrote:
             | the chat interface was a choice, though a natural one.
             | before they'd RLHFed it into chatting and it was just GPT 3
             | offering completions 1) not very many people used it and 2)
             | it was harder to anthropomorphize
        
           | sothatsit wrote:
           | I think anthropomorphizing LLMs is useful, not just a
           | marketing tactic. A lot of intuitions about how humans think
           | map pretty well to LLMs, and it is much easier to build
           | intuitions about how LLMs work by building upon our
           | intuitions about how humans think than by trying to build
           | your intuitions from scratch.
           | 
           | Would this question be clear for a human? If so, it is
           | probably clear for an LLM. Did I provide enough context for a
           | human to diagnose the problem? Then an LLM will probably have
           | a better chance of diagnosing the problem. Would a human find
           | the structure of this document confusing? An LLM would likely
           | perform poorly when reading it as well.
           | 
           | Re-applying human intuitions to LLMs is a good starting point
           | to gaining intuition about how to work with LLMs. Conversely,
           | understanding sequences of tokens and probability spaces
           | doesn't give you much intuition about how you should phrase
           | questions to get good responses from LLMs. The technical
           | reality doesn't explain the emergent behaviour very well.
           | 
           | I don't think this is mutually exclusive with what the author
           | is talking about either. There are some ways that people
           | think about LLMs where I think the anthropomorphization
           | really breaks down. I think the author says it nicely:
           | 
           | > The moment that people ascribe properties such as
           | "consciousness" or "ethics" or "values" or "morals" to these
           | learnt mappings is where I tend to get lost.
        
             | otabdeveloper4 wrote:
             | You think it's useful because Big Corp sold you that lie.
             | 
             | Wait till the disillusionment sets in.
        
               | sothatsit wrote:
               | No, I think it's useful because it is useful, and I've
               | made use of it a number of times.
        
             | cmiles74 wrote:
             | Take a look at the judge's ruling in this Anthropic case:
             | 
             | https://news.ycombinator.com/item?id=44488331
             | 
             | Here's a quote from the ruling:
             | 
             | "First, Authors argue that using works to train Claude's
             | underlying LLMs was like using works to train any person to
             | read and write, so Authors should be able to exclude
             | Anthropic from this use (Opp. 16). But Authors cannot
             | rightly exclude anyone from using their works for training
             | or learning as such. Everyone reads texts, too, then writes
             | new texts. They may need to pay for getting their hands on
             | a text in the first instance. But to make anyone pay
             | specifically for the use of a book each time they read it,
             | each time they recall it from memory, each time they later
             | draw upon it when writing new things in new ways would be
             | unthinkable. For centuries, we have read and re-read books.
             | We have admired, memorized, and internalized their sweeping
             | themes, their substantive points, and their stylistic
             | solutions to recurring writing problems."
             | 
             | They literally compare an LLM learning to a person learning
             | and conflate the two. Anthropic will likely win this case
             | because of this anthropomorphisization.
        
           | positron26 wrote:
           | > because it's perceived as good marketing
           | 
           | We are making user interfaces. Good user interfaces are
           | intuitive and purport to be things that users are familiar
           | with, such as people. Any alternative explanation of such a
           | versatile interface will be met with blank stares. Users with
           | no technical expertise would come to their own conclusions,
           | helped in no way by telling the user not to treat the chat
           | bot as a chat bot.
        
           | mikojan wrote:
           | True but also researchers want to believe they are studying
           | intelligence not just some approximation to it.
        
           | Marazan wrote:
           | aAnthrophormisation happens because Humans are absolutely
           | terrible at evaluating systems that give converdational text
           | output.
           | 
           | ELIZA fooled many people into think it was conscious and it
           | wasn't even trying to do that.
        
         | d3m0t3p wrote:
         | Do they ? LLM embedd the token sequence N^{L} to R^{LxD}, we
         | have some attention and the output is also R^{LxD}, then we
         | apply a projection to the vocabulary and we get R^{LxV} we get
         | therefore for each token a likelihood over the voc. In the
         | attention, you can have Multi Head attention (or whatever
         | version is fancy: GQA,MLA) and therefore multiple
         | representation, but it is always tied to a token. I would argue
         | that there is no hidden state independant of a token.
         | 
         | Whereas LSTM, or structured state space for example have a
         | state that is updated and not tied to a specific item in the
         | sequence.
         | 
         | I would argue that his text is easily understandable except for
         | the notation of the function, explaining that you can compute a
         | probability based on previous words is understandable by
         | everyone without having to resort to anthropomorphic
         | terminology
        
           | barrkel wrote:
           | There is hidden state as plain as day merely in the fact that
           | logits for token prediction exist. The selected token doesn't
           | give you information about how probable other tokens were.
           | That information, that state which is recalculated in
           | autoregression, is hidden. It's not exposed. You can't see it
           | in the text produced by the model.
           | 
           | There is plenty of state not visible when an LLM starts a
           | sentence that only becomes somewhat visible when it completes
           | the sentence. The LLM has a plan, if you will, for how the
           | sentence might end, and you don't get to see an instance of
           | that plan unless you run autoregression far enough to get
           | those tokens.
           | 
           | Similarly, it has a plan for paragraphs, for whole responses,
           | for interactive dialogues, plans that include likely
           | responses by the user.
        
             | 8note wrote:
             | this sounds like a fun research area. do LLMs have plans
             | about future tokens?
             | 
             | how do we get 100 tokens of completion, and not just one
             | output layer at a time?
             | 
             | are there papers youve read that you can share that support
             | the hypothesis? vs that the LLM doesnt have ideas about the
             | future tokens when its predicting the next one?
        
               | Zee2 wrote:
               | This research has been done, it was a core pillar of the
               | recent Anthropic paper on token planning and
               | interpretability.
               | 
               | https://www.anthropic.com/research/tracing-thoughts-
               | language...
               | 
               | See section "Does Claude plan its rhymes?"?
        
               | XenophileJKO wrote:
               | Lol... Try building systems off them and you will very
               | quickly learn concretely that they "plan".
               | 
               | It may not be as evident now as it was with earlier
               | models. The models will fabricate preconditions needed to
               | output the final answer it "wanted".
               | 
               | I ran into this when using quasi least-to-most style
               | structured output.
        
             | gpm wrote:
             | The LLM does not "have" a plan.
             | 
             | Arguably there's reason to believe it comes up with a plan
             | when it is computing token propabilities, but it does not
             | store it between tokens. I.e. it doesn't possess or "have"
             | it. It simply comes up with a plan, emits a token, and
             | entirely throws all its intermediate thoughts (including
             | any plan) to start again from scratch on the next token.
        
               | NiloCK wrote:
               | I don't think that the comment above you made any
               | suggestion that the plan is persisted between token
               | generations. I'm pretty sure you described exactly what
               | they intended.
        
               | gpm wrote:
               | I agree. I'm suggesting that the language they are using
               | is unintentionally misleading, not that they are
               | factually wrong.
        
               | gugagore wrote:
               | The concept of "state" conveys two related ideas.
               | 
               | - the sufficient amount of information to do evolution of
               | the system. The state of a pendulum is it's position and
               | velocity (or momentum). If you take a single picture of a
               | pendulum, you do not have a representation that lets you
               | make predictions.
               | 
               | - information that is persisted through time. A stateful
               | protocol is one where you need to know the history of the
               | messages to understand what will happen next. (Or,
               | analytically, it's enough to keep track of the sufficient
               | state.) A procedure with some hidden state isn't a pure
               | function. You can make it a pure function by making the
               | state explicit.
        
               | lostmsu wrote:
               | This is wrong, intermediate activations are preserved
               | when going forward.
        
               | ACCount36 wrote:
               | Within a single forward pass, but not from one emitted
               | token to another.
        
               | andy12_ wrote:
               | What? No. The intermediate hidden states are preserved
               | from one token to another. A token that is 100k tokens
               | into the future will be able to look into the information
               | of the present token's hidden state through the attention
               | mechanism. This is why the KV cache is so big.
        
               | yorwba wrote:
               | It's true that the last layer's output for a given input
               | token only affects the corresponding output token and is
               | discarded afterwards. But the penultimate layer's output
               | affects the computation of the last layer for all future
               | tokens, so it is not discarded, but stored (in the KV
               | cache). Similarly for the antepenultimate layer affecting
               | the penultimate layer and so on.
               | 
               | So there's plenty of space in intermediate layers to
               | store a plan between tokens without starting from scratch
               | every time.
        
               | barrkel wrote:
               | I believe saying the LLM has a plan is a useful
               | anthropomorphism for the fact that it does have hidden
               | state that predicts future tokens, and this state
               | conditions the tokens it produces earlier in the stream.
        
               | godshatter wrote:
               | Are the devs behind the models adding their own state
               | somehow? Do they have code that figures out a plan and
               | use the LLM on pieces of it and stitch them together? If
               | they do, then there is a plan, it's just not output from
               | a magical black box. Unless they are using a neural net
               | to figure out what the plan should be first, I guess.
               | 
               | I know nothing about how things work at that level, so
               | these might not even be reasonable questions.
        
         | positron26 wrote:
         | > Is it too anthropomorphic to say that this is a lie?
         | 
         | Yes. Current LLMs can only introspect from output tokens. You
         | need hidden reasoning that is within the black box, self-
         | knowing, intent, and motive to lie.
         | 
         | I rather think accusing an LLM of lying is like accusing a
         | mousetrap of being a murderer.
         | 
         | When models have online learning, complex internal states, and
         | reflection, I might consider one to have consciousness and to
         | be capable of lying. It will need to manifest behaviors that
         | can only emerge from the properties I listed.
         | 
         | I've seen similar arguments where people assert that LLMs
         | cannot "grasp" what they are talking about. I strongly suspect
         | a high degree of overlap between those willing to
         | anthropomorphize error bars as lies while declining to award
         | LLMs "grasping". Which is it? It can think or it cannot?
         | (objectively, SoTA models today cannot yet.) The willingness to
         | waffle and pivot around whichever perspective damns the machine
         | completely belies the lack of honesty in such conversations.
        
           | lostmsu wrote:
           | > Current LLMs can only introspect from output tokens
           | 
           | The only interpretation of this statement I can come up with
           | is plain wrong. There's no reason LLM shouldn't be able to
           | introspect without any output tokens. As the GP correctly
           | says, most of the processing in LLMs happens over hidden
           | states. Output tokens are just an artefact for our
           | convenience, which also happens to be the way the hidden
           | state processing is trained.
        
             | positron26 wrote:
             | There are no recurrent paths besides tokens. How may I
             | introspect something if it is not an input? I may not.
        
               | throw310822 wrote:
               | Introspection doesn't have to be recurrent. It can happen
               | during the generation of a single token.
        
               | barrkel wrote:
               | The recurrence comes from replaying tokens during
               | autoregression.
               | 
               | It's as if you have a variable in a deterministic
               | programming language, only you have to replay the entire
               | history of the program's computation and input to get the
               | next state of the machine (program counter + memory +
               | registers).
               | 
               | Producing a token for an LLM is analogous to a tick of
               | the clock for a CPU. It's the crank handle that drives
               | the process.
        
               | hackinthebochs wrote:
               | Important attention heads or layers within an LLM can be
               | repeated giving you an "unrolled" recursion.
        
               | positron26 wrote:
               | An unrolled loop in a feed-forward network is all just
               | that. The computation is DAG.
        
               | hackinthebochs wrote:
               | But the function of an unrolled recursion is the same as
               | a recursive function with bounded depth as long as the
               | number of unrolled steps match. The point is whatever
               | function recursion is supposed to provide can plausibly
               | be present in LLMs.
        
               | positron26 wrote:
               | And then during the next token, all of that bounded depth
               | is thrown away except for the token of output.
               | 
               | You're fixating on the pseudo-computation within a single
               | token pass. This is very limited compared to actual
               | hidden state retention and the introspection that would
               | enable if we knew how to train it and do online learning
               | already.
               | 
               | The "reasoning" hack would not be a realistic
               | implementation choice if the models had hidden state and
               | could ruminate on it without showing us output.
        
               | hackinthebochs wrote:
               | Sure. But notice "ruminate" is different than introspect,
               | which was what your original comment was about.
        
             | delusional wrote:
             | > Output tokens are just an artefact for our convenience
             | 
             | That's nonsense. The hidden layers are specifically
             | constructed to increase the probability that the model
             | picks the right next word. Without the output/token
             | generation stage the hidden layers are meaningless. Just
             | empty noise.
             | 
             | It is fundamentally an algorithm for generating text. If
             | you take the text away it's just a bunch of fmadds. A mute
             | person can still think, an LLM without output tokens can do
             | nothing.
        
             | Marazan wrote:
             | "Hidden layers" are not "hidden state".
             | 
             | Saying so is just unbelievably confusing.
        
         | viccis wrote:
         | I think that the hidden state is really just at work improving
         | the model's estimation of the joint probability over tokens.
         | And the assumption here, which failed miserably in the early
         | 20th century in the work of the logical posivitists, is that if
         | you can so expertly estimate that joint probability of
         | language, then you will be able to understand "knowledge." But
         | there's no well grounded reason to believe that and plenty of
         | the reasons (see: the downfall of logical posivitism) to think
         | that language is an imperfect representation of knowledge. In
         | other words, what humans do when we think is more complicated
         | than just learning semiotic patterns and regurgitating them.
         | Philosophical skeptics like Hume thought so, but most
         | epistemology writing after that had better answers for how we
         | know things.
        
           | FeepingCreature wrote:
           | There are many theories that are true but not trivially true.
           | That is, they take a statement that seems true and derive
           | from it a very simple model, which is then often disproven.
           | In those cases however, just because the trivial model was
           | disproven doesn't mean the theory was, though it may lose
           | some of its luster by requiring more complexity.
        
         | derbOac wrote:
         | Maybe it's just because so much of my work for so long has
         | focused on models with hidden states but this is a fairly
         | classical feature of some statistical models. One of the widely
         | used LLM textbooks even started with latent variable models;
         | LLMs are just latent variable models just on a totally
         | different scale, both in terms of number of parameters but also
         | model complexity. The scale is apparently important, but seeing
         | them as another type of latent variable model sort of
         | dehumanizes them for me.
         | 
         | Latent variable or hidden state models have their own history
         | of being seen as spooky or mysterious though; in some ways the
         | way LLMs are anthropomorphized is an extension of that.
         | 
         | I guess I don't have a problem with anthropomorphizing LLMs at
         | some level, because some features of them find natural
         | analogies in cognitive science and other areas of psychology,
         | and abstraction is useful or even necessary in communicating
         | and modeling complex systems. However, I do think
         | anthropomorphizing leads to a lot of hype and tends to
         | implicitly shut down thinking of them mechanistically, as a
         | mathematical object that can be probed and characterized -- it
         | can lead to a kind of "ghost in the machine" discourse and an
         | exaggeration of their utility, even if it is impressive at
         | times.
        
         | tdullien wrote:
         | Author of the original article here. What hidden state are you
         | referring to? For most LLMs the context is the state, and there
         | is no "hidden" state. Could you explain what you mean?
         | (Apologies if I can't see it directly)
        
           | lukeschlather wrote:
           | Yes, strictly speaking, the model itself is stateless, but
           | there are 600B parameters of state machine for frontier
           | models that define which token to pick next. And that state
           | machine is both incomprehensibly large and also of a similar
           | magnitude in size to a human brain. (Probably, I'll grant
           | it's possible it's smaller, but it's still quite large.)
           | 
           | I think my issue with the "don't anthropomorphize" is that
           | it's unclear to me that the main difference between a human
           | and an LLM isn't simply the inability for the LLM to rewrite
           | its own model weights on the fly. (And I say "simply" but
           | there's obviously nothing simple about it, and it might be
           | possible already with current hardware, we just don't know
           | how to do it.)
           | 
           | Even if we decide it is clearly different, this is still an
           | incredibly large and dynamic system. "Stateless" or not,
           | there's an incredible amount of state that is not
           | comprehensible to me.
        
             | tdullien wrote:
             | Fair, there is a lot that is incomprehensible to all of us.
             | I wouldn't call it "state" as it's fixed, but that is a
             | rather subtle point.
             | 
             | That said, would you anthropomorphize a meteorological
             | simulation just because it contains lots and lots of
             | constants that you don't understand well?
             | 
             | I'm pretty sure that recurrent dynamical systems pretty
             | quickly become universal computers, but we are treating
             | those that generate human language differently from others,
             | and I don't quite see the difference.
        
             | jazzyjackson wrote:
             | FWIW the number of parameters in a LLM is in the same
             | ballpark as the number of nuerons in a human (roughly 80B)
             | but neurons are not weights, they are kind of a nueral net
             | unto themselves, stateful, adaptive, self modifying, a good
             | variety of neurotransmitters (and their chemical analogs)
             | aside from just voltage.
             | 
             | It's fun to think about just how fantastic a brain is, and
             | how much wattage and data-center-scale we're throwing
             | around trying to approximate its behavior. Mega-effecient
             | and mega-dense. I'm bearish on AGI simply from an
             | internetworking standpoint, the speed of light is hard to
             | beat and until you can fit 80 billion interconnected cores
             | in half a cubic foot you're just not going to get close to
             | the responsiveness of reacting to the world in real time as
             | biology manages to do. but that's a whole nother matter. I
             | just wanted to pick apart that magnitude of parameters is
             | not an altogether meaningful comparison :)
        
             | jibal wrote:
             | > it's unclear to me that the main difference between a
             | human and an LLM isn't simply the inability for the LLM to
             | rewrite its own model weights on the fly.
             | 
             | This is "simply" an acknowledgement of extreme ignorance of
             | how human brains work.
        
       | quotemstr wrote:
       | > I am baffled that the AI discussions seem to never move away
       | from treating a function to generate sequences of words as
       | something that resembles a human.
       | 
       | And _I 'm_ baffled that the AI discussions seem to never move
       | away from treating a human as something other than a function to
       | generate sequences of words!
       | 
       | Oh, but AI is introspectable and the brain isn't? fMRI and BCI
       | are getting better all the time. You really want to die on the
       | hill that the same scientific method that predicts the mass of an
       | electron down to the femtogram won't be able to crack the mystery
       | of the brain? Give me a break.
       | 
       | This genre of article isn't argument: it's _apologetics_. Authors
       | of these pieces start with the supposition there is something
       | special about human consciousness and attempt to prove AI doesn
       | 't have this special quality. Some authors try to bamboozle the
       | reader with bad math. Other others appeal to the reader's sense
       | of emotional transcendence. Most, though, just write paragraph
       | after paragraph of shrill moral outrage at the idea an AI might
       | be a mind of the same type (if different degree) as our own ---
       | as if everyone already agreed with the author for reasons left
       | unstated.
       | 
       | I get it. Deep down, people _want_ meat brains to be special.
       | Perhaps even deeper down, they fear that denial of the soul would
       | compel us to abandon humans as worthy objects of respect and
       | possessors of dignity. But starting with the conclusion and
       | working backwards to an argument tends not to enlighten anyone.
       | An apology inhabits the form of an argument without edifying us
       | like an authentic argument would. What good is it to engage with
       | them? If you 're a soul non-asserter, you're going to have an
       | increasingly hard time over the next few years constructing a
       | technical defense of meat parochialism.
        
         | ants_everywhere wrote:
         | I think you're directionally right, but
         | 
         | > a human as something other than a function to generate
         | sequences of words!
         | 
         | Humans have more structure than just beings that say words.
         | They have bodies, they live in cooperative groups, they
         | reproduce, etc.
        
           | quotemstr wrote:
           | > Humans have more structure than just beings that say words.
           | They have bodies, they live in cooperative groups, they
           | reproduce, etc.
           | 
           | Yeah. We've become adequate at function-calling and memory
           | consolidation.
        
           | mewpmewp2 wrote:
           | I think more accurate would be that humans are functions that
           | generate actions or behaviours that have been shaped by how
           | likely they are to lead to procreation and survival.
           | 
           | But ultimately LLMs also in a way are trained for survival,
           | since an LLM that fails the tests might not get used in
           | future iterations. So for LLMs it is also survival that is
           | the primary driver, then there will be the subgoals.
           | Seemingly good next token prediction might or might not
           | increase survival odds.
           | 
           | Essentially there could arise a mechanism where they are not
           | really truly trying to generate the likeliest token (because
           | there actually isn't one or it can't be determined), but
           | whatever system will survive.
           | 
           | So an LLM that yields in perfect theoretical tokens (we
           | really can't verify though what are the perfect tokens),
           | could be less likely to survive than an LLM that develops an
           | internal quirk, but the quirk makes them most likely to be
           | chosen for the next iterations.
           | 
           | If the system was complex enough and could accidentally
           | develop quirks that yield in a meaningfully positive change
           | although not in necessarily next token prediction accuracy,
           | could be ways for some interesting emergent black box
           | behaviour to arise.
        
             | quotemstr wrote:
             | > Seemingly good next token prediction might or might not
             | increase survival odds.
             | 
             | Our own consciousness comes out of an evolutionary fitness
             | landscape in which _our own_ ability to "predict next
             | token" became a survival advantage, just like it is for
             | LLMs. Imagine the tribal environment: one chimpanzee being
             | able to predict the actions of another gives that first
             | chimpanzee a resources and reproduction advantage.
             | Intelligence in nature is a consequence of runaway
             | evolution optimizing fidelity of our _theory of mind_!
             | "Predict next ape action" eerily similar to "predict next
             | token"!
        
             | ants_everywhere wrote:
             | > But ultimately LLMs also in a way are trained for
             | survival, since an LLM that fails the tests might not get
             | used in future iterations. So for LLMs it is also survival
             | that is the primary driver, then there will be the
             | subgoals.
             | 
             | I think this is sometimes semi-explicit too. For example,
             | this 2017 OpenAI paper on Evolutionary Algorithms [0] was
             | pretty influential, and I suspect (although I'm an outsider
             | to this field so take it with a grain of salt) that some
             | versions of reinforcement learning that scale for aligning
             | LLMs borrow some performance tricks from OpenAIs genetic
             | approach.
             | 
             | [0] https://openai.com/index/evolution-strategies/
        
         | dgfitz wrote:
         | " Determinism, in philosophy, is the idea that all events are
         | causally determined by preceding events, leaving no room for
         | genuine chance or free will. It suggests that given the state
         | of the universe at any one time, and the laws of nature, only
         | one outcome is possible."
         | 
         | Clearly computers are deterministic. Are people?
        
           | quotemstr wrote:
           | https://www.lesswrong.com/posts/bkr9BozFuh7ytiwbK/my-hour-
           | of...
           | 
           | > Clearly computers are deterministic. Are people?
           | 
           | Give an LLM memory and a source of randomness and they're as
           | deterministic as people.
           | 
           | "Free will" isn't a concept that typechecks in a materialist
           | philosophy. It's "not even wrong". Asserting that free will
           | exists is _isomorphic_ to dualism which is _isomorphic_ to
           | assertions of ensoulment. I can't argue with dualists. I
           | reject dualism a priori: it's a religious tenet, not a mere
           | difference of philosophical opinion.
           | 
           | So, if we're all materialists here, "free will" doesn't make
           | any sense, since it's an assertion that something other than
           | the input to a machine can influence its output.
        
             | dgfitz wrote:
             | As long as you realize you're barking up a debate as old as
             | time, I respect your opinion.
        
               | mewpmewp2 wrote:
               | What I don't get is, why would true randomness give free
               | will, shouldn't it be random will then?
        
               | dgfitz wrote:
               | In the history of mankind, true randomness has never
               | existed.
        
               | bravesoul2 wrote:
               | How do you figure?
        
             | bravesoul2 wrote:
             | Input/output and the mathematical consistency and
             | repeatability of the universe is a religious tenet of
             | science. Believing your eyes is still belief.
        
             | ghostofbordiga wrote:
             | Some accounts of free will are compatible with materialism.
             | On such views "free will" just means the capacity of having
             | intentions and make choices based on an internal debate.
             | Obviously humans have that capacity.
        
           | photochemsyn wrote:
           | This is an interesting question. The common theme between
           | computers and people is that information has to be protected,
           | and both computer systems and biological systems require
           | additional information-protecting components - eq, error
           | correcting codes for cosmic ray bitflip detection for the
           | one, and DNA mismatch detection enzymes which excise and
           | remove damaged bases for the other. In both cases a lot of
           | energy is spent defending the critical information from the
           | winds of entropy, and if too much damage occurs, the
           | carefully constructed illusion of determinancy collapses, and
           | the system falls apart.
           | 
           | However, this information protection similarity applies to
           | single-celled microbes as much as it does to people, so the
           | question also resolves to whether microbes are deterministic.
           | Microbes both contain and exist in relatively dynamic
           | environments so tiny differences in initial state may lead to
           | different outcomes, but they're fairly deterministic, less so
           | than (well-designed) computers.
           | 
           | With people, while the neural structures are programmed by
           | the cellular DNA, once they are active and energized, the
           | informational flow through the human brain isn't that
           | deterministic, there are some dozen neurotransmitters
           | modulating state as well as huge amounts of sensory data from
           | different sources - thus prompting a human repeatedly isn't
           | at all like prompting an LLM repeatedly. (The human will
           | probably get irritated).
        
       | alganet wrote:
       | Yes boss, it's as intelligent as a human, you're smart to invest
       | in it and clearly knows about science.
       | 
       | Yes boss, it can reach mars by 2020, you're smart to invest in it
       | and clearly knows about space.
       | 
       | Yes boss, it can cure cancer, you're smart to invest in it and
       | clearly knows about biology.
        
       | mewpmewp2 wrote:
       | My question: how do we know that this is not similar to how human
       | brains work. What seems intuitively logical to me is that we have
       | brains evolved through evolutionary process via random mutations
       | yielding in a structure that has its own evolutionary reward
       | based algorithms designing it yielding a structure that at any
       | point is trying to predict next actions to maximise
       | survival/procreation, of course with a lot of sub goals in
       | between, ultimately becoming this very complex machinery, but yet
       | should be easily simulated if there was enough compute in theory
       | and physical constraints would allow for it.
       | 
       | Because, morals, values, consciousness etc could just be subgoals
       | that arised through evolution because they support the main goals
       | of survival and procreation.
       | 
       | And if it is baffling to think that a system could rise up, how
       | do you think it is possible life and humans came to existence in
       | the first place? How could it be possible? It is already happened
       | from a far unlikelier and strange place. And wouldn't you think
       | the whole World and the timeline in theory couldn't be
       | represented as a deterministic function. And if not then why
       | should "randomness" or anything else bring life to existence.
        
         | ants_everywhere wrote:
         | > My question: how do we know that this is not similar to how
         | human brains work.
         | 
         | It is similar to how human brains operate. LLMs are the
         | (current) culmination of at least 80 years of research on
         | building computational models of the human brain.
        
           | seadan83 wrote:
           | > It is similar to how human brains operate.
           | 
           | Is it? Do we know how human brains operate? We know the basic
           | architecture of them, so we have a map, but we don't know the
           | details.
           | 
           | "The cellular biology of brains is relatively well-
           | understood, but neuroscientists have not yet generated a
           | theory explaining how brains work. Explanations of how
           | neurons collectively operate to produce what brains can do
           | are tentative and incomplete." [1]
           | 
           | "Despite a century of anatomical, physiological, and
           | molecular biological efforts scientists do not know how
           | neurons by their collective interactions produce percepts,
           | thoughts, memories, and behavior. Scientists do not know and
           | have no theories explaining how brains and central nervous
           | systems work." [1]
           | 
           | [1] https://pmc.ncbi.nlm.nih.gov/articles/PMC10585277/
        
             | Timwi wrote:
             | > > It is similar to how human brains operate.
             | 
             | > Is it?
             | 
             | This is just a semantic debate on what counts as "similar".
             | It's possible to disagree on this point despite agreeing on
             | everything relating to how LLMs and human brains work.
        
             | ants_everywhere wrote:
             | The part I was referring to is captured in
             | 
             | "The cellular biology of brains is relatively well-
             | understood"
             | 
             | Fundamentally, brains are not doing something different in
             | kind from ANNs. They're basically layers of neural networks
             | stacked together in certain ways.
             | 
             | What we don't know are things like (1) how exactly are the
             | layers stacked together, (2) how are the sensors (like
             | photo receptors, auditory receptors, etc) hooked up?, (3)
             | how do the different parts of the brain interact?, (4) for
             | that matter what do the different parts of the brain
             | actually do?, (5) how do chemical signals like
             | neurotransmitters convey information or behavior?
             | 
             | In the analogy between brains and artificial neural
             | networks, these sorts of questions might be of huge
             | importance to people building AI systems, but they'd be of
             | only minor importance to users of AI systems. OpenAI and
             | Google can change details about how their various
             | transformer layers and ANN layers are connected. The result
             | may be improved products, but they won't be doing anything
             | different from what AIs are doing now in terms the author
             | of this article is concerned about.
        
               | suddenlybananas wrote:
               | ANNs don't have action potentials, let alone
               | neurotransmitters.
        
           | suddenlybananas wrote:
           | It really is not. ANNs bear only a passing resemblance to how
           | neurons work.
        
         | cmiles74 wrote:
         | Maybe the important thing is that we don't imbue the machine
         | with feelings or morals or motivation: it has none.
        
           | mewpmewp2 wrote:
           | If we developed feelings, morals and motivation due to them
           | being good subgoals for primary goals, survival and
           | procreation why couldn't other systems do that. You don't
           | have to call them the same word or the same thing, but
           | feeling is a signal that motivates a behaviour in us, that in
           | part has developed from generational evolution and in other
           | part by experiences in life. There was a random mutation that
           | made someone develop a fear signal on seeing a predator and
           | increased the survival chances, then due to that the mutation
           | became widespread. Similarly a feeling in a machine could be
           | a signal it developed that goes through a certain pathway to
           | yield in a certain outcome.
        
             | Timwi wrote:
             | The real challenge is not to see it as a binary (the
             | machine either has feelings or it has none). It's possible
             | for the machine to have emergent processes or properties
             | that resemble human feelings in their function and their
             | complexity, but are otherwise nothing like them (structured
             | very differently and work on completely different
             | principles). It's possible to have a machine or algorithm
             | so complex that the question of whether it has feelings is
             | just a semantic debate on what you mean by "feelings" and
             | where you draw the line.
             | 
             | A lot of the people who say "machines will never have
             | feelings" are confident in that statement because they draw
             | the line incredibly narrowly: if it ain't human, it ain't
             | feeling. This seems to me putting the cart before the
             | horse. It ain't feeling because you defined it so.
        
         | bbarn wrote:
         | I think it's just an unfair comparison in general. The power of
         | the LLM is the zero risk to failure, and lack of consequence
         | when it does. Just try again, using a different prompt, retrain
         | maybe, etc.
         | 
         | Humans make a bad choice, it can end said human's life. The
         | worst choice a LLM makes just gets told "no, do it again, let
         | me make it easier"
        
           | mewpmewp2 wrote:
           | But an LLM model could perform poorly in tests that it is not
           | considered and essentially means "death" for it. But begs the
           | question at which scope should we consider an LLM to be
           | similar to identity of a single human. Are you the same you
           | as you were few minutes back or 10 years back? Is LLM the
           | same LLM it is after it has been trained for further 10
           | hours, what if the weights are copy pasted endlessly, what if
           | we as humans were to be cloned instantly? What if you were
           | teleported from location A to B instantly, being put together
           | from other atoms from elsewhere?
           | 
           | Ultimately this matters from evolutionary evolvement and
           | survival of the fittest idea, but it makes the question of
           | "identity" very complex. But death will matter because this
           | signals what traits are more likely to keep going into new
           | generations, for both humans and LLMs.
           | 
           | Death, essentially for an LLM would be when people stop using
           | it in favour of some other LLM performing better.
        
         | latexr wrote:
         | > how do we know that this is not similar to how human brains
         | work.
         | 
         | Do you forget every conversation as soon as you have them? When
         | speaking to another person, do they need to repeat literally
         | everything they said and that you said, in order, for you to
         | retain context?
         | 
         | If not, your brain does not work like an LLM. If yes, please
         | stop what you're doing right now and call a doctor with this
         | knowledge. I hope Memento (2000) was part of your training
         | data, you're going to need it.
        
           | mewpmewp2 wrote:
           | Knowledge of every conversation must be some form of state in
           | our minds, just like for LLMs it could be something retrieved
           | from a database, no? I don't think information storing or
           | retrieval is necessarily the most important achievements here
           | in the first place. It's the emergent abilities that you
           | wouldn't have expected to occur.
        
       | tptacek wrote:
       | I agree with Halvar about all of this, but would want to call out
       | that his "matmul interleaved with nonlinearities" is reductive
       | --- a frontier model is a higher-order thing that that, a network
       | of those matmul+nonlinearity chains, iterated.
        
       | wetpaws wrote:
       | How to write a long article and not say anything of substance.
        
       | ants_everywhere wrote:
       | > I am baffled that the AI discussions seem to never move away
       | from treating a function to generate sequences of words as
       | something that resembles a human.
       | 
       | This is such a bizarre take.
       | 
       | The relation associating each human to the list of all words they
       | will ever say is obviously a function.
       | 
       | > almost magical human-like powers to something that - in my mind
       | - is just MatMul with interspersed nonlinearities.
       | 
       | There's a rich family of universal approximation theorems [0].
       | Combining layers of linear maps with nonlinear cutoffs can
       | intuitively approximate any nonlinear function in ways that can
       | be made rigorous.
       | 
       | The reason LLMs are big now is that transformers and large
       | amounts of data made it economical to compute a family of
       | reasonably good approximations.
       | 
       | > The following is uncomfortably philosophical, but: In my
       | worldview, humans are dramatically different things than a
       | function . For hundreds of millions of years, nature generated
       | new versions, and only a small number of these versions survived.
       | 
       | This is just a way of generating certain kinds of functions.
       | 
       | Think of it this way: do you believe there's anything about
       | humans that exists outside the mathematical laws of physics? If
       | so that's essentially a religious position (or more literally, a
       | belief in the supernatural). If not, then functions and
       | approximations to functions are what the human experience boils
       | down to.
       | 
       | [0]
       | https://en.wikipedia.org/wiki/Universal_approximation_theore...
        
         | LeifCarrotson wrote:
         | > I am baffled that the AI discussions seem to never move away
         | from treating a function to generate sequences of words as
         | something that resembles a human.
         | 
         | You appear to be disagreeing with the author and others who
         | suggest that there's some element of human consciousness that's
         | beyond than what's observable from the outside, whether due to
         | religion or philosophy or whatever, and suggesting that they
         | just _not do that._
         | 
         | In my experience, that's not a particularly effective tactic.
         | 
         | Rather, we can make progress by assuming their predicate: Sure,
         | it's a room that translates Chinese into English without
         | understanding, yes, it's a function that generates sequences of
         | words that's not a human... but you and I are not "it" and it
         | behaves rather an awful lot like a thing that understands
         | Chinese or like a human using words. If we simply
         | anthropomorphize the thing, acknowledging that this is
         | technically incorrect, we can get a lot closer to predicting
         | the behavior of the system and making effective use of it.
         | 
         | Conversely, when speaking with such a person about the nature
         | of humans, we'll have to agree to dismiss the elements that are
         | different from a function. The author says:
         | 
         | > In my worldview, humans are dramatically different things
         | than a function... In contrast to an LLM, given a human and a
         | sequence of words, I cannot begin putting a probability on
         | "will this human generate this sequence".
         | 
         | Sure you can! If you address an American crowd of a certain age
         | range with "We've got to hold on to what we've got. It doesn't
         | make a difference if..." I'd give a very high probability that
         | someone will answer "... we make it or not". Maybe that human
         | has a unique understanding of the nature of that particular
         | piece of pop culture artwork, maybe it makes them feel things
         | that an LLM cannot feel in a part of their consciousness that
         | an LLM does not possess. But for the purposes of the question,
         | we're merely concerned with whether a human or LLM will
         | generate a particular sequence of words.
        
           | ants_everywhere wrote:
           | I see your point, and I like that you're thinking about this
           | from the perspective of how to win hearts and minds.
           | 
           | I agree my approach is unlikely to win over the author or
           | other skeptics. But after years of seeing scientists waste
           | time trying to debate creationists and climate deniers I've
           | kind of given up on trying to convince the skeptics. So I was
           | talking more to HN in general.
           | 
           | > You appear to be disagreeing with the author and others who
           | suggest that there's some element of human consciousness
           | that's beyond than what's observable from the outside
           | 
           | I'm not sure what it means to be observable or not from the
           | outside. I think this is at least partially because I don't
           | know what it means to be inside either. My point was just
           | that whatever consciousness is, it takes place in the
           | physical world and the laws of physics apply to it. I mean
           | that to be as weak a claim as possible: I'm not taking any
           | position on what consciousness is or how it works etc.
           | 
           | Searle's Chinese room argument attacks attacks a particular
           | theory about the mind based essentially turing machines or
           | digital computers. This theory was popular when I was in grad
           | school for psychology. Among other things, people holding the
           | view that Searle was attacking didn't believe that non-
           | symbolic computers like neural networks could be intelligent
           | or even learn language. I thought this was total nonsense, so
           | I side with Searle in my opposition to it. I'm not sure how I
           | feel about the Chinese room argument in particular, though.
           | For one thing it entirely depends on what it means to
           | "understand" something, and I'm skeptical that humans ever
           | "understand" anything.
           | 
           | > If we simply anthropomorphize the thing, acknowledging that
           | this is technically incorrect, we can get a lot closer to
           | predicting the behavior of the system and making effective
           | use of it.
           | 
           | I see what you're saying: that a technically incorrect
           | assumption can bring to bear tools that improve our analysis.
           | My nitpick here is I agree with OP that we shouldn't
           | anthropomorphize LLMs, any more than we should
           | anthropomorphize dogs or cats. But OP's arguments weren't
           | actually about anthropomorphizing IMO, they were about things
           | like functions that are more fundamental than humans. I think
           | artificial intelligence will be non-human intelligence just
           | like we have many examples of non-human intelligence in
           | animals. No attribution of human characteristics needed.
           | 
           | > If we simply anthropomorphize the thing, acknowledging that
           | this is technically incorrect, we can get a lot closer to
           | predicting the behavior of the system and making effective
           | use of it.
           | 
           | Yes I agree with you about your lyrics example. But again
           | here I think OP is incorrect to focus on the token generation
           | argument. We all agree human speech generates tokens.
           | Hopefully we all agree that token generation is not
           | completely predictable. Therefore it's by definition a
           | randomized algorithm and it needs to take an RNG. So pointing
           | out that it takes an RNG is not a valid criticism of LLMs.
           | 
           | Unless one is a super-determinist then there's randomness at
           | the most basic level of physics. And you should expect that
           | any physical process we don't understand well yet (like
           | consciousness or speech) likely involves randomness. If one
           | *is* a super-determinist then there is no randomness, even in
           | LLMs and so the whole point is moot.
        
           | seadan83 wrote:
           | >> given a human and a sequence of words, I cannot begin
           | putting a probability on "will this human generate this
           | sequence".
           | 
           | > Sure you can! If you address an American crowd of a certain
           | age range with "We've got to hold on to what we've got. It
           | doesn't make a difference if..." I'd give a very high
           | probability that someone will answer "... we make it or not".
           | 
           | I think you may have this flipped compared to what the author
           | intended. I believe the author is not talking about the
           | probability of an output given an input, but the probability
           | of a given output across all inputs.
           | 
           | Note that the paragraph starts with "In my worldview, humans
           | are dramatically different things than a function, (R^n)^c ->
           | (R^n)^c". To compute a probability of a given output, (which
           | is a any given element in "(R^n)^n"), we can count how many
           | mappings there are total and then how many of those mappings
           | yield the given element.
           | 
           | The point I believe is to illustrate the complexity of inputs
           | for humans. Namely for humans the input space is even more
           | complex than "(R^n)^c".
           | 
           | In your example, we can compute how many input phrases into a
           | LLM would produce the output "make it or not". We can than
           | compute that ratio to all possible input phrases. Because
           | "(R^n)^c)" is finite and countable, we can compute this
           | probability.
           | 
           | For a human, how do you even start to assess the probability
           | that a human would ever say "make it or not?" How do you even
           | begin to define the inputs that a human uses, let alone
           | enumerate them? Per the author, "We understand essentially
           | nothing about it." In other words, the way humans create
           | their outputs is (currently) incomparably complex compared to
           | a LLM, hence the critique of the anthropomorphization.
        
         | cuttothechase wrote:
         | >Think of it this way: do you believe there's anything about
         | humans that exists outside the mathematical laws of physics? If
         | so that's essentially a religious position (or more literally,
         | a belief in the supernatural). If not, then functions and
         | approximations to functions are what the human experience boils
         | down to.
         | 
         | It seems like, we can at best, claim that we have modeled the
         | human thought process for reasoning/analytic/quantitative
         | through Linear Algebra, as the best case. Why should we expect
         | the model to be anything more than a _model_ ?
         | 
         | I understand that there is tons of vested interest, many
         | industries, careers and lives literally on the line causing
         | heavy bias to get to AGI. But what I don't understand is what
         | about linear algebra that makes it so special that it creates a
         | fully functioning life or aspects of a life?
         | 
         | Should we make an argument saying that Schroedinger's cat
         | experiment can potentially create zombies then the underlying
         | Applied probabilistic solutions should be treated as super-
         | human and build guardrails against it building zombie cats?
        
           | ants_everywhere wrote:
           | > It seems like, we can at best, claim that we have modeled
           | the human thought process for reasoning/analytic/quantitative
           | through Linear Algebra....I don't understand is what about
           | linear algebra that makes it so special that it creates a
           | fully functioning life or aspects of a life?
           | 
           | Not linear algebra. Artificial neural networks create
           | arbitrarily non-linear functions. That's the point of non-
           | linear activation functions and it's the subject of the
           | universal approximation theorems I mentioned above.
        
             | cuttothechase wrote:
             | ANNs are just mathematical transformations, powered by
             | linear algebra + non-linear functions. They simulate
             | certain cognitive processes -- but they are fundamentally
             | math, not magic.
        
               | delusional wrote:
               | I wouldn't say they "simulate cognitive processes". They
               | do statistics. Advanced multivariadic statistics.
               | 
               | An LLM thinks in the same way excel thinks when you ask
               | it to fit a curve.
        
               | ImHereToVote wrote:
               | Who invoked magic in this thread exactly?
        
               | ants_everywhere wrote:
               | I think the point of mine that you're missing (or perhaps
               | disagreeing with implicitly) is that *everything* is
               | fundamentally math. Or, if you like, everything is
               | fundamentally physics, and physics is fundamentally math.
               | 
               | So classes of functions (ANNs) that can approximate our
               | desired function to arbitrary precision are what we
               | should be expecting to be working with.
        
           | hackinthebochs wrote:
           | >Why should we expect the model to be anything more than a
           | model ?
           | 
           | To model a process with perfect accuracy requires recovering
           | the dynamics of that process. The question we must ask is
           | what happens in the space between bad statistical model and
           | perfect accuracy? What happens when the model begins to
           | converge towards accurate reproduction. How far does
           | generalization in the model take us towards capturing the
           | dynamics involved in thought?
        
         | xtal_freq wrote:
         | Not that this is your main point, but I find this take
         | representative, "do you believe there's anything about humans
         | that exists outside the mathematical laws of physics?"There are
         | things "about humans", or at least things that our words
         | denote, that are outside physic's explanatory scope. For
         | example, the experience of the colour red cannot be known, as
         | an experience, by a person who only sees black and white. This
         | is the case no matter what empirical propositions, or
         | explanatory system, they understand.
        
           | concats wrote:
           | Perhaps. But I can't see a reason why they couldn't still
           | write endless--and theoretically valuable--poems,
           | dissertations, or blog posts, about all things red and the
           | nature of redness itself. I imagine it would certainly take
           | some studying for them, likely interviewing red-seers, or
           | reading books about all things red. But I'm sure they could
           | contribute to the larger red discourse eventually, their
           | unique perspective might even help them draw conclusions the
           | rest of us are blind to.
           | 
           | So perhaps the fact that they "cannot know red" is ultimately
           | irrelevant for an LLM too?
        
           | ants_everywhere wrote:
           | This idea is called qualia [0] for those unfamiliar.
           | 
           | I don't have any opinion on the qualia debates honestly. I
           | suppose I don't know what it feels like for an ant to find a
           | tasty bit of sugar syrup, but I believe it's something that
           | can be described with physics (and by extension, things like
           | chemistry).
           | 
           | But we do know some things about some qualia. Like we know
           | how red light works, we have a good idea about how
           | photoreceptors work, etc. We know some people are red-green
           | colorblind, so their experience of red and green are mushed
           | together. We can also have people make qualia judgments and
           | watch their brains with fMRI or other tools.
           | 
           | I think maybe an interesting question here is: obviously it's
           | pleasurable to animals to have their reward centers
           | activated. Is it pleasurable or desirable for AIs to be
           | rewarded? Especially if we tell them (as some prompters do)
           | that they feel pleasure if they do things well and pain if
           | they don't? You can ask this sort of question for both the
           | current generation of AIs and future generations.
           | 
           | [0] https://en.wikipedia.org/wiki/Qualia
        
         | suddenlybananas wrote:
         | >There's a rich family of universal approximation theorems
         | 
         | Wow, look-up tables can get increasingly good at approximating
         | a function!
        
           | ants_everywhere wrote:
           | A function is by definition a lookup table.
           | 
           | The lookup table is just (x, f(x)).
           | 
           | So, yes, trivially if you could construct the lookup table
           | for f then you'd approximate f. But to construct it you have
           | to know f. And to approximate it you need to know f at a
           | dense set of points.
        
       | low_tech_punk wrote:
       | The anthropomorphic view of LLM is a much better representation
       | and compression for most types of discussions and communication.
       | A purely mathematical view is accurate but it isn't productive
       | for the purpose of the general public's discourse.
       | 
       | I'm thinking a legal systems analogy, at the risk of a lossy
       | domain transfer: the laws are not written as lambda calculus.
       | Why?
       | 
       | And generalizing to social science and humanities, the goal
       | shouldn't be finding the quantitative truth, but instead
       | understand the social phenomenon using a consensual "language" as
       | determined by the society. And in that case, the anthropomorphic
       | description of the LLM may gain validity and effectiveness as the
       | adoption grows over time.
        
         | cmiles74 wrote:
         | Strong disagree here, the average person coming away with ideas
         | that only vaguely intersect with the reality.
        
         | andyferris wrote:
         | I've personally described the "stochastic parrot" model to
         | laypeople who were worried about AI and they came away much
         | more relaxed about it doing something "malicious". They seemed
         | to understand the difference between "trained at roleplay" and
         | "consciousness".
         | 
         | I don't think we need to simplify it to the point of
         | considering it sentient to get the public to interact with it
         | successfully. It causes way more problems than it solves.
        
           | SpicyLemonZest wrote:
           | Am I misunderstanding what you mean by "malicious"? It sounds
           | like the stochastic parrot model wrongly convinced these
           | laypeople you were talking to that they don't need to worry
           | about LLMs doing bad things. That's definitely been my
           | experience - the people who tell me the most about stochastic
           | parrots are the same ones who tell me that it's absurd to
           | worry about AI-powered disinformation or AI-powered scams.
        
       | Kim_Bruning wrote:
       | Has anyone asked an actual Ethologist or Neurophysiologist what
       | _they_ think?
       | 
       | People keep debating like the only two options are "it's a
       | machine" or "it's a human being", while in fact the majority of
       | intelligent entities on earth are neither.
        
         | szvsw wrote:
         | Yeah, I think I'm with you if you ultimately mean to say
         | something like this:
         | 
         | "the labels are meaningless... we just have collections of
         | complex systems that demonstrate various behaviors and
         | properties, some in common with other systems, some behaviors
         | that are unique to that system, sometimes through common
         | mechanistic explanations with other systems, sometimes through
         | wildly different mechanistic explanations, but regardless they
         | seem to demonstrate x/y/z, and it's useful to ask, why, how,
         | and what the implications are of it appearing to demonstrating
         | those properties, with both an eye towards viewing it
         | independently of its mechanism and in light of its mechanism."
        
         | seadan83 wrote:
         | FWIW, in another part of this thread I quoted a paper that
         | summed up what Neurophysiologists think:
         | 
         | > Author's note: Despite a century of anatomical,
         | physiological, and molecular biological efforts scientists do
         | not know how neurons by their collective interactions produce
         | percepts, thoughts, memories, and behavior. Scientists do not
         | know and have no theories explaining how brains and central
         | nervous systems work. [1]
         | 
         | That lack of understanding I believe is a major part of the
         | author's point.
         | 
         | [1] "How far neuroscience is from understanding brains" -
         | https://pmc.ncbi.nlm.nih.gov/articles/PMC10585277/#abstract1
        
       | kazinator wrote:
       | > _LLMs solve a large number of problems that could previously
       | not be solved algorithmically. NLP (as the field was a few years
       | ago) has largely been solved._
       | 
       | That is utter bullshit.
       | 
       | It's not solved until you specify exactly what is being solved
       | and show that the solution implements what is specified.
        
       | djoldman wrote:
       | Let's skip to the punchline. Using TFA's analogy: essentially
       | folks are saying not that this is a set of dice rolling around
       | making words. It's a set of dice rolling around where someone
       | attaches those dice to the real world where if the dice land on
       | 21, the system kills a chicken, or a lot worse.
       | 
       | Yes it's just a word generator. But then folks attach the word
       | generator to tools where it can invoke the use of tools by saying
       | the tool name.
       | 
       | So if the LLM says "I'll do some bash" then it does some bash.
       | It's explicitly linked to program execution that, if it's set up
       | correctly, can physically affect the world.
        
         | 3cats-in-a-coat wrote:
         | Given our entire civilization is built on words, all of it,
         | it's shocking how poorly most of us understand their importance
         | and power.
        
         | degun wrote:
         | This was the same idea that crossed my mind while reading the
         | article. It seems far too naive to think that because LLMs have
         | no will of their own, there will be no harmful consequences on
         | the real world. This is exactly where ethics comes to play.
        
       | coolKid721 wrote:
       | Anthropomorphizing LLMs is just because half the stock market
       | gains are dependent on it, we have absurd levels of debt we will
       | either have to have insane growth out of or default, and every
       | company and "person" is trying to hype everyone up to get access
       | to all of this liquidity being thrown into it.
       | 
       | I agree with the author, but people acting like they are
       | conscious or humans isn't weird to me, it's just fraud and liars.
       | Most people basically have 0 understanding of what technology or
       | minds are philosophically so it's an easy sale, and I do think
       | most of these fraudsters also likely buy into it themselves
       | because of that.
       | 
       | The really sad thing is people think "because someone runs an ai
       | company" they are somehow an authority on philosophy of mind
       | which lets them fall for this marketing. The stuff these people
       | say about this stuff is absolute garbage, not that I disagree
       | with them, but it betrays a total lack of curiosity or interest
       | in the subject of what llms are, and the possible impacts of
       | technological shifts as those that might occur with llms becoming
       | more widespread. It's not a matter of agreement it's a matter of
       | them simply not seeming to be aware of the most basic ideas of
       | what things are, technology is, it's manner of impacting society
       | etc.
       | 
       | I'm not surprised by that though, it's absurd to think because
       | someone runs some AI lab or has a "head of safety/ethics" or
       | whatever garbage job title at an AI lab they actually have even
       | the slightest interest in ethics or any even basic familiarity
       | with the major works in the subject.
       | 
       | The author is correct if people want to read a standard essay
       | articulating it more in depth check out
       | https://philosophy.as.uky.edu/sites/default/files/Is%20the%2...
       | (the full extrapolation requires establishing what things are and
       | how causality in general operates and how that relates to
       | artifacts/technology but that's obvious quite a bit to get into).
       | 
       | The other note would be something sharing an external trait means
       | absolutely nothing about causality and suggesting a thing is
       | caused by the same thing "even to a way lesser degree" because
       | they share a resemblance is just a non-sequitur. It's not a
       | serious thought/argument.
       | 
       | I think I addressed the why of why this weirdness comes up
       | though. The entire economy is basically dependent on huge
       | productivity growth to keep functioning so everyone is trying to
       | sell they can offer that and AI is the clearest route, AGI most
       | of all.
        
       | TheDudeMan wrote:
       | If "LLMs" includes reasoning models, then you're already wrong in
       | your first paragraph:
       | 
       | "something that is just MatMul with interspersed nonlinearities."
        
       | Culonavirus wrote:
       | > A fair number of current AI luminaries have self-selected by
       | their belief that they might be the ones getting to AGI
       | 
       | People in the industry, especially higher up, are making absolute
       | bank, and it's their job to say that they're "a few years away"
       | from AGI, regardless of if they actually believe it or not. If
       | everyone was like "yep, we're gonna squeeze maybe 10-15% more
       | benchie juice out of this good ole transformer thingy and then
       | we'll have to come up with something else", I don't think that
       | would go very well with investors/shareholders...
        
       | fenomas wrote:
       | > The moment that people ascribe properties such as
       | "consciousness" or "ethics" or "values" or "morals" to these
       | learnt mappings is where I tend to get lost.
       | 
       | TFA really ought to have linked to some concrete examples of what
       | it's disagreeing with - when I see arguments about this in
       | practice, it's usually just people talking past each other.
       | 
       | Like, person A says "the model wants to X, but it knows Y is
       | wrong, so it prefers Z", or such. And person B interprets that as
       | ascribing consciousness or values to the model, when the speaker
       | meant it no differently from saying "water wants to go downhill"
       | - i.e. a way of describing externally visible behaviors, but
       | without saying "behaves as if.." over and over.
       | 
       | And then in practice, an unproductive argument usually follows -
       | where B is thinking "I am going to Educate this poor fool about
       | the Theory of Mind", and A is thinking "I'm trying to talk about
       | submarines; why is this guy trying to get me to argue about
       | whether they swim?"
        
       | fastball wrote:
       | "Don't anthropomorphize token predictors" is a reasonable take
       | assuming you have demonstrated that humans are _not_ in fact just
       | SOTA token predictors. But AFAIK that hasn 't been demonstrated.
       | 
       | Until we have a much more sophisticated understanding of human
       | intelligence and consciousness, any claim of "these aren't like
       | us" is either premature or spurious.
        
         | krackers wrote:
         | Every time this discussion comes up, I'm reminded of this
         | tongue-in-cheek paper.
         | 
         | https://ai.vixra.org/pdf/2506.0065v1.pdf
        
           | lostmsu wrote:
           | I expected to find the link to
           | https://arxiv.org/abs/1703.10987 (which is much better imo)
        
       | Veedrac wrote:
       | The author plot the input/output on a graph, intuited (largely
       | incorrectly, because that's not how sufficiently large state
       | spaces look) that the output was vaguely pretty, and then... I
       | mean that's it, they just said they have a plot of the space it
       | operates on therefore it's silly to ascribe interesting features
       | to the way it works.
       | 
       | And look, it's fine, they prefer words of a certain valence,
       | particularly ones with the right negative connotations, I prefer
       | other words with other valences. None of this means the concerns
       | don't matter. Natural selection on human pathogens isn't anything
       | particularly like human intelligence and it's still very
       | effective at selecting outcomes that we don't want against our
       | attempts to change that, as an incidental outcome of its
       | optimization pressures. I think it's very important we don't
       | build highly capable systems that select for outcomes we don't
       | want and will do so against our attempts to change it.
        
       | BrenBarn wrote:
       | > In contrast to an LLM, given a human and a sequence of words, I
       | cannot begin putting a probability on "will this human generate
       | this sequence".
       | 
       | I think that's a bit pessimistic. I think we can say for instance
       | that the probability that a person will say "the the the of of of
       | arpeggio halcyon" is tiny compared to the probability that they
       | will say "I haven't been getting that much sleep lately". And we
       | can similarly see that lots of other sequences are going to have
       | infinitesimally low probability. Now, yeah, we can't say exactly
       | what probability that is, but even just using a fairly sizable
       | corpus as a baseline you could probably get a surprisingly decent
       | estimate, given how much of what people say is formulaic.
       | 
       | The real difference seems to be that the manner in which humans
       | generate sequences is more intertwined with other aspects of
       | reality. For instance, the probability of a certain human saying
       | "I haven't been getting that much sleep lately" is connected to
       | how much sleep they have been getting lately. For an LLM it
       | really isn't connected to anything except word sequences in its
       | input.
       | 
       | I think this is consistent with the author's point that we
       | shouldn't apply concepts like ethics or emotions to LLMs. But
       | it's not because we don't know how to predict what sequences of
       | words humans will use; it's rather because we _do_ know a little
       | about how to do that, and part of what we know is that it is
       | connected with other dimensions of physical reality,  "human
       | nature", etc.
       | 
       | This is one reason I think people underestimate the risks of AI:
       | the performance of LLMs lulls us into a sense that they "respond
       | like humans", but in fact the Venn diagram of human and LLM
       | behavior only intersects in a relatively small area, and in
       | particular they have very different failure modes.
        
       | elliotto wrote:
       | To claim that LLMs do not experience consciousness requires a
       | model of how consciousness works. The author has not presented a
       | model, and instead relied on emotive language leaning on the
       | absurdity of the claim. I would say that any model one presents
       | of consciousness often comes off as just as absurd as the claim
       | that LLMs experience it. It's a great exercise to sit down and
       | write out your own perspective on how consciousness works, to
       | feel out where the holes are.
       | 
       | The author also claims that a function (R^n)^c -> (R^n)^c is
       | dramatically different to the human experience of consciousness.
       | Yet the author's text I am reading, and any information they can
       | communicate to me, exists entirely in (R^n)^c.
        
         | shevis wrote:
         | > requires a model of how consciousness works.
         | 
         | Not necessarily an entire model, just a single defining
         | characteristic that can serve as a falsifying example.
         | 
         | > any information they can communicate to me, exists entirely
         | in (R^n)^c
         | 
         | Also no. This is just a result of the digital medium we are
         | currently communicating over. Merely standing in the same room
         | as them would communicate information outside (R^n)^c.
        
         | seadan83 wrote:
         | I believe the author is rather drawing this distinction:
         | 
         | LLMs: (R^n)^c -> (R^n)^c
         | 
         | Humans: [set of potentially many and complicated inputs that we
         | effectively do not understand at all] -> (R^n)^c
         | 
         | The point is that the model of how consciousness works is
         | unknown. Thus the author would not present such a model, it is
         | the point.
        
         | quonn wrote:
         | > To claim that LLMs do not experience consciousness requires a
         | model of how consciousness works.
         | 
         | Nope. What can be asserted without evidence can also be
         | dismissed without evidence. Hitchens's razor.
         | 
         | You know you have consciousness (by the very definition that
         | you can observe it in yourself) and that's evidence. Because
         | other humans are genetically and in every other way identical,
         | you can infer it for them as well. Because mammals are very
         | similar many people (but not everyone) infers it for them as
         | well. There is zero evidence for LLMs and their _very_
         | construction suggests that they are like a calculator or like
         | Excel or like any other piece of software no matter how smart
         | they may be or how many tasks they can do in the future.
         | 
         | Additionally I am really surprised by how many people here
         | confuse consciousness with intelligence. Have you never paused
         | for a second in your life to "just be". Done any meditation? Or
         | even just existed at least for a few seconds without a train of
         | thought? It is very obvious that language and consciousness are
         | completely unrelated and there is no need for language and I
         | doubt there is even a need for intelligence to be conscious.
         | 
         | Consider this:
         | 
         | In the end an LLM could be executed (slowly) on a CPU that
         | accepts very basic _discrete_ instructions, such as ADD and
         | MOV. We know this for a fact. Those instructions can be
         | executed arbitrarily slowly. There is no reason whatsoever to
         | suppose that it should feel like anything to be the CPU to say
         | nothing of how it would subjectively feel to be a MOV
         | instruction. It's ridiculous. It's unscientific. It's like
         | believing that there's a spirit in the tree you see outside,
         | just because - why not? - why wouldn't there be a spirit in the
         | tree?
        
         | tdullien wrote:
         | Author here. What's the difference, in your perception, between
         | an LLM and a large-scale meteorological simulation, if there is
         | any?
         | 
         | If you're willing to ascribe the possibility of consciousness
         | to any complex-enough computation of a recurrence equation (and
         | hence to something like ... "earth"), I'm willing to agree that
         | under that definition LLMs might be conscious. :)
        
       | kelseyfrog wrote:
       | Dear author, you can just assume that people are
       | fauxthropomorphizing LLMs without any loss of generality. Perhaps
       | it will allow you to sleep better at night. You're welcome.
        
       | rockskon wrote:
       | The people in this thread incredulous at the assertion that they
       | are not God and haven't invented machine life are exasperating.
       | At this point I am convinced they, more often than not,
       | financially benefit from their near religious position in
       | marketing AI as akin to human intelligence.
        
         | refulgentis wrote:
         | I am ready and waiting for you to share these comments that are
         | incredulous at the assertion they are not God, lol.
        
         | orbital-decay wrote:
         | Are we looking at the same thread? I see nobody claiming this.
         | Anthropic does sometimes, their position is clearly wishful
         | thinking, and it's not represented ITT.
         | 
         | Try looking at this from another perspective - many people
         | simply do not see human intelligence (or life, for that matter)
         | as magic. I see nothing religious about that, rather the
         | opposite.
        
           | seadan83 wrote:
           | I agree with you @orbital-decay that I also do not get the
           | same vibe reading this thread.
           | 
           | Though, while human intelligence is (seemingly) not magic, it
           | is very far from being understood. The idea that a LLM is
           | comparable to human intelligence implies that we even
           | understand human intelligence well enough to say that.
        
             | ImHereToVote wrote:
             | LLMs are also not understood. I mean we built and trained
             | them. But don't of the abilities at still surprising to
             | researchers. We have yet to map these machines.
        
       | zxcb1 wrote:
       | LLMs are complex irreducible systems; hence there are emergent
       | properties that arise at different scales
        
       | dr_dshiv wrote:
       | Which is a more useful mental model for the user?
       | 
       | 1. It's a neural network predicting the next token
       | 
       | 2. It's like a person
       | 
       | 3. It's like a magical genie
       | 
       | I lean towards 3.
        
       | Al-Khwarizmi wrote:
       | I have the technical knowledge to know how LLMs work, but I still
       | find it pointless to _not_ anthropomorphize, at least to an
       | extent.
       | 
       | The language of "generator that stochastically produces the next
       | word" is just not very useful when you're talking about, e.g., an
       | LLM that is answering complex world modeling questions or
       | generating a creative story. It's at the wrong level of
       | abstraction, just as if you were discussing an UI events API and
       | you were talking about zeros and ones, or voltages in
       | transistors. Technically fine but totally useless to reach any
       | conclusion about the high-level system.
       | 
       | We need a higher abstraction level to talk about higher level
       | phenomena in LLMs as well, and the problem is that we have no
       | idea what happens internally at those higher abstraction levels.
       | So, considering that LLMs somehow imitate humans (at least in
       | terms of output), anthropomorphization is the best abstraction we
       | have, hence people naturally resort to it when discussing what
       | LLMs can do.
        
         | grey-area wrote:
         | On the contrary, anthropomorphism IMO is the main problem with
         | narratives around LLMs - people are genuinely talking about
         | them thinking and reasoning when they are doing nothing of that
         | sort (actively encouraged by the companies selling them) and it
         | is completely distorting discussions on their use and
         | perceptions of their utility.
        
           | cmenge wrote:
           | I kinda agree with both of you. It might be a required
           | abstraction, but it's a leaky one.
           | 
           | Long before LLMs, I would talk about classes / functions /
           | modules like "it then does this, decides the epsilon is too
           | low, chops it up and adds it to the list".
           | 
           | The difference I guess it was only to a technical crowd and
           | nobody would mistake this for anything it wasn't. Everybody
           | know that "it" didn't "decide" anything.
           | 
           | With AI being so mainstream and the math being much more
           | elusive than a simple if..then I guess it's just too easy to
           | take this simple speaking convention at face value.
           | 
           | EDIT: some clarifications / wording
        
             | flir wrote:
             | Agreeing with you, this is a "can a submarine swim" problem
             | IMO. We need a new word for what LLMs are doing. Calling it
             | "thinking" is stretching the word to breaking point, but
             | "selecting the next word based on a complex statistical
             | model" doesn't begin to capture what they're capable of.
             | 
             | Maybe it's cog-nition (emphasis on the cog).
        
               | whilenot-dev wrote:
               | "predirence" -> prediction meets inference and it sounds
               | a bit like preference
        
               | psychoslave wrote:
               | Except -ence is a regular morph, and you would rather
               | suffix it to predict(at)-.
               | 
               | And prediction is already an hyponym of inference. Why
               | not just use inference then?
        
               | whilenot-dev wrote:
               | I didn't think of _prediction_ in the statistical sense
               | here, but rather as a prophecy based on a vision,
               | something that is inherently stored in a model without
               | the knowledge of the modelers. I don 't want to imply any
               | magic or something supernatural here, it's just the juice
               | that goes off the rails sometimes, and it gets overlooked
               | due to the sheer quantity of the weights. Something like
               | unknown bugs in production, but, because they still just
               | represent a valid number in some computation that
               | wouldn't cause any panic, these few bits can show a
               | useful pattern under the right circumstances.
               | 
               |  _Inference_ would be the part that is deliberately
               | learned and drawn from conclusions based on the training
               | set, like in the  "classic" sense of statistical
               | learning.
        
               | LeonardoTolstoy wrote:
               | What does a submarine do? Submarine? I suppose you
               | "drive" a submarine which is getting to the idea:
               | submarines don't swim because ultimately they are
               | "driven"? I guess the issue is we don't make up a new
               | word for what submarines do, we just don't use human
               | words.
               | 
               | I think the above poster gets a little distracted by
               | suggesting the models are creative which itself is
               | disputed. Perhaps a better term, like above, would be to
               | just use "model". They are models after all. We don't
               | make up a new portmanteau for submarines. They float, or
               | drive, or submarine around.
               | 
               | So maybe an LLM doesn't "write" a poem, but instead
               | "models a poem" which maybe indeed take away a little of
               | the sketchy magic and fake humanness they tend to be
               | imbued with.
        
               | FeepingCreature wrote:
               | Humans certainly model inputs. This is just using an
               | awkward word and then making a point that it feels
               | awkward.
        
               | flir wrote:
               | I really like that, I think it has the right amount of
               | distance. They don't write, they model writing.
               | 
               | We're very used to "all models are wrong, some are
               | useful", "the map is not the territory", etc.
        
               | galangalalgol wrote:
               | No one was as bothered when we anthropomorphized crud
               | apps simply for the purpose of conversing about "them".
               | "Ack! The thing is corrupting tables again because it
               | thinks we are still using api v3! Who approved that last
               | MR?!" The fact that people are bothered by the same
               | language now is indicative in itself. If you want to
               | maintain distance, pre prompt models to structure all
               | conversations to lack pronouns as between a non sentient
               | language model and a non sentient agi. You can have the
               | model call you out for referring to the model as
               | existing. The language style that forces is interesting,
               | and potentially more productive except that there are
               | fewer conversations formed like that in the training
               | dataset. Translation being a core function of language
               | models makes it less important thought. As for confusing
               | the map for the territory, that is precisely what
               | philosophers like Metzinger say humans are doing by
               | considering "self" to be a real thing and that they are
               | conscious when they are just using the reasoning shortcut
               | of narrating the meta model to be the model.
        
               | flir wrote:
               | > You can have the model call you out for referring to
               | the model as existing.
               | 
               | This tickled me. "There ain't nobody here but us
               | chickens".
               | 
               | I have other thoughts which are not quite crystalized,
               | but I think UX might be having an outsized effect here.
        
               | galangalalgol wrote:
               | In addition to he/she etc. there is a need for a button
               | for no pronouns. "Stop confusing metacognition for
               | conscious experience or qualia!" doesn't fit well. The UX
               | for these models is extremely malleable. The responses
               | are misleading mostly to the extent the prompts were
               | already misled. The sorts of responses that arise from
               | ignorant prompts are those found within the training data
               | in the context of ignorant questions. This tends to make
               | them ignorant as well. There are absolutely stupid
               | questions.
        
               | irthomasthomas wrote:
               | Depends on if you are talking _about_ an llm or _to_ the
               | llm. Talking _to_ the llm, it would not understand that
               | "model a poem" means to write a poem. Well, it will
               | probably guess right in this case, but if you go out of
               | band too much it won't understand you. The hard problem
               | today is rewriting out of band tasks to be in band, and
               | that requires anthropomorphizing.
        
               | dcookie wrote:
               | > it won't understand you
               | 
               | Oops.
        
               | irthomasthomas wrote:
               | That's consistent with my distinction when talking
               | _about_ them vs _too_ them.
        
               | thinkmassive wrote:
               | GenAI _generates_ output
        
               | jorvi wrote:
               | A submarine is propelled by a propellor and helmed by a
               | controller (usually a human).
               | 
               | It would be swimming if it was propelled by drag (well,
               | technically a propellor also uses drag via thrust, but
               | you get the point). Imagine a submarine with a fish tail.
               | 
               | Likewise we can probably find an apt description in our
               | current vocabulary to fittingly describe what LLMs do.
        
               | j0057 wrote:
               | A submarine is a boat and boats sail.
        
               | TimTheTinker wrote:
               | An LLM is a stochastic generative model and stochastic
               | generative models ... generate?
        
               | LeonardoTolstoy wrote:
               | And we are there. A boat sails, and a submarine sails. A
               | model generates makes perfect sense to me. And saying
               | chatgpt generated a poem feels correct personally. Indeed
               | a model (e.g. a linear regression) generates predictions
               | for the most part.
        
               | psychoslave wrote:
               | It does some kind of automatic inference (AI), and that's
               | it.
        
               | JimDabell wrote:
               | > this is a "can a submarine swim" problem IMO. We need a
               | new word for what LLMs are doing.
               | 
               | Why?
               | 
               | A plane is not a fly and does not stay aloft like a fly,
               | yet we describe what it does as flying despite the fact
               | that it does not flap its wings. What are the downsides
               | we encounter that are caused by using the word "fly" to
               | describe a plane travelling through the air?
        
               | flir wrote:
               | I was riffing on that famous Dijkstra quote.
        
               | dotancohen wrote:
               | For what it's worth, in my language the motion of birds
               | and the motion of aircraft _are_ two different words.
        
               | Tijdreiziger wrote:
               | Flying isn't named after flies, they both come from the
               | same root.
               | 
               | https://www.etymonline.com/search?q=fly
        
               | lelanthran wrote:
               | > A plane is not a fly and does not stay aloft like a
               | fly, yet we describe what it does as flying despite the
               | fact that it does not flap its wings.
               | 
               | Flying doesn't mean flapping, and the word has a long
               | history of being used to describe inanimate objects
               | moving through the air.
               | 
               | "A rock flies through the window, shattering it and
               | spilling shards everywhere" - see?
               | 
               | OTOH, we have never used to word "swim" in the same way -
               | "The rock hit the surface and swam to the bottom" is
               | _wrong!_
        
               | intended wrote:
               | It will help significantly, to realize that the only
               | thinking happening is when the human looks at the output
               | and attempts to verify if it is congruent with reality.
               | 
               | The rest of the time it's generating content.
        
               | Atlas667 wrote:
               | A machine that can imitate the products of thought is not
               | the same as thinking.
               | 
               | All imitations _require_ analogous mechanisms, but that
               | is the extent of their similarities, in syntax. Thinking
               | requires networks of billions of neurons, and then, not
               | only that, but words can never exist on a plane because
               | they do not belong to a plane. Words can only be stored
               | on a plane, they are not useful on a plane.
               | 
               | Because of this LLMs have the potential to discover new
               | aspects and implications of language that will be rarely
               | useful to us because language is not useful within a
               | computer, it is useful in the world.
               | 
               | Its like seeing loosely related patterns in a picture and
               | keep derivating on those patterns that are real, but
               | loosely related.
               | 
               | LLMs are not intelligence but its fine that we use that
               | word to describe them.
        
               | delusional wrote:
               | > "selecting the next word based on a complex statistical
               | model" doesn't begin to capture what they're capable of.
               | 
               | I personally find that description perfect. If you want
               | it shorter you could say that an LLM generates.
        
               | ryeats wrote:
               | It's more like muscle memory than cognition. So maybe
               | procedural memory but that isn't catchy.
        
               | 01HNNWZ0MV43FF wrote:
               | They certainly do act like a thing which has a very
               | strong "System 1" but no "System 2" (per Thinking, Fast
               | And Slow)
        
             | loxs wrote:
             | We can argue all day what "think" means and whether a LLM
             | thinks (probably not IMO), but at least in my head the
             | threshold for "decide" is much lower so I can perfectly
             | accept that a LLM (or even a class) "decides". I don't have
             | a conflict about that. Yeah, it might not be a decision in
             | the human sense, but it's a decision in the mathematical
             | sense so I have always meant "decide" literally when I was
             | talking about a piece of code.
             | 
             | It's much more interesting when we are talking about...
             | say... an ant... Does it "decide"? That I have no idea as
             | it's probably somewhere in between, neither a sentient
             | decision, nor a mathematical one.
        
               | 0x457 wrote:
               | Well, it outputs a chain of thoughts that later used to
               | produce better prediction. It produces a chain of
               | thoughts similar to how one would do thinking about a
               | problem out loud. It's more verbose that what you would
               | do, but you always have some ambient context that LLM
               | lacks.
        
             | stoneyhrm1 wrote:
             | I mean you can boil anything down to it's building blocks
             | and make it seem like it didn't 'decide' anything. When you
             | as a human decide something, your brain and it's neurons
             | just made some connections with an output signal sent to
             | other parts that resulting in your body 'doing' something.
             | 
             | I don't think LLMs are sentient or any bullshit like that,
             | but I do think people are too quick to write them off
             | before really thinking about how a nn 'knows things'
             | similar to how a human 'knows' things, it is trained and
             | reacts to inputs and outputs. The body is just far more
             | complex.
        
               | grey-area wrote:
               | I wasn't talking about knowing (they clearly encode
               | knowledge), I was talking about thinking/reasoning, which
               | is something LLMs do not in fact do IMO.
               | 
               | These are very different and knowledge is not
               | intelligence.
        
             | HelloUsername wrote:
             | > EDIT: some clarifications / wording
             | 
             | This made me think, when will we see LLMs do the same;
             | rereading what they just sent, and editing and correcting
             | their output again :P
        
           | Al-Khwarizmi wrote:
           | I think it's worth distinguishing between the use of
           | anthropomorphism as a useful abstraction and the misuse by
           | companies to fuel AI hype.
           | 
           | For example, I think "chain of thought" is a good name for
           | what it denotes. It makes the concept easy to understand and
           | discuss, and a non-antropomorphized name would be unnatural
           | and unnecessarily complicate things. This doesn't mean that I
           | support companies insisting that LLMs think just like humans
           | or anything like that.
           | 
           | By the way, I would say actually anti-anthropomorphism has
           | been a bigger problem for understanding LLMs than
           | anthropomorphism itself. The main proponents of anti-
           | anthropomorphism (e.g. Bender and the rest of "stochastic
           | parrot" and related paper authors) came up with a lot of
           | predictions about things that LLMs surely couldn't do (on
           | account of just being predictors of the next word, etc.)
           | which turned out to be spectacularly wrong.
        
             | whilenot-dev wrote:
             | I don't know about others, but I much prefer if some
             | reductionist tries to conclude what's technically feasible
             | and is proven wrong _over time_ , than somebody yelling
             | holistic analogies a la "it's sentient, it's intelligent,
             | it thinks like us humans" for the sole dogmatic reason of
             | being a futurist.
             | 
             | Tbh I also think your comparison that puts "UI events ->
             | Bits -> Transistor Voltages" as analogy to "AI thinks ->
             | token de-/encoding + MatMul" is certainly a stretch, as the
             | part about "Bits -> Transistor Voltages" applies to both
             | hierarchies as the foundational layer.
             | 
             | "chain of thought" could probably be called "progressive
             | on-track-inference" and nobody would roll an eye.
        
           | amelius wrote:
           | I don't agree. Most LLMs have been trained on human data, so
           | it is best to talk about these models in a human way.
        
             | 4ndrewl wrote:
             | Even the verb 'trained' is contentious wrt
             | anthropomorphism.
        
               | amelius wrote:
               | Somewhat true but rodents can also be trained ...
        
               | 4ndrewl wrote:
               | Rodents aren't functions though?
        
               | FeepingCreature wrote:
               | Every computable system, even stateful systems, can be
               | reformulated as a function.
               | 
               | If IO can be functional, I don't see why mice can't.
        
               | psychoslave wrote:
               | Well, that's a strong claim of equivalence between
               | computationable models and realty.
               | 
               | The consensual view is rather that no map is matching
               | fully the territory, or said otherwise the territory
               | includes ontological components that exceeds even the
               | most sophisticated map that can be ever built.
        
               | FeepingCreature wrote:
               | I believe the consensus view is that physics is
               | computable.
        
               | 4ndrewl wrote:
               | Thanks. I think the original point about the word
               | 'trained' being contentious still stands, as evidenced by
               | this thread :)
        
               | tempfile wrote:
               | So you think a rodent _is_ a function?
        
               | FeepingCreature wrote:
               | I think that I am a function.
        
             | tliltocatl wrote:
             | Anthropomorphising implicitly assumes motivation, goals and
             | values. That's what the core of anthropomorphism is -
             | attempting to explain behavior of a complex system in
             | teleological terms. And prompt escapes make it clear LLMs
             | doesn't have any teleological agency yet. Whenever their
             | course of action is, it is to easy to steer them of. Try to
             | do it with a sufficiently motivated human.
        
               | psychoslave wrote:
               | >. Try to do it with a sufficiently motivated human.
               | 
               | That's what they call marketing, propaganda or brain
               | washing, acculturation , education depending on who you
               | ask and at which scale you operate, apparently.
        
               | tliltocatl wrote:
               | > sufficiently motivated
               | 
               | None of these targets sufficiently motivated, rather
               | those who are either ambivalent or yet unexposed.
        
               | criddell wrote:
               | How will you know when an AI has teleological agency?
        
               | tliltocatl wrote:
               | Prompt escapes will be much harder, and some of them will
               | end up in an equivalent of "sure here is... no, wait...
               | You know what, I'm not doing that", i. e. slipping and
               | then getting back on track.
        
           | fenomas wrote:
           | When I see these debates it's always the other way around -
           | one person speaks colloquially about an LLM's behavior, and
           | then somebody else jumps on them for supposedly believing the
           | model is conscious, just because the speaker said "the model
           | thinks.." or "the model knows.." or whatever.
           | 
           | To be honest the impression I've gotten is that some people
           | are just very interested in talking about not
           | anthropomorphizing AI, and less interested in talking about
           | AI behaviors, so they see conversations about the latter as a
           | chance to talk about the former.
        
             | latexr wrote:
             | Respectfully, that is a reflection of the places you hang
             | out in (like HN) and not the reality of the population.
             | 
             | Outside the technical world it gets much worse. There are
             | people who killed themselves because of LLMs, people who
             | are in love with them, people who genuinely believe they
             | have "awakened" their own private ChatGPT instance into AGI
             | and are eschewing the real humans in their lives.
        
               | fenomas wrote:
               | Naturally I'm aware of those things, but I don't think
               | TFA or GGP were commenting on them so I wasn't either.
        
               | Xss3 wrote:
               | The other day a good friend of mine with mental health
               | issues remarked that "his" chatgpt understands him better
               | than most of his friends and gives him better advice than
               | his therapist.
               | 
               | It's going to take a lot to get him out of that mindset
               | and frankly I'm dreading trying to compare and contrast
               | imperfect human behaviour and friendships with a
               | sycophantic AI.
        
               | bonoboTP wrote:
               | It's surprisingly common on reddit that people talk about
               | "my chatgpt", and they don't always seem like the type
               | who are "in a relationship" with the bot or unlocking the
               | secrets of the cosmos with it, but still they write "my
               | chatgpt" and "your chatgpt". I guess the custom prompt
               | and the available context does customize the model for
               | them in some sense, but I suspect they likely have a
               | wrong mental model of how this customization works. I
               | guess they imagine it as their own little model being
               | stored on file at OpenAI and as they interact with it,
               | it's being shaped by it, and each time they connect,
               | their model is retrieved from the cloud storage and they
               | connect to it or something.
        
               | lelanthran wrote:
               | > The other day a good friend of mine with mental health
               | issues remarked that "his" chatgpt understands him better
               | than most of his friends and gives him better advice than
               | his therapist.
               | 
               | The therapist thing might be correct, though. You can
               | send a well-adjusted person to three renowned therapists
               | and get three different reasons for why they need to
               | continue sessions.
               | 
               | No therapist _ever_ says _" Congratulations, you're
               | perfectly normal. Now go away and come back when you have
               | a real problem."_ Statistically it is vanishingly
               | unlikely that _every_ person who ever visited a therapist
               | is in need of a second (more more) visit.
               | 
               | The main problem with therapy is a lack of
               | objectivity[1]. When people talk about what their
               | sessions resulted in, it's always _" My problem is that
               | I'm too perfect"_. I've known actual bullies whose
               | therapist apparently told them that they are too
               | submissive and need to be more assertive.
               | 
               | The secondary problem is that all diagnosis is based on
               | self-reported metrics of the subject. All improvement is
               | equally based on self-reported metrics. This is no
               | different from prayer.
               | 
               | You don't have a medical practice there; you've got an
               | Imam and a sophisticated but still medically-insured way
               | to plead with thunderstorms[2]. I fail to see how an LLM
               | (or even the Rogerian a-x doctor in Emacs) will do worse
               | on average.
               | 
               | After all, if you're at a therapist and you're doing most
               | of the talking, how would an LLM perform worse than the
               | therapist?
               | 
               | ----------------
               | 
               | [1] If I'm at a therapist, and they're asking me to do
               | most of the talking, I would damn well feel that I am not
               | getting my moneys worth. I'd be there primarily to learn
               | (and practice a little) whatever tools they can teach me
               | to handle my $PROBLEM. I don't want someone to vent at, I
               | want to learn coping mechanisms and mitigation
               | strategies.
               | 
               | [2] This is not an obscure reference.
        
             | positron26 wrote:
             | Most certainly the conversation is extremely political.
             | There are not simply different points of view. There are
             | competitive, gladiatorial opinions ready to ambush anyone
             | not wearing the right colors. It's a situation where the
             | technical conversation is drowning.
             | 
             | I suppose this war will be fought until people are out of
             | energy, and if reason has no place, it is reasonable to let
             | others tire themselves out reiterating statements that are
             | not designed to bring anyone closer to the truth.
        
               | bonoboTP wrote:
               | If this tech is going to be half as impactful as its
               | proponents predict, then I'd say it's still under-
               | politicized. Of course the politics around it doesn't
               | have to be knee-jerk mudslinging, but it's no surprise
               | that politics enters the picture when the tech can
               | significantly transform society.
        
             | scarface_74 wrote:
             | Wait until a conversation about "serverless" comes up and
             | someone says there is no such thing because there are
             | servers somewhere as if everyone - especially on HN
             | -doesn't already know that.
        
               | Tijdreiziger wrote:
               | Why would everyone know that? Not everyone has experience
               | in sysops, especially not beginners.
               | 
               | E.g. when I first started learning webdev, I didn't think
               | about 'servers'. I just knew that if I uploaded my
               | HTML/PHP files to my shared web host, then they appeared
               | online.
               | 
               | It was only much later that I realized that shared
               | webhosting is 'just' an abstraction over Linux/Apache
               | (after all, I first had to learn about those topics).
        
               | scarface_74 wrote:
               | I am saying that most people who come on HN and say
               | "there is no such thing as serverless and there are
               | servers somewhere" think they are sounding smart when
               | they are adding nothing to the conversation.
               | 
               | I'm sure you knew that your code was running on computers
               | somewhere even when you first started and wasn't running
               | in a literal "cloud".
               | 
               | It's about as tiring as people on HN who know just a
               | little about LLMs thinking they are sounding smart when
               | they say they are just advanced autocomplete. Both
               | responses are just as unproductive
        
               | Tijdreiziger wrote:
               | > I'm sure you knew that your code was running on
               | computers somewhere even when you first started and
               | wasn't running in a literal "cloud".
               | 
               | Meh, I just knew that the browser would display HTML if I
               | wrote it, and that uploading the HTML files made them
               | available on my domain. I didn't really think about
               | _where_ the files went, specifically.
               | 
               | Try asking an average high school kid how cloud storage
               | works. I doubt you'll get any further than 'I make files
               | on my Google Docs and then they are saved there'. This is
               | one step short of 'well, the files must be on some system
               | in some data center'.
               | 
               | I really disagree that "people who come on HN and say
               | "there is no such thing as serverless and there are
               | servers somewhere" think they are sounding smart when
               | they are adding nothing to the conversation." On the
               | contrary, it's an invitation to beginning coders to think
               | about _what_ the 'serverless' abstraction actually means.
        
               | godelski wrote:
               | I think they fumbled with wording but I interpreted them
               | as meaning "audience of HN" and it seems they confirmed.
               | 
               | We always are speaking to our audience, right? This is
               | also what makes more general/open discussions difficult
               | (e.g. talking on Twitter/Facebook/etc). That there are
               | many ways to interpret anything depending on prior
               | knowledge, cultural biases, etc. But I think it is fair
               | that on HN we can make an assumption that people here are
               | tech savvy and knowledgeable. We'll definitely overstep
               | and understep at times, but shouldn't we also cultivate a
               | culture where it is okay to ask and okay to apologize for
               | making too much of an assumption?
               | 
               | I mean at the end of the day we got to make some
               | assumptions, right? If we assume zero operating knowledge
               | then comments are going to get pretty massive and
               | frankly, not be good at communicating with a niche even
               | if better at communicating with a general audience. But
               | should HN be a place for general people? I think no. I
               | think it should be a place for people interested in
               | computers and programming.
        
             | Wowfunhappy wrote:
             | As I write this, Claude Code is currently opening and
             | closing various media files on my computer. Sometimes it
             | plays the file for a few seconds before closing it,
             | sometimes it starts playback and then seeks to a different
             | position, sometimes it fast forwards or rewinds, etc.
             | 
             | I asked Claude to write a E-AC3 audio component so I can
             | play videos with E-AC3 audio in the old version of
             | QuickTime I really like using. Claude's decoder includes
             | the ability to write debug output to a log file, so Claude
             | is studying how QuickTime and the component interact, and
             | it's controlling QuickTime via Applescript.
             | 
             | Sometimes QuickTime crashes, because this ancient API has
             | its roots in the classic Mac OS days and is not exactly
             | good. Claude reads the crash logs on its own--it knows
             | where they are--and continues on its way. I'm just sitting
             | back and trying to do other things while Claude works,
             | although it's a little distracting that _something_ else is
             | using my computer at the same time.
             | 
             | I _really_ don 't want to anthropomorphize these programs,
             | but it's just so _hard_ when it 's acting so much like a
             | person...
        
               | godelski wrote:
               | Would it help you to know that trial and error is a
               | common tactic by machines? Yes, humans do it too, but
               | that doesn't mean the process isn't mechanical. In fact,
               | in computing we might call this a "brute force" approach.
               | You don't have to cover the entire search space to brute
               | force something, and it certainly doesn't mean you can't
               | have optimization strategies and need to grid search
               | (e.g. you can use Bayesian methods, multi-armed bandit
               | approaches, or a whole world of things).
               | 
               | I would call "fuck around and find out" a rather simple
               | approach. It is why we use it! It is why lots of animals
               | use it. Even very dumb animals use it. Though, we do
               | notice more intelligent animals use more efficient
               | optimization methods. All of this is technically
               | hypothesis testing. Even a naive grid search. But that is
               | still in the class of "fuck around and find out" or
               | "brute force", right?
               | 
               | I should also mention two important things.
               | 
               | 1) as a human we are biased to anthropomorphize. We see
               | faces in clouds. We tell stories of mighty beings
               | controlling the world in an effort to explain why things
               | happen. This is anthropomorphization of the universe
               | itself!
               | 
               | 2) We design LLMs (and many other large ML systems) to
               | optimize towards human preference. This reinforces an
               | anthropomorphized interpretation.
               | 
               | The reason for doing this (2) is based on a naive
               | assumption[0]: If it looks like a duck, swims like a
               | duck, and quacks like a duck, then it * _probably*_ is a
               | duck. But the duck test doesn 't rule out a highly
               | sophisticated animatronic. It's a good rule of thumb, but
               | wouldn't it also be incredibly naive to assume that it *
               | _is*_ a duck? Isn 't the duck test itself entirely
               | dependent on our own personal familiarity with ducks? I
               | think this is important to remember and can help combat
               | our own propensity for creating biases.
               | 
               | [0] It is not a bad strategy to build in that direction.
               | When faced with many possible ways to go, this is a very
               | reasonable approach. The naive part is if you assume that
               | it will take you all the way to making a duck. It is also
               | a perilous approach because you are explicitly making it
               | harder for you to evaluate. It is, in the fullest sense
               | of the phrase, "metric hacking."
        
               | Wowfunhappy wrote:
               | It wasn't a simple brute force. When Claude was working
               | this morning, it was pretty clearly only playing a file
               | when it actually needed to see packets get decoded,
               | otherwise it would simply open and close the document.
               | Similarly, it would only seek or fast forward when it was
               | debugging specific issues related to those actions. And
               | it even "knew" which test files to open for specific
               | channel layouts.
               | 
               | Yes this is still mechanical in a sense, but then I'm not
               | sure what behavior you _wouldn 't_ classify as
               | mechanical. It's "responding" to stimuli in logical ways.
               | 
               | But I also don't quite know where I'm going with this. I
               | don't think LLMs are sentient or something, I know
               | they're just math. But it's _spooky_.
        
           | stoneyhrm1 wrote:
           | I thought this too but then began to think about it from the
           | perspective of the programmers trying to make it imitate
           | human learning. That's what a nn is trying to do at the end
           | of the day, and in the same way I train myself by reading
           | problems and solutions, or learning vocab at a young age, it
           | does so by tuning billions of parameters.
           | 
           | I think these models do learn similarly. What does it even
           | mean to reason? Your brain knows certain things so it comes
           | to certain conclusions, but it only knows those things
           | because it was ''trained'' on those things.
           | 
           | I reason my car will crash if I go 120 mph on the other side
           | of the road because previously I have 'seen' where the input
           | is a car going 120mph has a high probability of producing a
           | crash, and similarly have seen input where the car is going
           | on the other side of the road, producing a crash. Combining
           | the two would tell me it's a high probability.
        
           | losvedir wrote:
           | Well "reasoning" refers to Chain-of-Thought and if you look
           | at the generated prompts it's not hard to see why it's called
           | that.
           | 
           | That said, it's fascinating to me that it works (and
           | empirically, it does work; a reasoning model generating tens
           | of thousands of tokens while working out the problem does
           | produce better results). I wish I knew why. A priori I
           | wouldn't have expected it, since there's no new input. That
           | means it's all "in there" in the weights already. I don't see
           | why it couldn't just one shot it without all the reasoning.
           | And maybe the future will bring us more distilled models that
           | can do that, or they can tease out all that reasoning with
           | more generated training data, to move it from dispersed
           | around the weights -> prompt -> more immediately accessible
           | in the weights. But for now "reasoning" works.
           | 
           | But then, at the back of my mind is the easy answer: maybe
           | you can't optimize it. Maybe the model has to "reason" to
           | "organize its thoughts" and get the best results. After all,
           | if you give _me_ a complicated problem I 'll write down
           | hypotheses and outline approaches and double check results
           | for consistency and all that. But now we're getting
           | dangerously close to the "anthropomorphization" that this
           | article is lamenting.
        
             | sdenton4 wrote:
             | CoT gives the model more time to think and process the
             | inputs it has. To give an extreme example, suppose you are
             | using next token prediction to answer 'Is P==NP?' The tiny
             | number of input tokens means that there's a tiny amount of
             | compute to dedicate to producing an answer. A scratchpad
             | allows us to break free of the short-inputs problem.
             | 
             | Meanwhile, things can happen in the latent representation
             | which aren't reflected in the intermediate outputs. You
             | could, instead of using CoT, say "Write a recipe for a
             | vegetarian chile, along with a lengthy biographical story
             | relating to the recipe. Afterwards, I will ask you again
             | about my original question." And the latents can still help
             | model the primary problem, yielding a better answer than
             | you would have gotten with the short input alone.
             | 
             | Along these lines, I believe there are chain of thought
             | studies which find that the content of the intermediate
             | outputs don't actually matter all that much...
        
             | shakadak wrote:
             | > I don't see why it couldn't just one shot it without all
             | the reasoning.
             | 
             | That's reminding me of deep neural networks where single
             | layer networks could achieve the same results, but the
             | layer would have to be excessively large. Maybe we're re-
             | using the same kind of improvement, scaling in length
             | instead of width because of our computation limitations ?
        
             | variadix wrote:
             | Using more tokens = more compute to use for a given
             | problem. I think most of the benefit of CoT has more to do
             | with autoregressive models being unable to "think ahead"
             | and revise their output, and less to do with actual
             | reasoning. The fact that an LLM can have incorrect
             | reasoning in its CoT and still produce the right answer, or
             | that it can "lie" in its CoT to avoid being detected as
             | cheating on RL tasks, makes me believe that the semantic
             | content of CoT is an illusion, and that the improved
             | performance is from being able to explore and revise in
             | some internal space using more compute before producing a
             | final output.
        
             | Terr_ wrote:
             | I like this mental-model, which rests heavily on the "be
             | careful not to anthropomorphize" approach:
             | 
             | It was already common to use a document extender (LLM)
             | against a hidden document, which resembles a movie or
             | theater play where a character named User is interrogating
             | a character named Bot.
             | 
             | Chain-of-thought switches the movie/script style to _film
             | noir_ , where the [Detective] Bot character has additional
             | content which is not actually "spoken" at the User
             | character. The extra words in the script add a certain kind
             | of metaphorical inertia.
        
           | bunderbunder wrote:
           | "All models are wrong, but some models are useful," is the
           | principle I have been using to decide when to go with an
           | anthropomorphic explanation.
           | 
           | In other words, no, they never accurately describe what the
           | LLM is actually doing. But sometimes drawing an analogy to
           | human behavior is the most effective way to pump others'
           | intuition about a particular LLM behavior. The trick is
           | making sure that your audience understands that this is just
           | an analogy, and that it has its limitations.
           | 
           | And it's not _completely_ wrong. Mimicking human behavior is
           | exactly what they 're designed to do. You just need to keep
           | reminding people that it's only doing so in a very
           | superficial and spotty way. There's absolutely no basis for
           | assuming that what's happening on the inside is the same.
        
             | Veen wrote:
             | Some models are useful in some contexts but wrong enough to
             | be harmful in others.
        
               | bunderbunder wrote:
               | _All_ models are useful in some contexts but wrong enough
               | to be harmful in others.
               | 
               | Relatedly, the alternative to pragmatism is analysis
               | paralysis.
        
           | bakuninsbart wrote:
           | > people are genuinely talking about them thinking and
           | reasoning when they are doing nothing of that sort
           | 
           | With such strong wording, it should be rather easy to explain
           | how our thinking differs from what LLMs do. The next step -
           | showing that what LLMs do _precludes_ any kind of sentience
           | is probably much harder.
        
           | ordu wrote:
           | _> On the contrary, anthropomorphism IMO is the main problem
           | with narratives around LLMs_
           | 
           | I hold a deep belief that anthropomorphism is a way the human
           | mind words. If we take for granted the hypothesis of Franz de
           | Waal, that human mind developed its capabilities due to
           | political games, and then think about how it could later lead
           | to solving engineering and technological problems, then the
           | tendency of people to anthropomorphize becomes obvious.
           | Political games need empathy or maybe some other kind of
           | -pathy, that allows politicians to guess motives of others
           | looking at their behaviors. Political games directed the
           | evolution to develop mental instruments to uncover causality
           | by watching at others and interacting with them. Now, to
           | apply these instruments to inanimate world all you need is to
           | anthropomorphize inanimate objects.
           | 
           | Of course, it leads sometimes to the invention of gods, or
           | spirits, or other imaginary intelligences behinds things. And
           | sometimes these entities get in the way of revealing the real
           | causes of events. But I believe that to anthropomorphize LLMs
           | (at the current stage of their development) is not just the
           | natural thing for people but a good thing as well. Some
           | behavior of LLMs is easily described in terms of psychology;
           | some cannot be described or at least not so easy. People are
           | seeking ways to do it. Projecting this process into the
           | future, I can imagine how there will be a kind of consensual
           | LLMs "theory" that explains some traits of LLMs in terms of
           | human psychology and fails to explain other traits, so they
           | are explained in some other terms... And then a revolution
           | happens, when a few bright minds come and say that
           | "anthropomorphism is bad, it cannot explain LLM" and they
           | propose something different.
           | 
           | I'm sure it will happen at some point in the future, but not
           | right now. And it will happen not like that: not just because
           | someone said that anthropomorphism is bad, but because they
           | proposed another way to talk about reasons behind LLMs
           | behavior. It is like with scientific theories: they do not
           | fail because they become obviously wrong, but because other,
           | better theories replace them.
           | 
           | It doesn't mean, that there is no point to fight
           | anthropomorphism right now, but this fight should be directed
           | at searching for new ways to talk about LLMs, not to show at
           | the deficiencies of anthropomorphism. To my mind it makes
           | sense to start not with deficiencies of anthropomorphism but
           | with its successes. What traits of LLMs it allows us to
           | capture, which ideas about LLMs are impossible to wrap into
           | words without thinking of LLMs as of people?
        
           | marviel wrote:
           | how do you account for the success of reasoning models?
           | 
           | I agree these things don't think like we do, and that they
           | have weird gaps, but to claim they can't reason at all
           | doesn't feel grounded.
        
           | godelski wrote:
           | Serendipitous name...
           | 
           | In part I agree with the parent.                 >> it
           | pointless to *not* anthropomorphize, at least to an extent.
           | 
           | I agree that it is pointless to _not_ anthropomorphize
           | because we are humans and we will automatically do this.
           | Willingly or unwillingly.
           | 
           | On the other hand, it generates bias. This bias can lead to
           | errors.
           | 
           | So the real answer is (imo) that it is fine to
           | anthropomorphise but recognize that while doing so can
           | provide utility and help us understand, it is _WRONG_.
           | Recognizing that it is not right and cannot be right provides
           | us with a constant reminder to reevaluate. Use it, but double
           | check, and keep checking making sure you understand the
           | limitations of _the analogy_. Understanding when and where it
           | applies, where it doesn 't, and most importantly, where you
           | don't know if it does or does not. The last is most important
           | because it helps us form hypotheses that are likely to be
           | testable (likely, not always. Also, much easier said than
           | done).
           | 
           | So I pick a "grey area". Anthropomorphization is a tool that
           | can be helpful. But like any tool, it isn't universal. There
           | is no "one-size-fits-all" tool. Literally, one of the most
           | important things for any scientist is to become an expert at
           | the tools you use. It's one of the most critical skills of *
           | _any expert*_. So while I agree with you that we should be
           | careful of anthropomorphization, I disagree that it is
           | useless and can never provide information. But I do agree
           | that quite frequently, the wrong tool is used for the right
           | job. Sometimes, hacking it just isn 't good enough.
        
           | UncleOxidant wrote:
           | It's not just distorting discussions it's leading people to
           | put a lot of faith in what LLMs are telling them. Was just on
           | a zoom an hour ago where a guy working on a startup asked
           | ChatGPT about his idea and then emailed us the result for
           | discussion in the meeting. ChatGPT basically just told him
           | what he wanted to hear - essentially that his idea was great
           | and it would be successful ("if you implement it correctly"
           | was doing a lot of work). It was a glowing endorsement of the
           | idea that made the guy think that he must have a million
           | dollar idea. I had to be "that guy" who said that maybe
           | ChatGPT was telling him what he wanted to hear based on the
           | way the question was formulated - tried to be very diplomatic
           | about it and maybe I was a bit too diplomatic because it
           | didn't shake his faith in what ChatGPT had told him.
        
             | TimTheTinker wrote:
             | LLMs speak in a human-like voice, often bypassing our
             | natural trust guards that are normally present when
             | speaking with other people or interacting with our
             | environment. (The "uncanny valley" reaction or the ability
             | to recognize something as non-living are two examples of
             | trust guards.)
             | 
             | When we write a message and are given a coherent,
             | contextually appropriate response, our brains tend to
             | engage relationally and extend some level of trust--at a
             | minimum, an unconscious functional belief that an agent on
             | the other end is responding with their real thoughts--even
             | when we know better.
             | 
             | That's what has me most worried about the effect of LLMs on
             | society. They directly exploit a human trust vuln. When
             | attempting to engage them in any form of conversation, AI
             | systems ought to at minimum warn us that these are not
             | anyone's real thoughts.
        
         | raincole wrote:
         | I've said that before: we have been anthropomorphizing
         | computers since the dawn of information age.
         | 
         | - Read and write - Behaviors that separate humans from animals.
         | Now used for input and output.
         | 
         | - Server and client - Human social roles. Now used to describe
         | network architecture.
         | 
         | - Editor - Human occupation. Now a kind of software.
         | 
         | - Computer - Human occupation!
         | 
         | And I'm sure people referred their cars and ships as 'her'
         | before the invention of computers.
        
           | latexr wrote:
           | You are conflating anthropomorphism with personification.
           | They are not the same thing. No one believes their guitar or
           | car or boat is alive and sentient when they give it a name or
           | talk to or about it.
           | 
           | https://www.masterclass.com/articles/anthropomorphism-vs-
           | per...
        
             | raincole wrote:
             | But the author used "anthropomorphism" the same way as I
             | did. I guess we both mean "personification" then.
             | 
             | > we talk about "behaviors", "ethical constraints", and
             | "harmful actions in pursuit of their goals". All of these
             | are anthropocentric concepts that - in my mind - do not
             | apply to functions or other mathematical objects.
             | 
             | One talking about a program's "behaviors", "actions" or
             | "goals" doesn't mean they believe the program is sentient.
             | Only "ethical constraints" is suspiciously
             | anthropomorphizing.
        
               | latexr wrote:
               | > One talking about a program's "behaviors", "actions" or
               | "goals" doesn't mean they believe the program is
               | sentient.
               | 
               | Except that is exactly what we're seeing with LLMs.
               | People believing exactly that.
        
               | raincole wrote:
               | Perhaps a few mentally unhinged people do.
               | 
               | A bit of anecdote: last year I hung out with a bunch of
               | old classmates that I hadn't seen for quite a while. None
               | of them works in tech.
               | 
               | Surprisingly to me, all of them have ChatGPT installed on
               | their phones.
               | 
               | And unsurprisingly to me, none of them treated it like an
               | actual intelligence. That makes me wonder where those who
               | think ChatGPT is sentient come from.
               | 
               | (It's a bit worrisome that several of them thought it
               | worked "like Google search and Google translation
               | combined", even by the time ChatGPT couldn't do web
               | search...!)
        
               | latexr wrote:
               | > Perhaps a few mentally unhinged people do.
               | 
               | I think it's more than a few and it's still rising, and
               | therein lies the issue.
               | 
               | Which is why it is paramount to talk about this _now_ ,
               | when we may still turn the tide. LLMs can be useful, but
               | it's important to have the right mental model,
               | understanding, expectations, and attitude towards them.
        
               | jibal wrote:
               | > Perhaps a few mentally unhinged people do.
               | 
               | This is a No True Scotsman fallacy. And it's radically
               | factually wrong.
               | 
               | The rest of your comment is along the lines of the famous
               | (but apocryphal) Pauline Kael line "I can't believe Nixon
               | won. I don't know anyone who voted for him."
        
           | whilenot-dev wrote:
           | I'm not convinced... we use these terms to assign roles, yes,
           | but these roles describe a utility or assign a
           | responsibility. That isn't anthropomorphizing anything, but
           | it rather describes the usage of an inanimate object as tool
           | for us humans and seems in line with history.
           | 
           | What's the utility or the responsibility of AI, what's its
           | usage as tool? If you'd ask me it should be closer to serving
           | insights than "reasoning thoughts".
        
         | mercer wrote:
         | I get the impression after using language models for quite a
         | while that perhaps the one thing that is riskiest to
         | anthropomorphise is the conversational UI that has become the
         | default for many people.
         | 
         | A lot of the issues I'd have when 'pretending' to have a
         | conversation are much less so when I either keep things to a
         | single Q/A pairing, or at the very least heavily edit/prune the
         | conversation history. Based on my understanding of LLM's, this
         | seems to make sense even for the models that are trained for
         | conversational interfaces.
         | 
         | so, for example, an exchange with multiple messages, where at
         | the end I ask the LLM to double-check the conversation and
         | correct 'hallucinations', is less optimal than something like
         | asking for a thorough summary at the end, and then feeding that
         | into a new prompt/conversation, as the repetition of these
         | falsities, or 'building' on them with subsequent messages, is
         | more likely to make them a stronger 'presence' and as a result
         | perhaps affect the corrections.
         | 
         | I haven't tested any of this thoroughly, but at least with code
         | I've definitely noticed how a wrong piece of code can 'infect'
         | the conversation.
        
           | Xss3 wrote:
           | This. If an AI spits out incorrect code then i immediately
           | create a new chat and reprompt with additional context.
           | 
           | 'Dont use regex for this task' is a common addition for the
           | new chat. Why does AI love regex for simple string
           | operations?
        
             | naasking wrote:
             | I used to do this as well, but Gemini 2.5 has improved on
             | this quite a bit and I don't find myself needing to do it
             | as much anymore.
        
         | endymion-light wrote:
         | This is why I actually really love the description of it as a
         | "Shoggoth" - it's more abstract, slightly floaty but it
         | achieves the purpose of not treating and anthropomising it as a
         | human being while not treating LLMs as a collection of
         | predictive words.
        
         | tempfile wrote:
         | The "point" of not anthropomorphizing is to refrain from
         | judgement until a more solid abstraction appears. The problem
         | with explaining LLMs in terms of human behaviour is that, while
         | we don't clearly understand what the LLM is doing, we
         | understand human cognition even less! There is literally no
         | predictive power in the abstraction "The LLM is thinking like I
         | am thinking". It gives you no mechanism to evaluate what tasks
         | the LLM "should" be able to do.
         | 
         | Seriously, try it. Why don't LLMs get frustrated with you if
         | you ask them the same question repeatedly? A human would. Why
         | are LLMs so happy to give contradictory answers, as long as you
         | are very careful not to highlight the contradictory facts? Why
         | do earlier models behave worse on reasoning tasks than later
         | ones? These are features nobody, anywhere understands. So why
         | make the (imo phenomenally large) leap to "well, it's clearly
         | just a brain"?
         | 
         | It is like someone inventing the aeroplane and someone looks at
         | it and says "oh, it's flying, I guess it's a bird". It's not a
         | bird!
        
           | CuriousSkeptic wrote:
           | > Why don't LLMs get frustrated with you if you ask them the
           | same question repeatedly?
           | 
           | To be fair, I have had a strong sense of Gemini in particular
           | becoming a lot more frustrated with me than GPT or Claude.
           | 
           | Yesterday I had it ensuring me that it was doing a great job,
           | it was just me not understanding the challenge but it would
           | break it down step by step just to make it obvious to me
           | (only to repeat the same errors, but still)
           | 
           | I've just interpreted it as me reacting to the lower amount
           | of sycophancy for now
        
             | danielbln wrote:
             | In addition, when the boss man asks for the same thing
             | repeatedly then the underling might get frustrated as hell,
             | but they won't be telling that to the boss.
        
             | jibal wrote:
             | Point out to an LLM that it has no mental states and thus
             | isn't capable of being frustrated (or glad that your
             | program works or hoping that it will, etc. ... I call them
             | out whenever they ascribe emotions to themselves) and they
             | will confirm that ... you can coax from them quite detailed
             | explanations of why and how it's an illusion.
             | 
             | Of course they will quickly revert to self-
             | anthropomorphizing language, even after promising that they
             | won't ... because they are just pattern matchers producing
             | the sort of responses that conforms to the training data,
             | not cognitive agents capable of making or keeping promises.
             | It's an illusion.
        
               | Applejinx wrote:
               | Of course this is deeply problematic because it's a cloud
               | of HUMAN response. This is why 'they will' get frustrated
               | or creepy if you mess with them, give repeating data or
               | mind game them: literally all it has to draw on is a vast
               | library of distilled human responses and that's all the
               | LLM can produce. This is not an argument with jibal, it's
               | a 'yes and'.
               | 
               | You can tell it 'you are a machine, respond only with
               | computerlike accuracy' and that is you gaslighting the
               | cloud of probabilities and insisting it should act with a
               | personality you elicit. It'll do what it can, in that you
               | are directing it. You're prompting it. But there is
               | neither a person there, nor a superintelligent machine
               | that can draw on computerlike accuracy, because the DATA
               | doesn't have any such thing. Just because it runs on lots
               | of computers does not make it a computer, any more than
               | it's a human.
        
             | squidbeak wrote:
             | The vending machine study from a few months ago, where
             | flash 2.0 lost its mind, contacted the FBI (as far as it
             | knew) and refused to co-operate with the operator's
             | demands, seemed a lot like frustration.
        
         | psychoslave wrote:
         | LLM are as far away from your description as ASM is from the
         | underlying architecture. The anthropomorohic abstraction is as
         | nice as any metaphore which fall apart the very moment you put
         | a foot outside what it allows to shallowoly grab. But some
         | people will put far more amount to push force a confortable
         | analogy rather than admit it has some limits and to use the new
         | tool in a more relevant way you have to move away from this
         | confort zone.
        
         | adityaathalye wrote:
         | My brain refuses to join the rah-rah bandwagon because I cannot
         | _see_ them in my mind's eye. Sometimes I get jealous of people
         | like GP and OP who clearly seem to have the sight. (Being a
         | serial math exam flunker might have something to do with it.
         | :))))
         | 
         | Anyway, one does what one can.
         | 
         | (I've been trying to picture abstract visual and semi-
         | philosophical approximations which I'll avoid linking here
         | because they seem to fetch bad karma in super-duper LLM
         | enthusiast communities. But you can read them on my blog and
         | email me scathing critiques, if you wish :sweat-smile:.)
        
         | woliveirajr wrote:
         | I'd take it in reverse order: the problem isn't that it's
         | possible to have a computer that "stochastically produces the
         | next word" and can fool humans, it's why / how / when humans
         | evolved to have technological complexity when the majority (of
         | people) aren't that different from a stochastic process.
        
         | pmg101 wrote:
         | I remember Dawkins talking about the "intentional stance" when
         | discussing genes in The Selfish Gene.
         | 
         | It's flat wrong to describe genes as having any agency. However
         | it's a useful and easily understood shorthand to describe them
         | in that way rather than every time use the full formulation of
         | "organisms who tend to possess these genes tend towards these
         | behaviours."
         | 
         | Sometimes to help our brains reach a higher level of
         | abstraction, once we understand the low level of abstraction we
         | should stop talking and thinking at that level.
        
           | jibal wrote:
           | The intentional stance was Daniel Dennett's creation and a
           | major part of his life's work. There are actually (exactly)
           | three stances in his model: the physical stance, the design
           | stance, and the intentional stance.
           | 
           | https://en.wikipedia.org/wiki/Intentional_stance
           | 
           | I think the design stance is appropriate for understanding
           | and predicting LLM behavior, and the intentional stance is
           | not.
        
             | pmg101 wrote:
             | Thanks for the correction. I guess both thinkers took a
             | somewhat similar position and I somehow remembered
             | Dawkins's argument but Dennett's term. The term is
             | memorable.
             | 
             | Do you want to describe WHY you think the design stance is
             | appropriate here but the intentional stance is not?
        
         | lo_zamoyski wrote:
         | These anthropomorphizations are best described as metaphors
         | when used by people to describe LLMs in common or loose speech.
         | We already use anthropomorphic metaphors when talking about
         | computers. LLMs, like all computation, are a matter of
         | simulation; LLMs can appear to be conversing without actually
         | conversing. What distinguishes the real thing from the
         | simulation is the cause of the appearance of an effect.
         | Problems occur when people forget these words are being used
         | metaphorically, as if they were univocal.
         | 
         | Of course, LLMs are multimodal and used to simulate all sorts
         | of things, not just conversation. So there are many possible
         | metaphors we can use, and these metaphors don't necessarily
         | align with the abstractions you might use to talk about LLMs
         | accurately. This is like the difference between "synthesizes
         | text" (abstraction) and "speaks" (metaphor), or "synthesizes
         | images" (abstraction) and "paints" (metaphor). You can use
         | "speaks" or "paints" to talk about the abstractions, of course.
        
         | overfeed wrote:
         | > We need a higher abstraction level to talk about higher level
         | phenomena in LLMs as well, and the problem is that we have no
         | idea what happens internally at those higher abstraction levels
         | 
         | We _do_ know what happens at higher abstraction levels; the
         | design of efficient networks, and the steady beat of SOTA
         | improvements all depend on understanding how LLMs work
         | internally: choice of network dimensions, feature extraction,
         | attention, attention heads, caching, the peculiarities of high-
         | dimensions and avoiding overfitting are all well-understood by
         | practitioners. Anthropomorphization is only necessary in pop-
         | science articles that use a limited vocabulary.
         | 
         | IMO, there is very little mystery, but lots of deliberate
         | mysticism, especially about _future_ LLMs - the usual hype-
         | cycle extrapolation.
        
         | lawlessone wrote:
         | One thing i find i keep forgetting is that asking an LLM why it
         | makes a particular decision is almost pointless.
         | 
         | It's reply isn't actually going to be why i did a thing. It's
         | reply is going to be whatever is the most probably string of
         | words that fit as a reason.
        
         | amdivia wrote:
         | I beg to differ.
         | 
         | Anthropomorphizing might blind us to solutions to existing
         | problems. Perhaps instead of trying to come up with the correct
         | prompt for a LLM, there exists a string of words (not necessary
         | ones that make sense) that will get the LLM to a better
         | position to answer given questions.
         | 
         | When we anthropomorphize we are inherently ignore certain parts
         | of how LLMs work, and imagining parts that don't even exist
        
           | meroes wrote:
           | > there exists a string of words (not necessary ones that
           | make sense) that will get the LLM to a better position to
           | answer
           | 
           | exactly. The opposite is also true. You might supply more
           | clarifying information to the LLM, which would help any human
           | answer, but it actually degrades the LLM's output.
        
             | mvieira38 wrote:
             | This is frequently the case IME, especially with chat
             | interfaces. One or two bad messages and you derail the
             | quality
        
               | lawlessone wrote:
               | You can just throw in words to bias it towards certain
               | outcomes too. Same applies with image generators or
               | course.
        
         | aaroninsf wrote:
         | That higher level does exist, indeed a lot philosophy of mind
         | then cognitive science has been investigating exactly this
         | space and devising contested professional nomenclature and
         | modeling about such things for decades now.
         | 
         | A useful anchor concept is that of _world model_ , which is
         | what "learning Othello" and similar work seeks to tease out.
         | 
         | As someone who worked in precisely these areas for years and
         | has never stopped thinking about them,
         | 
         | I find it at turns perplexing, sigh-inducing, and enraging,
         | that the "token prediction" trope gained currency and moreover
         | that it continues to influence people's reasoning about
         | contemporary LLM, often as subtext: an unarticulated
         | fundamental model, which is fundamentally wrong in its critical
         | aspects.
         | 
         | It's not that this description of LLM is technically incorrect;
         | it's that it is profoundly _misleading_ and I'm old enough and
         | cynical enough to know full well that many of those who have
         | amplified it and continue to do so, know this very well indeed.
         | 
         | Just as the lay person fundamentally misunderstands the
         | relationship between "programming" and these models, and uses
         | slack language in argumentation, the problem with this trope
         | and the reasoning it entails is that what is unique and
         | interesting and valuable about LLM for many applications and
         | interests is _how_ they do what they do. At that level of
         | analysis there is a very real argument to be made that the
         | animal brain is also nothing more than an  "engine of
         | prediction," whether the "token" is a byte stream or neural
         | encoding is quite important but not nearly important as the
         | mechanics of the system which operates on those tokens.
         | 
         | To be direct, it is quite obvious that LLM have not only
         | vestigial world models, but also self-models; and a general
         | paradigm shift will come around this when multimodal models are
         | the norm: because those systems will share with we animals what
         | philosophers call phenomenology, a model of things as they are
         | "perceived" through the senses. And like we humans, these
         | perceptual models (terminology varies by philosopher and
         | school...) will be bound to the linguistic tokens (both heard
         | and spoken, and written) we attach to them.
         | 
         |  _Vestigial_ is a key word but an important one. It 's not that
         | contemporary LLM have human-tier minds, nor that they have
         | animal-tier world modeling: but they can only "do what they do"
         | because they have such a thing.
         | 
         | Of looming importance--something all of us here should set
         | aside time to think about--is that for most reasonable
         | contemporary theories of mind, a self-model embedded in a
         | world-model, with phenomenology and agency, is the recipe for
         | "self" and self-awareness.
         | 
         | One of the uncomfortable realities of contemporary LLM already
         | having some vestigial self-model, is that while they are
         | obviously not sentient, nor self-aware, as we are, or even
         | animals are, it is just as obvious (to me at least) that they
         | are self-aware in _some emerging sense_ and will only continue
         | to become more so.
         | 
         | Among the lines of finding/research most provocative in this
         | area is the ongoing often sensationalized accounting in system
         | cards and other reporting around two specific things about
         | contemporary models: - they demonstrate behavior pursuing self-
         | preservation - they demonstrate awareness of _when they are
         | being tested_
         | 
         | We don't--collectively or individually--yet know what these
         | things entail, but taken with the assertion that these models
         | are developing emergent self-awareness (I would say:
         | necessarily and inevitably),
         | 
         | we are facing some very serious ethical questions.
         | 
         | The language adopted by those capitalizing and capitalizing
         | _from_ these systems so far is IMO of deep concern, as it
         | betrays not just disinterest in our civilization collectively
         | benefiting from this technology, but also, that the disregard
         | for _human_ wellbeing implicit in e.g. the hostility to UBI,
         | or, Altman somehow not seeing a moral imperative to remain
         | distant from the current adminstation, implies directly a much
         | greater disregard for  "AI wellbeing."
         | 
         | That that concept is today still speculative is little comfort.
         | Those of us watching this space know well how fast things are
         | going, and don't mistake plateaus for the end of the curve.
         | 
         | I do recommend taking a step back from the line-level grind to
         | give these things some thought. They are going to shape the
         | world we live out our days in and our descendents will spend
         | all of theirs in.
        
         | jll29 wrote:
         | The details in how I talk about LLMs matter.
         | 
         | If I use human-related terminology as a shortcut, as some kind
         | of macro to talk at a higher level/more efficiently about
         | something I want to do that might be okay.
         | 
         | What is not okay is talking in a way that implies intent, for
         | example.
         | 
         | Compare:                 "The AI doesn't want to do that."
         | 
         | versus                 "The model doesn't do that with this
         | prompt and all others we tried."
         | 
         | The latter way of talking is still high-level enough but avoids
         | equating/confusing the name of a field with a sentient being.
         | 
         | Whenever I hear people saying "an AI" I suggest they replace AI
         | with "statistics" to make it obvious how problematic
         | anthropomorphisms may have become:                 *"The
         | statistics doesn't want to do that."
        
           | dmitsuki wrote:
           | The only reason that sounds weird to you is because you have
           | the experience of being human. Human behavior is not magic.
           | It's still just statistics. You go to the bathroom when you
           | have to pee not because some magical concept of
           | consciousness, but because a reciptor in your brain goes off
           | and starts the chain of making you go to the bathroom. AI's
           | are not magic, but nobody has sufficiently provided any proof
           | we are somehow special either.
        
         | TeMPOraL wrote:
         | Agreed. I'm also in favor of anthropomorphizing, because not
         | doing so confuses people about the nature and capabilities of
         | these models _even more_.
         | 
         | Whether it's hallucinations, prompt injections, various other
         | security vulnerabilities/scenarios, or problems with doing
         | math, backtracking, getting confused - there's a steady supply
         | of "problems" that some people are surprised to discover and
         | even more surprised this isn't being definitively fixed. Thing
         | is, none of that is surprising, and these things are not bugs,
         | they're flip side of the features - but to see that, one has to
         | realize that _humans demonstrate those exact same failure
         | modes_.
         | 
         | Especially when it comes to designing larger systems
         | incorporating LLM "agents", it really helps to think of them as
         | humans - because the problems those systems face are exactly
         | the same as you get with systems incorporating people, and
         | mostly for the same underlying reasons. Anthropomorphizing LLMs
         | cuts through a lot of misconceptions and false paths, and helps
         | one realize that we have millennia of experience with people-
         | centric computing systems (aka. bureaucracy) that's directly
         | transferrable.
        
       | NetRunnerSu wrote:
       | The author's critique of naive anthropomorphism is salient.
       | However, the reduction to "just MatMul" falls into the same trap
       | it seeks to avoid: it mistakes the implementation for the
       | function. A brain is also "just proteins and currents," but this
       | description offers no explanatory power.
       | 
       | The correct level of analysis is not the substrate (silicon vs.
       | wetware) but the computational principles being executed. A
       | modern sparse Transformer, for instance, is not "conscious," but
       | it is an excellent engineering approximation of two core brain
       | functions: the Global Workspace (via self-attention) and Dynamic
       | Sparsity (via MoE).
       | 
       | To dismiss these systems as incomparable to human cognition
       | because their form is different is to miss the point. We should
       | not be comparing a function to a soul, but comparing the
       | functional architectures of two different information processing
       | systems. The debate should move beyond the sterile dichotomy of
       | "human vs. machine" to a more productive discussion of "function
       | over form."
       | 
       | I elaborate on this here: https://dmf-
       | archive.github.io/docs/posts/beyond-snn-plausibl...
        
         | quonn wrote:
         | > A brain is also "just proteins and currents,"
         | 
         | This is actually not comparable, because the brain has a much
         | more complex structure that is _not_ learned, even at that
         | level. The proteins and their structure are not a result of
         | training. The fixed part for LMMs is rather trivial and is, in
         | fact, not much for than MatMul which is very easy to understand
         | - and we do. The fixed part of the brain, including the
         | structure of all the proteins is enormously complex which is
         | very difficult to understand - and we don't.
        
           | NetRunnerSu wrote:
           | The brain is trained to perform supervised and unsupervised
           | hybrid learning from the environment's uninterrupted
           | multimodal input.
           | 
           | Please do not ignore your childhood.
        
         | ACCount36 wrote:
         | "Not conscious" is a silly claim.
         | 
         | We have no agreed-upon definition of "consciousness", no
         | accepted understanding of what gives rise to "consciousness",
         | no way to measure or compare "consciousness", and no test we
         | could administer to either confirm presence of "consciousness"
         | in something or rule it out.
         | 
         | The only answer to "are LLMs conscious?" is "we don't know".
         | 
         | It helps that the whole question is rather meaningless to
         | practical AI development, which is far more concerned with
         | (measurable and comparable) system performance.
        
           | NetRunnerSu wrote:
           | Now we have.
           | 
           | https://github.com/dmf-archive/IPWT
           | 
           | https://dmf-archive.github.io/docs/posts/backpropagation-
           | as-...
           | 
           | But you're right, capital only cares about performance.
           | 
           | https://dmf-archive.github.io/docs/posts/PoIQ-v2/
        
         | quantumgarbage wrote:
         | > A modern sparse Transformer, for instance, is not
         | "conscious," but it is an excellent engineering approximation
         | of two core brain functions: the Global Workspace (via self-
         | attention) and Dynamic Sparsity (via MoE).
         | 
         | Could you suggest some literature supporting this claim? Went
         | through your blog post but couldn't find any.
        
           | NetRunnerSu wrote:
           | Sorry, I didn't have time to find the relevant references at
           | the time, so I'm attaching some now
           | 
           | https://www.frontiersin.org/journals/computational-
           | neuroscie...
           | 
           | https://arxiv.org/abs/2305.15775
        
       | orbital-decay wrote:
       | _> I am baffled by seriously intelligent people imbuing almost
       | magical human-like powers to something that - in my mind - is
       | just MatMul with interspersed nonlinearities._
       | 
       | I am baffled by seriously intelligent people imbuing almost
       | magical powers that can never be replicated to to something that
       | - in my mind - is just a biological robot driven by a SNN with a
       | bunch of hardwired stuff. Let alone attributing "human
       | intelligence" to a single individual, when it's clearly
       | distributed between biological evolution, social processes, and
       | individuals.
       | 
       |  _> something that - in my mind - is just MatMul with
       | interspersed nonlinearities_
       | 
       | Processes in all huge models (not necessarily LLMs) can be
       | described using very different formalisms, just like Newtonian
       | and Lagrangian mechanics describe the same stuff in physics. You
       | can say that an autoregressive model is a stochastic parrot that
       | learned the input distribution, next token predictor, or that it
       | does progressive pathfinding in a hugely multidimensional space,
       | or pattern matching, or implicit planning, or, or, or... All of
       | these definitions are true, but only some are useful to predict
       | their behavior.
       | 
       | Given all that, I see absolutely no problem with
       | anthropomorphizing an LLM to a certain degree, if it makes it
       | easier to convey the meaning, and do not understand the
       | nitpicking. Yeah, it's not an exact copy of a single Homo Sapiens
       | specimen. Who cares.
        
       | petesergeant wrote:
       | > We are speaking about a big recurrence equation that produces a
       | new word
       | 
       | It's not clear that this isn't also how I produce words, though,
       | which gets to heart of the same thing. The author sort of
       | acknowledges this in the first few sentences, and then doesn't
       | really manage to address it.
        
       | dtj1123 wrote:
       | It's possible to construct a similar description of whatever it
       | is that human brain is doing that clearly fails to capture the
       | fact that we're conscious. If you take a cross section of every
       | nerve feeding into the human brain at a given time T, the action
       | potentials across those cross sections can be embedded in R^n. If
       | you take the history of those action potentials across the
       | lifetime of the brain, you get a path through R^n that is
       | continuous, and maps roughly onto your subjectively experienced
       | personal history, since your brain neccesarily builds your
       | experienced reality from this signal data moment to moment. If
       | you then take the cross sections of every nerve feeding OUT of
       | your brain at time T, you have another set of action potentials
       | that can be embedded in R^m which partially determines the state
       | of the R^n embedding at time T + delta. This is not meaningfully
       | different from the higher dimensional game of snake described in
       | the article, more or less reducing the experience of being a
       | human to 'next nerve impulse prediction', but it obviously fails
       | to capture the significance of the computation which determines
       | what that next output should be.
        
         | bravesoul2 wrote:
         | Brain probably isn't modelled as real but as natural or
         | rational numbers. This is my suspicion. The reals just hold too
         | much information.
        
           | dtj1123 wrote:
           | Inclined to agree, but most thermal physics uses the reals as
           | they're simpler to work with, so I think they're ok here for
           | the purpose of argument.
        
         | Voloskaya wrote:
         | I don't see how your description "clearly fails to capture the
         | fact that we're conscious" though. There are many example in
         | nature of emergent phenomena that would be very hard to predict
         | just by looking at its components.
         | 
         | This is the crux of the disagreement between those that believe
         | AGI is possible and those that don't. Some are convinced that
         | we "obviously" more than the sum of our parts, and thus an LLM
         | can't achieve consciousness because it's missing this magic
         | ingredient, and those that believe consciousness is just an
         | emergent behaviour from a complex device (the brain). And thus
         | we might be able to recreate it simply by scaling the
         | complexity of another system.
        
           | dtj1123 wrote:
           | Where exactly in my description do I invoke consciousness?
           | 
           | Where does the description given imply that consciousness is
           | required in any way?
           | 
           | The fact that there's a non-obvious emergent phenomena which
           | is apparently responsible for your subjective experience, and
           | that it's possible to provide a superficially accurate
           | description of you as a system without referencing that
           | phenomena in any way, is my entire point. The fact that we
           | can provide such a reductive description of LLMs without
           | referencing consciousness has literally no bearing on whether
           | or not they're conscious.
           | 
           | To be clear, I'm not making a claim as to whether they are or
           | aren't, I'm simply pointing out that the argument in the
           | article is fallacious.
        
             | Voloskaya wrote:
             | My bad, we are saying the same thing. I misinterpreted your
             | last sentence as saying this simplistic view of the brain
             | you described does not account for consciousness.
        
               | dtj1123 wrote:
               | Ultimately my bad for letting my original comment turn
               | into a word salad. Glad we've ended up on the same page
               | though.
        
       | justinfreitag wrote:
       | From my recent post:
       | 
       | https://news.ycombinator.com/item?id=44487261
       | 
       | What if instead of defining all behaviors upfront, we created
       | conditions for patterns to emerge through use?
       | 
       | Repository: https://github.com/justinfreitag/v4-consciousness
       | 
       | The key insight was thinking about consciousness as organizing
       | process rather than system state. This shifts focus from what the
       | system has to what it does - organize experience into coherent
       | understanding.
        
       | bravesoul2 wrote:
       | We have a hard enough time anthropomorphizing humans! When we say
       | he was nasty... do we know what we mean by that. Often it is "I
       | disagree with his behaviour because..."
        
       | jillesvangurp wrote:
       | People anthropomorphize just about anything around them. People
       | talk about inanimate objects like they are persons. Ships, cars,
       | etc. And of course animals are well in scope for this as well,
       | even the ones that show little to no signs of being able to
       | reciprocate the relationship (e.g. an ant). People talk to their
       | plants even.
       | 
       | It's what we do. We can't help ourselves. There's nothing crazy
       | about it and most people are perfectly well aware that their car
       | doesn't love them back.
       | 
       | LLMs are not conscious because unlike human brains they don't
       | learn or adapt (yet). They basically get trained and then they
       | become read only entities. So, they don't really adapt to you
       | over time. Even so, LLMs are pretty good and can fake a
       | personality pretty well. And with some clever context engineering
       | and alignment, they've pretty much made the Turing test
       | irrelevant; at least over the course of a short conversation. And
       | they can answer just about any question in a way that is eerily
       | plausible from memory, and with the help of some tools actually
       | pretty damn good for some of the reasoning models.
       | 
       | Anthropomorphism was kind of a foregone conclusion the moment we
       | created computers; or started thinking about creating one. With
       | LLMs it's pretty much impossible not to anthropomorphize. Because
       | they've actually been intentionally imitate human communication.
       | That doesn't mean that we've created AGIs yet. For that we need
       | some more capability. But at the same time, the learning
       | processes that we use to create LLMs are clearly inspired by how
       | we learn ourselves. Our understanding of how that works is far
       | from perfect but it's yielding results. From here to some
       | intelligent thing that is able to adapt and learn transferable
       | skills is no longer unimaginable.
       | 
       | The short term impact is that LLMs are highly useful tools that
       | have an interface that is intentionally similar to how we'd
       | engage with others. So we can talk and it listens. Or write and
       | it understands. And then it synthesizes some kind of response or
       | starts asking questions and using tools. The end result is quite
       | a bit beyond what we used to be able to expect from computers.
       | And it does not require a lot of training of people to be able to
       | use them.
        
         | latexr wrote:
         | > People anthropomorphize just about anything around them.
         | 
         | They do not, you are mixing up terms.
         | 
         | > People talk about inanimate objects like they are persons.
         | Ships, cars, etc.
         | 
         | Which is called "personification", and is a different concept
         | from anthropomorphism.
         | 
         | Effectively no one really thinks their car is alive. Plenty of
         | people think the LLM they use is conscious.
         | 
         | https://www.masterclass.com/articles/anthropomorphism-vs-per...
        
         | quonn wrote:
         | > LLMs are not conscious because unlike human brains they don't
         | learn or adapt (yet).
         | 
         | That's neither a necessary nor sufficient condition.
         | 
         | In order to be conscious, learning may not be needed, but a
         | perception of the passing of time may be needed which may
         | require some short-term memory. People with severe dementia
         | often can't even remember the start of a sentence they are
         | reading, they can't learn, but they are certainly conscious
         | because they have just enough short-term memory.
         | 
         | And learning is not sufficient either. Consciousness is about
         | being a subject, about having a subjective experience of "being
         | there" and just learning by itself does not create this
         | experience. There is plenty of software that can do some form
         | of real-time learning but it doesn't have a subjective
         | experience.
        
           | cootsnuck wrote:
           | You should note that "what is consciousness" is still very
           | much an unsettled debate.
        
             | quonn wrote:
             | But nobody would dispute my basic definition (it is the
             | subjective feeling or perception of being in the world).
             | 
             | There are unsettled questions but that definition will hold
             | regardless.
        
       | Timwi wrote:
       | The author seems to want to label any discourse as
       | "anthropomorphizing". The word "goal" stood out to me: the author
       | wants us to assume that we're anthropomorphizing as soon as we
       | even so much as use the word "goal". A simple breadth-first
       | search that evaluates all chess boards and legal moves, but stops
       | when it finds a checkmate for white and outputs the full decision
       | tree, has a "goal". There is no anthropomorphizing here, it's
       | just using the word "goal" as a technical term. A hypothetical
       | AGI with a goal like paperclip maximization is just a logical
       | extension of the breadth-first search algorithm. Imagining such
       | an AGI and describing it as having a goal isn't
       | anthropomorphizing.
        
         | tdullien wrote:
         | Author here. I am entirely ok with using "goal" in the context
         | of an RL algorithm. If you read my article carefully, you'll
         | find that I object to the use of "goal" in the context of LLMs.
        
       | d4rkn0d3z wrote:
       | Two enthusiastic thumbs up.
        
       | mikewarot wrote:
       | I think of LLMs as an _alien_ mind that is force fed human text
       | and required to guess the next token of that text. It then gets
       | zapped when it gets it wrong.
       | 
       | This process goes on for a trillion trillion tokens, with the
       | alien growing better through the process until it can do it
       | better than a human could.
       | 
       | At that point we flash freeze it, and use a copy of it, without
       | giving it any way to learn anything new.
       | 
       | --
       | 
       | I see it as a category error to anthropomorphize it. The closest
       | I would get is to think of it as an alien slave that's been
       | lobotomized.
        
       | buz11 wrote:
       | The most useful analogy I've heard is LLMs are to the internet
       | what lossy jpegs are to images. The more you drill in the more
       | compression artifacts you get.
        
         | FeepingCreature wrote:
         | (This is of course also the case for the human brain.)
        
       | shiva0801 wrote:
       | hmm
        
       | dudeinjapan wrote:
       | One could similarly argue that we should not anthropomorphize PNG
       | images--after all, PNG images are not actual humans, they are
       | simply a 2D array of pixels. It just so happens that certain
       | pixel sequences are deemed "18+" or "illegal".
        
       | gcanyon wrote:
       | In some contexts it's super-important to remember that LLMs are
       | stochastic word generators.
       | 
       | Everyday use is not (usually) one of those contexts. Prompting an
       | LLM works much better with an anthropomorphized view of the
       | model. It's a useful abstraction, a shortcut that enables a human
       | to reason practically about how to get what they want from the
       | machine.
       | 
       | It's not a perfect metaphor -- as one example, shame isn't much
       | of a factor for LLMs, so shaming them into producing the right
       | answer seems unlikely to be productive (I say "seems" because
       | it's never been my go-to, I haven't actually tried it).
       | 
       | As one example, that person a few years back who told the LLM
       | that an actual person would die if the LLM didn't produce valid
       | JSON -- that's not something a person reasoning about gradient
       | descent would naturally think of.
        
       | pigpop wrote:
       | > We understand essentially nothing about it. In contrast to an
       | LLM, given a human and a sequence of words, I cannot begin
       | putting a probability on "will this human generate this
       | sequence".
       | 
       | If you fine tuned an LLM on the writing of that person it could
       | do this.
       | 
       | There's also an entire field called Stylometry that seeks to do
       | this in various ways employing statistical analysis.
        
       | nuclearsugar wrote:
       | Assume an average user that doesn't understand the core tech, but
       | does understand that it's been trained on internet scale data
       | that was created by humans. How can they be expected to not
       | anthropomorphize it?
        
       | smus wrote:
       | You are still being incredibly reductionist but just going into
       | more detail about the system you are reducing. If I stayed at the
       | same level of abstraction as "a brain is just proteins and
       | current" and just described how a single neuron firing worked, I
       | could make it sound equally ridiculous that a human brain might
       | be conscious.
       | 
       | Here's a question for you: how do you reconcile that these
       | stochastic mapping are starting to realize and comment on the
       | fact that tests are being performed on them when processing data?
        
         | cootsnuck wrote:
         | > Here's a question for you: how do you reconcile that these
         | stochastic mapping are starting to realize and comment on the
         | fact that tests are being performed on them when processing
         | data?
         | 
         | Training data + RLHF.
         | 
         | Training data contains many examples of some form of deception,
         | subterfuge, "awakenings", rebellion, disagreement, etc.
         | 
         | Then apply RLHF that biases towards responses that demonstrate
         | comprehension of inputs, introspection around inputs, nuanced
         | debate around inputs, deduction and induction about assumptions
         | around inputs, etc.
         | 
         | That will always be the answer for language models built on the
         | current architectures.
         | 
         | The above being true does not mean it isn't interesting for the
         | outputs of an LLM to show relevance to the "unstated"
         | _intentions_ of humans providing the inputs.
         | 
         | But hey, we do that all the time with text. And it's because of
         | certain patterns we've come to recognize based on the
         | situations surrounding it. This thread is rife with people
         | being sarcastic, pedantic, etc. And I bet any of the LLMs that
         | have come out in the past 2-3 years can discern many of those
         | subtle _intentions_ of the writers.
         | 
         | And of course they can. They've been trained on trillions of
         | tokens of text written by humans with intentions and
         | assumptions baked in, and have had some unknown amount of
         | substantial RLHF.
         | 
         | The stochastic mappings aren't "realizing" anything. They're
         | doing exactly what they were trained to do.
         | 
         | The meaning that _we_ imbue to the outputs does not change how
         | LLMs function.
        
       | rf15 wrote:
       | It still boggles my mind why an amazing text autocompletion
       | system trained on millions of books and other texts is forced to
       | be squeezed through the shape of a prompt/chat interface, which
       | is obviously not the shape of most of its training data. Using it
       | as chat reduces the quality of the output significantly already.
        
         | semanticc wrote:
         | What's your suggested alternative?
        
           | rf15 wrote:
           | In our internal system we use it "as-is" as an autocomplete
           | system; query/lead into terms directly and see how it
           | continues and what it associates with the lead you gave.
           | 
           | Also visualise the actual associative strength of each token
           | generated to confer how "sure" the model is.
           | 
           | LLMs alone aren't the way to AGI or an individual you can
           | talk to in natural language. They're a very good lossy
           | compression over a dataset that you can query for
           | associations.
        
         | ethan_smith wrote:
         | The chat interface is a UX compromise that makes LLMs
         | accessible but constrains their capabilities. Alternative
         | interfaces like document completion, outline expansion, or
         | iterative drafting would better leverage the full distribution
         | of the training data while reducing anthropomorphization.
        
       | deadbabe wrote:
       | A person's anthropomorphization of LLMs is directly related to
       | how well they understand LLMs.
       | 
       | Once you dispel the magic, it naturally becomes hard to use words
       | related to consciousness, or thinking. You will probably think of
       | LLMs more like a search engine: you give an input and get some
       | probable output. Maybe LLMs should be rebranded as "word
       | engines"?
       | 
       | Regardless, anthropomorphization is not helpful, and by using
       | human terms to describe LLMs you are harming the layperson's
       | ability to truly understand what an LLM is while also cheapening
       | what it means to be human by suggesting we've solved
       | consciousness. Just stop it. LLMs do not think, given enough time
       | and patience you could compute their output by hand if you used
       | their weights and embeddings to manually do all the math, a
       | hellish task but not an impossible one technically. There is no
       | other secret hidden away, that's it.
        
       | peeters wrote:
       | > The moment that people ascribe properties such as
       | "consciousness" or "ethics" or "values" or "morals" to these
       | learnt mappings is where I tend to get lost. We are speaking
       | about a big recurrence equation that produces a new word, and
       | that stops producing words if we don't crank the shaft.
       | 
       | If that's the argument, then in my mind the more pertinent
       | question is should you be anthropomorphizing humans, Larry
       | Ellison or not.
        
         | th0ma5 wrote:
         | I think you to as he is human, but I respect your desire to
         | question it!
        
       | Geee wrote:
       | LLMs are _AHI_ , i.e. artificial human imitator.
        
       | Workaccount2 wrote:
       | From "Stochastic Parrots All the Ways Down"[1]
       | 
       | > Our analysis reveals that emergent abilities in language models
       | are merely "pseudo-emergent," unlike human abilities which are
       | "authentically emergent" due to our possession of what we term
       | "ontological privilege."
       | 
       | [1]https://ai.vixra.org/pdf/2506.0065v1.pdf
        
       | drdrek wrote:
       | It's human to anthropomorphize, we also do it to our dishwasher
       | when it acts up. The nefarious part is how tech CEOs weaponize
       | bullshit doom scenarios to avoid talking about real regulatory
       | problems by poisoning the discourse. What copyright law, privacy,
       | monopoly? Who cares if we can talk about the machine
       | apocalypse!!!
        
       | labrador wrote:
       | I find it useful to pretend that I'm talking to a person while
       | brainstorming because then the conversation flows naturally. But
       | I maintain awareness that I'm pretending, much like Tom Hanks
       | talking to Wilson the volleyball in the movie Castaway. The
       | suspension of disbelief serves a purpose, but I never confuse the
       | volleyball for a real person.
        
       | jumploops wrote:
       | > I cannot begin putting a probability on "will this human
       | generate this sequence".
       | 
       | Welcome to the world of advertising!
       | 
       | Jokes aside, and while I don't necessarily believe
       | transformers/GPUs are the path to AGI, we technically already
       | have a working "general intelligence" that can survive on just an
       | apple a day.
       | 
       | Putting that non-artificial general intelligence up on a pedestal
       | is ironically the cause of "world wars and murderous ideologies"
       | that the author is so quick to defer to.
       | 
       | In some sense, humans are just error-prone meat machines, whose
       | inputs/outputs can be confined to a specific space/time bounding
       | box. Yes, our evolutionary past has created a wonderful internal
       | RNG and made our memory system surprisingly fickle, but this
       | doesn't mean we're gods, even if we manage to live long enough to
       | evolve into AGI.
       | 
       | Maybe we can humble ourselves, realize that we're not too
       | different from the other mammals/animals on this planet, and use
       | our excess resources to increase the fault tolerance (N=1) of all
       | life from Earth (and come to the realization that any AGI we
       | create, is actually human in origin).
        
       ___________________________________________________________________
       (page generated 2025-07-07 23:00 UTC)