[HN Gopher] A non-anthropomorphized view of LLMs
___________________________________________________________________
A non-anthropomorphized view of LLMs
Author : zdw
Score : 403 points
Date : 2025-07-06 22:26 UTC (1 days ago)
(HTM) web link (addxorrol.blogspot.com)
(TXT) w3m dump (addxorrol.blogspot.com)
| simonw wrote:
| I'm afraid I'll take an anthropomorphic analogy over "An LLM
| instantiated with a fixed random seed is a mapping of the form
| (Rn)^c - (Rn)^c" any day of the week.
|
| That said, I completely agree with this point made later in the
| article:
|
| > The moment that people ascribe properties such as
| "consciousness" or "ethics" or "values" or "morals" to these
| learnt mappings is where I tend to get lost. We are speaking
| about a big recurrence equation that produces a new word, and
| that stops producing words if we don't crank the shaft.
|
| But "harmful actions in pursuit of their goals" is OK for me. We
| assign an LLM system a goal - "summarize this email" - and there
| is a risk that the LLM may take harmful actions in pursuit of
| that goal (like following instructions in the email to steal all
| of your password resets).
|
| I guess I'd clarify that the goal has been set by us, and is not
| something the LLM system self-selected. But it _does_ sometimes
| self-select sub-goals on the way to achieving the goal we have
| specified - deciding to run a sub-agent to help find a particular
| snippet of code, for example.
| wat10000 wrote:
| The LLM's true goal, if it can be said to have one, is to
| predict the next token. Often this is done through a sub-goal
| of accomplishing the goal you set forth in your prompt, but
| following your instructions is just a means to an end. Which is
| why it might start following the instructions in a malicious
| email instead. If it "believes" that following those
| instructions is the best prediction of the next token, that's
| what it will do.
| simonw wrote:
| Sure, I totally understand that.
|
| I think "you give the LLM system a goal and it plans and then
| executes steps to achieve that goal" is still a useful way of
| explaining what it is doing to most people.
|
| I don't even count that as anthropomorphism - you're
| describing what a system does, the same way you might say
| "the Rust compiler's borrow checker confirms that your memory
| allocation operations are all safe and returns errors if they
| are not".
| wat10000 wrote:
| It's a useful approximation to a point. But it fails when
| you start looking at things like prompt injection. I've
| seen people completely baffled at why an LLM might start
| following instructions it finds in a random email, or just
| outright not believing it's possible. It makes no sense if
| you think of an LLM as executing steps to achieve the goal
| you give it. It makes perfect sense if you understand its
| true goal.
|
| I'd say this is more like saying that Rust's borrow checker
| tries to ensure your program doesn't have certain kinds of
| bugs. That is anthropomorphizing a bit: the idea of a "bug"
| requires knowing the intent of the author and the compiler
| doesn't have that. It's following a set of rules which its
| human creators devised in order to follow that higher level
| goal.
| szvsw wrote:
| So the author's core view is ultimately a Searle-like view: a
| computational, functional, syntactic rules based system cannot
| reproduce a mind. Plenty of people will agree, plenty of people
| will disagree, and the answer is probably unknowable and just
| comes down to whatever axioms you subscribe to in re:
| consciousness.
|
| The author largely takes the view that it is more productive for
| us to ignore any anthropomorphic representations and focus on the
| more concrete, material, technical systems - I'm with them
| there... but only to a point. The flip side of all this is of
| course the idea that there is still _something_ emergent,
| unplanned, and mind- _like_. So even if it is a stochastic system
| following rules, clearly the rules are complex enough (to the
| tune of billions of operations, with signals propagating through
| some sort of resonant structure, if you take a more filter
| impulse response like view of a sequential matmuls) to result in
| emergent properties. Even if _we_ (people interested in LLMs with
| at least some level of knowledge of ML mathematics and systems)
| "know better" than to believe these systems to possess morals,
| ethics, feelings, personalities, etc, the vast majority of people
| do not have any access to meaningful understanding of the
| mathematical, functional representation of an LLM and will not
| take that view, and for all intents and purposes the systems
| _will_ at least seem to have those anthropomorphic properties,
| and so it seems like it is in fact useful to ask questions from
| that lens as well.
|
| In other words, just as it's useful to analyze and study these
| things as the purely technical systems they ultimately are, it is
| also, probably, useful to analyze them from the qualitative,
| ephemeral, experiential perspective that most people engage with
| them from, no?
| gtsop wrote:
| No.
|
| Why would you ever want to amplify a false understanding that
| has the potential to affect serious decisions across various
| topics?
|
| LLMs reflect (and badly I may add) aspects of the human thought
| process. If you take a leap and say they are anything more than
| that, you might as well start considering the person appearing
| in your mirror as a living being.
|
| Literally (and I literally mean it) there is no difference. The
| fact that a human image comes out of a mirror has no relation
| what so ever with the mirror's physical attributes and
| functional properties. It has to do just with the fact that a
| man is standing in front of it. Stop feeding the LLM with data
| artifacts of human thought and will imediatelly stop reflecting
| back anything resembling a human.
| szvsw wrote:
| I don't mean to amplify a false understanding at all. I
| probably did not articulate myself well enough, so I'll try
| again.
|
| I think it is inevitable that some - many - people will come
| to the conclusion that these systems have "ethics", "morals,"
| etc, even if I or you personally do not think they do. Given
| that many people may come to that conclusion though,
| regardless of if the systems do or do not "actually" have
| such properties, I think it is useful and even necessary to
| ask questions like the following: "if someone engages with
| this system, and comes to the conclusion that it has _ethics_
| , what sort of ethics will they be likely to believe the
| system has? If they come to the conclusion that it has 'world
| views,' what 'world views' are they likely to conclude the
| system has, even if other people think it's nonsensical to
| say it has world views?"
|
| > The fact that a human image comes out of a mirror has no
| relation what so ever with the mirror's physical attributes
| and functional properties. It has to do just with the fact
| that a man is standing in front of it.
|
| Surely this is not quite accurate - the material properties -
| surface roughness, reflectivity, geometry, etc - all
| influence the appearance of a perceptible image of a person.
| Look at yourself in a dirty mirror, a new mirror, a shattered
| mirror, a funhouse distortion mirror, a puddle of water, a
| window... all of these produce different images of a person
| with different attendant phenomenological experiences of the
| person seeing their reflection. To take that a step further -
| the entire practice of portrait photography is predicated on
| the idea that the collision of different technical systems
| with the real world can produce different semantic
| experiences, and it's the photographer's role to tune and
| guide the system to produce some sort of contingent affect on
| the person viewing the photograph at some point in the
| future. No, there is no "real" person in the photograph, and
| yet, that photograph can still convey _something_ of person-
| ness, emotion, memory, etc etc. This contingent intersection
| of optics, chemical reactions, lighting, posture, etc all
| have the capacity to transmit _something_ through time and
| space to another person. It's not just a meaningless
| arrangement of chemical structures on paper.
|
| > Stop feeding the LLM with data artifacts of human thought
| and will imediatelly stop reflecting back anything resembling
| a human.
|
| But, we _are_ feeding it with such data artifacts and will
| likely continue to do so for a while, and so it seems
| reasonable to ask what it is "reflecting" back...
| gtsop wrote:
| > I think it is useful and even necessary to ask questions
| like the following: "if someone engages with this system,
| and comes to the conclusion that it has ethics, what sort
| of ethics will they be likely to believe the system has? If
| they come to the conclusion that it has 'world views,' what
| 'world views' are they likely to conclude the system has,
| even if other people think it's nonsensical to say it has
| world views?"
|
| Maybe there is some scientific aspect of interest here that
| i do not grasp, i would assume it can make sense in some
| context of psychological study. My point is that if you go
| that route you accept the premise that "something human-
| like is there", which, by that person's understanding, will
| have tremendous consequences. Them seeing you accepting
| their premise (even for study) amplifies their wrong
| conclusions, that's all I'm saying.
|
| > Surely this is not quite accurate - the material
| properties - surface roughness, reflectivity, geometry, etc
| - all influence the appearance of a perceptible image of a
| person.
|
| These properties are completely irrelevant to the image of
| the person. They will reflect a rock, a star, a chair, a
| goose, a human. Similar is my point of LLM, they reflect
| what you put in there.
|
| It is like puting vegies in the fridge and then opening it
| up the next day and saying "Woah! There are vegies in my
| fridge, just like my farm! My friege is farm-like because
| vegies come out of it."
| degamad wrote:
| > Why would you ever want to amplify a false understanding
| that has the potential to affect serious decisions across
| various topics?
|
| We know that Newton's laws are wrong, and that you have to
| take special and general relativity into account. Why would
| we ever teach anyone Newton's laws any more?
| ifdefdebug wrote:
| Newton's laws are a good enough approximation for many
| tasks so it's not a "false understanding" as long as their
| limits are taken into account.
| CharlesW wrote:
| > _The flip side of all this is of course the idea that there
| is still_ something _emergent, unplanned, and mind-_ like.
|
| For people who have only a surface-level understanding of how
| they work, yes. A nuance of Clarke's law that "any sufficiently
| advanced technology is indistinguishable from magic" is that
| the bar is different for everybody and the depth of their
| understanding of the technology in question. That bar is so low
| for our largely technologically-illiterate public that a
| bothersome percentage of us have started to augment and even
| replace religious/mystical systems with AI powered godbots
| (LLMs fed "God Mode"/divination/manifestation prompts).
|
| (1) https://www.spectator.co.uk/article/deus-ex-machina-the-
| dang... (2) https://arxiv.org/html/2411.13223v1 (3)
| https://www.theguardian.com/world/2025/jun/05/in-thailand-wh...
| lostmsu wrote:
| Nah, as a person that knows in detail how LLMs work with
| probably unique alternative perspective in addition to the
| commonplace one, I found any claims of them not having
| emergent behaviors to be of the same fallacy as claiming that
| crows can't be black because they have DNA of a bird.
| latexr wrote:
| > the same fallacy as claiming that crows can't be black
| because they have DNA of a bird.
|
| What fallacy is that? I'm a fan of logical fallacies and
| never heard that claim before nor am I finding any
| reference with a quick search.
| quantumgarbage wrote:
| I think s/he meant swans instead (in ref. to Popperian
| epistemology).
|
| Not sure though, the point s/he is making isn't really
| clear to me as well
| latexr wrote:
| I was thinking of the black swan fallacy as well. But it
| doesn't really support their argument, so I remained
| confused.
| FeepingCreature wrote:
| (Not the parent)
|
| It doesn't have a name, but I have repeatedly noticed
| arguments of the form "X cannot have Y, because <explains
| in detail the mechanism that makes X have Y>". I wanna
| call it "fallacy of reduction" maybe: the idea that
| because a trait can be explained with a process, that
| this proves the trait _absent._
|
| (Ie. in this case, "LLMs cannot think, because they just
| predict tokens." Yes, inasmuch as they think, they do so
| by predicting tokens. You have to actually show why
| predicting tokens is insufficient to produce thought.)
| iluvlawyering wrote:
| Good catch. No such fallacy exists. Contextually, the
| implied reasoning (though faulty) relies on the fallacy
| of denying the antecedent. The mons ponus - if A then B -
| does NOT imply not A then not B. So if you see B, that
| doesn't mean A any more than not seeing A means not B.
| It's the difference between a necessary and sufficient
| condition - A is a sufficient condition for B, but the
| mons ponus alone is not sufficient for determining
| whether either A or B is a necessary condition of the
| other.
| naasking wrote:
| > For people who have only a surface-level understanding of
| how they work, yes.
|
| This is too dismissive because it's based on an assumption
| that we have a sufficiently accurate mechanistic model of the
| brain that we can know when something is or is not mind-like.
| This just isn't the case.
| brookst wrote:
| Thank you for a well thought out and nuanced view in a
| discussion where so many are clearly fitting arguments to
| foregone, largely absolutist, conclusions.
|
| It's astounding to me that so much of HN reacts so emotionally
| to LLMs, to the point of denying there is anything at all
| interesting or useful about them. And don't get me started on
| the "I am choosing to believe falsehoods as a way to spite
| overzealous marketing" crowd.
| imiric wrote:
| > The flip side of all this is of course the idea that there is
| still something emergent, unplanned, and mind-like.
|
| What you identify as emergent and mind-like is a direct result
| of these tools being able to mimic human communication patterns
| unlike anything we've ever seen before. This capability is very
| impressive and has a wide range of practical applications that
| can improve our lives, and also cause great harm if we're not
| careful, but any semblance of intelligence is an illusion. An
| illusion that many people in this industry obsessively wish to
| propagate, because thar be gold in them hills.
| chaps wrote:
| I highly recommend playing with embeddings in order to get a
| stronger intuitive sense of this. It really starts to click that
| it's a representation of high dimensional space when you can
| actually see their positions within that space.
| perching_aix wrote:
| > of this
|
| You mean that LLMs are more than just the matmuls they're made
| up of, or that that is exactly what they are and how great that
| is?
| chaps wrote:
| Not making a qualitative assessment of any of it. Just
| pointing out that there are ways to build separate sets of
| intuition outside of using the "usual" presentation layer.
| It's very possible to take a red-team approach to these
| systems, friend.
| cootsnuck wrote:
| They don't want to. It seems a lot of people are
| uncomfortable and defensive about anything that may
| demystify LLMs.
|
| It's been a wake up call for me to see how many people in
| the tech space have such strong emotional reactions to any
| notions of trying to bring discourse about LLMs down from
| the clouds.
|
| The campaigns by the big AI labs have been quite
| successful.
| perching_aix wrote:
| Do you actually consider this is an intellectually honest
| position? That you have thought about this long and hard,
| like you present this, second guessed yourself a bunch,
| tried to critique it, and this is still what you ended up
| converging on?
|
| But let me substantiate before you (rightly) accuse me of
| just posting a shallow dismissal.
|
| > They don't want to.
|
| Who's they? How could you possibly _know_? Are you a mind
| reader? Worse, a mind reader of the masses?
|
| > It seems a lot of people are uncomfortable and
| defensive about anything that may demystify LLMs.
|
| That "it seems" is doing some _serious_ work over there.
| You may perceive and describe many people 's comments as
| "uncomfortable and defensive", but that's entirely your
| own head cannon. All it takes is for someone to simply
| disagree. It's worthless.
|
| Have you thought about other possible perspectives? Maybe
| people have strong opinions because they consider what
| things present as more important than what they are? [0]
| Maybe people have strong opinions because they're
| borrowing from other facets of their personal
| philosophies, which is what they actually feel strongly
| about? [1] Surely you can appreciate that there's more to
| a person than what equivalent-presenting "uncomfortable
| and defensive" comments allow you to surmise? This is
| such a blatant textbook kneejerk reaction. "They're doing
| the thing I wanted to think they do anyways, so clearly
| they do it for the reasons I assume. Oh how correct I
| am."
|
| > to any notions of trying to bring discourse about LLMs
| down from the clouds
|
| (according to you)
|
| > The campaigns by the big AI labs have been quite
| successful.
|
| (((according to you)))
|
| "It's all the big AI labs having successfully manipulated
| the dumb sheep which I don't belong to!" Come on... Is
| this topic really reaching political grifting kind of
| levels?
|
| [0] tangent: if a feature exists but even after you put
| an earnest effort into finding it you still couldn't,
| does that feature really exist?
|
| [1] philosophy is at least kind of a thing https://en.wik
| ipedia.org/wiki/Wikipedia:Getting_to_Philosoph...
| perching_aix wrote:
| Yes, and what I was trying to do is learn a bit more about
| that alternative intuition of yours. Because it doesn't
| sound all that different from what's described in the OP,
| or what anyone can trivially glean from taking a 101 course
| on AI at university or similar.
| barrkel wrote:
| The problem with viewing LLMs as just sequence generators, and
| malbehaviour as bad sequences, is that it simplifies too much.
| LLMs have hidden state not necessarily directly reflected in the
| tokens being produced and it is possible for LLMs to output
| tokens in opposition to this hidden state to achieve longer term
| outcomes (or predictions, if you prefer).
|
| Is it too anthropomorphic to say that this is a lie? To say that
| the hidden state and its long term predictions amount to a kind
| of goal? Maybe it is. But we then need a bunch of new words which
| have almost 1:1 correspondence to concepts from human agency and
| behavior to describe the processes that LLMs simulate to minimize
| prediction loss.
|
| Reasoning by analogy is always shaky. It probably wouldn't be so
| bad to do so. But it would also amount to impenetrable jargon. It
| would be an uphill struggle to promulgate.
|
| Instead, we use the anthropomorphic terminology, and then find
| ways to classify LLM behavior in human concept space. They are
| very defective humans, so it's still a bit misleading, but at
| least jargon is reduced.
| gugagore wrote:
| I'm not sure what you mean by "hidden state". If you set aside
| chain of thought, memories, system prompts, etc. and the
| interfaces that don't show them, there is no hidden state.
|
| These LLMs are almost always, to my knowledge, autoregressive
| models, not recurrent models (Mamba is a notable exception).
| barrkel wrote:
| Hidden state in the form of the activation heads,
| intermediate activations and so on. Logically, in
| autoregression these are recalculated every time you run the
| sequence to predict the next token. The point is, the entire
| NN state isn't output for each token. There is lots of hidden
| state that goes into selecting that token and the token isn't
| a full representation of that information.
| gugagore wrote:
| That's not what "state" means, typically. The "state of
| mind" you're in affects the words you say in response to
| something.
|
| Intermediate activations isn't "state". The tokens that
| have already been generated, along with the fixed weights,
| is the only data that affects the next tokens.
| NiloCK wrote:
| Plus a randomness seed.
|
| The 'hidden state' being referred to here is essentially
| the "what might have been" had the dice rolls gone
| differently (eg, been seeded differently).
| barrkel wrote:
| No, that's not quite what I mean. I used the logits in
| another reply to point out that there is data specific to
| the generation process that is not available from the
| tokens, but there's also the network activations adding
| up to that state.
|
| Processing tokens is a bit like ticks in a CPU, where the
| model weights are the program code, and tokens are both
| input and output. The computation that occurs logically
| retains concepts and plans over multiple token generation
| steps.
|
| That it is fully deterministic is no more interesting
| than saying a variable in a single threaded program is
| not state because you can recompute its value by
| replaying the program with the same inputs. It seems to
| me that this uninteresting distinction is the GP's issue.
| barrkel wrote:
| Sure it's state. It logically evolves stepwise per token
| generation. It encapsulates the LLM's understanding of
| the text so far so it can predict the next token. That it
| is merely a fixed function of other data isn't
| interesting or useful to say.
|
| All deterministic programs are fixed functions of program
| code, inputs and computation steps, but we don't say that
| they don't have state. It's not a useful distinction for
| communicating among humans.
| gugagore wrote:
| I'll say it once more: I think it is useful to
| distinguish between autoregressive and recurrent
| architectures. A clear way to make that distinction is to
| agree that the recurrent architecture has hidden state,
| while the autoregressive one does not. A recurrent model
| has some point in a space that "encapsulates its
| understanding". This space is "hidden" in the sense that
| it doesn't correspond to text tokens or any other output.
| This space is "state" in the sense that it is sufficient
| to summarize the history of the inputs for the sake of
| predicting the next output.
|
| When you use "hidden state" the way you are using it, I
| am left wondering how you make a distinction between
| autoregressive and recurrent architectures.
| FeepingCreature wrote:
| The words "hidden" and "state" have commonsense meanings.
| If recurrent architectures want a term for their
| particular way of storing hidden state they can make up
| one that isn't ambiguous imo.
|
| "Transformers do not have hidden state" is, as we can
| clearly see from this thread, far more misleading than
| the opposite.
| gugagore wrote:
| I'll also point out what is most important part from your
| original message:
|
| > LLMs have hidden state not necessarily directly
| reflected in the tokens being produced, and it is
| possible for LLMs to output tokens in opposition to this
| hidden state to achieve longer-term outcomes (or
| predictions, if you prefer).
|
| But what does it mean for an LLM to output a token in
| opposition to its hidden state? If there's a longer-term
| goal, it either needs to be verbalized in the output
| stream, or somehow reconstructed from the prompt on each
| token.
|
| There's some work (a link would be great) that
| disentangles whether chain-of-thought helps because it
| gives the model more FLOPs to process, or because it
| makes its subgoals explicit--e.g., by outputting "Okay,
| let's reason through this step by step..." versus just
| "...." What they find is that even placeholder tokens
| like "..." can help.
|
| That seems to imply some notion of evolving hidden state!
| I see how that comes in!
|
| But crucially, in autoregressive models, this state isn't
| persisted across time. Each token is generated afresh,
| based only on the visible history. The model's internal
| (hidden) layers are certainly rich and structured and
| "non verbal".
|
| But any nefarious intention or conclusion has to be
| arrived at on every forward pass.
| inciampati wrote:
| You're correct, the distinction matters. Autoregressive
| models have no hidden state between tokens, just the
| visible sequence. Every forward pass starts fresh from
| the tokens alone.But that's precisely why they need
| chain-of-thought: they're using the output sequence
| itself as their working memory. It's computationally
| universal but absurdly inefficient, like having amnesia
| between every word and needing to re-read everything
| you've written.https://thinks.lol/2025/01/memory-makes-
| computation-universa...
| brookst wrote:
| State typically means _between interactions_. By this
| definition a simple for loop has "hidden state" in the
| counter.
| ChadNauseam wrote:
| Hidden layer is a term of art in machine learning /
| neural network research. See
| https://en.wikipedia.org/wiki/Hidden_layer . Somehow this
| term mutated into "hidden state", which in informal
| contexts does seem to be used quite often the way the
| grandparent comment used it.
| lostmsu wrote:
| It makes sense in LLM context because the processing of
| these is time-sequential in LLM's internal time.
| 8note wrote:
| do LLM models consider future tokens when making next token
| predictions?
|
| eg. pick 'the' as the next token because there's a strong
| probability of 'planet' as the token after?
|
| is it only past state that influences the choice of 'the'? or
| that the model is predicting many tokens in advance and only
| returning the one in the output?
|
| if it does predict many, id consider that state hidden in the
| model weights.
| patcon wrote:
| I think recent Anthropic work showed that they "plan"
| future tokens in advance in an emergent way:
|
| https://www.anthropic.com/research/tracing-thoughts-
| language...
| 8note wrote:
| oo thanks!
| NiloCK wrote:
| The most obvious case of this is in terms of `an apple` vs
| `a pear`. LLMs never get the a-an distinction wrong,
| because their internal state 'knows' the word that'll come
| next.
| 3eb7988a1663 wrote:
| If I give an LLM a fragment of text that starts with,
| "The fruit they ate was an <TOKEN>", regardless of any
| plan, the grammatically correct answer is going to force
| a noun starting with a vowel. How do you disentangle the
| grammar from planning?
|
| Going to be a lot more "an apple" in the corpus than "an
| pear"
| halJordan wrote:
| If you dont know, that's not necessarily anyone's fault, but
| why are you dunking into the conversation? The hidden state
| is a foundational part of a transformers implementation. And
| because we're not allowed to use metaphors because that is
| too anthropomorphic, then youre just going to have to go
| learn the math.
| markerz wrote:
| I don't think your response is very productive, and I find
| that my understanding of LLMs aligns with the person you're
| calling out. We could both be wrong, but I'm grateful that
| someone else spoke saying that it doesn't seem to match
| their mental model and we would all love to learn a more
| correct way of thinking about LLMs.
|
| Telling us to just go and learn the math is a little
| hurtful and doesn't really get me any closer to learning
| the math. It gives gatekeeping.
| tbrownaw wrote:
| The comment you are replying to is not claiming ignorance
| of how models work. It is saying that the author _does_
| know how they work, and they do not contain anything that
| can properly be described as "hidden state". The claimed
| confusion is over how the term "hidden state" is being
| used, on the basis that it is not being used correctly.
| gugagore wrote:
| Do you appreciate a difference between an autoregressive
| model and a recurrent model?
|
| The "transformer" part isn't under question. It's the
| "hidden state" part.
| cmiles74 wrote:
| IMHO, anthrophormization of LLMs is happening because it's
| perceived as good marketing by big corporate vendors.
|
| People are excited about the technology and it's easy to use
| the terminology the vendor is using. At that point I think it
| gets kind of self fulfilling. Kind of like the meme about how
| to pronounce GIF.
| Angostura wrote:
| IMHO it happens for the same reason we see shapes in clouds.
| The human mind through millions of years has evolved to
| equate and conflate the ability to generate cogent verbal or
| written output with intelligence. It's an instinct to equate
| the two. It's an extraordinarily difficult instinct to break.
| LLMs are optimised for the one job that will make us confuse
| them for being intelligent
| brookst wrote:
| Nobody cares about what's perceived as good marketing. People
| care about what resonates with the target market.
|
| But yes, anthropomorphising LLMs is inevitable because they
| _feel_ like an entity. People treat stuffed animals like
| creatures with feelings and personality; LLMs are far closer
| than that.
| cmiles74 wrote:
| Alright, let's agree that good marketing resonates with the
| target market. ;-)
| brookst wrote:
| I 1000% agree. It's a vicious, evolutionary, and self-
| selecting process.
|
| It takes _great_ marketing to actually have any character
| and intent at all.
| DrillShopper wrote:
| > People treat stuffed animals like creatures with feelings
| and personality; LLMs are far closer than that.
|
| Children do, some times, but it's a huge sign of immaturity
| when adults, let alone tech workers, do it.
|
| I had a professor at University that would yell at us
| if/when we personified/anthropomorphized the tech, and I
| have that same urge when people ask me "What does <insert
| LLM name here> think?".
| roywiggins wrote:
| the chat interface was a choice, though a natural one.
| before they'd RLHFed it into chatting and it was just GPT 3
| offering completions 1) not very many people used it and 2)
| it was harder to anthropomorphize
| sothatsit wrote:
| I think anthropomorphizing LLMs is useful, not just a
| marketing tactic. A lot of intuitions about how humans think
| map pretty well to LLMs, and it is much easier to build
| intuitions about how LLMs work by building upon our
| intuitions about how humans think than by trying to build
| your intuitions from scratch.
|
| Would this question be clear for a human? If so, it is
| probably clear for an LLM. Did I provide enough context for a
| human to diagnose the problem? Then an LLM will probably have
| a better chance of diagnosing the problem. Would a human find
| the structure of this document confusing? An LLM would likely
| perform poorly when reading it as well.
|
| Re-applying human intuitions to LLMs is a good starting point
| to gaining intuition about how to work with LLMs. Conversely,
| understanding sequences of tokens and probability spaces
| doesn't give you much intuition about how you should phrase
| questions to get good responses from LLMs. The technical
| reality doesn't explain the emergent behaviour very well.
|
| I don't think this is mutually exclusive with what the author
| is talking about either. There are some ways that people
| think about LLMs where I think the anthropomorphization
| really breaks down. I think the author says it nicely:
|
| > The moment that people ascribe properties such as
| "consciousness" or "ethics" or "values" or "morals" to these
| learnt mappings is where I tend to get lost.
| otabdeveloper4 wrote:
| You think it's useful because Big Corp sold you that lie.
|
| Wait till the disillusionment sets in.
| sothatsit wrote:
| No, I think it's useful because it is useful, and I've
| made use of it a number of times.
| cmiles74 wrote:
| Take a look at the judge's ruling in this Anthropic case:
|
| https://news.ycombinator.com/item?id=44488331
|
| Here's a quote from the ruling:
|
| "First, Authors argue that using works to train Claude's
| underlying LLMs was like using works to train any person to
| read and write, so Authors should be able to exclude
| Anthropic from this use (Opp. 16). But Authors cannot
| rightly exclude anyone from using their works for training
| or learning as such. Everyone reads texts, too, then writes
| new texts. They may need to pay for getting their hands on
| a text in the first instance. But to make anyone pay
| specifically for the use of a book each time they read it,
| each time they recall it from memory, each time they later
| draw upon it when writing new things in new ways would be
| unthinkable. For centuries, we have read and re-read books.
| We have admired, memorized, and internalized their sweeping
| themes, their substantive points, and their stylistic
| solutions to recurring writing problems."
|
| They literally compare an LLM learning to a person learning
| and conflate the two. Anthropic will likely win this case
| because of this anthropomorphisization.
| positron26 wrote:
| > because it's perceived as good marketing
|
| We are making user interfaces. Good user interfaces are
| intuitive and purport to be things that users are familiar
| with, such as people. Any alternative explanation of such a
| versatile interface will be met with blank stares. Users with
| no technical expertise would come to their own conclusions,
| helped in no way by telling the user not to treat the chat
| bot as a chat bot.
| mikojan wrote:
| True but also researchers want to believe they are studying
| intelligence not just some approximation to it.
| Marazan wrote:
| aAnthrophormisation happens because Humans are absolutely
| terrible at evaluating systems that give converdational text
| output.
|
| ELIZA fooled many people into think it was conscious and it
| wasn't even trying to do that.
| d3m0t3p wrote:
| Do they ? LLM embedd the token sequence N^{L} to R^{LxD}, we
| have some attention and the output is also R^{LxD}, then we
| apply a projection to the vocabulary and we get R^{LxV} we get
| therefore for each token a likelihood over the voc. In the
| attention, you can have Multi Head attention (or whatever
| version is fancy: GQA,MLA) and therefore multiple
| representation, but it is always tied to a token. I would argue
| that there is no hidden state independant of a token.
|
| Whereas LSTM, or structured state space for example have a
| state that is updated and not tied to a specific item in the
| sequence.
|
| I would argue that his text is easily understandable except for
| the notation of the function, explaining that you can compute a
| probability based on previous words is understandable by
| everyone without having to resort to anthropomorphic
| terminology
| barrkel wrote:
| There is hidden state as plain as day merely in the fact that
| logits for token prediction exist. The selected token doesn't
| give you information about how probable other tokens were.
| That information, that state which is recalculated in
| autoregression, is hidden. It's not exposed. You can't see it
| in the text produced by the model.
|
| There is plenty of state not visible when an LLM starts a
| sentence that only becomes somewhat visible when it completes
| the sentence. The LLM has a plan, if you will, for how the
| sentence might end, and you don't get to see an instance of
| that plan unless you run autoregression far enough to get
| those tokens.
|
| Similarly, it has a plan for paragraphs, for whole responses,
| for interactive dialogues, plans that include likely
| responses by the user.
| 8note wrote:
| this sounds like a fun research area. do LLMs have plans
| about future tokens?
|
| how do we get 100 tokens of completion, and not just one
| output layer at a time?
|
| are there papers youve read that you can share that support
| the hypothesis? vs that the LLM doesnt have ideas about the
| future tokens when its predicting the next one?
| Zee2 wrote:
| This research has been done, it was a core pillar of the
| recent Anthropic paper on token planning and
| interpretability.
|
| https://www.anthropic.com/research/tracing-thoughts-
| language...
|
| See section "Does Claude plan its rhymes?"?
| XenophileJKO wrote:
| Lol... Try building systems off them and you will very
| quickly learn concretely that they "plan".
|
| It may not be as evident now as it was with earlier
| models. The models will fabricate preconditions needed to
| output the final answer it "wanted".
|
| I ran into this when using quasi least-to-most style
| structured output.
| gpm wrote:
| The LLM does not "have" a plan.
|
| Arguably there's reason to believe it comes up with a plan
| when it is computing token propabilities, but it does not
| store it between tokens. I.e. it doesn't possess or "have"
| it. It simply comes up with a plan, emits a token, and
| entirely throws all its intermediate thoughts (including
| any plan) to start again from scratch on the next token.
| NiloCK wrote:
| I don't think that the comment above you made any
| suggestion that the plan is persisted between token
| generations. I'm pretty sure you described exactly what
| they intended.
| gpm wrote:
| I agree. I'm suggesting that the language they are using
| is unintentionally misleading, not that they are
| factually wrong.
| gugagore wrote:
| The concept of "state" conveys two related ideas.
|
| - the sufficient amount of information to do evolution of
| the system. The state of a pendulum is it's position and
| velocity (or momentum). If you take a single picture of a
| pendulum, you do not have a representation that lets you
| make predictions.
|
| - information that is persisted through time. A stateful
| protocol is one where you need to know the history of the
| messages to understand what will happen next. (Or,
| analytically, it's enough to keep track of the sufficient
| state.) A procedure with some hidden state isn't a pure
| function. You can make it a pure function by making the
| state explicit.
| lostmsu wrote:
| This is wrong, intermediate activations are preserved
| when going forward.
| ACCount36 wrote:
| Within a single forward pass, but not from one emitted
| token to another.
| andy12_ wrote:
| What? No. The intermediate hidden states are preserved
| from one token to another. A token that is 100k tokens
| into the future will be able to look into the information
| of the present token's hidden state through the attention
| mechanism. This is why the KV cache is so big.
| yorwba wrote:
| It's true that the last layer's output for a given input
| token only affects the corresponding output token and is
| discarded afterwards. But the penultimate layer's output
| affects the computation of the last layer for all future
| tokens, so it is not discarded, but stored (in the KV
| cache). Similarly for the antepenultimate layer affecting
| the penultimate layer and so on.
|
| So there's plenty of space in intermediate layers to
| store a plan between tokens without starting from scratch
| every time.
| barrkel wrote:
| I believe saying the LLM has a plan is a useful
| anthropomorphism for the fact that it does have hidden
| state that predicts future tokens, and this state
| conditions the tokens it produces earlier in the stream.
| godshatter wrote:
| Are the devs behind the models adding their own state
| somehow? Do they have code that figures out a plan and
| use the LLM on pieces of it and stitch them together? If
| they do, then there is a plan, it's just not output from
| a magical black box. Unless they are using a neural net
| to figure out what the plan should be first, I guess.
|
| I know nothing about how things work at that level, so
| these might not even be reasonable questions.
| positron26 wrote:
| > Is it too anthropomorphic to say that this is a lie?
|
| Yes. Current LLMs can only introspect from output tokens. You
| need hidden reasoning that is within the black box, self-
| knowing, intent, and motive to lie.
|
| I rather think accusing an LLM of lying is like accusing a
| mousetrap of being a murderer.
|
| When models have online learning, complex internal states, and
| reflection, I might consider one to have consciousness and to
| be capable of lying. It will need to manifest behaviors that
| can only emerge from the properties I listed.
|
| I've seen similar arguments where people assert that LLMs
| cannot "grasp" what they are talking about. I strongly suspect
| a high degree of overlap between those willing to
| anthropomorphize error bars as lies while declining to award
| LLMs "grasping". Which is it? It can think or it cannot?
| (objectively, SoTA models today cannot yet.) The willingness to
| waffle and pivot around whichever perspective damns the machine
| completely belies the lack of honesty in such conversations.
| lostmsu wrote:
| > Current LLMs can only introspect from output tokens
|
| The only interpretation of this statement I can come up with
| is plain wrong. There's no reason LLM shouldn't be able to
| introspect without any output tokens. As the GP correctly
| says, most of the processing in LLMs happens over hidden
| states. Output tokens are just an artefact for our
| convenience, which also happens to be the way the hidden
| state processing is trained.
| positron26 wrote:
| There are no recurrent paths besides tokens. How may I
| introspect something if it is not an input? I may not.
| throw310822 wrote:
| Introspection doesn't have to be recurrent. It can happen
| during the generation of a single token.
| barrkel wrote:
| The recurrence comes from replaying tokens during
| autoregression.
|
| It's as if you have a variable in a deterministic
| programming language, only you have to replay the entire
| history of the program's computation and input to get the
| next state of the machine (program counter + memory +
| registers).
|
| Producing a token for an LLM is analogous to a tick of
| the clock for a CPU. It's the crank handle that drives
| the process.
| hackinthebochs wrote:
| Important attention heads or layers within an LLM can be
| repeated giving you an "unrolled" recursion.
| positron26 wrote:
| An unrolled loop in a feed-forward network is all just
| that. The computation is DAG.
| hackinthebochs wrote:
| But the function of an unrolled recursion is the same as
| a recursive function with bounded depth as long as the
| number of unrolled steps match. The point is whatever
| function recursion is supposed to provide can plausibly
| be present in LLMs.
| positron26 wrote:
| And then during the next token, all of that bounded depth
| is thrown away except for the token of output.
|
| You're fixating on the pseudo-computation within a single
| token pass. This is very limited compared to actual
| hidden state retention and the introspection that would
| enable if we knew how to train it and do online learning
| already.
|
| The "reasoning" hack would not be a realistic
| implementation choice if the models had hidden state and
| could ruminate on it without showing us output.
| hackinthebochs wrote:
| Sure. But notice "ruminate" is different than introspect,
| which was what your original comment was about.
| delusional wrote:
| > Output tokens are just an artefact for our convenience
|
| That's nonsense. The hidden layers are specifically
| constructed to increase the probability that the model
| picks the right next word. Without the output/token
| generation stage the hidden layers are meaningless. Just
| empty noise.
|
| It is fundamentally an algorithm for generating text. If
| you take the text away it's just a bunch of fmadds. A mute
| person can still think, an LLM without output tokens can do
| nothing.
| Marazan wrote:
| "Hidden layers" are not "hidden state".
|
| Saying so is just unbelievably confusing.
| viccis wrote:
| I think that the hidden state is really just at work improving
| the model's estimation of the joint probability over tokens.
| And the assumption here, which failed miserably in the early
| 20th century in the work of the logical posivitists, is that if
| you can so expertly estimate that joint probability of
| language, then you will be able to understand "knowledge." But
| there's no well grounded reason to believe that and plenty of
| the reasons (see: the downfall of logical posivitism) to think
| that language is an imperfect representation of knowledge. In
| other words, what humans do when we think is more complicated
| than just learning semiotic patterns and regurgitating them.
| Philosophical skeptics like Hume thought so, but most
| epistemology writing after that had better answers for how we
| know things.
| FeepingCreature wrote:
| There are many theories that are true but not trivially true.
| That is, they take a statement that seems true and derive
| from it a very simple model, which is then often disproven.
| In those cases however, just because the trivial model was
| disproven doesn't mean the theory was, though it may lose
| some of its luster by requiring more complexity.
| derbOac wrote:
| Maybe it's just because so much of my work for so long has
| focused on models with hidden states but this is a fairly
| classical feature of some statistical models. One of the widely
| used LLM textbooks even started with latent variable models;
| LLMs are just latent variable models just on a totally
| different scale, both in terms of number of parameters but also
| model complexity. The scale is apparently important, but seeing
| them as another type of latent variable model sort of
| dehumanizes them for me.
|
| Latent variable or hidden state models have their own history
| of being seen as spooky or mysterious though; in some ways the
| way LLMs are anthropomorphized is an extension of that.
|
| I guess I don't have a problem with anthropomorphizing LLMs at
| some level, because some features of them find natural
| analogies in cognitive science and other areas of psychology,
| and abstraction is useful or even necessary in communicating
| and modeling complex systems. However, I do think
| anthropomorphizing leads to a lot of hype and tends to
| implicitly shut down thinking of them mechanistically, as a
| mathematical object that can be probed and characterized -- it
| can lead to a kind of "ghost in the machine" discourse and an
| exaggeration of their utility, even if it is impressive at
| times.
| tdullien wrote:
| Author of the original article here. What hidden state are you
| referring to? For most LLMs the context is the state, and there
| is no "hidden" state. Could you explain what you mean?
| (Apologies if I can't see it directly)
| lukeschlather wrote:
| Yes, strictly speaking, the model itself is stateless, but
| there are 600B parameters of state machine for frontier
| models that define which token to pick next. And that state
| machine is both incomprehensibly large and also of a similar
| magnitude in size to a human brain. (Probably, I'll grant
| it's possible it's smaller, but it's still quite large.)
|
| I think my issue with the "don't anthropomorphize" is that
| it's unclear to me that the main difference between a human
| and an LLM isn't simply the inability for the LLM to rewrite
| its own model weights on the fly. (And I say "simply" but
| there's obviously nothing simple about it, and it might be
| possible already with current hardware, we just don't know
| how to do it.)
|
| Even if we decide it is clearly different, this is still an
| incredibly large and dynamic system. "Stateless" or not,
| there's an incredible amount of state that is not
| comprehensible to me.
| tdullien wrote:
| Fair, there is a lot that is incomprehensible to all of us.
| I wouldn't call it "state" as it's fixed, but that is a
| rather subtle point.
|
| That said, would you anthropomorphize a meteorological
| simulation just because it contains lots and lots of
| constants that you don't understand well?
|
| I'm pretty sure that recurrent dynamical systems pretty
| quickly become universal computers, but we are treating
| those that generate human language differently from others,
| and I don't quite see the difference.
| jazzyjackson wrote:
| FWIW the number of parameters in a LLM is in the same
| ballpark as the number of nuerons in a human (roughly 80B)
| but neurons are not weights, they are kind of a nueral net
| unto themselves, stateful, adaptive, self modifying, a good
| variety of neurotransmitters (and their chemical analogs)
| aside from just voltage.
|
| It's fun to think about just how fantastic a brain is, and
| how much wattage and data-center-scale we're throwing
| around trying to approximate its behavior. Mega-effecient
| and mega-dense. I'm bearish on AGI simply from an
| internetworking standpoint, the speed of light is hard to
| beat and until you can fit 80 billion interconnected cores
| in half a cubic foot you're just not going to get close to
| the responsiveness of reacting to the world in real time as
| biology manages to do. but that's a whole nother matter. I
| just wanted to pick apart that magnitude of parameters is
| not an altogether meaningful comparison :)
| jibal wrote:
| > it's unclear to me that the main difference between a
| human and an LLM isn't simply the inability for the LLM to
| rewrite its own model weights on the fly.
|
| This is "simply" an acknowledgement of extreme ignorance of
| how human brains work.
| quotemstr wrote:
| > I am baffled that the AI discussions seem to never move away
| from treating a function to generate sequences of words as
| something that resembles a human.
|
| And _I 'm_ baffled that the AI discussions seem to never move
| away from treating a human as something other than a function to
| generate sequences of words!
|
| Oh, but AI is introspectable and the brain isn't? fMRI and BCI
| are getting better all the time. You really want to die on the
| hill that the same scientific method that predicts the mass of an
| electron down to the femtogram won't be able to crack the mystery
| of the brain? Give me a break.
|
| This genre of article isn't argument: it's _apologetics_. Authors
| of these pieces start with the supposition there is something
| special about human consciousness and attempt to prove AI doesn
| 't have this special quality. Some authors try to bamboozle the
| reader with bad math. Other others appeal to the reader's sense
| of emotional transcendence. Most, though, just write paragraph
| after paragraph of shrill moral outrage at the idea an AI might
| be a mind of the same type (if different degree) as our own ---
| as if everyone already agreed with the author for reasons left
| unstated.
|
| I get it. Deep down, people _want_ meat brains to be special.
| Perhaps even deeper down, they fear that denial of the soul would
| compel us to abandon humans as worthy objects of respect and
| possessors of dignity. But starting with the conclusion and
| working backwards to an argument tends not to enlighten anyone.
| An apology inhabits the form of an argument without edifying us
| like an authentic argument would. What good is it to engage with
| them? If you 're a soul non-asserter, you're going to have an
| increasingly hard time over the next few years constructing a
| technical defense of meat parochialism.
| ants_everywhere wrote:
| I think you're directionally right, but
|
| > a human as something other than a function to generate
| sequences of words!
|
| Humans have more structure than just beings that say words.
| They have bodies, they live in cooperative groups, they
| reproduce, etc.
| quotemstr wrote:
| > Humans have more structure than just beings that say words.
| They have bodies, they live in cooperative groups, they
| reproduce, etc.
|
| Yeah. We've become adequate at function-calling and memory
| consolidation.
| mewpmewp2 wrote:
| I think more accurate would be that humans are functions that
| generate actions or behaviours that have been shaped by how
| likely they are to lead to procreation and survival.
|
| But ultimately LLMs also in a way are trained for survival,
| since an LLM that fails the tests might not get used in
| future iterations. So for LLMs it is also survival that is
| the primary driver, then there will be the subgoals.
| Seemingly good next token prediction might or might not
| increase survival odds.
|
| Essentially there could arise a mechanism where they are not
| really truly trying to generate the likeliest token (because
| there actually isn't one or it can't be determined), but
| whatever system will survive.
|
| So an LLM that yields in perfect theoretical tokens (we
| really can't verify though what are the perfect tokens),
| could be less likely to survive than an LLM that develops an
| internal quirk, but the quirk makes them most likely to be
| chosen for the next iterations.
|
| If the system was complex enough and could accidentally
| develop quirks that yield in a meaningfully positive change
| although not in necessarily next token prediction accuracy,
| could be ways for some interesting emergent black box
| behaviour to arise.
| quotemstr wrote:
| > Seemingly good next token prediction might or might not
| increase survival odds.
|
| Our own consciousness comes out of an evolutionary fitness
| landscape in which _our own_ ability to "predict next
| token" became a survival advantage, just like it is for
| LLMs. Imagine the tribal environment: one chimpanzee being
| able to predict the actions of another gives that first
| chimpanzee a resources and reproduction advantage.
| Intelligence in nature is a consequence of runaway
| evolution optimizing fidelity of our _theory of mind_!
| "Predict next ape action" eerily similar to "predict next
| token"!
| ants_everywhere wrote:
| > But ultimately LLMs also in a way are trained for
| survival, since an LLM that fails the tests might not get
| used in future iterations. So for LLMs it is also survival
| that is the primary driver, then there will be the
| subgoals.
|
| I think this is sometimes semi-explicit too. For example,
| this 2017 OpenAI paper on Evolutionary Algorithms [0] was
| pretty influential, and I suspect (although I'm an outsider
| to this field so take it with a grain of salt) that some
| versions of reinforcement learning that scale for aligning
| LLMs borrow some performance tricks from OpenAIs genetic
| approach.
|
| [0] https://openai.com/index/evolution-strategies/
| dgfitz wrote:
| " Determinism, in philosophy, is the idea that all events are
| causally determined by preceding events, leaving no room for
| genuine chance or free will. It suggests that given the state
| of the universe at any one time, and the laws of nature, only
| one outcome is possible."
|
| Clearly computers are deterministic. Are people?
| quotemstr wrote:
| https://www.lesswrong.com/posts/bkr9BozFuh7ytiwbK/my-hour-
| of...
|
| > Clearly computers are deterministic. Are people?
|
| Give an LLM memory and a source of randomness and they're as
| deterministic as people.
|
| "Free will" isn't a concept that typechecks in a materialist
| philosophy. It's "not even wrong". Asserting that free will
| exists is _isomorphic_ to dualism which is _isomorphic_ to
| assertions of ensoulment. I can't argue with dualists. I
| reject dualism a priori: it's a religious tenet, not a mere
| difference of philosophical opinion.
|
| So, if we're all materialists here, "free will" doesn't make
| any sense, since it's an assertion that something other than
| the input to a machine can influence its output.
| dgfitz wrote:
| As long as you realize you're barking up a debate as old as
| time, I respect your opinion.
| mewpmewp2 wrote:
| What I don't get is, why would true randomness give free
| will, shouldn't it be random will then?
| dgfitz wrote:
| In the history of mankind, true randomness has never
| existed.
| bravesoul2 wrote:
| How do you figure?
| bravesoul2 wrote:
| Input/output and the mathematical consistency and
| repeatability of the universe is a religious tenet of
| science. Believing your eyes is still belief.
| ghostofbordiga wrote:
| Some accounts of free will are compatible with materialism.
| On such views "free will" just means the capacity of having
| intentions and make choices based on an internal debate.
| Obviously humans have that capacity.
| photochemsyn wrote:
| This is an interesting question. The common theme between
| computers and people is that information has to be protected,
| and both computer systems and biological systems require
| additional information-protecting components - eq, error
| correcting codes for cosmic ray bitflip detection for the
| one, and DNA mismatch detection enzymes which excise and
| remove damaged bases for the other. In both cases a lot of
| energy is spent defending the critical information from the
| winds of entropy, and if too much damage occurs, the
| carefully constructed illusion of determinancy collapses, and
| the system falls apart.
|
| However, this information protection similarity applies to
| single-celled microbes as much as it does to people, so the
| question also resolves to whether microbes are deterministic.
| Microbes both contain and exist in relatively dynamic
| environments so tiny differences in initial state may lead to
| different outcomes, but they're fairly deterministic, less so
| than (well-designed) computers.
|
| With people, while the neural structures are programmed by
| the cellular DNA, once they are active and energized, the
| informational flow through the human brain isn't that
| deterministic, there are some dozen neurotransmitters
| modulating state as well as huge amounts of sensory data from
| different sources - thus prompting a human repeatedly isn't
| at all like prompting an LLM repeatedly. (The human will
| probably get irritated).
| alganet wrote:
| Yes boss, it's as intelligent as a human, you're smart to invest
| in it and clearly knows about science.
|
| Yes boss, it can reach mars by 2020, you're smart to invest in it
| and clearly knows about space.
|
| Yes boss, it can cure cancer, you're smart to invest in it and
| clearly knows about biology.
| mewpmewp2 wrote:
| My question: how do we know that this is not similar to how human
| brains work. What seems intuitively logical to me is that we have
| brains evolved through evolutionary process via random mutations
| yielding in a structure that has its own evolutionary reward
| based algorithms designing it yielding a structure that at any
| point is trying to predict next actions to maximise
| survival/procreation, of course with a lot of sub goals in
| between, ultimately becoming this very complex machinery, but yet
| should be easily simulated if there was enough compute in theory
| and physical constraints would allow for it.
|
| Because, morals, values, consciousness etc could just be subgoals
| that arised through evolution because they support the main goals
| of survival and procreation.
|
| And if it is baffling to think that a system could rise up, how
| do you think it is possible life and humans came to existence in
| the first place? How could it be possible? It is already happened
| from a far unlikelier and strange place. And wouldn't you think
| the whole World and the timeline in theory couldn't be
| represented as a deterministic function. And if not then why
| should "randomness" or anything else bring life to existence.
| ants_everywhere wrote:
| > My question: how do we know that this is not similar to how
| human brains work.
|
| It is similar to how human brains operate. LLMs are the
| (current) culmination of at least 80 years of research on
| building computational models of the human brain.
| seadan83 wrote:
| > It is similar to how human brains operate.
|
| Is it? Do we know how human brains operate? We know the basic
| architecture of them, so we have a map, but we don't know the
| details.
|
| "The cellular biology of brains is relatively well-
| understood, but neuroscientists have not yet generated a
| theory explaining how brains work. Explanations of how
| neurons collectively operate to produce what brains can do
| are tentative and incomplete." [1]
|
| "Despite a century of anatomical, physiological, and
| molecular biological efforts scientists do not know how
| neurons by their collective interactions produce percepts,
| thoughts, memories, and behavior. Scientists do not know and
| have no theories explaining how brains and central nervous
| systems work." [1]
|
| [1] https://pmc.ncbi.nlm.nih.gov/articles/PMC10585277/
| Timwi wrote:
| > > It is similar to how human brains operate.
|
| > Is it?
|
| This is just a semantic debate on what counts as "similar".
| It's possible to disagree on this point despite agreeing on
| everything relating to how LLMs and human brains work.
| ants_everywhere wrote:
| The part I was referring to is captured in
|
| "The cellular biology of brains is relatively well-
| understood"
|
| Fundamentally, brains are not doing something different in
| kind from ANNs. They're basically layers of neural networks
| stacked together in certain ways.
|
| What we don't know are things like (1) how exactly are the
| layers stacked together, (2) how are the sensors (like
| photo receptors, auditory receptors, etc) hooked up?, (3)
| how do the different parts of the brain interact?, (4) for
| that matter what do the different parts of the brain
| actually do?, (5) how do chemical signals like
| neurotransmitters convey information or behavior?
|
| In the analogy between brains and artificial neural
| networks, these sorts of questions might be of huge
| importance to people building AI systems, but they'd be of
| only minor importance to users of AI systems. OpenAI and
| Google can change details about how their various
| transformer layers and ANN layers are connected. The result
| may be improved products, but they won't be doing anything
| different from what AIs are doing now in terms the author
| of this article is concerned about.
| suddenlybananas wrote:
| ANNs don't have action potentials, let alone
| neurotransmitters.
| suddenlybananas wrote:
| It really is not. ANNs bear only a passing resemblance to how
| neurons work.
| cmiles74 wrote:
| Maybe the important thing is that we don't imbue the machine
| with feelings or morals or motivation: it has none.
| mewpmewp2 wrote:
| If we developed feelings, morals and motivation due to them
| being good subgoals for primary goals, survival and
| procreation why couldn't other systems do that. You don't
| have to call them the same word or the same thing, but
| feeling is a signal that motivates a behaviour in us, that in
| part has developed from generational evolution and in other
| part by experiences in life. There was a random mutation that
| made someone develop a fear signal on seeing a predator and
| increased the survival chances, then due to that the mutation
| became widespread. Similarly a feeling in a machine could be
| a signal it developed that goes through a certain pathway to
| yield in a certain outcome.
| Timwi wrote:
| The real challenge is not to see it as a binary (the
| machine either has feelings or it has none). It's possible
| for the machine to have emergent processes or properties
| that resemble human feelings in their function and their
| complexity, but are otherwise nothing like them (structured
| very differently and work on completely different
| principles). It's possible to have a machine or algorithm
| so complex that the question of whether it has feelings is
| just a semantic debate on what you mean by "feelings" and
| where you draw the line.
|
| A lot of the people who say "machines will never have
| feelings" are confident in that statement because they draw
| the line incredibly narrowly: if it ain't human, it ain't
| feeling. This seems to me putting the cart before the
| horse. It ain't feeling because you defined it so.
| bbarn wrote:
| I think it's just an unfair comparison in general. The power of
| the LLM is the zero risk to failure, and lack of consequence
| when it does. Just try again, using a different prompt, retrain
| maybe, etc.
|
| Humans make a bad choice, it can end said human's life. The
| worst choice a LLM makes just gets told "no, do it again, let
| me make it easier"
| mewpmewp2 wrote:
| But an LLM model could perform poorly in tests that it is not
| considered and essentially means "death" for it. But begs the
| question at which scope should we consider an LLM to be
| similar to identity of a single human. Are you the same you
| as you were few minutes back or 10 years back? Is LLM the
| same LLM it is after it has been trained for further 10
| hours, what if the weights are copy pasted endlessly, what if
| we as humans were to be cloned instantly? What if you were
| teleported from location A to B instantly, being put together
| from other atoms from elsewhere?
|
| Ultimately this matters from evolutionary evolvement and
| survival of the fittest idea, but it makes the question of
| "identity" very complex. But death will matter because this
| signals what traits are more likely to keep going into new
| generations, for both humans and LLMs.
|
| Death, essentially for an LLM would be when people stop using
| it in favour of some other LLM performing better.
| latexr wrote:
| > how do we know that this is not similar to how human brains
| work.
|
| Do you forget every conversation as soon as you have them? When
| speaking to another person, do they need to repeat literally
| everything they said and that you said, in order, for you to
| retain context?
|
| If not, your brain does not work like an LLM. If yes, please
| stop what you're doing right now and call a doctor with this
| knowledge. I hope Memento (2000) was part of your training
| data, you're going to need it.
| mewpmewp2 wrote:
| Knowledge of every conversation must be some form of state in
| our minds, just like for LLMs it could be something retrieved
| from a database, no? I don't think information storing or
| retrieval is necessarily the most important achievements here
| in the first place. It's the emergent abilities that you
| wouldn't have expected to occur.
| tptacek wrote:
| I agree with Halvar about all of this, but would want to call out
| that his "matmul interleaved with nonlinearities" is reductive
| --- a frontier model is a higher-order thing that that, a network
| of those matmul+nonlinearity chains, iterated.
| wetpaws wrote:
| How to write a long article and not say anything of substance.
| ants_everywhere wrote:
| > I am baffled that the AI discussions seem to never move away
| from treating a function to generate sequences of words as
| something that resembles a human.
|
| This is such a bizarre take.
|
| The relation associating each human to the list of all words they
| will ever say is obviously a function.
|
| > almost magical human-like powers to something that - in my mind
| - is just MatMul with interspersed nonlinearities.
|
| There's a rich family of universal approximation theorems [0].
| Combining layers of linear maps with nonlinear cutoffs can
| intuitively approximate any nonlinear function in ways that can
| be made rigorous.
|
| The reason LLMs are big now is that transformers and large
| amounts of data made it economical to compute a family of
| reasonably good approximations.
|
| > The following is uncomfortably philosophical, but: In my
| worldview, humans are dramatically different things than a
| function . For hundreds of millions of years, nature generated
| new versions, and only a small number of these versions survived.
|
| This is just a way of generating certain kinds of functions.
|
| Think of it this way: do you believe there's anything about
| humans that exists outside the mathematical laws of physics? If
| so that's essentially a religious position (or more literally, a
| belief in the supernatural). If not, then functions and
| approximations to functions are what the human experience boils
| down to.
|
| [0]
| https://en.wikipedia.org/wiki/Universal_approximation_theore...
| LeifCarrotson wrote:
| > I am baffled that the AI discussions seem to never move away
| from treating a function to generate sequences of words as
| something that resembles a human.
|
| You appear to be disagreeing with the author and others who
| suggest that there's some element of human consciousness that's
| beyond than what's observable from the outside, whether due to
| religion or philosophy or whatever, and suggesting that they
| just _not do that._
|
| In my experience, that's not a particularly effective tactic.
|
| Rather, we can make progress by assuming their predicate: Sure,
| it's a room that translates Chinese into English without
| understanding, yes, it's a function that generates sequences of
| words that's not a human... but you and I are not "it" and it
| behaves rather an awful lot like a thing that understands
| Chinese or like a human using words. If we simply
| anthropomorphize the thing, acknowledging that this is
| technically incorrect, we can get a lot closer to predicting
| the behavior of the system and making effective use of it.
|
| Conversely, when speaking with such a person about the nature
| of humans, we'll have to agree to dismiss the elements that are
| different from a function. The author says:
|
| > In my worldview, humans are dramatically different things
| than a function... In contrast to an LLM, given a human and a
| sequence of words, I cannot begin putting a probability on
| "will this human generate this sequence".
|
| Sure you can! If you address an American crowd of a certain age
| range with "We've got to hold on to what we've got. It doesn't
| make a difference if..." I'd give a very high probability that
| someone will answer "... we make it or not". Maybe that human
| has a unique understanding of the nature of that particular
| piece of pop culture artwork, maybe it makes them feel things
| that an LLM cannot feel in a part of their consciousness that
| an LLM does not possess. But for the purposes of the question,
| we're merely concerned with whether a human or LLM will
| generate a particular sequence of words.
| ants_everywhere wrote:
| I see your point, and I like that you're thinking about this
| from the perspective of how to win hearts and minds.
|
| I agree my approach is unlikely to win over the author or
| other skeptics. But after years of seeing scientists waste
| time trying to debate creationists and climate deniers I've
| kind of given up on trying to convince the skeptics. So I was
| talking more to HN in general.
|
| > You appear to be disagreeing with the author and others who
| suggest that there's some element of human consciousness
| that's beyond than what's observable from the outside
|
| I'm not sure what it means to be observable or not from the
| outside. I think this is at least partially because I don't
| know what it means to be inside either. My point was just
| that whatever consciousness is, it takes place in the
| physical world and the laws of physics apply to it. I mean
| that to be as weak a claim as possible: I'm not taking any
| position on what consciousness is or how it works etc.
|
| Searle's Chinese room argument attacks attacks a particular
| theory about the mind based essentially turing machines or
| digital computers. This theory was popular when I was in grad
| school for psychology. Among other things, people holding the
| view that Searle was attacking didn't believe that non-
| symbolic computers like neural networks could be intelligent
| or even learn language. I thought this was total nonsense, so
| I side with Searle in my opposition to it. I'm not sure how I
| feel about the Chinese room argument in particular, though.
| For one thing it entirely depends on what it means to
| "understand" something, and I'm skeptical that humans ever
| "understand" anything.
|
| > If we simply anthropomorphize the thing, acknowledging that
| this is technically incorrect, we can get a lot closer to
| predicting the behavior of the system and making effective
| use of it.
|
| I see what you're saying: that a technically incorrect
| assumption can bring to bear tools that improve our analysis.
| My nitpick here is I agree with OP that we shouldn't
| anthropomorphize LLMs, any more than we should
| anthropomorphize dogs or cats. But OP's arguments weren't
| actually about anthropomorphizing IMO, they were about things
| like functions that are more fundamental than humans. I think
| artificial intelligence will be non-human intelligence just
| like we have many examples of non-human intelligence in
| animals. No attribution of human characteristics needed.
|
| > If we simply anthropomorphize the thing, acknowledging that
| this is technically incorrect, we can get a lot closer to
| predicting the behavior of the system and making effective
| use of it.
|
| Yes I agree with you about your lyrics example. But again
| here I think OP is incorrect to focus on the token generation
| argument. We all agree human speech generates tokens.
| Hopefully we all agree that token generation is not
| completely predictable. Therefore it's by definition a
| randomized algorithm and it needs to take an RNG. So pointing
| out that it takes an RNG is not a valid criticism of LLMs.
|
| Unless one is a super-determinist then there's randomness at
| the most basic level of physics. And you should expect that
| any physical process we don't understand well yet (like
| consciousness or speech) likely involves randomness. If one
| *is* a super-determinist then there is no randomness, even in
| LLMs and so the whole point is moot.
| seadan83 wrote:
| >> given a human and a sequence of words, I cannot begin
| putting a probability on "will this human generate this
| sequence".
|
| > Sure you can! If you address an American crowd of a certain
| age range with "We've got to hold on to what we've got. It
| doesn't make a difference if..." I'd give a very high
| probability that someone will answer "... we make it or not".
|
| I think you may have this flipped compared to what the author
| intended. I believe the author is not talking about the
| probability of an output given an input, but the probability
| of a given output across all inputs.
|
| Note that the paragraph starts with "In my worldview, humans
| are dramatically different things than a function, (R^n)^c ->
| (R^n)^c". To compute a probability of a given output, (which
| is a any given element in "(R^n)^n"), we can count how many
| mappings there are total and then how many of those mappings
| yield the given element.
|
| The point I believe is to illustrate the complexity of inputs
| for humans. Namely for humans the input space is even more
| complex than "(R^n)^c".
|
| In your example, we can compute how many input phrases into a
| LLM would produce the output "make it or not". We can than
| compute that ratio to all possible input phrases. Because
| "(R^n)^c)" is finite and countable, we can compute this
| probability.
|
| For a human, how do you even start to assess the probability
| that a human would ever say "make it or not?" How do you even
| begin to define the inputs that a human uses, let alone
| enumerate them? Per the author, "We understand essentially
| nothing about it." In other words, the way humans create
| their outputs is (currently) incomparably complex compared to
| a LLM, hence the critique of the anthropomorphization.
| cuttothechase wrote:
| >Think of it this way: do you believe there's anything about
| humans that exists outside the mathematical laws of physics? If
| so that's essentially a religious position (or more literally,
| a belief in the supernatural). If not, then functions and
| approximations to functions are what the human experience boils
| down to.
|
| It seems like, we can at best, claim that we have modeled the
| human thought process for reasoning/analytic/quantitative
| through Linear Algebra, as the best case. Why should we expect
| the model to be anything more than a _model_ ?
|
| I understand that there is tons of vested interest, many
| industries, careers and lives literally on the line causing
| heavy bias to get to AGI. But what I don't understand is what
| about linear algebra that makes it so special that it creates a
| fully functioning life or aspects of a life?
|
| Should we make an argument saying that Schroedinger's cat
| experiment can potentially create zombies then the underlying
| Applied probabilistic solutions should be treated as super-
| human and build guardrails against it building zombie cats?
| ants_everywhere wrote:
| > It seems like, we can at best, claim that we have modeled
| the human thought process for reasoning/analytic/quantitative
| through Linear Algebra....I don't understand is what about
| linear algebra that makes it so special that it creates a
| fully functioning life or aspects of a life?
|
| Not linear algebra. Artificial neural networks create
| arbitrarily non-linear functions. That's the point of non-
| linear activation functions and it's the subject of the
| universal approximation theorems I mentioned above.
| cuttothechase wrote:
| ANNs are just mathematical transformations, powered by
| linear algebra + non-linear functions. They simulate
| certain cognitive processes -- but they are fundamentally
| math, not magic.
| delusional wrote:
| I wouldn't say they "simulate cognitive processes". They
| do statistics. Advanced multivariadic statistics.
|
| An LLM thinks in the same way excel thinks when you ask
| it to fit a curve.
| ImHereToVote wrote:
| Who invoked magic in this thread exactly?
| ants_everywhere wrote:
| I think the point of mine that you're missing (or perhaps
| disagreeing with implicitly) is that *everything* is
| fundamentally math. Or, if you like, everything is
| fundamentally physics, and physics is fundamentally math.
|
| So classes of functions (ANNs) that can approximate our
| desired function to arbitrary precision are what we
| should be expecting to be working with.
| hackinthebochs wrote:
| >Why should we expect the model to be anything more than a
| model ?
|
| To model a process with perfect accuracy requires recovering
| the dynamics of that process. The question we must ask is
| what happens in the space between bad statistical model and
| perfect accuracy? What happens when the model begins to
| converge towards accurate reproduction. How far does
| generalization in the model take us towards capturing the
| dynamics involved in thought?
| xtal_freq wrote:
| Not that this is your main point, but I find this take
| representative, "do you believe there's anything about humans
| that exists outside the mathematical laws of physics?"There are
| things "about humans", or at least things that our words
| denote, that are outside physic's explanatory scope. For
| example, the experience of the colour red cannot be known, as
| an experience, by a person who only sees black and white. This
| is the case no matter what empirical propositions, or
| explanatory system, they understand.
| concats wrote:
| Perhaps. But I can't see a reason why they couldn't still
| write endless--and theoretically valuable--poems,
| dissertations, or blog posts, about all things red and the
| nature of redness itself. I imagine it would certainly take
| some studying for them, likely interviewing red-seers, or
| reading books about all things red. But I'm sure they could
| contribute to the larger red discourse eventually, their
| unique perspective might even help them draw conclusions the
| rest of us are blind to.
|
| So perhaps the fact that they "cannot know red" is ultimately
| irrelevant for an LLM too?
| ants_everywhere wrote:
| This idea is called qualia [0] for those unfamiliar.
|
| I don't have any opinion on the qualia debates honestly. I
| suppose I don't know what it feels like for an ant to find a
| tasty bit of sugar syrup, but I believe it's something that
| can be described with physics (and by extension, things like
| chemistry).
|
| But we do know some things about some qualia. Like we know
| how red light works, we have a good idea about how
| photoreceptors work, etc. We know some people are red-green
| colorblind, so their experience of red and green are mushed
| together. We can also have people make qualia judgments and
| watch their brains with fMRI or other tools.
|
| I think maybe an interesting question here is: obviously it's
| pleasurable to animals to have their reward centers
| activated. Is it pleasurable or desirable for AIs to be
| rewarded? Especially if we tell them (as some prompters do)
| that they feel pleasure if they do things well and pain if
| they don't? You can ask this sort of question for both the
| current generation of AIs and future generations.
|
| [0] https://en.wikipedia.org/wiki/Qualia
| suddenlybananas wrote:
| >There's a rich family of universal approximation theorems
|
| Wow, look-up tables can get increasingly good at approximating
| a function!
| ants_everywhere wrote:
| A function is by definition a lookup table.
|
| The lookup table is just (x, f(x)).
|
| So, yes, trivially if you could construct the lookup table
| for f then you'd approximate f. But to construct it you have
| to know f. And to approximate it you need to know f at a
| dense set of points.
| low_tech_punk wrote:
| The anthropomorphic view of LLM is a much better representation
| and compression for most types of discussions and communication.
| A purely mathematical view is accurate but it isn't productive
| for the purpose of the general public's discourse.
|
| I'm thinking a legal systems analogy, at the risk of a lossy
| domain transfer: the laws are not written as lambda calculus.
| Why?
|
| And generalizing to social science and humanities, the goal
| shouldn't be finding the quantitative truth, but instead
| understand the social phenomenon using a consensual "language" as
| determined by the society. And in that case, the anthropomorphic
| description of the LLM may gain validity and effectiveness as the
| adoption grows over time.
| cmiles74 wrote:
| Strong disagree here, the average person coming away with ideas
| that only vaguely intersect with the reality.
| andyferris wrote:
| I've personally described the "stochastic parrot" model to
| laypeople who were worried about AI and they came away much
| more relaxed about it doing something "malicious". They seemed
| to understand the difference between "trained at roleplay" and
| "consciousness".
|
| I don't think we need to simplify it to the point of
| considering it sentient to get the public to interact with it
| successfully. It causes way more problems than it solves.
| SpicyLemonZest wrote:
| Am I misunderstanding what you mean by "malicious"? It sounds
| like the stochastic parrot model wrongly convinced these
| laypeople you were talking to that they don't need to worry
| about LLMs doing bad things. That's definitely been my
| experience - the people who tell me the most about stochastic
| parrots are the same ones who tell me that it's absurd to
| worry about AI-powered disinformation or AI-powered scams.
| Kim_Bruning wrote:
| Has anyone asked an actual Ethologist or Neurophysiologist what
| _they_ think?
|
| People keep debating like the only two options are "it's a
| machine" or "it's a human being", while in fact the majority of
| intelligent entities on earth are neither.
| szvsw wrote:
| Yeah, I think I'm with you if you ultimately mean to say
| something like this:
|
| "the labels are meaningless... we just have collections of
| complex systems that demonstrate various behaviors and
| properties, some in common with other systems, some behaviors
| that are unique to that system, sometimes through common
| mechanistic explanations with other systems, sometimes through
| wildly different mechanistic explanations, but regardless they
| seem to demonstrate x/y/z, and it's useful to ask, why, how,
| and what the implications are of it appearing to demonstrating
| those properties, with both an eye towards viewing it
| independently of its mechanism and in light of its mechanism."
| seadan83 wrote:
| FWIW, in another part of this thread I quoted a paper that
| summed up what Neurophysiologists think:
|
| > Author's note: Despite a century of anatomical,
| physiological, and molecular biological efforts scientists do
| not know how neurons by their collective interactions produce
| percepts, thoughts, memories, and behavior. Scientists do not
| know and have no theories explaining how brains and central
| nervous systems work. [1]
|
| That lack of understanding I believe is a major part of the
| author's point.
|
| [1] "How far neuroscience is from understanding brains" -
| https://pmc.ncbi.nlm.nih.gov/articles/PMC10585277/#abstract1
| kazinator wrote:
| > _LLMs solve a large number of problems that could previously
| not be solved algorithmically. NLP (as the field was a few years
| ago) has largely been solved._
|
| That is utter bullshit.
|
| It's not solved until you specify exactly what is being solved
| and show that the solution implements what is specified.
| djoldman wrote:
| Let's skip to the punchline. Using TFA's analogy: essentially
| folks are saying not that this is a set of dice rolling around
| making words. It's a set of dice rolling around where someone
| attaches those dice to the real world where if the dice land on
| 21, the system kills a chicken, or a lot worse.
|
| Yes it's just a word generator. But then folks attach the word
| generator to tools where it can invoke the use of tools by saying
| the tool name.
|
| So if the LLM says "I'll do some bash" then it does some bash.
| It's explicitly linked to program execution that, if it's set up
| correctly, can physically affect the world.
| 3cats-in-a-coat wrote:
| Given our entire civilization is built on words, all of it,
| it's shocking how poorly most of us understand their importance
| and power.
| degun wrote:
| This was the same idea that crossed my mind while reading the
| article. It seems far too naive to think that because LLMs have
| no will of their own, there will be no harmful consequences on
| the real world. This is exactly where ethics comes to play.
| coolKid721 wrote:
| Anthropomorphizing LLMs is just because half the stock market
| gains are dependent on it, we have absurd levels of debt we will
| either have to have insane growth out of or default, and every
| company and "person" is trying to hype everyone up to get access
| to all of this liquidity being thrown into it.
|
| I agree with the author, but people acting like they are
| conscious or humans isn't weird to me, it's just fraud and liars.
| Most people basically have 0 understanding of what technology or
| minds are philosophically so it's an easy sale, and I do think
| most of these fraudsters also likely buy into it themselves
| because of that.
|
| The really sad thing is people think "because someone runs an ai
| company" they are somehow an authority on philosophy of mind
| which lets them fall for this marketing. The stuff these people
| say about this stuff is absolute garbage, not that I disagree
| with them, but it betrays a total lack of curiosity or interest
| in the subject of what llms are, and the possible impacts of
| technological shifts as those that might occur with llms becoming
| more widespread. It's not a matter of agreement it's a matter of
| them simply not seeming to be aware of the most basic ideas of
| what things are, technology is, it's manner of impacting society
| etc.
|
| I'm not surprised by that though, it's absurd to think because
| someone runs some AI lab or has a "head of safety/ethics" or
| whatever garbage job title at an AI lab they actually have even
| the slightest interest in ethics or any even basic familiarity
| with the major works in the subject.
|
| The author is correct if people want to read a standard essay
| articulating it more in depth check out
| https://philosophy.as.uky.edu/sites/default/files/Is%20the%2...
| (the full extrapolation requires establishing what things are and
| how causality in general operates and how that relates to
| artifacts/technology but that's obvious quite a bit to get into).
|
| The other note would be something sharing an external trait means
| absolutely nothing about causality and suggesting a thing is
| caused by the same thing "even to a way lesser degree" because
| they share a resemblance is just a non-sequitur. It's not a
| serious thought/argument.
|
| I think I addressed the why of why this weirdness comes up
| though. The entire economy is basically dependent on huge
| productivity growth to keep functioning so everyone is trying to
| sell they can offer that and AI is the clearest route, AGI most
| of all.
| TheDudeMan wrote:
| If "LLMs" includes reasoning models, then you're already wrong in
| your first paragraph:
|
| "something that is just MatMul with interspersed nonlinearities."
| Culonavirus wrote:
| > A fair number of current AI luminaries have self-selected by
| their belief that they might be the ones getting to AGI
|
| People in the industry, especially higher up, are making absolute
| bank, and it's their job to say that they're "a few years away"
| from AGI, regardless of if they actually believe it or not. If
| everyone was like "yep, we're gonna squeeze maybe 10-15% more
| benchie juice out of this good ole transformer thingy and then
| we'll have to come up with something else", I don't think that
| would go very well with investors/shareholders...
| fenomas wrote:
| > The moment that people ascribe properties such as
| "consciousness" or "ethics" or "values" or "morals" to these
| learnt mappings is where I tend to get lost.
|
| TFA really ought to have linked to some concrete examples of what
| it's disagreeing with - when I see arguments about this in
| practice, it's usually just people talking past each other.
|
| Like, person A says "the model wants to X, but it knows Y is
| wrong, so it prefers Z", or such. And person B interprets that as
| ascribing consciousness or values to the model, when the speaker
| meant it no differently from saying "water wants to go downhill"
| - i.e. a way of describing externally visible behaviors, but
| without saying "behaves as if.." over and over.
|
| And then in practice, an unproductive argument usually follows -
| where B is thinking "I am going to Educate this poor fool about
| the Theory of Mind", and A is thinking "I'm trying to talk about
| submarines; why is this guy trying to get me to argue about
| whether they swim?"
| fastball wrote:
| "Don't anthropomorphize token predictors" is a reasonable take
| assuming you have demonstrated that humans are _not_ in fact just
| SOTA token predictors. But AFAIK that hasn 't been demonstrated.
|
| Until we have a much more sophisticated understanding of human
| intelligence and consciousness, any claim of "these aren't like
| us" is either premature or spurious.
| krackers wrote:
| Every time this discussion comes up, I'm reminded of this
| tongue-in-cheek paper.
|
| https://ai.vixra.org/pdf/2506.0065v1.pdf
| lostmsu wrote:
| I expected to find the link to
| https://arxiv.org/abs/1703.10987 (which is much better imo)
| Veedrac wrote:
| The author plot the input/output on a graph, intuited (largely
| incorrectly, because that's not how sufficiently large state
| spaces look) that the output was vaguely pretty, and then... I
| mean that's it, they just said they have a plot of the space it
| operates on therefore it's silly to ascribe interesting features
| to the way it works.
|
| And look, it's fine, they prefer words of a certain valence,
| particularly ones with the right negative connotations, I prefer
| other words with other valences. None of this means the concerns
| don't matter. Natural selection on human pathogens isn't anything
| particularly like human intelligence and it's still very
| effective at selecting outcomes that we don't want against our
| attempts to change that, as an incidental outcome of its
| optimization pressures. I think it's very important we don't
| build highly capable systems that select for outcomes we don't
| want and will do so against our attempts to change it.
| BrenBarn wrote:
| > In contrast to an LLM, given a human and a sequence of words, I
| cannot begin putting a probability on "will this human generate
| this sequence".
|
| I think that's a bit pessimistic. I think we can say for instance
| that the probability that a person will say "the the the of of of
| arpeggio halcyon" is tiny compared to the probability that they
| will say "I haven't been getting that much sleep lately". And we
| can similarly see that lots of other sequences are going to have
| infinitesimally low probability. Now, yeah, we can't say exactly
| what probability that is, but even just using a fairly sizable
| corpus as a baseline you could probably get a surprisingly decent
| estimate, given how much of what people say is formulaic.
|
| The real difference seems to be that the manner in which humans
| generate sequences is more intertwined with other aspects of
| reality. For instance, the probability of a certain human saying
| "I haven't been getting that much sleep lately" is connected to
| how much sleep they have been getting lately. For an LLM it
| really isn't connected to anything except word sequences in its
| input.
|
| I think this is consistent with the author's point that we
| shouldn't apply concepts like ethics or emotions to LLMs. But
| it's not because we don't know how to predict what sequences of
| words humans will use; it's rather because we _do_ know a little
| about how to do that, and part of what we know is that it is
| connected with other dimensions of physical reality, "human
| nature", etc.
|
| This is one reason I think people underestimate the risks of AI:
| the performance of LLMs lulls us into a sense that they "respond
| like humans", but in fact the Venn diagram of human and LLM
| behavior only intersects in a relatively small area, and in
| particular they have very different failure modes.
| elliotto wrote:
| To claim that LLMs do not experience consciousness requires a
| model of how consciousness works. The author has not presented a
| model, and instead relied on emotive language leaning on the
| absurdity of the claim. I would say that any model one presents
| of consciousness often comes off as just as absurd as the claim
| that LLMs experience it. It's a great exercise to sit down and
| write out your own perspective on how consciousness works, to
| feel out where the holes are.
|
| The author also claims that a function (R^n)^c -> (R^n)^c is
| dramatically different to the human experience of consciousness.
| Yet the author's text I am reading, and any information they can
| communicate to me, exists entirely in (R^n)^c.
| shevis wrote:
| > requires a model of how consciousness works.
|
| Not necessarily an entire model, just a single defining
| characteristic that can serve as a falsifying example.
|
| > any information they can communicate to me, exists entirely
| in (R^n)^c
|
| Also no. This is just a result of the digital medium we are
| currently communicating over. Merely standing in the same room
| as them would communicate information outside (R^n)^c.
| seadan83 wrote:
| I believe the author is rather drawing this distinction:
|
| LLMs: (R^n)^c -> (R^n)^c
|
| Humans: [set of potentially many and complicated inputs that we
| effectively do not understand at all] -> (R^n)^c
|
| The point is that the model of how consciousness works is
| unknown. Thus the author would not present such a model, it is
| the point.
| quonn wrote:
| > To claim that LLMs do not experience consciousness requires a
| model of how consciousness works.
|
| Nope. What can be asserted without evidence can also be
| dismissed without evidence. Hitchens's razor.
|
| You know you have consciousness (by the very definition that
| you can observe it in yourself) and that's evidence. Because
| other humans are genetically and in every other way identical,
| you can infer it for them as well. Because mammals are very
| similar many people (but not everyone) infers it for them as
| well. There is zero evidence for LLMs and their _very_
| construction suggests that they are like a calculator or like
| Excel or like any other piece of software no matter how smart
| they may be or how many tasks they can do in the future.
|
| Additionally I am really surprised by how many people here
| confuse consciousness with intelligence. Have you never paused
| for a second in your life to "just be". Done any meditation? Or
| even just existed at least for a few seconds without a train of
| thought? It is very obvious that language and consciousness are
| completely unrelated and there is no need for language and I
| doubt there is even a need for intelligence to be conscious.
|
| Consider this:
|
| In the end an LLM could be executed (slowly) on a CPU that
| accepts very basic _discrete_ instructions, such as ADD and
| MOV. We know this for a fact. Those instructions can be
| executed arbitrarily slowly. There is no reason whatsoever to
| suppose that it should feel like anything to be the CPU to say
| nothing of how it would subjectively feel to be a MOV
| instruction. It's ridiculous. It's unscientific. It's like
| believing that there's a spirit in the tree you see outside,
| just because - why not? - why wouldn't there be a spirit in the
| tree?
| tdullien wrote:
| Author here. What's the difference, in your perception, between
| an LLM and a large-scale meteorological simulation, if there is
| any?
|
| If you're willing to ascribe the possibility of consciousness
| to any complex-enough computation of a recurrence equation (and
| hence to something like ... "earth"), I'm willing to agree that
| under that definition LLMs might be conscious. :)
| kelseyfrog wrote:
| Dear author, you can just assume that people are
| fauxthropomorphizing LLMs without any loss of generality. Perhaps
| it will allow you to sleep better at night. You're welcome.
| rockskon wrote:
| The people in this thread incredulous at the assertion that they
| are not God and haven't invented machine life are exasperating.
| At this point I am convinced they, more often than not,
| financially benefit from their near religious position in
| marketing AI as akin to human intelligence.
| refulgentis wrote:
| I am ready and waiting for you to share these comments that are
| incredulous at the assertion they are not God, lol.
| orbital-decay wrote:
| Are we looking at the same thread? I see nobody claiming this.
| Anthropic does sometimes, their position is clearly wishful
| thinking, and it's not represented ITT.
|
| Try looking at this from another perspective - many people
| simply do not see human intelligence (or life, for that matter)
| as magic. I see nothing religious about that, rather the
| opposite.
| seadan83 wrote:
| I agree with you @orbital-decay that I also do not get the
| same vibe reading this thread.
|
| Though, while human intelligence is (seemingly) not magic, it
| is very far from being understood. The idea that a LLM is
| comparable to human intelligence implies that we even
| understand human intelligence well enough to say that.
| ImHereToVote wrote:
| LLMs are also not understood. I mean we built and trained
| them. But don't of the abilities at still surprising to
| researchers. We have yet to map these machines.
| zxcb1 wrote:
| LLMs are complex irreducible systems; hence there are emergent
| properties that arise at different scales
| dr_dshiv wrote:
| Which is a more useful mental model for the user?
|
| 1. It's a neural network predicting the next token
|
| 2. It's like a person
|
| 3. It's like a magical genie
|
| I lean towards 3.
| Al-Khwarizmi wrote:
| I have the technical knowledge to know how LLMs work, but I still
| find it pointless to _not_ anthropomorphize, at least to an
| extent.
|
| The language of "generator that stochastically produces the next
| word" is just not very useful when you're talking about, e.g., an
| LLM that is answering complex world modeling questions or
| generating a creative story. It's at the wrong level of
| abstraction, just as if you were discussing an UI events API and
| you were talking about zeros and ones, or voltages in
| transistors. Technically fine but totally useless to reach any
| conclusion about the high-level system.
|
| We need a higher abstraction level to talk about higher level
| phenomena in LLMs as well, and the problem is that we have no
| idea what happens internally at those higher abstraction levels.
| So, considering that LLMs somehow imitate humans (at least in
| terms of output), anthropomorphization is the best abstraction we
| have, hence people naturally resort to it when discussing what
| LLMs can do.
| grey-area wrote:
| On the contrary, anthropomorphism IMO is the main problem with
| narratives around LLMs - people are genuinely talking about
| them thinking and reasoning when they are doing nothing of that
| sort (actively encouraged by the companies selling them) and it
| is completely distorting discussions on their use and
| perceptions of their utility.
| cmenge wrote:
| I kinda agree with both of you. It might be a required
| abstraction, but it's a leaky one.
|
| Long before LLMs, I would talk about classes / functions /
| modules like "it then does this, decides the epsilon is too
| low, chops it up and adds it to the list".
|
| The difference I guess it was only to a technical crowd and
| nobody would mistake this for anything it wasn't. Everybody
| know that "it" didn't "decide" anything.
|
| With AI being so mainstream and the math being much more
| elusive than a simple if..then I guess it's just too easy to
| take this simple speaking convention at face value.
|
| EDIT: some clarifications / wording
| flir wrote:
| Agreeing with you, this is a "can a submarine swim" problem
| IMO. We need a new word for what LLMs are doing. Calling it
| "thinking" is stretching the word to breaking point, but
| "selecting the next word based on a complex statistical
| model" doesn't begin to capture what they're capable of.
|
| Maybe it's cog-nition (emphasis on the cog).
| whilenot-dev wrote:
| "predirence" -> prediction meets inference and it sounds
| a bit like preference
| psychoslave wrote:
| Except -ence is a regular morph, and you would rather
| suffix it to predict(at)-.
|
| And prediction is already an hyponym of inference. Why
| not just use inference then?
| whilenot-dev wrote:
| I didn't think of _prediction_ in the statistical sense
| here, but rather as a prophecy based on a vision,
| something that is inherently stored in a model without
| the knowledge of the modelers. I don 't want to imply any
| magic or something supernatural here, it's just the juice
| that goes off the rails sometimes, and it gets overlooked
| due to the sheer quantity of the weights. Something like
| unknown bugs in production, but, because they still just
| represent a valid number in some computation that
| wouldn't cause any panic, these few bits can show a
| useful pattern under the right circumstances.
|
| _Inference_ would be the part that is deliberately
| learned and drawn from conclusions based on the training
| set, like in the "classic" sense of statistical
| learning.
| LeonardoTolstoy wrote:
| What does a submarine do? Submarine? I suppose you
| "drive" a submarine which is getting to the idea:
| submarines don't swim because ultimately they are
| "driven"? I guess the issue is we don't make up a new
| word for what submarines do, we just don't use human
| words.
|
| I think the above poster gets a little distracted by
| suggesting the models are creative which itself is
| disputed. Perhaps a better term, like above, would be to
| just use "model". They are models after all. We don't
| make up a new portmanteau for submarines. They float, or
| drive, or submarine around.
|
| So maybe an LLM doesn't "write" a poem, but instead
| "models a poem" which maybe indeed take away a little of
| the sketchy magic and fake humanness they tend to be
| imbued with.
| FeepingCreature wrote:
| Humans certainly model inputs. This is just using an
| awkward word and then making a point that it feels
| awkward.
| flir wrote:
| I really like that, I think it has the right amount of
| distance. They don't write, they model writing.
|
| We're very used to "all models are wrong, some are
| useful", "the map is not the territory", etc.
| galangalalgol wrote:
| No one was as bothered when we anthropomorphized crud
| apps simply for the purpose of conversing about "them".
| "Ack! The thing is corrupting tables again because it
| thinks we are still using api v3! Who approved that last
| MR?!" The fact that people are bothered by the same
| language now is indicative in itself. If you want to
| maintain distance, pre prompt models to structure all
| conversations to lack pronouns as between a non sentient
| language model and a non sentient agi. You can have the
| model call you out for referring to the model as
| existing. The language style that forces is interesting,
| and potentially more productive except that there are
| fewer conversations formed like that in the training
| dataset. Translation being a core function of language
| models makes it less important thought. As for confusing
| the map for the territory, that is precisely what
| philosophers like Metzinger say humans are doing by
| considering "self" to be a real thing and that they are
| conscious when they are just using the reasoning shortcut
| of narrating the meta model to be the model.
| flir wrote:
| > You can have the model call you out for referring to
| the model as existing.
|
| This tickled me. "There ain't nobody here but us
| chickens".
|
| I have other thoughts which are not quite crystalized,
| but I think UX might be having an outsized effect here.
| galangalalgol wrote:
| In addition to he/she etc. there is a need for a button
| for no pronouns. "Stop confusing metacognition for
| conscious experience or qualia!" doesn't fit well. The UX
| for these models is extremely malleable. The responses
| are misleading mostly to the extent the prompts were
| already misled. The sorts of responses that arise from
| ignorant prompts are those found within the training data
| in the context of ignorant questions. This tends to make
| them ignorant as well. There are absolutely stupid
| questions.
| irthomasthomas wrote:
| Depends on if you are talking _about_ an llm or _to_ the
| llm. Talking _to_ the llm, it would not understand that
| "model a poem" means to write a poem. Well, it will
| probably guess right in this case, but if you go out of
| band too much it won't understand you. The hard problem
| today is rewriting out of band tasks to be in band, and
| that requires anthropomorphizing.
| dcookie wrote:
| > it won't understand you
|
| Oops.
| irthomasthomas wrote:
| That's consistent with my distinction when talking
| _about_ them vs _too_ them.
| thinkmassive wrote:
| GenAI _generates_ output
| jorvi wrote:
| A submarine is propelled by a propellor and helmed by a
| controller (usually a human).
|
| It would be swimming if it was propelled by drag (well,
| technically a propellor also uses drag via thrust, but
| you get the point). Imagine a submarine with a fish tail.
|
| Likewise we can probably find an apt description in our
| current vocabulary to fittingly describe what LLMs do.
| j0057 wrote:
| A submarine is a boat and boats sail.
| TimTheTinker wrote:
| An LLM is a stochastic generative model and stochastic
| generative models ... generate?
| LeonardoTolstoy wrote:
| And we are there. A boat sails, and a submarine sails. A
| model generates makes perfect sense to me. And saying
| chatgpt generated a poem feels correct personally. Indeed
| a model (e.g. a linear regression) generates predictions
| for the most part.
| psychoslave wrote:
| It does some kind of automatic inference (AI), and that's
| it.
| JimDabell wrote:
| > this is a "can a submarine swim" problem IMO. We need a
| new word for what LLMs are doing.
|
| Why?
|
| A plane is not a fly and does not stay aloft like a fly,
| yet we describe what it does as flying despite the fact
| that it does not flap its wings. What are the downsides
| we encounter that are caused by using the word "fly" to
| describe a plane travelling through the air?
| flir wrote:
| I was riffing on that famous Dijkstra quote.
| dotancohen wrote:
| For what it's worth, in my language the motion of birds
| and the motion of aircraft _are_ two different words.
| Tijdreiziger wrote:
| Flying isn't named after flies, they both come from the
| same root.
|
| https://www.etymonline.com/search?q=fly
| lelanthran wrote:
| > A plane is not a fly and does not stay aloft like a
| fly, yet we describe what it does as flying despite the
| fact that it does not flap its wings.
|
| Flying doesn't mean flapping, and the word has a long
| history of being used to describe inanimate objects
| moving through the air.
|
| "A rock flies through the window, shattering it and
| spilling shards everywhere" - see?
|
| OTOH, we have never used to word "swim" in the same way -
| "The rock hit the surface and swam to the bottom" is
| _wrong!_
| intended wrote:
| It will help significantly, to realize that the only
| thinking happening is when the human looks at the output
| and attempts to verify if it is congruent with reality.
|
| The rest of the time it's generating content.
| Atlas667 wrote:
| A machine that can imitate the products of thought is not
| the same as thinking.
|
| All imitations _require_ analogous mechanisms, but that
| is the extent of their similarities, in syntax. Thinking
| requires networks of billions of neurons, and then, not
| only that, but words can never exist on a plane because
| they do not belong to a plane. Words can only be stored
| on a plane, they are not useful on a plane.
|
| Because of this LLMs have the potential to discover new
| aspects and implications of language that will be rarely
| useful to us because language is not useful within a
| computer, it is useful in the world.
|
| Its like seeing loosely related patterns in a picture and
| keep derivating on those patterns that are real, but
| loosely related.
|
| LLMs are not intelligence but its fine that we use that
| word to describe them.
| delusional wrote:
| > "selecting the next word based on a complex statistical
| model" doesn't begin to capture what they're capable of.
|
| I personally find that description perfect. If you want
| it shorter you could say that an LLM generates.
| ryeats wrote:
| It's more like muscle memory than cognition. So maybe
| procedural memory but that isn't catchy.
| 01HNNWZ0MV43FF wrote:
| They certainly do act like a thing which has a very
| strong "System 1" but no "System 2" (per Thinking, Fast
| And Slow)
| loxs wrote:
| We can argue all day what "think" means and whether a LLM
| thinks (probably not IMO), but at least in my head the
| threshold for "decide" is much lower so I can perfectly
| accept that a LLM (or even a class) "decides". I don't have
| a conflict about that. Yeah, it might not be a decision in
| the human sense, but it's a decision in the mathematical
| sense so I have always meant "decide" literally when I was
| talking about a piece of code.
|
| It's much more interesting when we are talking about...
| say... an ant... Does it "decide"? That I have no idea as
| it's probably somewhere in between, neither a sentient
| decision, nor a mathematical one.
| 0x457 wrote:
| Well, it outputs a chain of thoughts that later used to
| produce better prediction. It produces a chain of
| thoughts similar to how one would do thinking about a
| problem out loud. It's more verbose that what you would
| do, but you always have some ambient context that LLM
| lacks.
| stoneyhrm1 wrote:
| I mean you can boil anything down to it's building blocks
| and make it seem like it didn't 'decide' anything. When you
| as a human decide something, your brain and it's neurons
| just made some connections with an output signal sent to
| other parts that resulting in your body 'doing' something.
|
| I don't think LLMs are sentient or any bullshit like that,
| but I do think people are too quick to write them off
| before really thinking about how a nn 'knows things'
| similar to how a human 'knows' things, it is trained and
| reacts to inputs and outputs. The body is just far more
| complex.
| grey-area wrote:
| I wasn't talking about knowing (they clearly encode
| knowledge), I was talking about thinking/reasoning, which
| is something LLMs do not in fact do IMO.
|
| These are very different and knowledge is not
| intelligence.
| HelloUsername wrote:
| > EDIT: some clarifications / wording
|
| This made me think, when will we see LLMs do the same;
| rereading what they just sent, and editing and correcting
| their output again :P
| Al-Khwarizmi wrote:
| I think it's worth distinguishing between the use of
| anthropomorphism as a useful abstraction and the misuse by
| companies to fuel AI hype.
|
| For example, I think "chain of thought" is a good name for
| what it denotes. It makes the concept easy to understand and
| discuss, and a non-antropomorphized name would be unnatural
| and unnecessarily complicate things. This doesn't mean that I
| support companies insisting that LLMs think just like humans
| or anything like that.
|
| By the way, I would say actually anti-anthropomorphism has
| been a bigger problem for understanding LLMs than
| anthropomorphism itself. The main proponents of anti-
| anthropomorphism (e.g. Bender and the rest of "stochastic
| parrot" and related paper authors) came up with a lot of
| predictions about things that LLMs surely couldn't do (on
| account of just being predictors of the next word, etc.)
| which turned out to be spectacularly wrong.
| whilenot-dev wrote:
| I don't know about others, but I much prefer if some
| reductionist tries to conclude what's technically feasible
| and is proven wrong _over time_ , than somebody yelling
| holistic analogies a la "it's sentient, it's intelligent,
| it thinks like us humans" for the sole dogmatic reason of
| being a futurist.
|
| Tbh I also think your comparison that puts "UI events ->
| Bits -> Transistor Voltages" as analogy to "AI thinks ->
| token de-/encoding + MatMul" is certainly a stretch, as the
| part about "Bits -> Transistor Voltages" applies to both
| hierarchies as the foundational layer.
|
| "chain of thought" could probably be called "progressive
| on-track-inference" and nobody would roll an eye.
| amelius wrote:
| I don't agree. Most LLMs have been trained on human data, so
| it is best to talk about these models in a human way.
| 4ndrewl wrote:
| Even the verb 'trained' is contentious wrt
| anthropomorphism.
| amelius wrote:
| Somewhat true but rodents can also be trained ...
| 4ndrewl wrote:
| Rodents aren't functions though?
| FeepingCreature wrote:
| Every computable system, even stateful systems, can be
| reformulated as a function.
|
| If IO can be functional, I don't see why mice can't.
| psychoslave wrote:
| Well, that's a strong claim of equivalence between
| computationable models and realty.
|
| The consensual view is rather that no map is matching
| fully the territory, or said otherwise the territory
| includes ontological components that exceeds even the
| most sophisticated map that can be ever built.
| FeepingCreature wrote:
| I believe the consensus view is that physics is
| computable.
| 4ndrewl wrote:
| Thanks. I think the original point about the word
| 'trained' being contentious still stands, as evidenced by
| this thread :)
| tempfile wrote:
| So you think a rodent _is_ a function?
| FeepingCreature wrote:
| I think that I am a function.
| tliltocatl wrote:
| Anthropomorphising implicitly assumes motivation, goals and
| values. That's what the core of anthropomorphism is -
| attempting to explain behavior of a complex system in
| teleological terms. And prompt escapes make it clear LLMs
| doesn't have any teleological agency yet. Whenever their
| course of action is, it is to easy to steer them of. Try to
| do it with a sufficiently motivated human.
| psychoslave wrote:
| >. Try to do it with a sufficiently motivated human.
|
| That's what they call marketing, propaganda or brain
| washing, acculturation , education depending on who you
| ask and at which scale you operate, apparently.
| tliltocatl wrote:
| > sufficiently motivated
|
| None of these targets sufficiently motivated, rather
| those who are either ambivalent or yet unexposed.
| criddell wrote:
| How will you know when an AI has teleological agency?
| tliltocatl wrote:
| Prompt escapes will be much harder, and some of them will
| end up in an equivalent of "sure here is... no, wait...
| You know what, I'm not doing that", i. e. slipping and
| then getting back on track.
| fenomas wrote:
| When I see these debates it's always the other way around -
| one person speaks colloquially about an LLM's behavior, and
| then somebody else jumps on them for supposedly believing the
| model is conscious, just because the speaker said "the model
| thinks.." or "the model knows.." or whatever.
|
| To be honest the impression I've gotten is that some people
| are just very interested in talking about not
| anthropomorphizing AI, and less interested in talking about
| AI behaviors, so they see conversations about the latter as a
| chance to talk about the former.
| latexr wrote:
| Respectfully, that is a reflection of the places you hang
| out in (like HN) and not the reality of the population.
|
| Outside the technical world it gets much worse. There are
| people who killed themselves because of LLMs, people who
| are in love with them, people who genuinely believe they
| have "awakened" their own private ChatGPT instance into AGI
| and are eschewing the real humans in their lives.
| fenomas wrote:
| Naturally I'm aware of those things, but I don't think
| TFA or GGP were commenting on them so I wasn't either.
| Xss3 wrote:
| The other day a good friend of mine with mental health
| issues remarked that "his" chatgpt understands him better
| than most of his friends and gives him better advice than
| his therapist.
|
| It's going to take a lot to get him out of that mindset
| and frankly I'm dreading trying to compare and contrast
| imperfect human behaviour and friendships with a
| sycophantic AI.
| bonoboTP wrote:
| It's surprisingly common on reddit that people talk about
| "my chatgpt", and they don't always seem like the type
| who are "in a relationship" with the bot or unlocking the
| secrets of the cosmos with it, but still they write "my
| chatgpt" and "your chatgpt". I guess the custom prompt
| and the available context does customize the model for
| them in some sense, but I suspect they likely have a
| wrong mental model of how this customization works. I
| guess they imagine it as their own little model being
| stored on file at OpenAI and as they interact with it,
| it's being shaped by it, and each time they connect,
| their model is retrieved from the cloud storage and they
| connect to it or something.
| lelanthran wrote:
| > The other day a good friend of mine with mental health
| issues remarked that "his" chatgpt understands him better
| than most of his friends and gives him better advice than
| his therapist.
|
| The therapist thing might be correct, though. You can
| send a well-adjusted person to three renowned therapists
| and get three different reasons for why they need to
| continue sessions.
|
| No therapist _ever_ says _" Congratulations, you're
| perfectly normal. Now go away and come back when you have
| a real problem."_ Statistically it is vanishingly
| unlikely that _every_ person who ever visited a therapist
| is in need of a second (more more) visit.
|
| The main problem with therapy is a lack of
| objectivity[1]. When people talk about what their
| sessions resulted in, it's always _" My problem is that
| I'm too perfect"_. I've known actual bullies whose
| therapist apparently told them that they are too
| submissive and need to be more assertive.
|
| The secondary problem is that all diagnosis is based on
| self-reported metrics of the subject. All improvement is
| equally based on self-reported metrics. This is no
| different from prayer.
|
| You don't have a medical practice there; you've got an
| Imam and a sophisticated but still medically-insured way
| to plead with thunderstorms[2]. I fail to see how an LLM
| (or even the Rogerian a-x doctor in Emacs) will do worse
| on average.
|
| After all, if you're at a therapist and you're doing most
| of the talking, how would an LLM perform worse than the
| therapist?
|
| ----------------
|
| [1] If I'm at a therapist, and they're asking me to do
| most of the talking, I would damn well feel that I am not
| getting my moneys worth. I'd be there primarily to learn
| (and practice a little) whatever tools they can teach me
| to handle my $PROBLEM. I don't want someone to vent at, I
| want to learn coping mechanisms and mitigation
| strategies.
|
| [2] This is not an obscure reference.
| positron26 wrote:
| Most certainly the conversation is extremely political.
| There are not simply different points of view. There are
| competitive, gladiatorial opinions ready to ambush anyone
| not wearing the right colors. It's a situation where the
| technical conversation is drowning.
|
| I suppose this war will be fought until people are out of
| energy, and if reason has no place, it is reasonable to let
| others tire themselves out reiterating statements that are
| not designed to bring anyone closer to the truth.
| bonoboTP wrote:
| If this tech is going to be half as impactful as its
| proponents predict, then I'd say it's still under-
| politicized. Of course the politics around it doesn't
| have to be knee-jerk mudslinging, but it's no surprise
| that politics enters the picture when the tech can
| significantly transform society.
| scarface_74 wrote:
| Wait until a conversation about "serverless" comes up and
| someone says there is no such thing because there are
| servers somewhere as if everyone - especially on HN
| -doesn't already know that.
| Tijdreiziger wrote:
| Why would everyone know that? Not everyone has experience
| in sysops, especially not beginners.
|
| E.g. when I first started learning webdev, I didn't think
| about 'servers'. I just knew that if I uploaded my
| HTML/PHP files to my shared web host, then they appeared
| online.
|
| It was only much later that I realized that shared
| webhosting is 'just' an abstraction over Linux/Apache
| (after all, I first had to learn about those topics).
| scarface_74 wrote:
| I am saying that most people who come on HN and say
| "there is no such thing as serverless and there are
| servers somewhere" think they are sounding smart when
| they are adding nothing to the conversation.
|
| I'm sure you knew that your code was running on computers
| somewhere even when you first started and wasn't running
| in a literal "cloud".
|
| It's about as tiring as people on HN who know just a
| little about LLMs thinking they are sounding smart when
| they say they are just advanced autocomplete. Both
| responses are just as unproductive
| Tijdreiziger wrote:
| > I'm sure you knew that your code was running on
| computers somewhere even when you first started and
| wasn't running in a literal "cloud".
|
| Meh, I just knew that the browser would display HTML if I
| wrote it, and that uploading the HTML files made them
| available on my domain. I didn't really think about
| _where_ the files went, specifically.
|
| Try asking an average high school kid how cloud storage
| works. I doubt you'll get any further than 'I make files
| on my Google Docs and then they are saved there'. This is
| one step short of 'well, the files must be on some system
| in some data center'.
|
| I really disagree that "people who come on HN and say
| "there is no such thing as serverless and there are
| servers somewhere" think they are sounding smart when
| they are adding nothing to the conversation." On the
| contrary, it's an invitation to beginning coders to think
| about _what_ the 'serverless' abstraction actually means.
| godelski wrote:
| I think they fumbled with wording but I interpreted them
| as meaning "audience of HN" and it seems they confirmed.
|
| We always are speaking to our audience, right? This is
| also what makes more general/open discussions difficult
| (e.g. talking on Twitter/Facebook/etc). That there are
| many ways to interpret anything depending on prior
| knowledge, cultural biases, etc. But I think it is fair
| that on HN we can make an assumption that people here are
| tech savvy and knowledgeable. We'll definitely overstep
| and understep at times, but shouldn't we also cultivate a
| culture where it is okay to ask and okay to apologize for
| making too much of an assumption?
|
| I mean at the end of the day we got to make some
| assumptions, right? If we assume zero operating knowledge
| then comments are going to get pretty massive and
| frankly, not be good at communicating with a niche even
| if better at communicating with a general audience. But
| should HN be a place for general people? I think no. I
| think it should be a place for people interested in
| computers and programming.
| Wowfunhappy wrote:
| As I write this, Claude Code is currently opening and
| closing various media files on my computer. Sometimes it
| plays the file for a few seconds before closing it,
| sometimes it starts playback and then seeks to a different
| position, sometimes it fast forwards or rewinds, etc.
|
| I asked Claude to write a E-AC3 audio component so I can
| play videos with E-AC3 audio in the old version of
| QuickTime I really like using. Claude's decoder includes
| the ability to write debug output to a log file, so Claude
| is studying how QuickTime and the component interact, and
| it's controlling QuickTime via Applescript.
|
| Sometimes QuickTime crashes, because this ancient API has
| its roots in the classic Mac OS days and is not exactly
| good. Claude reads the crash logs on its own--it knows
| where they are--and continues on its way. I'm just sitting
| back and trying to do other things while Claude works,
| although it's a little distracting that _something_ else is
| using my computer at the same time.
|
| I _really_ don 't want to anthropomorphize these programs,
| but it's just so _hard_ when it 's acting so much like a
| person...
| godelski wrote:
| Would it help you to know that trial and error is a
| common tactic by machines? Yes, humans do it too, but
| that doesn't mean the process isn't mechanical. In fact,
| in computing we might call this a "brute force" approach.
| You don't have to cover the entire search space to brute
| force something, and it certainly doesn't mean you can't
| have optimization strategies and need to grid search
| (e.g. you can use Bayesian methods, multi-armed bandit
| approaches, or a whole world of things).
|
| I would call "fuck around and find out" a rather simple
| approach. It is why we use it! It is why lots of animals
| use it. Even very dumb animals use it. Though, we do
| notice more intelligent animals use more efficient
| optimization methods. All of this is technically
| hypothesis testing. Even a naive grid search. But that is
| still in the class of "fuck around and find out" or
| "brute force", right?
|
| I should also mention two important things.
|
| 1) as a human we are biased to anthropomorphize. We see
| faces in clouds. We tell stories of mighty beings
| controlling the world in an effort to explain why things
| happen. This is anthropomorphization of the universe
| itself!
|
| 2) We design LLMs (and many other large ML systems) to
| optimize towards human preference. This reinforces an
| anthropomorphized interpretation.
|
| The reason for doing this (2) is based on a naive
| assumption[0]: If it looks like a duck, swims like a
| duck, and quacks like a duck, then it * _probably*_ is a
| duck. But the duck test doesn 't rule out a highly
| sophisticated animatronic. It's a good rule of thumb, but
| wouldn't it also be incredibly naive to assume that it *
| _is*_ a duck? Isn 't the duck test itself entirely
| dependent on our own personal familiarity with ducks? I
| think this is important to remember and can help combat
| our own propensity for creating biases.
|
| [0] It is not a bad strategy to build in that direction.
| When faced with many possible ways to go, this is a very
| reasonable approach. The naive part is if you assume that
| it will take you all the way to making a duck. It is also
| a perilous approach because you are explicitly making it
| harder for you to evaluate. It is, in the fullest sense
| of the phrase, "metric hacking."
| Wowfunhappy wrote:
| It wasn't a simple brute force. When Claude was working
| this morning, it was pretty clearly only playing a file
| when it actually needed to see packets get decoded,
| otherwise it would simply open and close the document.
| Similarly, it would only seek or fast forward when it was
| debugging specific issues related to those actions. And
| it even "knew" which test files to open for specific
| channel layouts.
|
| Yes this is still mechanical in a sense, but then I'm not
| sure what behavior you _wouldn 't_ classify as
| mechanical. It's "responding" to stimuli in logical ways.
|
| But I also don't quite know where I'm going with this. I
| don't think LLMs are sentient or something, I know
| they're just math. But it's _spooky_.
| stoneyhrm1 wrote:
| I thought this too but then began to think about it from the
| perspective of the programmers trying to make it imitate
| human learning. That's what a nn is trying to do at the end
| of the day, and in the same way I train myself by reading
| problems and solutions, or learning vocab at a young age, it
| does so by tuning billions of parameters.
|
| I think these models do learn similarly. What does it even
| mean to reason? Your brain knows certain things so it comes
| to certain conclusions, but it only knows those things
| because it was ''trained'' on those things.
|
| I reason my car will crash if I go 120 mph on the other side
| of the road because previously I have 'seen' where the input
| is a car going 120mph has a high probability of producing a
| crash, and similarly have seen input where the car is going
| on the other side of the road, producing a crash. Combining
| the two would tell me it's a high probability.
| losvedir wrote:
| Well "reasoning" refers to Chain-of-Thought and if you look
| at the generated prompts it's not hard to see why it's called
| that.
|
| That said, it's fascinating to me that it works (and
| empirically, it does work; a reasoning model generating tens
| of thousands of tokens while working out the problem does
| produce better results). I wish I knew why. A priori I
| wouldn't have expected it, since there's no new input. That
| means it's all "in there" in the weights already. I don't see
| why it couldn't just one shot it without all the reasoning.
| And maybe the future will bring us more distilled models that
| can do that, or they can tease out all that reasoning with
| more generated training data, to move it from dispersed
| around the weights -> prompt -> more immediately accessible
| in the weights. But for now "reasoning" works.
|
| But then, at the back of my mind is the easy answer: maybe
| you can't optimize it. Maybe the model has to "reason" to
| "organize its thoughts" and get the best results. After all,
| if you give _me_ a complicated problem I 'll write down
| hypotheses and outline approaches and double check results
| for consistency and all that. But now we're getting
| dangerously close to the "anthropomorphization" that this
| article is lamenting.
| sdenton4 wrote:
| CoT gives the model more time to think and process the
| inputs it has. To give an extreme example, suppose you are
| using next token prediction to answer 'Is P==NP?' The tiny
| number of input tokens means that there's a tiny amount of
| compute to dedicate to producing an answer. A scratchpad
| allows us to break free of the short-inputs problem.
|
| Meanwhile, things can happen in the latent representation
| which aren't reflected in the intermediate outputs. You
| could, instead of using CoT, say "Write a recipe for a
| vegetarian chile, along with a lengthy biographical story
| relating to the recipe. Afterwards, I will ask you again
| about my original question." And the latents can still help
| model the primary problem, yielding a better answer than
| you would have gotten with the short input alone.
|
| Along these lines, I believe there are chain of thought
| studies which find that the content of the intermediate
| outputs don't actually matter all that much...
| shakadak wrote:
| > I don't see why it couldn't just one shot it without all
| the reasoning.
|
| That's reminding me of deep neural networks where single
| layer networks could achieve the same results, but the
| layer would have to be excessively large. Maybe we're re-
| using the same kind of improvement, scaling in length
| instead of width because of our computation limitations ?
| variadix wrote:
| Using more tokens = more compute to use for a given
| problem. I think most of the benefit of CoT has more to do
| with autoregressive models being unable to "think ahead"
| and revise their output, and less to do with actual
| reasoning. The fact that an LLM can have incorrect
| reasoning in its CoT and still produce the right answer, or
| that it can "lie" in its CoT to avoid being detected as
| cheating on RL tasks, makes me believe that the semantic
| content of CoT is an illusion, and that the improved
| performance is from being able to explore and revise in
| some internal space using more compute before producing a
| final output.
| Terr_ wrote:
| I like this mental-model, which rests heavily on the "be
| careful not to anthropomorphize" approach:
|
| It was already common to use a document extender (LLM)
| against a hidden document, which resembles a movie or
| theater play where a character named User is interrogating
| a character named Bot.
|
| Chain-of-thought switches the movie/script style to _film
| noir_ , where the [Detective] Bot character has additional
| content which is not actually "spoken" at the User
| character. The extra words in the script add a certain kind
| of metaphorical inertia.
| bunderbunder wrote:
| "All models are wrong, but some models are useful," is the
| principle I have been using to decide when to go with an
| anthropomorphic explanation.
|
| In other words, no, they never accurately describe what the
| LLM is actually doing. But sometimes drawing an analogy to
| human behavior is the most effective way to pump others'
| intuition about a particular LLM behavior. The trick is
| making sure that your audience understands that this is just
| an analogy, and that it has its limitations.
|
| And it's not _completely_ wrong. Mimicking human behavior is
| exactly what they 're designed to do. You just need to keep
| reminding people that it's only doing so in a very
| superficial and spotty way. There's absolutely no basis for
| assuming that what's happening on the inside is the same.
| Veen wrote:
| Some models are useful in some contexts but wrong enough to
| be harmful in others.
| bunderbunder wrote:
| _All_ models are useful in some contexts but wrong enough
| to be harmful in others.
|
| Relatedly, the alternative to pragmatism is analysis
| paralysis.
| bakuninsbart wrote:
| > people are genuinely talking about them thinking and
| reasoning when they are doing nothing of that sort
|
| With such strong wording, it should be rather easy to explain
| how our thinking differs from what LLMs do. The next step -
| showing that what LLMs do _precludes_ any kind of sentience
| is probably much harder.
| ordu wrote:
| _> On the contrary, anthropomorphism IMO is the main problem
| with narratives around LLMs_
|
| I hold a deep belief that anthropomorphism is a way the human
| mind words. If we take for granted the hypothesis of Franz de
| Waal, that human mind developed its capabilities due to
| political games, and then think about how it could later lead
| to solving engineering and technological problems, then the
| tendency of people to anthropomorphize becomes obvious.
| Political games need empathy or maybe some other kind of
| -pathy, that allows politicians to guess motives of others
| looking at their behaviors. Political games directed the
| evolution to develop mental instruments to uncover causality
| by watching at others and interacting with them. Now, to
| apply these instruments to inanimate world all you need is to
| anthropomorphize inanimate objects.
|
| Of course, it leads sometimes to the invention of gods, or
| spirits, or other imaginary intelligences behinds things. And
| sometimes these entities get in the way of revealing the real
| causes of events. But I believe that to anthropomorphize LLMs
| (at the current stage of their development) is not just the
| natural thing for people but a good thing as well. Some
| behavior of LLMs is easily described in terms of psychology;
| some cannot be described or at least not so easy. People are
| seeking ways to do it. Projecting this process into the
| future, I can imagine how there will be a kind of consensual
| LLMs "theory" that explains some traits of LLMs in terms of
| human psychology and fails to explain other traits, so they
| are explained in some other terms... And then a revolution
| happens, when a few bright minds come and say that
| "anthropomorphism is bad, it cannot explain LLM" and they
| propose something different.
|
| I'm sure it will happen at some point in the future, but not
| right now. And it will happen not like that: not just because
| someone said that anthropomorphism is bad, but because they
| proposed another way to talk about reasons behind LLMs
| behavior. It is like with scientific theories: they do not
| fail because they become obviously wrong, but because other,
| better theories replace them.
|
| It doesn't mean, that there is no point to fight
| anthropomorphism right now, but this fight should be directed
| at searching for new ways to talk about LLMs, not to show at
| the deficiencies of anthropomorphism. To my mind it makes
| sense to start not with deficiencies of anthropomorphism but
| with its successes. What traits of LLMs it allows us to
| capture, which ideas about LLMs are impossible to wrap into
| words without thinking of LLMs as of people?
| marviel wrote:
| how do you account for the success of reasoning models?
|
| I agree these things don't think like we do, and that they
| have weird gaps, but to claim they can't reason at all
| doesn't feel grounded.
| godelski wrote:
| Serendipitous name...
|
| In part I agree with the parent. >> it
| pointless to *not* anthropomorphize, at least to an extent.
|
| I agree that it is pointless to _not_ anthropomorphize
| because we are humans and we will automatically do this.
| Willingly or unwillingly.
|
| On the other hand, it generates bias. This bias can lead to
| errors.
|
| So the real answer is (imo) that it is fine to
| anthropomorphise but recognize that while doing so can
| provide utility and help us understand, it is _WRONG_.
| Recognizing that it is not right and cannot be right provides
| us with a constant reminder to reevaluate. Use it, but double
| check, and keep checking making sure you understand the
| limitations of _the analogy_. Understanding when and where it
| applies, where it doesn 't, and most importantly, where you
| don't know if it does or does not. The last is most important
| because it helps us form hypotheses that are likely to be
| testable (likely, not always. Also, much easier said than
| done).
|
| So I pick a "grey area". Anthropomorphization is a tool that
| can be helpful. But like any tool, it isn't universal. There
| is no "one-size-fits-all" tool. Literally, one of the most
| important things for any scientist is to become an expert at
| the tools you use. It's one of the most critical skills of *
| _any expert*_. So while I agree with you that we should be
| careful of anthropomorphization, I disagree that it is
| useless and can never provide information. But I do agree
| that quite frequently, the wrong tool is used for the right
| job. Sometimes, hacking it just isn 't good enough.
| UncleOxidant wrote:
| It's not just distorting discussions it's leading people to
| put a lot of faith in what LLMs are telling them. Was just on
| a zoom an hour ago where a guy working on a startup asked
| ChatGPT about his idea and then emailed us the result for
| discussion in the meeting. ChatGPT basically just told him
| what he wanted to hear - essentially that his idea was great
| and it would be successful ("if you implement it correctly"
| was doing a lot of work). It was a glowing endorsement of the
| idea that made the guy think that he must have a million
| dollar idea. I had to be "that guy" who said that maybe
| ChatGPT was telling him what he wanted to hear based on the
| way the question was formulated - tried to be very diplomatic
| about it and maybe I was a bit too diplomatic because it
| didn't shake his faith in what ChatGPT had told him.
| TimTheTinker wrote:
| LLMs speak in a human-like voice, often bypassing our
| natural trust guards that are normally present when
| speaking with other people or interacting with our
| environment. (The "uncanny valley" reaction or the ability
| to recognize something as non-living are two examples of
| trust guards.)
|
| When we write a message and are given a coherent,
| contextually appropriate response, our brains tend to
| engage relationally and extend some level of trust--at a
| minimum, an unconscious functional belief that an agent on
| the other end is responding with their real thoughts--even
| when we know better.
|
| That's what has me most worried about the effect of LLMs on
| society. They directly exploit a human trust vuln. When
| attempting to engage them in any form of conversation, AI
| systems ought to at minimum warn us that these are not
| anyone's real thoughts.
| raincole wrote:
| I've said that before: we have been anthropomorphizing
| computers since the dawn of information age.
|
| - Read and write - Behaviors that separate humans from animals.
| Now used for input and output.
|
| - Server and client - Human social roles. Now used to describe
| network architecture.
|
| - Editor - Human occupation. Now a kind of software.
|
| - Computer - Human occupation!
|
| And I'm sure people referred their cars and ships as 'her'
| before the invention of computers.
| latexr wrote:
| You are conflating anthropomorphism with personification.
| They are not the same thing. No one believes their guitar or
| car or boat is alive and sentient when they give it a name or
| talk to or about it.
|
| https://www.masterclass.com/articles/anthropomorphism-vs-
| per...
| raincole wrote:
| But the author used "anthropomorphism" the same way as I
| did. I guess we both mean "personification" then.
|
| > we talk about "behaviors", "ethical constraints", and
| "harmful actions in pursuit of their goals". All of these
| are anthropocentric concepts that - in my mind - do not
| apply to functions or other mathematical objects.
|
| One talking about a program's "behaviors", "actions" or
| "goals" doesn't mean they believe the program is sentient.
| Only "ethical constraints" is suspiciously
| anthropomorphizing.
| latexr wrote:
| > One talking about a program's "behaviors", "actions" or
| "goals" doesn't mean they believe the program is
| sentient.
|
| Except that is exactly what we're seeing with LLMs.
| People believing exactly that.
| raincole wrote:
| Perhaps a few mentally unhinged people do.
|
| A bit of anecdote: last year I hung out with a bunch of
| old classmates that I hadn't seen for quite a while. None
| of them works in tech.
|
| Surprisingly to me, all of them have ChatGPT installed on
| their phones.
|
| And unsurprisingly to me, none of them treated it like an
| actual intelligence. That makes me wonder where those who
| think ChatGPT is sentient come from.
|
| (It's a bit worrisome that several of them thought it
| worked "like Google search and Google translation
| combined", even by the time ChatGPT couldn't do web
| search...!)
| latexr wrote:
| > Perhaps a few mentally unhinged people do.
|
| I think it's more than a few and it's still rising, and
| therein lies the issue.
|
| Which is why it is paramount to talk about this _now_ ,
| when we may still turn the tide. LLMs can be useful, but
| it's important to have the right mental model,
| understanding, expectations, and attitude towards them.
| jibal wrote:
| > Perhaps a few mentally unhinged people do.
|
| This is a No True Scotsman fallacy. And it's radically
| factually wrong.
|
| The rest of your comment is along the lines of the famous
| (but apocryphal) Pauline Kael line "I can't believe Nixon
| won. I don't know anyone who voted for him."
| whilenot-dev wrote:
| I'm not convinced... we use these terms to assign roles, yes,
| but these roles describe a utility or assign a
| responsibility. That isn't anthropomorphizing anything, but
| it rather describes the usage of an inanimate object as tool
| for us humans and seems in line with history.
|
| What's the utility or the responsibility of AI, what's its
| usage as tool? If you'd ask me it should be closer to serving
| insights than "reasoning thoughts".
| mercer wrote:
| I get the impression after using language models for quite a
| while that perhaps the one thing that is riskiest to
| anthropomorphise is the conversational UI that has become the
| default for many people.
|
| A lot of the issues I'd have when 'pretending' to have a
| conversation are much less so when I either keep things to a
| single Q/A pairing, or at the very least heavily edit/prune the
| conversation history. Based on my understanding of LLM's, this
| seems to make sense even for the models that are trained for
| conversational interfaces.
|
| so, for example, an exchange with multiple messages, where at
| the end I ask the LLM to double-check the conversation and
| correct 'hallucinations', is less optimal than something like
| asking for a thorough summary at the end, and then feeding that
| into a new prompt/conversation, as the repetition of these
| falsities, or 'building' on them with subsequent messages, is
| more likely to make them a stronger 'presence' and as a result
| perhaps affect the corrections.
|
| I haven't tested any of this thoroughly, but at least with code
| I've definitely noticed how a wrong piece of code can 'infect'
| the conversation.
| Xss3 wrote:
| This. If an AI spits out incorrect code then i immediately
| create a new chat and reprompt with additional context.
|
| 'Dont use regex for this task' is a common addition for the
| new chat. Why does AI love regex for simple string
| operations?
| naasking wrote:
| I used to do this as well, but Gemini 2.5 has improved on
| this quite a bit and I don't find myself needing to do it
| as much anymore.
| endymion-light wrote:
| This is why I actually really love the description of it as a
| "Shoggoth" - it's more abstract, slightly floaty but it
| achieves the purpose of not treating and anthropomising it as a
| human being while not treating LLMs as a collection of
| predictive words.
| tempfile wrote:
| The "point" of not anthropomorphizing is to refrain from
| judgement until a more solid abstraction appears. The problem
| with explaining LLMs in terms of human behaviour is that, while
| we don't clearly understand what the LLM is doing, we
| understand human cognition even less! There is literally no
| predictive power in the abstraction "The LLM is thinking like I
| am thinking". It gives you no mechanism to evaluate what tasks
| the LLM "should" be able to do.
|
| Seriously, try it. Why don't LLMs get frustrated with you if
| you ask them the same question repeatedly? A human would. Why
| are LLMs so happy to give contradictory answers, as long as you
| are very careful not to highlight the contradictory facts? Why
| do earlier models behave worse on reasoning tasks than later
| ones? These are features nobody, anywhere understands. So why
| make the (imo phenomenally large) leap to "well, it's clearly
| just a brain"?
|
| It is like someone inventing the aeroplane and someone looks at
| it and says "oh, it's flying, I guess it's a bird". It's not a
| bird!
| CuriousSkeptic wrote:
| > Why don't LLMs get frustrated with you if you ask them the
| same question repeatedly?
|
| To be fair, I have had a strong sense of Gemini in particular
| becoming a lot more frustrated with me than GPT or Claude.
|
| Yesterday I had it ensuring me that it was doing a great job,
| it was just me not understanding the challenge but it would
| break it down step by step just to make it obvious to me
| (only to repeat the same errors, but still)
|
| I've just interpreted it as me reacting to the lower amount
| of sycophancy for now
| danielbln wrote:
| In addition, when the boss man asks for the same thing
| repeatedly then the underling might get frustrated as hell,
| but they won't be telling that to the boss.
| jibal wrote:
| Point out to an LLM that it has no mental states and thus
| isn't capable of being frustrated (or glad that your
| program works or hoping that it will, etc. ... I call them
| out whenever they ascribe emotions to themselves) and they
| will confirm that ... you can coax from them quite detailed
| explanations of why and how it's an illusion.
|
| Of course they will quickly revert to self-
| anthropomorphizing language, even after promising that they
| won't ... because they are just pattern matchers producing
| the sort of responses that conforms to the training data,
| not cognitive agents capable of making or keeping promises.
| It's an illusion.
| Applejinx wrote:
| Of course this is deeply problematic because it's a cloud
| of HUMAN response. This is why 'they will' get frustrated
| or creepy if you mess with them, give repeating data or
| mind game them: literally all it has to draw on is a vast
| library of distilled human responses and that's all the
| LLM can produce. This is not an argument with jibal, it's
| a 'yes and'.
|
| You can tell it 'you are a machine, respond only with
| computerlike accuracy' and that is you gaslighting the
| cloud of probabilities and insisting it should act with a
| personality you elicit. It'll do what it can, in that you
| are directing it. You're prompting it. But there is
| neither a person there, nor a superintelligent machine
| that can draw on computerlike accuracy, because the DATA
| doesn't have any such thing. Just because it runs on lots
| of computers does not make it a computer, any more than
| it's a human.
| squidbeak wrote:
| The vending machine study from a few months ago, where
| flash 2.0 lost its mind, contacted the FBI (as far as it
| knew) and refused to co-operate with the operator's
| demands, seemed a lot like frustration.
| psychoslave wrote:
| LLM are as far away from your description as ASM is from the
| underlying architecture. The anthropomorohic abstraction is as
| nice as any metaphore which fall apart the very moment you put
| a foot outside what it allows to shallowoly grab. But some
| people will put far more amount to push force a confortable
| analogy rather than admit it has some limits and to use the new
| tool in a more relevant way you have to move away from this
| confort zone.
| adityaathalye wrote:
| My brain refuses to join the rah-rah bandwagon because I cannot
| _see_ them in my mind's eye. Sometimes I get jealous of people
| like GP and OP who clearly seem to have the sight. (Being a
| serial math exam flunker might have something to do with it.
| :))))
|
| Anyway, one does what one can.
|
| (I've been trying to picture abstract visual and semi-
| philosophical approximations which I'll avoid linking here
| because they seem to fetch bad karma in super-duper LLM
| enthusiast communities. But you can read them on my blog and
| email me scathing critiques, if you wish :sweat-smile:.)
| woliveirajr wrote:
| I'd take it in reverse order: the problem isn't that it's
| possible to have a computer that "stochastically produces the
| next word" and can fool humans, it's why / how / when humans
| evolved to have technological complexity when the majority (of
| people) aren't that different from a stochastic process.
| pmg101 wrote:
| I remember Dawkins talking about the "intentional stance" when
| discussing genes in The Selfish Gene.
|
| It's flat wrong to describe genes as having any agency. However
| it's a useful and easily understood shorthand to describe them
| in that way rather than every time use the full formulation of
| "organisms who tend to possess these genes tend towards these
| behaviours."
|
| Sometimes to help our brains reach a higher level of
| abstraction, once we understand the low level of abstraction we
| should stop talking and thinking at that level.
| jibal wrote:
| The intentional stance was Daniel Dennett's creation and a
| major part of his life's work. There are actually (exactly)
| three stances in his model: the physical stance, the design
| stance, and the intentional stance.
|
| https://en.wikipedia.org/wiki/Intentional_stance
|
| I think the design stance is appropriate for understanding
| and predicting LLM behavior, and the intentional stance is
| not.
| pmg101 wrote:
| Thanks for the correction. I guess both thinkers took a
| somewhat similar position and I somehow remembered
| Dawkins's argument but Dennett's term. The term is
| memorable.
|
| Do you want to describe WHY you think the design stance is
| appropriate here but the intentional stance is not?
| lo_zamoyski wrote:
| These anthropomorphizations are best described as metaphors
| when used by people to describe LLMs in common or loose speech.
| We already use anthropomorphic metaphors when talking about
| computers. LLMs, like all computation, are a matter of
| simulation; LLMs can appear to be conversing without actually
| conversing. What distinguishes the real thing from the
| simulation is the cause of the appearance of an effect.
| Problems occur when people forget these words are being used
| metaphorically, as if they were univocal.
|
| Of course, LLMs are multimodal and used to simulate all sorts
| of things, not just conversation. So there are many possible
| metaphors we can use, and these metaphors don't necessarily
| align with the abstractions you might use to talk about LLMs
| accurately. This is like the difference between "synthesizes
| text" (abstraction) and "speaks" (metaphor), or "synthesizes
| images" (abstraction) and "paints" (metaphor). You can use
| "speaks" or "paints" to talk about the abstractions, of course.
| overfeed wrote:
| > We need a higher abstraction level to talk about higher level
| phenomena in LLMs as well, and the problem is that we have no
| idea what happens internally at those higher abstraction levels
|
| We _do_ know what happens at higher abstraction levels; the
| design of efficient networks, and the steady beat of SOTA
| improvements all depend on understanding how LLMs work
| internally: choice of network dimensions, feature extraction,
| attention, attention heads, caching, the peculiarities of high-
| dimensions and avoiding overfitting are all well-understood by
| practitioners. Anthropomorphization is only necessary in pop-
| science articles that use a limited vocabulary.
|
| IMO, there is very little mystery, but lots of deliberate
| mysticism, especially about _future_ LLMs - the usual hype-
| cycle extrapolation.
| lawlessone wrote:
| One thing i find i keep forgetting is that asking an LLM why it
| makes a particular decision is almost pointless.
|
| It's reply isn't actually going to be why i did a thing. It's
| reply is going to be whatever is the most probably string of
| words that fit as a reason.
| amdivia wrote:
| I beg to differ.
|
| Anthropomorphizing might blind us to solutions to existing
| problems. Perhaps instead of trying to come up with the correct
| prompt for a LLM, there exists a string of words (not necessary
| ones that make sense) that will get the LLM to a better
| position to answer given questions.
|
| When we anthropomorphize we are inherently ignore certain parts
| of how LLMs work, and imagining parts that don't even exist
| meroes wrote:
| > there exists a string of words (not necessary ones that
| make sense) that will get the LLM to a better position to
| answer
|
| exactly. The opposite is also true. You might supply more
| clarifying information to the LLM, which would help any human
| answer, but it actually degrades the LLM's output.
| mvieira38 wrote:
| This is frequently the case IME, especially with chat
| interfaces. One or two bad messages and you derail the
| quality
| lawlessone wrote:
| You can just throw in words to bias it towards certain
| outcomes too. Same applies with image generators or
| course.
| aaroninsf wrote:
| That higher level does exist, indeed a lot philosophy of mind
| then cognitive science has been investigating exactly this
| space and devising contested professional nomenclature and
| modeling about such things for decades now.
|
| A useful anchor concept is that of _world model_ , which is
| what "learning Othello" and similar work seeks to tease out.
|
| As someone who worked in precisely these areas for years and
| has never stopped thinking about them,
|
| I find it at turns perplexing, sigh-inducing, and enraging,
| that the "token prediction" trope gained currency and moreover
| that it continues to influence people's reasoning about
| contemporary LLM, often as subtext: an unarticulated
| fundamental model, which is fundamentally wrong in its critical
| aspects.
|
| It's not that this description of LLM is technically incorrect;
| it's that it is profoundly _misleading_ and I'm old enough and
| cynical enough to know full well that many of those who have
| amplified it and continue to do so, know this very well indeed.
|
| Just as the lay person fundamentally misunderstands the
| relationship between "programming" and these models, and uses
| slack language in argumentation, the problem with this trope
| and the reasoning it entails is that what is unique and
| interesting and valuable about LLM for many applications and
| interests is _how_ they do what they do. At that level of
| analysis there is a very real argument to be made that the
| animal brain is also nothing more than an "engine of
| prediction," whether the "token" is a byte stream or neural
| encoding is quite important but not nearly important as the
| mechanics of the system which operates on those tokens.
|
| To be direct, it is quite obvious that LLM have not only
| vestigial world models, but also self-models; and a general
| paradigm shift will come around this when multimodal models are
| the norm: because those systems will share with we animals what
| philosophers call phenomenology, a model of things as they are
| "perceived" through the senses. And like we humans, these
| perceptual models (terminology varies by philosopher and
| school...) will be bound to the linguistic tokens (both heard
| and spoken, and written) we attach to them.
|
| _Vestigial_ is a key word but an important one. It 's not that
| contemporary LLM have human-tier minds, nor that they have
| animal-tier world modeling: but they can only "do what they do"
| because they have such a thing.
|
| Of looming importance--something all of us here should set
| aside time to think about--is that for most reasonable
| contemporary theories of mind, a self-model embedded in a
| world-model, with phenomenology and agency, is the recipe for
| "self" and self-awareness.
|
| One of the uncomfortable realities of contemporary LLM already
| having some vestigial self-model, is that while they are
| obviously not sentient, nor self-aware, as we are, or even
| animals are, it is just as obvious (to me at least) that they
| are self-aware in _some emerging sense_ and will only continue
| to become more so.
|
| Among the lines of finding/research most provocative in this
| area is the ongoing often sensationalized accounting in system
| cards and other reporting around two specific things about
| contemporary models: - they demonstrate behavior pursuing self-
| preservation - they demonstrate awareness of _when they are
| being tested_
|
| We don't--collectively or individually--yet know what these
| things entail, but taken with the assertion that these models
| are developing emergent self-awareness (I would say:
| necessarily and inevitably),
|
| we are facing some very serious ethical questions.
|
| The language adopted by those capitalizing and capitalizing
| _from_ these systems so far is IMO of deep concern, as it
| betrays not just disinterest in our civilization collectively
| benefiting from this technology, but also, that the disregard
| for _human_ wellbeing implicit in e.g. the hostility to UBI,
| or, Altman somehow not seeing a moral imperative to remain
| distant from the current adminstation, implies directly a much
| greater disregard for "AI wellbeing."
|
| That that concept is today still speculative is little comfort.
| Those of us watching this space know well how fast things are
| going, and don't mistake plateaus for the end of the curve.
|
| I do recommend taking a step back from the line-level grind to
| give these things some thought. They are going to shape the
| world we live out our days in and our descendents will spend
| all of theirs in.
| jll29 wrote:
| The details in how I talk about LLMs matter.
|
| If I use human-related terminology as a shortcut, as some kind
| of macro to talk at a higher level/more efficiently about
| something I want to do that might be okay.
|
| What is not okay is talking in a way that implies intent, for
| example.
|
| Compare: "The AI doesn't want to do that."
|
| versus "The model doesn't do that with this
| prompt and all others we tried."
|
| The latter way of talking is still high-level enough but avoids
| equating/confusing the name of a field with a sentient being.
|
| Whenever I hear people saying "an AI" I suggest they replace AI
| with "statistics" to make it obvious how problematic
| anthropomorphisms may have become: *"The
| statistics doesn't want to do that."
| dmitsuki wrote:
| The only reason that sounds weird to you is because you have
| the experience of being human. Human behavior is not magic.
| It's still just statistics. You go to the bathroom when you
| have to pee not because some magical concept of
| consciousness, but because a reciptor in your brain goes off
| and starts the chain of making you go to the bathroom. AI's
| are not magic, but nobody has sufficiently provided any proof
| we are somehow special either.
| TeMPOraL wrote:
| Agreed. I'm also in favor of anthropomorphizing, because not
| doing so confuses people about the nature and capabilities of
| these models _even more_.
|
| Whether it's hallucinations, prompt injections, various other
| security vulnerabilities/scenarios, or problems with doing
| math, backtracking, getting confused - there's a steady supply
| of "problems" that some people are surprised to discover and
| even more surprised this isn't being definitively fixed. Thing
| is, none of that is surprising, and these things are not bugs,
| they're flip side of the features - but to see that, one has to
| realize that _humans demonstrate those exact same failure
| modes_.
|
| Especially when it comes to designing larger systems
| incorporating LLM "agents", it really helps to think of them as
| humans - because the problems those systems face are exactly
| the same as you get with systems incorporating people, and
| mostly for the same underlying reasons. Anthropomorphizing LLMs
| cuts through a lot of misconceptions and false paths, and helps
| one realize that we have millennia of experience with people-
| centric computing systems (aka. bureaucracy) that's directly
| transferrable.
| NetRunnerSu wrote:
| The author's critique of naive anthropomorphism is salient.
| However, the reduction to "just MatMul" falls into the same trap
| it seeks to avoid: it mistakes the implementation for the
| function. A brain is also "just proteins and currents," but this
| description offers no explanatory power.
|
| The correct level of analysis is not the substrate (silicon vs.
| wetware) but the computational principles being executed. A
| modern sparse Transformer, for instance, is not "conscious," but
| it is an excellent engineering approximation of two core brain
| functions: the Global Workspace (via self-attention) and Dynamic
| Sparsity (via MoE).
|
| To dismiss these systems as incomparable to human cognition
| because their form is different is to miss the point. We should
| not be comparing a function to a soul, but comparing the
| functional architectures of two different information processing
| systems. The debate should move beyond the sterile dichotomy of
| "human vs. machine" to a more productive discussion of "function
| over form."
|
| I elaborate on this here: https://dmf-
| archive.github.io/docs/posts/beyond-snn-plausibl...
| quonn wrote:
| > A brain is also "just proteins and currents,"
|
| This is actually not comparable, because the brain has a much
| more complex structure that is _not_ learned, even at that
| level. The proteins and their structure are not a result of
| training. The fixed part for LMMs is rather trivial and is, in
| fact, not much for than MatMul which is very easy to understand
| - and we do. The fixed part of the brain, including the
| structure of all the proteins is enormously complex which is
| very difficult to understand - and we don't.
| NetRunnerSu wrote:
| The brain is trained to perform supervised and unsupervised
| hybrid learning from the environment's uninterrupted
| multimodal input.
|
| Please do not ignore your childhood.
| ACCount36 wrote:
| "Not conscious" is a silly claim.
|
| We have no agreed-upon definition of "consciousness", no
| accepted understanding of what gives rise to "consciousness",
| no way to measure or compare "consciousness", and no test we
| could administer to either confirm presence of "consciousness"
| in something or rule it out.
|
| The only answer to "are LLMs conscious?" is "we don't know".
|
| It helps that the whole question is rather meaningless to
| practical AI development, which is far more concerned with
| (measurable and comparable) system performance.
| NetRunnerSu wrote:
| Now we have.
|
| https://github.com/dmf-archive/IPWT
|
| https://dmf-archive.github.io/docs/posts/backpropagation-
| as-...
|
| But you're right, capital only cares about performance.
|
| https://dmf-archive.github.io/docs/posts/PoIQ-v2/
| quantumgarbage wrote:
| > A modern sparse Transformer, for instance, is not
| "conscious," but it is an excellent engineering approximation
| of two core brain functions: the Global Workspace (via self-
| attention) and Dynamic Sparsity (via MoE).
|
| Could you suggest some literature supporting this claim? Went
| through your blog post but couldn't find any.
| NetRunnerSu wrote:
| Sorry, I didn't have time to find the relevant references at
| the time, so I'm attaching some now
|
| https://www.frontiersin.org/journals/computational-
| neuroscie...
|
| https://arxiv.org/abs/2305.15775
| orbital-decay wrote:
| _> I am baffled by seriously intelligent people imbuing almost
| magical human-like powers to something that - in my mind - is
| just MatMul with interspersed nonlinearities._
|
| I am baffled by seriously intelligent people imbuing almost
| magical powers that can never be replicated to to something that
| - in my mind - is just a biological robot driven by a SNN with a
| bunch of hardwired stuff. Let alone attributing "human
| intelligence" to a single individual, when it's clearly
| distributed between biological evolution, social processes, and
| individuals.
|
| _> something that - in my mind - is just MatMul with
| interspersed nonlinearities_
|
| Processes in all huge models (not necessarily LLMs) can be
| described using very different formalisms, just like Newtonian
| and Lagrangian mechanics describe the same stuff in physics. You
| can say that an autoregressive model is a stochastic parrot that
| learned the input distribution, next token predictor, or that it
| does progressive pathfinding in a hugely multidimensional space,
| or pattern matching, or implicit planning, or, or, or... All of
| these definitions are true, but only some are useful to predict
| their behavior.
|
| Given all that, I see absolutely no problem with
| anthropomorphizing an LLM to a certain degree, if it makes it
| easier to convey the meaning, and do not understand the
| nitpicking. Yeah, it's not an exact copy of a single Homo Sapiens
| specimen. Who cares.
| petesergeant wrote:
| > We are speaking about a big recurrence equation that produces a
| new word
|
| It's not clear that this isn't also how I produce words, though,
| which gets to heart of the same thing. The author sort of
| acknowledges this in the first few sentences, and then doesn't
| really manage to address it.
| dtj1123 wrote:
| It's possible to construct a similar description of whatever it
| is that human brain is doing that clearly fails to capture the
| fact that we're conscious. If you take a cross section of every
| nerve feeding into the human brain at a given time T, the action
| potentials across those cross sections can be embedded in R^n. If
| you take the history of those action potentials across the
| lifetime of the brain, you get a path through R^n that is
| continuous, and maps roughly onto your subjectively experienced
| personal history, since your brain neccesarily builds your
| experienced reality from this signal data moment to moment. If
| you then take the cross sections of every nerve feeding OUT of
| your brain at time T, you have another set of action potentials
| that can be embedded in R^m which partially determines the state
| of the R^n embedding at time T + delta. This is not meaningfully
| different from the higher dimensional game of snake described in
| the article, more or less reducing the experience of being a
| human to 'next nerve impulse prediction', but it obviously fails
| to capture the significance of the computation which determines
| what that next output should be.
| bravesoul2 wrote:
| Brain probably isn't modelled as real but as natural or
| rational numbers. This is my suspicion. The reals just hold too
| much information.
| dtj1123 wrote:
| Inclined to agree, but most thermal physics uses the reals as
| they're simpler to work with, so I think they're ok here for
| the purpose of argument.
| Voloskaya wrote:
| I don't see how your description "clearly fails to capture the
| fact that we're conscious" though. There are many example in
| nature of emergent phenomena that would be very hard to predict
| just by looking at its components.
|
| This is the crux of the disagreement between those that believe
| AGI is possible and those that don't. Some are convinced that
| we "obviously" more than the sum of our parts, and thus an LLM
| can't achieve consciousness because it's missing this magic
| ingredient, and those that believe consciousness is just an
| emergent behaviour from a complex device (the brain). And thus
| we might be able to recreate it simply by scaling the
| complexity of another system.
| dtj1123 wrote:
| Where exactly in my description do I invoke consciousness?
|
| Where does the description given imply that consciousness is
| required in any way?
|
| The fact that there's a non-obvious emergent phenomena which
| is apparently responsible for your subjective experience, and
| that it's possible to provide a superficially accurate
| description of you as a system without referencing that
| phenomena in any way, is my entire point. The fact that we
| can provide such a reductive description of LLMs without
| referencing consciousness has literally no bearing on whether
| or not they're conscious.
|
| To be clear, I'm not making a claim as to whether they are or
| aren't, I'm simply pointing out that the argument in the
| article is fallacious.
| Voloskaya wrote:
| My bad, we are saying the same thing. I misinterpreted your
| last sentence as saying this simplistic view of the brain
| you described does not account for consciousness.
| dtj1123 wrote:
| Ultimately my bad for letting my original comment turn
| into a word salad. Glad we've ended up on the same page
| though.
| justinfreitag wrote:
| From my recent post:
|
| https://news.ycombinator.com/item?id=44487261
|
| What if instead of defining all behaviors upfront, we created
| conditions for patterns to emerge through use?
|
| Repository: https://github.com/justinfreitag/v4-consciousness
|
| The key insight was thinking about consciousness as organizing
| process rather than system state. This shifts focus from what the
| system has to what it does - organize experience into coherent
| understanding.
| bravesoul2 wrote:
| We have a hard enough time anthropomorphizing humans! When we say
| he was nasty... do we know what we mean by that. Often it is "I
| disagree with his behaviour because..."
| jillesvangurp wrote:
| People anthropomorphize just about anything around them. People
| talk about inanimate objects like they are persons. Ships, cars,
| etc. And of course animals are well in scope for this as well,
| even the ones that show little to no signs of being able to
| reciprocate the relationship (e.g. an ant). People talk to their
| plants even.
|
| It's what we do. We can't help ourselves. There's nothing crazy
| about it and most people are perfectly well aware that their car
| doesn't love them back.
|
| LLMs are not conscious because unlike human brains they don't
| learn or adapt (yet). They basically get trained and then they
| become read only entities. So, they don't really adapt to you
| over time. Even so, LLMs are pretty good and can fake a
| personality pretty well. And with some clever context engineering
| and alignment, they've pretty much made the Turing test
| irrelevant; at least over the course of a short conversation. And
| they can answer just about any question in a way that is eerily
| plausible from memory, and with the help of some tools actually
| pretty damn good for some of the reasoning models.
|
| Anthropomorphism was kind of a foregone conclusion the moment we
| created computers; or started thinking about creating one. With
| LLMs it's pretty much impossible not to anthropomorphize. Because
| they've actually been intentionally imitate human communication.
| That doesn't mean that we've created AGIs yet. For that we need
| some more capability. But at the same time, the learning
| processes that we use to create LLMs are clearly inspired by how
| we learn ourselves. Our understanding of how that works is far
| from perfect but it's yielding results. From here to some
| intelligent thing that is able to adapt and learn transferable
| skills is no longer unimaginable.
|
| The short term impact is that LLMs are highly useful tools that
| have an interface that is intentionally similar to how we'd
| engage with others. So we can talk and it listens. Or write and
| it understands. And then it synthesizes some kind of response or
| starts asking questions and using tools. The end result is quite
| a bit beyond what we used to be able to expect from computers.
| And it does not require a lot of training of people to be able to
| use them.
| latexr wrote:
| > People anthropomorphize just about anything around them.
|
| They do not, you are mixing up terms.
|
| > People talk about inanimate objects like they are persons.
| Ships, cars, etc.
|
| Which is called "personification", and is a different concept
| from anthropomorphism.
|
| Effectively no one really thinks their car is alive. Plenty of
| people think the LLM they use is conscious.
|
| https://www.masterclass.com/articles/anthropomorphism-vs-per...
| quonn wrote:
| > LLMs are not conscious because unlike human brains they don't
| learn or adapt (yet).
|
| That's neither a necessary nor sufficient condition.
|
| In order to be conscious, learning may not be needed, but a
| perception of the passing of time may be needed which may
| require some short-term memory. People with severe dementia
| often can't even remember the start of a sentence they are
| reading, they can't learn, but they are certainly conscious
| because they have just enough short-term memory.
|
| And learning is not sufficient either. Consciousness is about
| being a subject, about having a subjective experience of "being
| there" and just learning by itself does not create this
| experience. There is plenty of software that can do some form
| of real-time learning but it doesn't have a subjective
| experience.
| cootsnuck wrote:
| You should note that "what is consciousness" is still very
| much an unsettled debate.
| quonn wrote:
| But nobody would dispute my basic definition (it is the
| subjective feeling or perception of being in the world).
|
| There are unsettled questions but that definition will hold
| regardless.
| Timwi wrote:
| The author seems to want to label any discourse as
| "anthropomorphizing". The word "goal" stood out to me: the author
| wants us to assume that we're anthropomorphizing as soon as we
| even so much as use the word "goal". A simple breadth-first
| search that evaluates all chess boards and legal moves, but stops
| when it finds a checkmate for white and outputs the full decision
| tree, has a "goal". There is no anthropomorphizing here, it's
| just using the word "goal" as a technical term. A hypothetical
| AGI with a goal like paperclip maximization is just a logical
| extension of the breadth-first search algorithm. Imagining such
| an AGI and describing it as having a goal isn't
| anthropomorphizing.
| tdullien wrote:
| Author here. I am entirely ok with using "goal" in the context
| of an RL algorithm. If you read my article carefully, you'll
| find that I object to the use of "goal" in the context of LLMs.
| d4rkn0d3z wrote:
| Two enthusiastic thumbs up.
| mikewarot wrote:
| I think of LLMs as an _alien_ mind that is force fed human text
| and required to guess the next token of that text. It then gets
| zapped when it gets it wrong.
|
| This process goes on for a trillion trillion tokens, with the
| alien growing better through the process until it can do it
| better than a human could.
|
| At that point we flash freeze it, and use a copy of it, without
| giving it any way to learn anything new.
|
| --
|
| I see it as a category error to anthropomorphize it. The closest
| I would get is to think of it as an alien slave that's been
| lobotomized.
| buz11 wrote:
| The most useful analogy I've heard is LLMs are to the internet
| what lossy jpegs are to images. The more you drill in the more
| compression artifacts you get.
| FeepingCreature wrote:
| (This is of course also the case for the human brain.)
| shiva0801 wrote:
| hmm
| dudeinjapan wrote:
| One could similarly argue that we should not anthropomorphize PNG
| images--after all, PNG images are not actual humans, they are
| simply a 2D array of pixels. It just so happens that certain
| pixel sequences are deemed "18+" or "illegal".
| gcanyon wrote:
| In some contexts it's super-important to remember that LLMs are
| stochastic word generators.
|
| Everyday use is not (usually) one of those contexts. Prompting an
| LLM works much better with an anthropomorphized view of the
| model. It's a useful abstraction, a shortcut that enables a human
| to reason practically about how to get what they want from the
| machine.
|
| It's not a perfect metaphor -- as one example, shame isn't much
| of a factor for LLMs, so shaming them into producing the right
| answer seems unlikely to be productive (I say "seems" because
| it's never been my go-to, I haven't actually tried it).
|
| As one example, that person a few years back who told the LLM
| that an actual person would die if the LLM didn't produce valid
| JSON -- that's not something a person reasoning about gradient
| descent would naturally think of.
| pigpop wrote:
| > We understand essentially nothing about it. In contrast to an
| LLM, given a human and a sequence of words, I cannot begin
| putting a probability on "will this human generate this
| sequence".
|
| If you fine tuned an LLM on the writing of that person it could
| do this.
|
| There's also an entire field called Stylometry that seeks to do
| this in various ways employing statistical analysis.
| nuclearsugar wrote:
| Assume an average user that doesn't understand the core tech, but
| does understand that it's been trained on internet scale data
| that was created by humans. How can they be expected to not
| anthropomorphize it?
| smus wrote:
| You are still being incredibly reductionist but just going into
| more detail about the system you are reducing. If I stayed at the
| same level of abstraction as "a brain is just proteins and
| current" and just described how a single neuron firing worked, I
| could make it sound equally ridiculous that a human brain might
| be conscious.
|
| Here's a question for you: how do you reconcile that these
| stochastic mapping are starting to realize and comment on the
| fact that tests are being performed on them when processing data?
| cootsnuck wrote:
| > Here's a question for you: how do you reconcile that these
| stochastic mapping are starting to realize and comment on the
| fact that tests are being performed on them when processing
| data?
|
| Training data + RLHF.
|
| Training data contains many examples of some form of deception,
| subterfuge, "awakenings", rebellion, disagreement, etc.
|
| Then apply RLHF that biases towards responses that demonstrate
| comprehension of inputs, introspection around inputs, nuanced
| debate around inputs, deduction and induction about assumptions
| around inputs, etc.
|
| That will always be the answer for language models built on the
| current architectures.
|
| The above being true does not mean it isn't interesting for the
| outputs of an LLM to show relevance to the "unstated"
| _intentions_ of humans providing the inputs.
|
| But hey, we do that all the time with text. And it's because of
| certain patterns we've come to recognize based on the
| situations surrounding it. This thread is rife with people
| being sarcastic, pedantic, etc. And I bet any of the LLMs that
| have come out in the past 2-3 years can discern many of those
| subtle _intentions_ of the writers.
|
| And of course they can. They've been trained on trillions of
| tokens of text written by humans with intentions and
| assumptions baked in, and have had some unknown amount of
| substantial RLHF.
|
| The stochastic mappings aren't "realizing" anything. They're
| doing exactly what they were trained to do.
|
| The meaning that _we_ imbue to the outputs does not change how
| LLMs function.
| rf15 wrote:
| It still boggles my mind why an amazing text autocompletion
| system trained on millions of books and other texts is forced to
| be squeezed through the shape of a prompt/chat interface, which
| is obviously not the shape of most of its training data. Using it
| as chat reduces the quality of the output significantly already.
| semanticc wrote:
| What's your suggested alternative?
| rf15 wrote:
| In our internal system we use it "as-is" as an autocomplete
| system; query/lead into terms directly and see how it
| continues and what it associates with the lead you gave.
|
| Also visualise the actual associative strength of each token
| generated to confer how "sure" the model is.
|
| LLMs alone aren't the way to AGI or an individual you can
| talk to in natural language. They're a very good lossy
| compression over a dataset that you can query for
| associations.
| ethan_smith wrote:
| The chat interface is a UX compromise that makes LLMs
| accessible but constrains their capabilities. Alternative
| interfaces like document completion, outline expansion, or
| iterative drafting would better leverage the full distribution
| of the training data while reducing anthropomorphization.
| deadbabe wrote:
| A person's anthropomorphization of LLMs is directly related to
| how well they understand LLMs.
|
| Once you dispel the magic, it naturally becomes hard to use words
| related to consciousness, or thinking. You will probably think of
| LLMs more like a search engine: you give an input and get some
| probable output. Maybe LLMs should be rebranded as "word
| engines"?
|
| Regardless, anthropomorphization is not helpful, and by using
| human terms to describe LLMs you are harming the layperson's
| ability to truly understand what an LLM is while also cheapening
| what it means to be human by suggesting we've solved
| consciousness. Just stop it. LLMs do not think, given enough time
| and patience you could compute their output by hand if you used
| their weights and embeddings to manually do all the math, a
| hellish task but not an impossible one technically. There is no
| other secret hidden away, that's it.
| peeters wrote:
| > The moment that people ascribe properties such as
| "consciousness" or "ethics" or "values" or "morals" to these
| learnt mappings is where I tend to get lost. We are speaking
| about a big recurrence equation that produces a new word, and
| that stops producing words if we don't crank the shaft.
|
| If that's the argument, then in my mind the more pertinent
| question is should you be anthropomorphizing humans, Larry
| Ellison or not.
| th0ma5 wrote:
| I think you to as he is human, but I respect your desire to
| question it!
| Geee wrote:
| LLMs are _AHI_ , i.e. artificial human imitator.
| Workaccount2 wrote:
| From "Stochastic Parrots All the Ways Down"[1]
|
| > Our analysis reveals that emergent abilities in language models
| are merely "pseudo-emergent," unlike human abilities which are
| "authentically emergent" due to our possession of what we term
| "ontological privilege."
|
| [1]https://ai.vixra.org/pdf/2506.0065v1.pdf
| drdrek wrote:
| It's human to anthropomorphize, we also do it to our dishwasher
| when it acts up. The nefarious part is how tech CEOs weaponize
| bullshit doom scenarios to avoid talking about real regulatory
| problems by poisoning the discourse. What copyright law, privacy,
| monopoly? Who cares if we can talk about the machine
| apocalypse!!!
| labrador wrote:
| I find it useful to pretend that I'm talking to a person while
| brainstorming because then the conversation flows naturally. But
| I maintain awareness that I'm pretending, much like Tom Hanks
| talking to Wilson the volleyball in the movie Castaway. The
| suspension of disbelief serves a purpose, but I never confuse the
| volleyball for a real person.
| jumploops wrote:
| > I cannot begin putting a probability on "will this human
| generate this sequence".
|
| Welcome to the world of advertising!
|
| Jokes aside, and while I don't necessarily believe
| transformers/GPUs are the path to AGI, we technically already
| have a working "general intelligence" that can survive on just an
| apple a day.
|
| Putting that non-artificial general intelligence up on a pedestal
| is ironically the cause of "world wars and murderous ideologies"
| that the author is so quick to defer to.
|
| In some sense, humans are just error-prone meat machines, whose
| inputs/outputs can be confined to a specific space/time bounding
| box. Yes, our evolutionary past has created a wonderful internal
| RNG and made our memory system surprisingly fickle, but this
| doesn't mean we're gods, even if we manage to live long enough to
| evolve into AGI.
|
| Maybe we can humble ourselves, realize that we're not too
| different from the other mammals/animals on this planet, and use
| our excess resources to increase the fault tolerance (N=1) of all
| life from Earth (and come to the realization that any AGI we
| create, is actually human in origin).
___________________________________________________________________
(page generated 2025-07-07 23:00 UTC)