[HN Gopher] LLMs understand nullability
       ___________________________________________________________________
        
       LLMs understand nullability
        
       Author : mattmarcus
       Score  : 144 points
       Date   : 2025-04-07 14:52 UTC (8 hours ago)
        
 (HTM) web link (dmodel.ai)
 (TXT) w3m dump (dmodel.ai)
        
       | EncomLab wrote:
       | This is like claiming a photorestor controlled night light
       | "understands when it is dark" or that a bimetallic strip
       | thermostat "understands temperature". You can say those words,
       | and it's syntactically correct but entirely incorrect
       | semantically.
        
         | aSanchezStern wrote:
         | The post includes this caveat. Depending on your philosophical
         | position about sentience you might say that LLMs can't possibly
         | "understand" anything, and the post isn't trying to have that
         | argument. But to the extent that an LLM can "understand"
         | anything, you can study its understanding of nullability.
        
           | keybored wrote:
           | People don't use "understand" for machines in science because
           | people may or may not believe in the sentience of machines.
           | That would be a weird catering to panpsychism.
        
         | nsingh2 wrote:
         | Where is the boundary where this becomes semantically correct?
         | It's easy for these kinds of discussions to go in circles,
         | because nothing is well defined.
        
           | nativeit wrote:
           | Hard to define something that science has yet to formally
           | outline, and is largely still in the realm of religion.
        
             | EMIRELADERO wrote:
             | That depends entirely on whether you believe understanding
             | requires consciousness.
             | 
             | I believe that the type of understanding demonstrated here
             | doesn't. Consciousness only comes into play when we become
             | aware that such understanding has taken place, not on the
             | process itself.
        
             | stefl14 wrote:
             | Shameless plug of personal blog post, but relevant. Still
             | not fully edited, so writing is a bit scattered, but crux
             | is we now have the framework for talking about
             | consciousness intelligently. It's not as mysterious as in
             | the past, considering advances in non-equilibrium
             | thermodynamics and the Free Energy Principle in particular.
             | 
             | https://stefanlavelle.substack.com/p/i-am-therefore-i-feel
        
         | robotresearcher wrote:
         | You declare this very plainly without evidence or argument, but
         | this is an age-old controversial issue. It's not self-evident
         | to everyone, including philosophers.
        
           | mubou wrote:
           | It's not age-old nor is it controversial. LLMs aren't
           | intelligent by any stretch of the imagination. Each
           | word/token is chosen as that which is statistically most
           | likely to follow the previous. There is no capability for
           | understanding in the design of an LLM. It's not a matter of
           | opinion; this just isn't how an LLM works.
           | 
           | Any comparison to the human brain is missing the point that
           | an LLM only simulates one small part, and that's notably
           | _not_ the frontal lobe. That 's required for intelligence,
           | reasoning, self-awareness, etc.
           | 
           | So, no, it's not a question of philosophy. For an AI to enter
           | that realm, it would need to be more than just an LLM with
           | some bells and whistles; an LLM _plus_ something else,
           | perhaps, something fundamentally different which does not yet
           | currently exist.
        
             | aSanchezStern wrote:
             | Many people don't think we have any good evidence that our
             | brains aren't essentially the same thing: a stochastic
             | statistical model that produces outputs based on inputs.
        
               | SJC_Hacker wrote:
               | Thats probably the case 99% of the time.
               | 
               | But that 1% is pretty important.
               | 
               | For example, they are dismal at math problems that aren't
               | just slight variations of problems they've seen before.
               | 
               | Here's one by blackandredpenn where ChatGPT insisted the
               | solution to problem that could be solved by high school /
               | talented middle school students was correct, even after
               | trying to convince it it was wrong.
               | https://youtu.be/V0jhP7giYVY?si=sDE2a4w7WpNwp6zU&t=837
               | 
               | Rewind earlier to see the real answer
        
               | LordDragonfang wrote:
               | > For example, they are dismal at math problems that
               | aren't just slight variations of problems they've seen
               | before.
               | 
               | I know plenty of teachers who would describe their
               | students the exact same way. The difference is mostly one
               | of magnitude (of delta in competence), not quality.
               | 
               | Also, I think it's important to note that by "could be
               | solved by high school / talented middle school students"
               | you mean "specifically designed to challenge the top ~1%
               | of them". Because if you say "LLMs _only_ manage to beat
               | 99% of middle schoolers at math ", the claim seems a
               | whole lot different.
        
               | jquery wrote:
               | ChatGPT o1 pro mode solved it on the first try, after 8
               | minutes and 53 seconds of "thinking":
               | 
               | https://chatgpt.com/share/67f40cd2-d088-8008-acd5-fe9a978
               | 4f3...
        
               | SJC_Hacker wrote:
               | The problem is how do you know that its correct ...
               | 
               | A human would probably say "I don't know how to solve the
               | problem". But ChatGPT free version is confidentially
               | wrong ..
        
               | mubou wrote:
               | Of course, you're right. Neural networks mimic exactly
               | that after all. I'm certain we'll see an ML model
               | developed someday that fully mimics the human brain. But
               | my point is an LLM isn't that; it's a language model
               | only. I know it can _seem_ intelligent sometimes, but it
               | 's important to understand what it's actually doing and
               | not ascribe feelings to it that don't exist in reality.
               | 
               | Too many people these days are forgetting this key point
               | and putting a _dangerous amount_ of faith in ChatGPT etc.
               | as a result. I 've seen DOCTORS using ChatGPT for
               | diagnosis. Ignorance is scary.
        
               | nativeit wrote:
               | Care to share any of this good evidence?
        
               | goatlover wrote:
               | Do biologists and neuroscientists not have any good
               | evidence or is that just computer scientists and
               | engineers speaking outside of their field of expertise?
               | There's always been this danger of taking computer and
               | brain comparisons too literally.
        
               | root_axis wrote:
               | If you're willing to torture the analogy you can find a
               | way to describe literally anything as a system of outputs
               | based on inputs. In the case of the brain to LLM
               | comparison, people are inclined to do it because they're
               | eager to anthropomorophize something that produces text
               | they can interpret as a speaker, but it's totally
               | incorrect to suggest that our brains are "essentially the
               | same thing" as LLMs. The comparison is specious even on a
               | surface level. It's like saying that birds and planes are
               | "essentially the same thing" because flight was achieved
               | by modeling planes after birds.
        
             | gwd wrote:
             | > Each word/token is chosen as that which is statistically
             | most likely to follow the previous.
             | 
             | The best way to predict the weather is to have a model
             | which approximates the weather. The best way to predict the
             | results of a physics simulation is to have a model which
             | approximates the physical bodies in question. The best way
             | to predict what word a human is going to write next is to
             | have a model that approximates human thought.
        
               | mubou wrote:
               | LLMs don't approximate human _thought_ , though. They
               | approximate _language_. That 's it.
               | 
               | Please, I'm begging you, go read some papers and watch
               | some videos about machine learning and how LLMs actually
               | work. It is not "thinking."
               | 
               | I fully realize neural networks _can_ approximate human
               | thought -- but we are not there yet, and when we do get
               | there, it will be something that is not an LLM, because
               | an LLM is not capable of that -- it 's not designed to
               | be.
        
               | handfuloflight wrote:
               | Isn't language expressed thought?
        
               | fwip wrote:
               | Language can be a (lossy) serialization of thought, yes.
               | But language is not thought, nor inherently produced by
               | thought. Most people agree that a process randomly
               | producing grammatically correct sentences is not
               | thinking.
        
               | Sohcahtoa82 wrote:
               | > it will be something that is not an LLM
               | 
               | I think it will be very similar in architecture.
               | 
               | Artificial neural networks already are approximating how
               | neurons in a brain work, it's just at a scale that's
               | several orders of magnitude smaller.
               | 
               | Our limiting factor for reaching brain-like intelligence
               | via ANN is probably more of a hardware limitation. We
               | would need over 100 TB to store the weights for the
               | neurons, not to mention the ridiculous amount of compute
               | to run it.
        
               | codedokode wrote:
               | > not to mention the ridiculous amount of compute to run
               | it.
               | 
               | How does the brain computes the weights then? Or maybe
               | your assumption than brain is equivalent to a
               | mathematical NN is wrong?
        
               | gwd wrote:
               | > LLMs don't approximate human thought, though.
               | ...Please, I'm begging you, go read some papers and watch
               | some videos about machine learning and how LLMs actually
               | work.
               | 
               | I know how LLMs work; so let me beg you in return, listen
               | to me for a second.
               | 
               | You have a _theoretical-only_ argument: LLMs do text
               | prediction, and therefore it is not possible for them to
               | actually think. And since it 's not possible for them to
               | actually think, you don't need to consider any other
               | evidence.
               | 
               | I'm telling you, there's a flaw in your argument: In
               | actuality, _the best way to do text prediction is to
               | think_. An LLM that could actually think would be able to
               | do text prediction better than an LLM that can 't
               | actually think; and the better an LLM is able to
               | approximate human thought, the better its predictions
               | will be. The fact that they're predicting text in no way
               | proves that there's no thinking going on.
               | 
               | Now, that doesn't prove that LLMs _actually are_
               | thinking; but it does mean that they _might_ be thinking.
               | And so you should think about how you would know if they
               | 're actually thinking or not.
        
             | wongarsu wrote:
             | That argument only really applies to base models. After
             | that we train them to give correct and helpful answers, not
             | just answers that are statistically probable in the
             | training data.
             | 
             | But even if we ignore that subtlety, it's not obvious that
             | training a model to predict the next token doesn't lead to
             | a world model and an ability to apply it. If you gave a
             | human 10 physics books and told them that in a month they
             | have a test where they have to complete sentences from the
             | book, which strategy do you think is more successful:
             | trying to memorize the books word by word or trying to
             | understand the content?
             | 
             | The argument that understanding is just an advanced form of
             | compression far predates LLMs. LLMs clearly lack many of
             | the facilities humans have. Their only concept of a
             | physical world comes from text descriptions and stories.
             | They have a very weird form of memory, no real agency (they
             | only act when triggered) and our attempts at replicating an
             | internal monologue are very crude. But _understanding_ is
             | one thing they may well have, and if the current generation
             | of models doesn 't have it the next generation might
        
             | robotresearcher wrote:
             | The thermostat analogy, and equivalents, are age-old.
        
           | fwip wrote:
           | Philosophers are often the last people to consider something
           | to be settled. There's very little in the universe that they
           | can all agree is true.
        
         | fallingknife wrote:
         | Or like saying the photoreceptors in your retina understand
         | when it's dark. Or like claiming the temperature sensitive ion
         | channels in your peripheral nervous system understand how hot
         | it is.
        
           | throwuxiytayq wrote:
           | Or like saying that the tangled web of neurons receiving
           | signals from these understands anything about these subjects.
        
           | nativeit wrote:
           | Describing the mechanics of nervous impulses != describing
           | consciousness.
        
             | LordDragonfang wrote:
             | Which is the point, since describing the mechanics of LLM
             | architectures do not inherently grant knowledge of whether
             | or not it is "conscious"
        
           | throw4847285 wrote:
           | This is a fallacy I've seen enough on here that I think it
           | needs a name. Maybe the fallacy of Theoretical Reducibility
           | (doesn't really roll off the tongue)?
           | 
           | When challenged, everybody becomes an eliminative materialist
           | even if it's inconsistent with their other views. It's very
           | weird.
        
         | dleeftink wrote:
         | I'd say the opposite also applies: to the extent LLMs have an
         | internal language, we understand very little of it.
        
       | thmorriss wrote:
       | very cool.
        
       | gopiandcode wrote:
       | The visualisation of how the model sees nullability was
       | fascinating.
       | 
       | I'm curious if this probing of nullability could be composed with
       | other LLM/ML-based python-typing tools to improve their accuracy.
       | 
       | Maybe even focusing on interfaces such as nullability rather than
       | precise types would work better with a duck-typed language like
       | python than inferring types directly (i.e we don't really care if
       | a variable is an int specifically, but rather that it supports
       | _add or _sub etc. that it is numeric).
        
         | qsort wrote:
         | > we don't really care if a variable is an int specifically,
         | but rather that it supports _add or _sub etc. that it is
         | numeric
         | 
         | my brother in christ, you invented Typescript.
         | 
         | (I agree on the visualization, it's very cool!)
        
           | gopiandcode wrote:
           | I am more than aware of Typescript, you seem to have
           | misunderstood my point: I was not describing a particular
           | type system (of which there have been many of this ilk) but
           | rather conjecturing that targeting interfaces specifically
           | might make LLM-based code generation/type inference more
           | effective.
        
             | qsort wrote:
             | Yeah, I read that comment wrong. I didn't mean to come off
             | like that. Sorry.
        
         | jayd16 wrote:
         | Why not just use a language with checked nullability? What's
         | the point of an LLM using a duck typing language anyway?
        
           | aSanchezStern wrote:
           | This post actually mostly uses the subset of Python where
           | nullability _is_ checked. The point is not to introduce new
           | LLM capabilities, but to understand more about how existing
           | LLMs are reasoning about code.
        
       | nonameiguess wrote:
       | As every fifth thread becomes some discussion of LLM
       | capabilities, I think we need to shift the way we talk about this
       | to be less like how we talk about software and more like how we
       | talk about people.
       | 
       | "LLM" is a valid category of thing in the world, but it's not a
       | thing like Microsoft Outlook that has well-defined capabilities
       | and limitations. It's frustrating reading these discussions that
       | constantly devolve into one person saying they tried something
       | that either worked or didn't, then 40 replies from other people
       | saying they got the opposite result, possibly with a different
       | model, different version, slight prompt altering, whatever it is.
       | 
       | LLMs possibly have the capability to understand nullability, but
       | that doesn't mean every instance of every model will consistently
       | understand that or anything else. This is the same way humans
       | operate. Humans can run a 4-minute mile. Humans can run a
       | 10-second 100 meter dash. Humans can develop and prove novel math
       | theorems. But not all humans, not all the time, performance
       | depends upon conditions, timing, luck, and there has probably
       | never been a single human who can do all three. It takes practice
       | in one specific discipline to get really good at that, and this
       | practice competes with or even limits other abilities. For LLMs,
       | this manifests in differences with the way they get fine-tuned
       | and respond to specific prompt sequences that should all be
       | different ways of expressing the same command or query but
       | nonetheless produce different results. This is very different
       | from the way we are used to machines and software behaving.
        
         | aSanchezStern wrote:
         | Yeah the link title is overclaiming a bit, the actual post
         | title doesn't make such a general claim, and the post itself
         | examines several specific models and compares their
         | understanding.
        
         | root_axis wrote:
         | Encouraging the continued anthropomorphization of these models
         | is a bad idea, especially in the context of discussing their
         | capabilities.
        
       | nativeit wrote:
       | We're all just elementary particles being clumped together in
       | energy gradients, therefore my little computer project is
       | sentient--this is getting absurd.
        
         | nativeit wrote:
         | Sorry, this is more about the discussion of this article than
         | the article itself. The moving goal posts that acolytes use to
         | declare consciousness are becoming increasingly cult-y.
        
           | wongarsu wrote:
           | We spent 40 years moving the goal posts on what constitutes
           | AI. Now we seem to have found an AI worthy of that title and
           | instead start moving the goal posts on "consciousness",
           | "understanding" and "intelligence".
        
             | cayley_graph wrote:
             | Indeed, science is a process of discovery and adjusting
             | goals and expectations. It is not a mountain to be
             | summited. It is highly telling that the LLM boosters do not
             | understand this. Those with a genuine interest in pushing
             | forward our understanding of cognition do.
        
               | delusional wrote:
               | They believe that once they reach this summit everything
               | else will be trivial problems that can be posed to the
               | almighty AI. It's not that they don't understand the
               | process, it's that they think AI is going to disrupt that
               | process.
               | 
               | They literally believe that the AI will supersede the
               | scientific process. It's crypto shit all over again.
        
               | redundantly wrote:
               | Well, if that summit were reached and AI is able to
               | improve itself trivially, I'd be willing to cede that
               | they've reached their goal.
               | 
               | Anything less than that, meh.
        
             | goatlover wrote:
             | Those have all been difficult words to define with much
             | debate over that past 40 years or longer.
        
             | bluefirebrand wrote:
             | > Now we seem to have found an AI worthy of that title and
             | instead start moving the goal posts on "consciousness",
             | "understanding" and "intelligence".
             | 
             | We didn't "find" AI, we invented systems that some people
             | want to call AI, and some people aren't convinced it meets
             | the bar
             | 
             | It is entirely reasonable for people to realize we set the
             | bar too low when it is a bar we invented
        
               | darkerside wrote:
               | What should the bar be? Should it be higher than it is
               | for the average human? Or even the least intelligent
               | human?
        
               | joe8756438 wrote:
               | there is no such bar.
               | 
               | We don't even have a good way to quantify human ability.
               | The idea that we could suddenly develop a technique to
               | quantify human ability because we now have a piece of
               | technology that would benefit from that quantification is
               | absurd.
               | 
               | That doesn't mean we shouldn't try to measure the ability
               | of an LLM. But it does mean that the techniques used to
               | quantify an LLMs ability are not something that can be
               | applied to humans outside of narrow focus areas.
        
               | bluefirebrand wrote:
               | Personally I don't care what the bar is, honestly
               | 
               | Call it AI, call it LLMs, whatever
               | 
               | Just as long as we continue to recognize that it is a
               | tool that humans can use, and don't start trying to treat
               | it as a human, or as a life, and I won't complain
               | 
               | I'm saving my anger for when idiots start to argue that
               | LLMs are alive and deserve human rights
        
             | Sohcahtoa82 wrote:
             | > We spent 40 years moving the goal posts on what
             | constitutes AI.
             | 
             | Who is "we"?
             | 
             | I think of "AI" as a pretty all-encompassing term. ChatGPT
             | is AI, but so is the computer player in the 1995 game
             | Command and Conquer, among thousands of other games. Heck,
             | I might even call the ghosts in Pac-man "AI", even if their
             | behavior is extremely simple, predictable, and even
             | exploitable once you understand it.
        
             | acchow wrote:
             | > Now we seem to have found an AI worthy of that title and
             | instead start moving the goal posts on "consciousness"
             | 
             | The goalposts already differentiated between "totally
             | human-like" vs "actually conscious"
             | 
             | See also Philosophical Zombie thought experiment from the
             | 70s.
        
             | arkh wrote:
             | The original meaning of mechanical Turk is about a chess
             | hoax and how it managed to make people think it was a
             | thinking machine.
             | https://en.wikipedia.org/wiki/Mechanical_Turk
             | 
             | The current LLM anthropomorphism may soon be known as the
             | silicon Turk. Managing to make people think they're AI.
        
               | 6510 wrote:
               | The mechanical Turk did something truly magical. Everyone
               | stopped moaning that automation was impossible because
               | most machines (while some absurdly complex) were many
               | orders of magnitude simpler than chess.
               | 
               | The initial LLMs simply lied about everything. If you
               | happened to know something it was rather shocking but for
               | topics you knew nothing about you got a rather convincing
               | answer. Then the arms race begun and now the lies are so
               | convincing we are at viable robot overlords.
        
             | 6510 wrote:
             | My joke was that the what it cant do debate changed into
             | what it shouldn't be allowed to.
        
               | wizardforhire wrote:
               | There ARE no jokes aloud on hn.
               | 
               | Look I'm no stranger to love. you know the rules and so
               | do I... you can't find this conversation with any other
               | guy.
               | 
               | But since the parent was making a meta commentary on this
               | conversation I'd like to introduce everyone here as
               | Kettle to a friend of mine known as #000000
        
           | drodgers wrote:
           | Who cares about consciousness? This is just a mis-direction
           | of the discussion. Ditto for 'intelligence' and
           | 'understanding'.
           | 
           | Let's talk about what they can do and where that's trending.
        
         | og_kalu wrote:
         | Well you can say it doesn't understand, but then you don't have
         | a very useful definition of the word.
         | 
         | You can say this is not 'real' understanding but you like many
         | others will be unable to clearly distinguish this 'fake'
         | understanding from 'real' understanding in a verifiable
         | fashion, so you are just playing a game of meaningless
         | semantics.
         | 
         | You really should think about what kind of difference is
         | supposedly so important yet will not manifest itself in any
         | testable way - an invented one.
        
       | plaineyjaney wrote:
       | This is really interesting! Intuitively it's hard to grasp that
       | you can just subtract two average states and get a direction
       | describing the model's perception of nullability.
        
         | nick__m wrote:
         | The original word2vec example might be easier to understand:
         | vec(King) - vec(Man) + vec(Woman) = vec(Queen)
        
       | btown wrote:
       | There seems to be a typo in OP's "Visualizing Our Results" - but
       | things make perfect sense if red is non-nullable, green is
       | nullable.
       | 
       | I'd be really curious to see where the "attention" heads of the
       | LLM look when evaluating the nullability of any given token. Does
       | just it trust the Optional[int] return type signature of the
       | function, or does it also skim through the function contents to
       | understand whether that's correct?
       | 
       | It's fascinating to me to think that the senior developer
       | skillset of being able to skim through complicated code, mentally
       | make note of different tokens of interest where assumptions may
       | need to be double-checked, and unravel that cascade of
       | assumptions to track down a bug, is something that LLMs already
       | excel at.
       | 
       | Sure, nullability is an example where static type checkers do
       | well, and it makes the article a bit silly on its own... but
       | there are all sorts of assumptions that aren't captured well by
       | type systems. There's been a ton of focus on LLMs for code
       | generation; I think that LLMs for debugging makes for a
       | fascinating frontier.
        
         | aSanchezStern wrote:
         | Thanks for pointing that out, it's fixed now.
        
       | sega_sai wrote:
       | One thing that is exciting in the text is an attempt to go away
       | from describing whether LLM 'understands' which I would argue an
       | ill posed question, but instead rephrase it in terms of something
       | that can actually be measured.
       | 
       | It would be good to list a few possible ways of interpreting
       | 'understanding of code'. It could possibly include: 1) Type
       | inference for the result 2) nullability 3) runtime asymptotics 4)
       | What the code does
        
         | kazinator wrote:
         | 5) predicting a bunch of language tokens from the compressed
         | database of knowledge encoded as weights, calculated out of
         | numerous examples that exploit nullability in code and talk
         | about it in accompanying text.
        
         | empath75 wrote:
         | Is there any way you can tell whether a human understands
         | something other than by asking them a question and judging
         | their answer?
         | 
         | Nobody interrogates each other's internal states when judging
         | whether someone understands a topic. All we can judge it based
         | on are the words they produce or the actions they take in
         | response to a situation.
         | 
         | The way that systems or people arrive at a response is sort of
         | an implementation detail that isn't that important when judging
         | whether a system does or doesn't understand something. Some
         | people understand a topic on an intuitive, almost unthinking
         | level, and other people need to carefully reason about it, but
         | they both demonstrate understanding by how they respond to
         | questions about it in the exact same way.
        
           | cess11 wrote:
           | No, most people absolutely use non-linguistic, involuntary
           | cues when judging the responses of other people.
           | 
           | To not do that is commonly associated with things like being
           | on the spectrum or cognitive deficiencies.
        
             | empath75 wrote:
             | On a message board? Do you have theories about whether
             | people on this thread understand or don't understand what
             | they're talking about?
        
             | LordDragonfang wrote:
             | You are saying no while presenting nothing to contradict
             | what GP said.
             | 
             | Judging someone's external "involuntary cues" is not
             | interrogating their internal state. It is, as you said,
             | judging their response (a synonym for "answer") - and that
             | judgment is also highly imperfect.
             | 
             | (It's worth noting that focusing so much on the someone's
             | body language and tone that you ignore the actual words
             | they said is a communication issue associated with not
             | being on the spectrum, or being too allistic)
        
             | dambi0 wrote:
             | The fact we have labels for communication problems caused
             | by failure to understand non-verbal cues doesn't tell us
             | that non-verbal cues are necessary for understanding
        
       | stared wrote:
       | Once LLMs fully understand nullability, they will cease to use
       | that.
       | 
       | Tony Hoare called it "a billion-dollar mistake" (https://en.wikip
       | edia.org/wiki/Tony_Hoare#Apologies_and_retra...), Rust had made
       | core design choices precisely to avoid this mistake.
       | 
       | In practical AI-assisted coding in TypeScript I have found that
       | it is good to add in Cursor Rules to avoid anything nullable,
       | unless it is a well-designed choice. In my experience, it makes
       | code much better.
        
         | hombre_fatal wrote:
         | I don't get the problem with null values as long as you can
         | statically reason about them which wasn't even the case in Java
         | where you had to always do runtime null-guards before access.
         | 
         | But in Typescript, who cares? You'd be forced to handle null
         | the same way you'd be forced to handle Maybe<T> = None |
         | Just<T> except with extra, unidiomatic ceremony in the latter
         | case.
        
           | ngruhn wrote:
           | What you mean with unidiomatic? If a language has
           | Maybe<T> = None | Just<T>
           | 
           | as a core concept then it's idiomatic by definition.
        
       | tanvach wrote:
       | Dear future authors: please run multiple iterations and report
       | the _probability_.
       | 
       | From: 'Keep training it, though, and eventually it will learn to
       | insert the None test'
       | 
       | To: 'Keep training it, though, and eventually the probability of
       | inserting the None test goes up to xx%'
       | 
       | The former is just horse poop, we all know LLMs generate big
       | variance in output.
        
         | aSanchezStern wrote:
         | If you're interested in a more scientific treatment of the
         | topic, the post links to a technical report which reports the
         | numbers in detail. This post is instead an attempt to explain
         | the topics to a more general audience, so digging into the
         | weeds isn't very useful.
        
       | kazinator wrote:
       | LLMs "understand" nullability to the extent that texts they have
       | been trained on contain examples of nullability being used in
       | code, together with remarks about it in natural language. When
       | the right tokens occur in your query, other tokens get filled in
       | from that data in a clever way. That's all there is to it.
       | 
       | The LLM will not understand, and is incapable of developing an
       | understanding, of a concept not present in its training data.
       | 
       | If try to teach it the basics of the misunderstood concept in
       | your chat, it will reflect back a verbal acknowledgement,
       | restated in different words, with some smoothly worded
       | embellishments which looks like the external trappings of
       | understanding. It's only a mirage though.
       | 
       | The LLM will code anything, no matter how novel, if you give it
       | detailed enough instructions and clarifications. That's just a a
       | language translation task from pseudo-code to code. Being a
       | language model, it's designed for that.
       | 
       | LLM is like the bar waiter who has picked up on economics and
       | politics talk, and is able to interject with something clever
       | sounding, to the surprise of the patrons. Gee, how does he or she
       | understand the workings of the international monetary fund, and
       | what the hell are they doing working in this bar?
        
       | gwern wrote:
       | > Interestingly, for models up to 1 billion parameters, the loss
       | actually starts to increase again after reaching a minimum. This
       | might be because as training continues, the model develops more
       | complex, non-linear representations that our simple linear probe
       | can't capture as well. Or it might be that the model starts to
       | overfit on the training data and loses its more general concept
       | of nullability.
       | 
       | Double descent?
        
       | lsy wrote:
       | The article puts scare quotes around "understand" etc. to try to
       | head off critiques around the lack of precision or scientific
       | language, but I think this is a really good example of where
       | casual use of these terms can get pretty misleading.
       | 
       | Because code LLMs have been trained on the syntactic form of the
       | program and not its execution, it's not correct -- even if the
       | correlation between variable annotations and requested
       | completions was _perfect_ (which it 's not) -- to say that the
       | model "understands nullability", because nullability means that
       | under execution the variable in question can become null, which
       | is not a state that it's possible for a model trained only on a
       | million programs' syntax to "understand". You could get the same
       | result if e.g. "Optional" means that the variable becomes
       | poisonous and checking "> 0" is eating it, and "!= None" is an
       | antidote. Human programmers _can_ understand nullability because
       | they 've hopefully _run_ programs and understand the semantics of
       | making something null.
       | 
       | The paper could use precise, scientific language (e.g. "the
       | presence of nullable annotation tokens correlates to activation
       | of vectors corresponding to, and emission of, null-check tokens
       | with high precision and accuracy") which would help us understand
       | what we can rely on the LLM to do and what we can't. But it seems
       | like there is some subconscious incentive to muddy how people see
       | these models in the hopes that we start ascribing things to them
       | that they aren't capable of.
        
         | creatonez wrote:
         | > Because code LLMs have been trained on the syntactic form of
         | the program and not its execution
         | 
         | What makes you think this? It has been trained on plenty of
         | logs and traces, discussion of the behavior of various code,
         | REPL sessions, etc. Code LLMs are trained on all human language
         | and wide swaths of whatever machine-generated text is
         | available, they are not restricted to just code.
        
         | uh_uh wrote:
         | We don't really have a clue what they are and aren't capable
         | of. Prior to the LLM-boom, many people - and I include myself
         | in this - thought it'd be impossible to get to the level of
         | capability we have now purely from statistical methods and here
         | we are. If you have a strong theory that proves some bounds on
         | LLM-capability, then please put it forward. In the absence of
         | that, your sceptical attitude is just as sus as the article's.
        
           | Baeocystin wrote:
           | I majored in CogSci at UCSD in the 90's. I've been interested
           | and active in the machine learning world for decades. The LLM
           | boom took me completely and utterly by surprise, continues to
           | do so, and frankly I am most mystified by the folks who
           | downplay it. These giant matrixes are already so far beyond
           | what we thought was (relatively) easily achievable that even
           | if progress stopped tomorrow, we'd have years of work to put
           | in to understand how we got here. Doesn't mean we've hit AGI,
           | but what we already have is truly remarkable.
        
             | chihuahua wrote:
             | The funny thing is that 1/3 of people think LLMs are dumb
             | and will never amount to anything. Another third think that
             | it's already too late to prevent the rise of superhuman AGI
             | that will destroy humanity, and are calling for airstrikes
             | on any data center that does not submit to their luddite
             | rules. And the last third use LLMs for writing small pieces
             | of code.
        
           | kubav027 wrote:
           | LLM also have no idea what it is capable of. This feels like
           | difference to humans. Having some understanding of the
           | problem also means knowing or "feeling" the limits of that
           | understanding.
        
         | wvenable wrote:
         | > Because code LLMs have been trained on the syntactic form of
         | the program and not its execution
         | 
         | One of the very first tests I did of ChatGPT way back when it
         | was new was give it a relatively complex string manipulation
         | function from our code base, strip all identifying materials
         | from the code (variable names, the function name itself, etc),
         | and then provide it with inputs and ask it for the outputs. I
         | was surprised that it could correctly generate the output from
         | the input.
         | 
         | So it does have some idea of what the code actually does not
         | just syntax.
        
         | waldrews wrote:
         | I was going to say "so you believe the LLM's don't have the
         | capacity to understand" but then I realized that the precise
         | language would be something like "the presence of photons in
         | this human's retinas in patterns encoding statements about
         | LLM's having understanding correlates to the activation of
         | neuron signaling chains corresponding to, and emission of,
         | muscle activations engaging keyboard switches, which produce
         | patterns of 'no they don't' with high frequency."
         | 
         | The critiques of mental state applied to the LLM's are
         | increasingly applicable to us biologicals, and that's the
         | philosophical abyss we're staring down.
        
           | xigency wrote:
           | This only applies to people who understand how computers and
           | computer programs work, because someone who doesn't
           | externalize their thinking process would never ascribe human
           | elements of consciousness to inanimate materials.
           | 
           | Certainly many ancient people worshiped celestial objects or
           | crafted idols by their own hands and ascribed to them powers
           | greater than themselves. That doesn't really help in the long
           | run compared to taking personal responsibility for one's own
           | actions and motives, the best interests of their tribe or
           | community, and taking initiative to understand the underlying
           | cause of mysterious phenomena.
        
           | mjburgess wrote:
           | No it's not. He gave you modal conditions on "understanding",
           | he said: predicting the syntax of valid programs, and their
           | operational semantics, ie., the behaviour of the computer as
           | it runs.
           | 
           | I would go much further than this; but this is a de minimus
           | criteria that the LLM already fails.
           | 
           | What zealots eventually discover is that they can hold their
           | "fanatical proposition" fixed in the face of all opposition
           | to the contrary, by tearing down the whole edifice of
           | science, knowledge, and reality itself.
           | 
           | If you wish to assert, against any reasonable thought, that
           | the sky is a pink dome you can do so -- first that our eyes
           | are broken, and then, eventually that we live in some
           | paranoid "philosophical abyss" carefully constructed to
           | permit _your_ paranoia.
           | 
           | This abursidty is exhausting, and I'd wish one day to find
           | fanatics who'd realise it quickly and abate it -- but alas, I
           | have never.
           | 
           | If you find yourself hollowing-out the meaning of words to
           | the point of making no distinctions, denying reality to
           | reality itself, and otherwise arriving at a "philosophical
           | abyss" be aware that it is your cherished propositions which
           | are the maddness and nothing else.
           | 
           | Here: no, the LLM does not understand. Yes, we do. It is your
           | job to begin from reasonable premises and abduce reasonable
           | theories. If you do not, you will not.
        
           | shafyy wrote:
           | Countering the argument that LLMs are just gloriefied
           | probability machines and do not undertand or think with "how
           | do you know humans are not the same" has been the biggest
           | achievement of AI hypemen (and yes, it's mostly men).
           | 
           | Of course, now you can say "how do you know that our brains
           | are not just efficient computers that run LLMs", but I feel
           | like the onus of proof lies on the makers of this claim, not
           | on the other side.
           | 
           | It is very likely that human intelligence is not just
           | autocomplete on crack, given all we know about neuroscience
           | so far.
        
             | mlinhares wrote:
             | BuT iT CoUlD Be, cAn YoU PrOvE ThAT IT is NOt?
             | 
             | I'm having a great experience using Cursor, but i don't
             | feel like trying to overhype it, it just makes me tired to
             | see all this hype. Its a great tool, makes me more
             | productive, nothing beyond that.
        
         | yujzgzc wrote:
         | How do you know that these models haven't been trained by
         | running programs?
         | 
         | At least, it's likely that they've been trained on undergrad
         | textbooks that explain program behaviors and contain exercises.
        
         | aoeusnth1 wrote:
         | As far as you know, AI labs _are_ doing E2E RL training with
         | running code in the loop to advance the model 's capability to
         | act as an agent (for cursor et al).
        
         | hatthew wrote:
         | I am slowly coming around to the idea that nobody should ever
         | use the word "understand" in relation to LLMs, simply because
         | everyone has their own definition of "understand", and many of
         | these definitions disagree, and people tend to treat their
         | definition as axiomatic. I have yet to see any productive
         | discussion happen once anyone disagrees on the definition of
         | "understand".
         | 
         | So, what word would you propose we use to mean "an LLM's
         | ability (or lack thereof) to output generally correct sentences
         | about the topic at hand"?
        
           | nomonnai wrote:
           | It's a prediction of what humans have frequently produced in
           | similar situations.
        
       | amelius wrote:
       | I'm curious what happens if you run the LLM with variable names
       | that occur often with nullable variables, but then use them with
       | code that has a non-nullable variable.
        
         | aSanchezStern wrote:
         | The answer it seems is, it depends on what kind of code you're
         | looking at. The post showed that `for` loops cause a lot more
         | variable-name-biased reasoning, while `ifs` and function
         | defs/calls are more variable-name independent.
        
       | apples_oranges wrote:
       | Sounds like the process to update/jailbreak llms in a way that
       | they don't deny requests and always answer. There is also this
       | direction of denial. (Article about it:
       | https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in...)
       | 
       | Would be fun if they also ,,cancelled the nullability
       | direction".. the llms probably would start hallucinating new
       | explanations for what is happening in the code.
        
       | timewizard wrote:
       | "Validate a phone number."
       | 
       | The code is entirely wrong. That validates something that's close
       | to a NAPN number but isn't actually a NAPN number. In particular
       | the area code cannot start with 0 nor can the central office
       | code. There are several numbers, like 911, which have special
       | meaning, and cannot appear in either position.
       | 
       | You'd get better results if you went to Stack Overflow and stole
       | the correct answer yourself. Would probably be faster too.
       | 
       | This is why "non technical code writing" is a terrible idea. The
       | underlying concept is explicitly technical. What are we even
       | doing?
        
       | casenmgreen wrote:
       | LLMs do understand nothing.
       | 
       | They are not reasoning.
        
       | kmod wrote:
       | I found this overly handwavy, but I discovered that there is a
       | non-"gentle" version of this page which is more explicit:
       | 
       | https://dmodel.ai/nullability/
        
         | aSanchezStern wrote:
         | Yeah that's linked a couple of times in the post
        
       ___________________________________________________________________
       (page generated 2025-04-07 23:01 UTC)