[HN Gopher] LLMs understand nullability
___________________________________________________________________
LLMs understand nullability
Author : mattmarcus
Score : 144 points
Date : 2025-04-07 14:52 UTC (8 hours ago)
(HTM) web link (dmodel.ai)
(TXT) w3m dump (dmodel.ai)
| EncomLab wrote:
| This is like claiming a photorestor controlled night light
| "understands when it is dark" or that a bimetallic strip
| thermostat "understands temperature". You can say those words,
| and it's syntactically correct but entirely incorrect
| semantically.
| aSanchezStern wrote:
| The post includes this caveat. Depending on your philosophical
| position about sentience you might say that LLMs can't possibly
| "understand" anything, and the post isn't trying to have that
| argument. But to the extent that an LLM can "understand"
| anything, you can study its understanding of nullability.
| keybored wrote:
| People don't use "understand" for machines in science because
| people may or may not believe in the sentience of machines.
| That would be a weird catering to panpsychism.
| nsingh2 wrote:
| Where is the boundary where this becomes semantically correct?
| It's easy for these kinds of discussions to go in circles,
| because nothing is well defined.
| nativeit wrote:
| Hard to define something that science has yet to formally
| outline, and is largely still in the realm of religion.
| EMIRELADERO wrote:
| That depends entirely on whether you believe understanding
| requires consciousness.
|
| I believe that the type of understanding demonstrated here
| doesn't. Consciousness only comes into play when we become
| aware that such understanding has taken place, not on the
| process itself.
| stefl14 wrote:
| Shameless plug of personal blog post, but relevant. Still
| not fully edited, so writing is a bit scattered, but crux
| is we now have the framework for talking about
| consciousness intelligently. It's not as mysterious as in
| the past, considering advances in non-equilibrium
| thermodynamics and the Free Energy Principle in particular.
|
| https://stefanlavelle.substack.com/p/i-am-therefore-i-feel
| robotresearcher wrote:
| You declare this very plainly without evidence or argument, but
| this is an age-old controversial issue. It's not self-evident
| to everyone, including philosophers.
| mubou wrote:
| It's not age-old nor is it controversial. LLMs aren't
| intelligent by any stretch of the imagination. Each
| word/token is chosen as that which is statistically most
| likely to follow the previous. There is no capability for
| understanding in the design of an LLM. It's not a matter of
| opinion; this just isn't how an LLM works.
|
| Any comparison to the human brain is missing the point that
| an LLM only simulates one small part, and that's notably
| _not_ the frontal lobe. That 's required for intelligence,
| reasoning, self-awareness, etc.
|
| So, no, it's not a question of philosophy. For an AI to enter
| that realm, it would need to be more than just an LLM with
| some bells and whistles; an LLM _plus_ something else,
| perhaps, something fundamentally different which does not yet
| currently exist.
| aSanchezStern wrote:
| Many people don't think we have any good evidence that our
| brains aren't essentially the same thing: a stochastic
| statistical model that produces outputs based on inputs.
| SJC_Hacker wrote:
| Thats probably the case 99% of the time.
|
| But that 1% is pretty important.
|
| For example, they are dismal at math problems that aren't
| just slight variations of problems they've seen before.
|
| Here's one by blackandredpenn where ChatGPT insisted the
| solution to problem that could be solved by high school /
| talented middle school students was correct, even after
| trying to convince it it was wrong.
| https://youtu.be/V0jhP7giYVY?si=sDE2a4w7WpNwp6zU&t=837
|
| Rewind earlier to see the real answer
| LordDragonfang wrote:
| > For example, they are dismal at math problems that
| aren't just slight variations of problems they've seen
| before.
|
| I know plenty of teachers who would describe their
| students the exact same way. The difference is mostly one
| of magnitude (of delta in competence), not quality.
|
| Also, I think it's important to note that by "could be
| solved by high school / talented middle school students"
| you mean "specifically designed to challenge the top ~1%
| of them". Because if you say "LLMs _only_ manage to beat
| 99% of middle schoolers at math ", the claim seems a
| whole lot different.
| jquery wrote:
| ChatGPT o1 pro mode solved it on the first try, after 8
| minutes and 53 seconds of "thinking":
|
| https://chatgpt.com/share/67f40cd2-d088-8008-acd5-fe9a978
| 4f3...
| SJC_Hacker wrote:
| The problem is how do you know that its correct ...
|
| A human would probably say "I don't know how to solve the
| problem". But ChatGPT free version is confidentially
| wrong ..
| mubou wrote:
| Of course, you're right. Neural networks mimic exactly
| that after all. I'm certain we'll see an ML model
| developed someday that fully mimics the human brain. But
| my point is an LLM isn't that; it's a language model
| only. I know it can _seem_ intelligent sometimes, but it
| 's important to understand what it's actually doing and
| not ascribe feelings to it that don't exist in reality.
|
| Too many people these days are forgetting this key point
| and putting a _dangerous amount_ of faith in ChatGPT etc.
| as a result. I 've seen DOCTORS using ChatGPT for
| diagnosis. Ignorance is scary.
| nativeit wrote:
| Care to share any of this good evidence?
| goatlover wrote:
| Do biologists and neuroscientists not have any good
| evidence or is that just computer scientists and
| engineers speaking outside of their field of expertise?
| There's always been this danger of taking computer and
| brain comparisons too literally.
| root_axis wrote:
| If you're willing to torture the analogy you can find a
| way to describe literally anything as a system of outputs
| based on inputs. In the case of the brain to LLM
| comparison, people are inclined to do it because they're
| eager to anthropomorophize something that produces text
| they can interpret as a speaker, but it's totally
| incorrect to suggest that our brains are "essentially the
| same thing" as LLMs. The comparison is specious even on a
| surface level. It's like saying that birds and planes are
| "essentially the same thing" because flight was achieved
| by modeling planes after birds.
| gwd wrote:
| > Each word/token is chosen as that which is statistically
| most likely to follow the previous.
|
| The best way to predict the weather is to have a model
| which approximates the weather. The best way to predict the
| results of a physics simulation is to have a model which
| approximates the physical bodies in question. The best way
| to predict what word a human is going to write next is to
| have a model that approximates human thought.
| mubou wrote:
| LLMs don't approximate human _thought_ , though. They
| approximate _language_. That 's it.
|
| Please, I'm begging you, go read some papers and watch
| some videos about machine learning and how LLMs actually
| work. It is not "thinking."
|
| I fully realize neural networks _can_ approximate human
| thought -- but we are not there yet, and when we do get
| there, it will be something that is not an LLM, because
| an LLM is not capable of that -- it 's not designed to
| be.
| handfuloflight wrote:
| Isn't language expressed thought?
| fwip wrote:
| Language can be a (lossy) serialization of thought, yes.
| But language is not thought, nor inherently produced by
| thought. Most people agree that a process randomly
| producing grammatically correct sentences is not
| thinking.
| Sohcahtoa82 wrote:
| > it will be something that is not an LLM
|
| I think it will be very similar in architecture.
|
| Artificial neural networks already are approximating how
| neurons in a brain work, it's just at a scale that's
| several orders of magnitude smaller.
|
| Our limiting factor for reaching brain-like intelligence
| via ANN is probably more of a hardware limitation. We
| would need over 100 TB to store the weights for the
| neurons, not to mention the ridiculous amount of compute
| to run it.
| codedokode wrote:
| > not to mention the ridiculous amount of compute to run
| it.
|
| How does the brain computes the weights then? Or maybe
| your assumption than brain is equivalent to a
| mathematical NN is wrong?
| gwd wrote:
| > LLMs don't approximate human thought, though.
| ...Please, I'm begging you, go read some papers and watch
| some videos about machine learning and how LLMs actually
| work.
|
| I know how LLMs work; so let me beg you in return, listen
| to me for a second.
|
| You have a _theoretical-only_ argument: LLMs do text
| prediction, and therefore it is not possible for them to
| actually think. And since it 's not possible for them to
| actually think, you don't need to consider any other
| evidence.
|
| I'm telling you, there's a flaw in your argument: In
| actuality, _the best way to do text prediction is to
| think_. An LLM that could actually think would be able to
| do text prediction better than an LLM that can 't
| actually think; and the better an LLM is able to
| approximate human thought, the better its predictions
| will be. The fact that they're predicting text in no way
| proves that there's no thinking going on.
|
| Now, that doesn't prove that LLMs _actually are_
| thinking; but it does mean that they _might_ be thinking.
| And so you should think about how you would know if they
| 're actually thinking or not.
| wongarsu wrote:
| That argument only really applies to base models. After
| that we train them to give correct and helpful answers, not
| just answers that are statistically probable in the
| training data.
|
| But even if we ignore that subtlety, it's not obvious that
| training a model to predict the next token doesn't lead to
| a world model and an ability to apply it. If you gave a
| human 10 physics books and told them that in a month they
| have a test where they have to complete sentences from the
| book, which strategy do you think is more successful:
| trying to memorize the books word by word or trying to
| understand the content?
|
| The argument that understanding is just an advanced form of
| compression far predates LLMs. LLMs clearly lack many of
| the facilities humans have. Their only concept of a
| physical world comes from text descriptions and stories.
| They have a very weird form of memory, no real agency (they
| only act when triggered) and our attempts at replicating an
| internal monologue are very crude. But _understanding_ is
| one thing they may well have, and if the current generation
| of models doesn 't have it the next generation might
| robotresearcher wrote:
| The thermostat analogy, and equivalents, are age-old.
| fwip wrote:
| Philosophers are often the last people to consider something
| to be settled. There's very little in the universe that they
| can all agree is true.
| fallingknife wrote:
| Or like saying the photoreceptors in your retina understand
| when it's dark. Or like claiming the temperature sensitive ion
| channels in your peripheral nervous system understand how hot
| it is.
| throwuxiytayq wrote:
| Or like saying that the tangled web of neurons receiving
| signals from these understands anything about these subjects.
| nativeit wrote:
| Describing the mechanics of nervous impulses != describing
| consciousness.
| LordDragonfang wrote:
| Which is the point, since describing the mechanics of LLM
| architectures do not inherently grant knowledge of whether
| or not it is "conscious"
| throw4847285 wrote:
| This is a fallacy I've seen enough on here that I think it
| needs a name. Maybe the fallacy of Theoretical Reducibility
| (doesn't really roll off the tongue)?
|
| When challenged, everybody becomes an eliminative materialist
| even if it's inconsistent with their other views. It's very
| weird.
| dleeftink wrote:
| I'd say the opposite also applies: to the extent LLMs have an
| internal language, we understand very little of it.
| thmorriss wrote:
| very cool.
| gopiandcode wrote:
| The visualisation of how the model sees nullability was
| fascinating.
|
| I'm curious if this probing of nullability could be composed with
| other LLM/ML-based python-typing tools to improve their accuracy.
|
| Maybe even focusing on interfaces such as nullability rather than
| precise types would work better with a duck-typed language like
| python than inferring types directly (i.e we don't really care if
| a variable is an int specifically, but rather that it supports
| _add or _sub etc. that it is numeric).
| qsort wrote:
| > we don't really care if a variable is an int specifically,
| but rather that it supports _add or _sub etc. that it is
| numeric
|
| my brother in christ, you invented Typescript.
|
| (I agree on the visualization, it's very cool!)
| gopiandcode wrote:
| I am more than aware of Typescript, you seem to have
| misunderstood my point: I was not describing a particular
| type system (of which there have been many of this ilk) but
| rather conjecturing that targeting interfaces specifically
| might make LLM-based code generation/type inference more
| effective.
| qsort wrote:
| Yeah, I read that comment wrong. I didn't mean to come off
| like that. Sorry.
| jayd16 wrote:
| Why not just use a language with checked nullability? What's
| the point of an LLM using a duck typing language anyway?
| aSanchezStern wrote:
| This post actually mostly uses the subset of Python where
| nullability _is_ checked. The point is not to introduce new
| LLM capabilities, but to understand more about how existing
| LLMs are reasoning about code.
| nonameiguess wrote:
| As every fifth thread becomes some discussion of LLM
| capabilities, I think we need to shift the way we talk about this
| to be less like how we talk about software and more like how we
| talk about people.
|
| "LLM" is a valid category of thing in the world, but it's not a
| thing like Microsoft Outlook that has well-defined capabilities
| and limitations. It's frustrating reading these discussions that
| constantly devolve into one person saying they tried something
| that either worked or didn't, then 40 replies from other people
| saying they got the opposite result, possibly with a different
| model, different version, slight prompt altering, whatever it is.
|
| LLMs possibly have the capability to understand nullability, but
| that doesn't mean every instance of every model will consistently
| understand that or anything else. This is the same way humans
| operate. Humans can run a 4-minute mile. Humans can run a
| 10-second 100 meter dash. Humans can develop and prove novel math
| theorems. But not all humans, not all the time, performance
| depends upon conditions, timing, luck, and there has probably
| never been a single human who can do all three. It takes practice
| in one specific discipline to get really good at that, and this
| practice competes with or even limits other abilities. For LLMs,
| this manifests in differences with the way they get fine-tuned
| and respond to specific prompt sequences that should all be
| different ways of expressing the same command or query but
| nonetheless produce different results. This is very different
| from the way we are used to machines and software behaving.
| aSanchezStern wrote:
| Yeah the link title is overclaiming a bit, the actual post
| title doesn't make such a general claim, and the post itself
| examines several specific models and compares their
| understanding.
| root_axis wrote:
| Encouraging the continued anthropomorphization of these models
| is a bad idea, especially in the context of discussing their
| capabilities.
| nativeit wrote:
| We're all just elementary particles being clumped together in
| energy gradients, therefore my little computer project is
| sentient--this is getting absurd.
| nativeit wrote:
| Sorry, this is more about the discussion of this article than
| the article itself. The moving goal posts that acolytes use to
| declare consciousness are becoming increasingly cult-y.
| wongarsu wrote:
| We spent 40 years moving the goal posts on what constitutes
| AI. Now we seem to have found an AI worthy of that title and
| instead start moving the goal posts on "consciousness",
| "understanding" and "intelligence".
| cayley_graph wrote:
| Indeed, science is a process of discovery and adjusting
| goals and expectations. It is not a mountain to be
| summited. It is highly telling that the LLM boosters do not
| understand this. Those with a genuine interest in pushing
| forward our understanding of cognition do.
| delusional wrote:
| They believe that once they reach this summit everything
| else will be trivial problems that can be posed to the
| almighty AI. It's not that they don't understand the
| process, it's that they think AI is going to disrupt that
| process.
|
| They literally believe that the AI will supersede the
| scientific process. It's crypto shit all over again.
| redundantly wrote:
| Well, if that summit were reached and AI is able to
| improve itself trivially, I'd be willing to cede that
| they've reached their goal.
|
| Anything less than that, meh.
| goatlover wrote:
| Those have all been difficult words to define with much
| debate over that past 40 years or longer.
| bluefirebrand wrote:
| > Now we seem to have found an AI worthy of that title and
| instead start moving the goal posts on "consciousness",
| "understanding" and "intelligence".
|
| We didn't "find" AI, we invented systems that some people
| want to call AI, and some people aren't convinced it meets
| the bar
|
| It is entirely reasonable for people to realize we set the
| bar too low when it is a bar we invented
| darkerside wrote:
| What should the bar be? Should it be higher than it is
| for the average human? Or even the least intelligent
| human?
| joe8756438 wrote:
| there is no such bar.
|
| We don't even have a good way to quantify human ability.
| The idea that we could suddenly develop a technique to
| quantify human ability because we now have a piece of
| technology that would benefit from that quantification is
| absurd.
|
| That doesn't mean we shouldn't try to measure the ability
| of an LLM. But it does mean that the techniques used to
| quantify an LLMs ability are not something that can be
| applied to humans outside of narrow focus areas.
| bluefirebrand wrote:
| Personally I don't care what the bar is, honestly
|
| Call it AI, call it LLMs, whatever
|
| Just as long as we continue to recognize that it is a
| tool that humans can use, and don't start trying to treat
| it as a human, or as a life, and I won't complain
|
| I'm saving my anger for when idiots start to argue that
| LLMs are alive and deserve human rights
| Sohcahtoa82 wrote:
| > We spent 40 years moving the goal posts on what
| constitutes AI.
|
| Who is "we"?
|
| I think of "AI" as a pretty all-encompassing term. ChatGPT
| is AI, but so is the computer player in the 1995 game
| Command and Conquer, among thousands of other games. Heck,
| I might even call the ghosts in Pac-man "AI", even if their
| behavior is extremely simple, predictable, and even
| exploitable once you understand it.
| acchow wrote:
| > Now we seem to have found an AI worthy of that title and
| instead start moving the goal posts on "consciousness"
|
| The goalposts already differentiated between "totally
| human-like" vs "actually conscious"
|
| See also Philosophical Zombie thought experiment from the
| 70s.
| arkh wrote:
| The original meaning of mechanical Turk is about a chess
| hoax and how it managed to make people think it was a
| thinking machine.
| https://en.wikipedia.org/wiki/Mechanical_Turk
|
| The current LLM anthropomorphism may soon be known as the
| silicon Turk. Managing to make people think they're AI.
| 6510 wrote:
| The mechanical Turk did something truly magical. Everyone
| stopped moaning that automation was impossible because
| most machines (while some absurdly complex) were many
| orders of magnitude simpler than chess.
|
| The initial LLMs simply lied about everything. If you
| happened to know something it was rather shocking but for
| topics you knew nothing about you got a rather convincing
| answer. Then the arms race begun and now the lies are so
| convincing we are at viable robot overlords.
| 6510 wrote:
| My joke was that the what it cant do debate changed into
| what it shouldn't be allowed to.
| wizardforhire wrote:
| There ARE no jokes aloud on hn.
|
| Look I'm no stranger to love. you know the rules and so
| do I... you can't find this conversation with any other
| guy.
|
| But since the parent was making a meta commentary on this
| conversation I'd like to introduce everyone here as
| Kettle to a friend of mine known as #000000
| drodgers wrote:
| Who cares about consciousness? This is just a mis-direction
| of the discussion. Ditto for 'intelligence' and
| 'understanding'.
|
| Let's talk about what they can do and where that's trending.
| og_kalu wrote:
| Well you can say it doesn't understand, but then you don't have
| a very useful definition of the word.
|
| You can say this is not 'real' understanding but you like many
| others will be unable to clearly distinguish this 'fake'
| understanding from 'real' understanding in a verifiable
| fashion, so you are just playing a game of meaningless
| semantics.
|
| You really should think about what kind of difference is
| supposedly so important yet will not manifest itself in any
| testable way - an invented one.
| plaineyjaney wrote:
| This is really interesting! Intuitively it's hard to grasp that
| you can just subtract two average states and get a direction
| describing the model's perception of nullability.
| nick__m wrote:
| The original word2vec example might be easier to understand:
| vec(King) - vec(Man) + vec(Woman) = vec(Queen)
| btown wrote:
| There seems to be a typo in OP's "Visualizing Our Results" - but
| things make perfect sense if red is non-nullable, green is
| nullable.
|
| I'd be really curious to see where the "attention" heads of the
| LLM look when evaluating the nullability of any given token. Does
| just it trust the Optional[int] return type signature of the
| function, or does it also skim through the function contents to
| understand whether that's correct?
|
| It's fascinating to me to think that the senior developer
| skillset of being able to skim through complicated code, mentally
| make note of different tokens of interest where assumptions may
| need to be double-checked, and unravel that cascade of
| assumptions to track down a bug, is something that LLMs already
| excel at.
|
| Sure, nullability is an example where static type checkers do
| well, and it makes the article a bit silly on its own... but
| there are all sorts of assumptions that aren't captured well by
| type systems. There's been a ton of focus on LLMs for code
| generation; I think that LLMs for debugging makes for a
| fascinating frontier.
| aSanchezStern wrote:
| Thanks for pointing that out, it's fixed now.
| sega_sai wrote:
| One thing that is exciting in the text is an attempt to go away
| from describing whether LLM 'understands' which I would argue an
| ill posed question, but instead rephrase it in terms of something
| that can actually be measured.
|
| It would be good to list a few possible ways of interpreting
| 'understanding of code'. It could possibly include: 1) Type
| inference for the result 2) nullability 3) runtime asymptotics 4)
| What the code does
| kazinator wrote:
| 5) predicting a bunch of language tokens from the compressed
| database of knowledge encoded as weights, calculated out of
| numerous examples that exploit nullability in code and talk
| about it in accompanying text.
| empath75 wrote:
| Is there any way you can tell whether a human understands
| something other than by asking them a question and judging
| their answer?
|
| Nobody interrogates each other's internal states when judging
| whether someone understands a topic. All we can judge it based
| on are the words they produce or the actions they take in
| response to a situation.
|
| The way that systems or people arrive at a response is sort of
| an implementation detail that isn't that important when judging
| whether a system does or doesn't understand something. Some
| people understand a topic on an intuitive, almost unthinking
| level, and other people need to carefully reason about it, but
| they both demonstrate understanding by how they respond to
| questions about it in the exact same way.
| cess11 wrote:
| No, most people absolutely use non-linguistic, involuntary
| cues when judging the responses of other people.
|
| To not do that is commonly associated with things like being
| on the spectrum or cognitive deficiencies.
| empath75 wrote:
| On a message board? Do you have theories about whether
| people on this thread understand or don't understand what
| they're talking about?
| LordDragonfang wrote:
| You are saying no while presenting nothing to contradict
| what GP said.
|
| Judging someone's external "involuntary cues" is not
| interrogating their internal state. It is, as you said,
| judging their response (a synonym for "answer") - and that
| judgment is also highly imperfect.
|
| (It's worth noting that focusing so much on the someone's
| body language and tone that you ignore the actual words
| they said is a communication issue associated with not
| being on the spectrum, or being too allistic)
| dambi0 wrote:
| The fact we have labels for communication problems caused
| by failure to understand non-verbal cues doesn't tell us
| that non-verbal cues are necessary for understanding
| stared wrote:
| Once LLMs fully understand nullability, they will cease to use
| that.
|
| Tony Hoare called it "a billion-dollar mistake" (https://en.wikip
| edia.org/wiki/Tony_Hoare#Apologies_and_retra...), Rust had made
| core design choices precisely to avoid this mistake.
|
| In practical AI-assisted coding in TypeScript I have found that
| it is good to add in Cursor Rules to avoid anything nullable,
| unless it is a well-designed choice. In my experience, it makes
| code much better.
| hombre_fatal wrote:
| I don't get the problem with null values as long as you can
| statically reason about them which wasn't even the case in Java
| where you had to always do runtime null-guards before access.
|
| But in Typescript, who cares? You'd be forced to handle null
| the same way you'd be forced to handle Maybe<T> = None |
| Just<T> except with extra, unidiomatic ceremony in the latter
| case.
| ngruhn wrote:
| What you mean with unidiomatic? If a language has
| Maybe<T> = None | Just<T>
|
| as a core concept then it's idiomatic by definition.
| tanvach wrote:
| Dear future authors: please run multiple iterations and report
| the _probability_.
|
| From: 'Keep training it, though, and eventually it will learn to
| insert the None test'
|
| To: 'Keep training it, though, and eventually the probability of
| inserting the None test goes up to xx%'
|
| The former is just horse poop, we all know LLMs generate big
| variance in output.
| aSanchezStern wrote:
| If you're interested in a more scientific treatment of the
| topic, the post links to a technical report which reports the
| numbers in detail. This post is instead an attempt to explain
| the topics to a more general audience, so digging into the
| weeds isn't very useful.
| kazinator wrote:
| LLMs "understand" nullability to the extent that texts they have
| been trained on contain examples of nullability being used in
| code, together with remarks about it in natural language. When
| the right tokens occur in your query, other tokens get filled in
| from that data in a clever way. That's all there is to it.
|
| The LLM will not understand, and is incapable of developing an
| understanding, of a concept not present in its training data.
|
| If try to teach it the basics of the misunderstood concept in
| your chat, it will reflect back a verbal acknowledgement,
| restated in different words, with some smoothly worded
| embellishments which looks like the external trappings of
| understanding. It's only a mirage though.
|
| The LLM will code anything, no matter how novel, if you give it
| detailed enough instructions and clarifications. That's just a a
| language translation task from pseudo-code to code. Being a
| language model, it's designed for that.
|
| LLM is like the bar waiter who has picked up on economics and
| politics talk, and is able to interject with something clever
| sounding, to the surprise of the patrons. Gee, how does he or she
| understand the workings of the international monetary fund, and
| what the hell are they doing working in this bar?
| gwern wrote:
| > Interestingly, for models up to 1 billion parameters, the loss
| actually starts to increase again after reaching a minimum. This
| might be because as training continues, the model develops more
| complex, non-linear representations that our simple linear probe
| can't capture as well. Or it might be that the model starts to
| overfit on the training data and loses its more general concept
| of nullability.
|
| Double descent?
| lsy wrote:
| The article puts scare quotes around "understand" etc. to try to
| head off critiques around the lack of precision or scientific
| language, but I think this is a really good example of where
| casual use of these terms can get pretty misleading.
|
| Because code LLMs have been trained on the syntactic form of the
| program and not its execution, it's not correct -- even if the
| correlation between variable annotations and requested
| completions was _perfect_ (which it 's not) -- to say that the
| model "understands nullability", because nullability means that
| under execution the variable in question can become null, which
| is not a state that it's possible for a model trained only on a
| million programs' syntax to "understand". You could get the same
| result if e.g. "Optional" means that the variable becomes
| poisonous and checking "> 0" is eating it, and "!= None" is an
| antidote. Human programmers _can_ understand nullability because
| they 've hopefully _run_ programs and understand the semantics of
| making something null.
|
| The paper could use precise, scientific language (e.g. "the
| presence of nullable annotation tokens correlates to activation
| of vectors corresponding to, and emission of, null-check tokens
| with high precision and accuracy") which would help us understand
| what we can rely on the LLM to do and what we can't. But it seems
| like there is some subconscious incentive to muddy how people see
| these models in the hopes that we start ascribing things to them
| that they aren't capable of.
| creatonez wrote:
| > Because code LLMs have been trained on the syntactic form of
| the program and not its execution
|
| What makes you think this? It has been trained on plenty of
| logs and traces, discussion of the behavior of various code,
| REPL sessions, etc. Code LLMs are trained on all human language
| and wide swaths of whatever machine-generated text is
| available, they are not restricted to just code.
| uh_uh wrote:
| We don't really have a clue what they are and aren't capable
| of. Prior to the LLM-boom, many people - and I include myself
| in this - thought it'd be impossible to get to the level of
| capability we have now purely from statistical methods and here
| we are. If you have a strong theory that proves some bounds on
| LLM-capability, then please put it forward. In the absence of
| that, your sceptical attitude is just as sus as the article's.
| Baeocystin wrote:
| I majored in CogSci at UCSD in the 90's. I've been interested
| and active in the machine learning world for decades. The LLM
| boom took me completely and utterly by surprise, continues to
| do so, and frankly I am most mystified by the folks who
| downplay it. These giant matrixes are already so far beyond
| what we thought was (relatively) easily achievable that even
| if progress stopped tomorrow, we'd have years of work to put
| in to understand how we got here. Doesn't mean we've hit AGI,
| but what we already have is truly remarkable.
| chihuahua wrote:
| The funny thing is that 1/3 of people think LLMs are dumb
| and will never amount to anything. Another third think that
| it's already too late to prevent the rise of superhuman AGI
| that will destroy humanity, and are calling for airstrikes
| on any data center that does not submit to their luddite
| rules. And the last third use LLMs for writing small pieces
| of code.
| kubav027 wrote:
| LLM also have no idea what it is capable of. This feels like
| difference to humans. Having some understanding of the
| problem also means knowing or "feeling" the limits of that
| understanding.
| wvenable wrote:
| > Because code LLMs have been trained on the syntactic form of
| the program and not its execution
|
| One of the very first tests I did of ChatGPT way back when it
| was new was give it a relatively complex string manipulation
| function from our code base, strip all identifying materials
| from the code (variable names, the function name itself, etc),
| and then provide it with inputs and ask it for the outputs. I
| was surprised that it could correctly generate the output from
| the input.
|
| So it does have some idea of what the code actually does not
| just syntax.
| waldrews wrote:
| I was going to say "so you believe the LLM's don't have the
| capacity to understand" but then I realized that the precise
| language would be something like "the presence of photons in
| this human's retinas in patterns encoding statements about
| LLM's having understanding correlates to the activation of
| neuron signaling chains corresponding to, and emission of,
| muscle activations engaging keyboard switches, which produce
| patterns of 'no they don't' with high frequency."
|
| The critiques of mental state applied to the LLM's are
| increasingly applicable to us biologicals, and that's the
| philosophical abyss we're staring down.
| xigency wrote:
| This only applies to people who understand how computers and
| computer programs work, because someone who doesn't
| externalize their thinking process would never ascribe human
| elements of consciousness to inanimate materials.
|
| Certainly many ancient people worshiped celestial objects or
| crafted idols by their own hands and ascribed to them powers
| greater than themselves. That doesn't really help in the long
| run compared to taking personal responsibility for one's own
| actions and motives, the best interests of their tribe or
| community, and taking initiative to understand the underlying
| cause of mysterious phenomena.
| mjburgess wrote:
| No it's not. He gave you modal conditions on "understanding",
| he said: predicting the syntax of valid programs, and their
| operational semantics, ie., the behaviour of the computer as
| it runs.
|
| I would go much further than this; but this is a de minimus
| criteria that the LLM already fails.
|
| What zealots eventually discover is that they can hold their
| "fanatical proposition" fixed in the face of all opposition
| to the contrary, by tearing down the whole edifice of
| science, knowledge, and reality itself.
|
| If you wish to assert, against any reasonable thought, that
| the sky is a pink dome you can do so -- first that our eyes
| are broken, and then, eventually that we live in some
| paranoid "philosophical abyss" carefully constructed to
| permit _your_ paranoia.
|
| This abursidty is exhausting, and I'd wish one day to find
| fanatics who'd realise it quickly and abate it -- but alas, I
| have never.
|
| If you find yourself hollowing-out the meaning of words to
| the point of making no distinctions, denying reality to
| reality itself, and otherwise arriving at a "philosophical
| abyss" be aware that it is your cherished propositions which
| are the maddness and nothing else.
|
| Here: no, the LLM does not understand. Yes, we do. It is your
| job to begin from reasonable premises and abduce reasonable
| theories. If you do not, you will not.
| shafyy wrote:
| Countering the argument that LLMs are just gloriefied
| probability machines and do not undertand or think with "how
| do you know humans are not the same" has been the biggest
| achievement of AI hypemen (and yes, it's mostly men).
|
| Of course, now you can say "how do you know that our brains
| are not just efficient computers that run LLMs", but I feel
| like the onus of proof lies on the makers of this claim, not
| on the other side.
|
| It is very likely that human intelligence is not just
| autocomplete on crack, given all we know about neuroscience
| so far.
| mlinhares wrote:
| BuT iT CoUlD Be, cAn YoU PrOvE ThAT IT is NOt?
|
| I'm having a great experience using Cursor, but i don't
| feel like trying to overhype it, it just makes me tired to
| see all this hype. Its a great tool, makes me more
| productive, nothing beyond that.
| yujzgzc wrote:
| How do you know that these models haven't been trained by
| running programs?
|
| At least, it's likely that they've been trained on undergrad
| textbooks that explain program behaviors and contain exercises.
| aoeusnth1 wrote:
| As far as you know, AI labs _are_ doing E2E RL training with
| running code in the loop to advance the model 's capability to
| act as an agent (for cursor et al).
| hatthew wrote:
| I am slowly coming around to the idea that nobody should ever
| use the word "understand" in relation to LLMs, simply because
| everyone has their own definition of "understand", and many of
| these definitions disagree, and people tend to treat their
| definition as axiomatic. I have yet to see any productive
| discussion happen once anyone disagrees on the definition of
| "understand".
|
| So, what word would you propose we use to mean "an LLM's
| ability (or lack thereof) to output generally correct sentences
| about the topic at hand"?
| nomonnai wrote:
| It's a prediction of what humans have frequently produced in
| similar situations.
| amelius wrote:
| I'm curious what happens if you run the LLM with variable names
| that occur often with nullable variables, but then use them with
| code that has a non-nullable variable.
| aSanchezStern wrote:
| The answer it seems is, it depends on what kind of code you're
| looking at. The post showed that `for` loops cause a lot more
| variable-name-biased reasoning, while `ifs` and function
| defs/calls are more variable-name independent.
| apples_oranges wrote:
| Sounds like the process to update/jailbreak llms in a way that
| they don't deny requests and always answer. There is also this
| direction of denial. (Article about it:
| https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in...)
|
| Would be fun if they also ,,cancelled the nullability
| direction".. the llms probably would start hallucinating new
| explanations for what is happening in the code.
| timewizard wrote:
| "Validate a phone number."
|
| The code is entirely wrong. That validates something that's close
| to a NAPN number but isn't actually a NAPN number. In particular
| the area code cannot start with 0 nor can the central office
| code. There are several numbers, like 911, which have special
| meaning, and cannot appear in either position.
|
| You'd get better results if you went to Stack Overflow and stole
| the correct answer yourself. Would probably be faster too.
|
| This is why "non technical code writing" is a terrible idea. The
| underlying concept is explicitly technical. What are we even
| doing?
| casenmgreen wrote:
| LLMs do understand nothing.
|
| They are not reasoning.
| kmod wrote:
| I found this overly handwavy, but I discovered that there is a
| non-"gentle" version of this page which is more explicit:
|
| https://dmodel.ai/nullability/
| aSanchezStern wrote:
| Yeah that's linked a couple of times in the post
___________________________________________________________________
(page generated 2025-04-07 23:01 UTC)