[HN Gopher] Elegant and powerful new result that seriously under...
___________________________________________________________________
Elegant and powerful new result that seriously undermines large
language models
Author : cratermoon
Score : 28 points
Date : 2023-09-22 21:12 UTC (1 hours ago)
(HTM) web link (garymarcus.substack.com)
(TXT) w3m dump (garymarcus.substack.com)
| jqpabc123 wrote:
| Surprise! Any reasoning is minimal.
| MeImCounting wrote:
| I know very little of cognitive science. I feel that LLMs and
| other neural networks are really just small pieces of what might
| someday make up an intelligence, like a virtual Broca's area. It
| is remarkable what you can do with language alone but the idea
| that an intelligent system would rely solely on language seems
| misguided.
| rossdavidh wrote:
| Q: What is the difference between A.I. and Machine Learning?
|
| A: Machine Learning exists.
|
| Neural networks (of the machine kind) can learn, and this can (in
| certain narrowly defined scenarios) be useful. But, they are not
| intelligence, general or otherwise.
| photonthug wrote:
| Haven't I seen lots of stuff showing LLMs do arithmetic for
| examples they haven't seen? If there is an issue really grasping
| basic logic though, doesn't that put a damper on the spooky
| "emergent properties" explanation for stuff like addition?
| dragonwriter wrote:
| > Haven't I seen lots of stuff showing LLMs do arithmetic for
| examples they haven't seen? If there is an issue really
| grasping basic logic though, doesn't that put a damper on the
| spooky "emergent properties" explanation for stuff like
| addition?
|
| No.
|
| The absence of one desired emergent property isn't evidence
| against a different, observed property. (And "emergent
| properties" isn't an explanation as much as a statement that we
| don't understand the mechanism by which the training data
| encodes the knowledge and did not plan for it to be encoded.)
| viraptor wrote:
| Maybe? The list is talking about one side of the issue I think
| - currently trained LLMs don't automatically remember the
| inverse of this relation. But the other side is: if you provide
| enough training on relations similar to this one, will the LLM
| start applying it to other examples as well?
| contravariant wrote:
| > If I say all odd numbers are prime, 1, 3, 5, and 7 may count in
| my favor, but at 9 the game is over.
|
| This is besides the point but those are wrong straight from the
| start, whichever way you cut it 1 is definitely not prime.
| Kapura wrote:
| Wow, the question posed to the neural nets (and their inability
| to respond) really gets to the heart of something that I've tried
| to articulate to others: that ML cannot conceptualize of things
| in the abstract like people can. They cannot offer reasons, a
| train of thought like a person; they respond essentially "on
| instinct," and folks should be wary of the output of something
| like ChatGPT. Great article.
| viraptor wrote:
| > They cannot offer reasons, a train of thought like a person
|
| That's not correct. Try asking a question that requires
| multiple steps of reasoning and add ", think step by step" to
| the prompt. This not only changes the output, but also often
| improves the quality of the result... like you'd expect it to
| happen with people.
| huijzer wrote:
| I agree with you. It feels like a clever human on "fast
| thinking" mode, as Kahneman would call it. So when I ask a
| programming question, it feels like a master student answers
| the first thing that comes to mind. If you ask an explanation,
| the first explanation that comes to mind is blurted out.
| Smaug123 wrote:
| (Which is why prompt engineering is a thing! The art/science
| of phrasing the prompt so that the immediate blurted response
| is more likely to be correct.)
| wincy wrote:
| I asked it about string formatting deduplication and it
| suggested using a HashSet since it deduplicates strings, and
| I'm like "are you sure that's the best way to do this?" Then
| it apologized and gave a much more "standard" way the second
| time I asked.
|
| You definitely need to know a little and be able to push
| back, it feels like. But it's been an absolute champ in
| describing why things are going wrong in a general sense when
| I've been having issues, especially with generics and
| templates in C#.
| agucova wrote:
| Note that:
|
| > ML cannot conceptualize of things in the abstract like people
| can
|
| And:
|
| > They cannot offer reasons, a train of thought like a person
|
| Are very different claims! The first one just seems wrong: LLMs
| require abstraction to work, and early work in interpretability
| suggests they build rich world models during training (i.e. see
| https://thegradient.pub/othello/).
|
| What is true is that often those models aren't very legible,
| and it would seem current LLMs are incapable of introspection,
| and so can't make those models more transparent.
|
| The second one is a tricky one: you can often get it by
| explicitly prompting for a chain of thought, but it's true
| current LLMs don't seem great at this yet. The big jump in this
| capability when going from GPT 3.5 to GPT 4 makes me thing that
| this is just a limitation that will be overcome relatively
| soon.
| Legend2440 wrote:
| Gary Marcus saying what Gary Marcus always says.
|
| According to him five years ago, LLMs and image generators should
| never have been possible at all. Now that they're here and work
| so well, he's insisting they're a dead end. The man is best off
| ignored.
| wincy wrote:
| They are making such an outsized impact for me. It's like I
| have someone I can bother literally all day with the smallest
| questions about how to write code. Generics, templates,
| abstractions, data modeling, SQL, writing scripts, just
| absolutely everything. It's sped up my work by an order of
| magnitude. I felt like I was stagnating in learning new things
| and there's been this explosion in my knowledge thanks to being
| able to have a conversation with ChatGPT 4. Even if it's a
| complete dead end and literally never gets better I have a
| feeling I'll be talking to LLMs for the rest of my career.
| ChatGPT 4 is simply incredible.
|
| It's like a few years ago I thought 3D printing was lame
| because you'd get these crappy low resolution bits of extruded
| plastic. Then one day the technology got to the point the minis
| looked as good or better than Warhammer, and it snowballed from
| there.
|
| And suddenly I was interested. LLMs are the same way. The
| models are good enough. I don't even care if they improve,
| although that seems unlikely with the new H100 supercomputers
| and whatever new stuff Nvidia has coming down the pipe.
| cs702 wrote:
| The authors reached this conclusion after finetuning a GPT-3
| model, i.e., they tinkered with the weights, and they used an old
| model.
|
| This begs a lot of questions:
|
| Why did they use an old model?
|
| Why _finetune_? Why not run these experiments on an _untouched_
| model? How does a model untouched by the authors perform?
|
| Did they check to make sure they didn't inadvertently induce
| catastrophic forgetting while messing up with the weights?
|
| Did they use common prompting techniques (e.g., chain-of-
| thought)? (Doesn't look like it.)
|
| If you run the same prompts on an untouched GPT-4, how does it
| perform?
| thefourthchime wrote:
| This seems like a trash clickbait article that undercuts the huge
| gains and usefulness of generative AI to pander to the naysayers.
| Yes, they are not perfect, but they are very useful!
| Smaug123 wrote:
| This doesn't seem as damning to me as it does to Gary Marcus.
| Humans routinely fail to generalise in this way, which is why we
| routinely use cloze deletion flashcards to train recall of
| various different permutations of a fact. I could quite easily
| imagine myself personally knowing the quoted fact "Tom Cruise's
| mother is Mary Lee Pfeiffer", and yet being unable to tell you
| who Pfeiffer was, because it's a kind of leaf node of my
| knowledge graph, accessible only by indexing into the Tom Cruise
| node.
|
| The linked paper
| (https://owainevans.github.io/reversal_curse.pdf) is purely
| empirical, and the results which I tried to reproduce did indeed
| reproduce across a few tries and various prompts of ChatGPT 4.
| mdp2021 wrote:
| > _Humans routinely fail to generalise in this way_
|
| If the goal is the implementation of intelligence, stop using
| unintelligent behaviour as an excuse.
| Smaug123 wrote:
| Most people would agree that human-level intelligence is
| intelligence! If a human can't reliably do a task, that
| rather suggests that failure to do the task isn't an
| indicator of lack-of-intelligence, unless you wish to bite
| the bullet that humans are in fact not intelligences merely
| because they are imperfect.
| cmsj wrote:
| I'm pretty sure most humans could reliably handle this
| scenario:
|
| ""Tom Cruise's mother is Mary Lee Pfeiffer, who is Mary Lee
| Pfeiffer's son?"
| Smaug123 wrote:
| You're the first person to suggest that the GPTs can't do
| that; they obviously can. https://chat.openai.com/share/b
| 94329ce-3607-4cb6-bc21-55d9f2... for example is GPT-4
| getting it correct. What the paper is about is _retrieval
| of facts from the learned "database"_.
|
| (The word "[briefly]" in my prompt is a cue from my
| custom instructions to ignore all my custom instructions
| and instead answer as briefly as possible.)
| mdp2021 wrote:
| > _Most people would agree_
|
| Truth is not the result of a poll.
|
| > _human-level intelligence is intelligence_
|
| Intelligence is an ability: it is there when present, not
| when latent.
| admax88qqq wrote:
| ?
|
| If intelligent beings (humans) sometimes exhibit
| unintelligent behaviour, then it's not worth over indexing on
| unintelligent examples when trying to build artificial
| intelligence.
| vimax wrote:
| Seems like it could be generalizable to tree-based indexing.
|
| The Tom Cruise node is the higher node, and the Pfeiffer node
| is a lower node. If you're first searching for Tom Cruise,
| you would find it earlier.
|
| With the Pfeiffer search, you have a lot more space to search
| before you get the node.
|
| With bounded computation, you may not be able to reach the
| lower node.
| bob1029 wrote:
| To me this is a fairly damning result.
|
| How large would we need to make an LLM to accommodate for every
| "reverse" prompt scenario? Isn't this memorization at the end?
| Why should I have to explain the reverse of everything after I
| demonstrate how to do it a few times?
|
| How can we correct for this in the transformer architecture? Is
| there some reasonable tweak that can be made to the attention
| mechanism, or are we looking at something more profound here?
| viraptor wrote:
| > Isn't this memorization at the end?
|
| That would be overfitting and it's a known issue you're trying
| to avoid during training.
|
| > How can we correct for this in the transformer architecture?
|
| I don't think the post really answers whether we need to. It
| may be just a case of this type of idea not being well
| represented in the training data, so didn't generalise during
| training.
| Legend2440 wrote:
| This is a weakness of the training data, not the architecture.
|
| Training is not only about learning information, but also
| learning how to handle it. It will only learn to reason "if A->
| B then B->A" if the data contains situations where it must do
| this.
|
| In the paper, their training process contained no examples
| where this reasoning was necessary - only A->B relations. It
| actually got _worse_ than the base model, because GPT-3 's
| training data did contain some examples of B->A relations.
| dragonwriter wrote:
| > "if A-> B then B->A"
|
| This would be invalid logic. The issue is "if A=B then B=A"
| not "if A->B then B->A"
|
| But, more, the issue is recognizing when "X is Y" is
| describing X being a member of a broader set ("George
| Washington is a former President of the United States" does
| not imply that "George Washington" and "a former president"
| are equivalent) vs. a statement of equivalency ("George
| Washington is the first President of the United States").
| Now, in many cases, the use of a definite article ("the") vs.
| an indefinite article ("a/an") after "is" is determinative,
| but there are cases where no article is used that can go
| either way, and there's probably cases where the use of
| articles is confusing (the definite article can often apply
| in a limited rather than general context, for instance.)
|
| I agree that this is a training data not model issue, but its
| also a more complex training issue that it might naively
| seem.
|
| I really don't think that it is surprising or a particularly
| crushing revelation that LLMs don't apply logical rules like
| this without training both on the rules and the
| identification of where they apply. A lot of what we have
| with modern LLMs is throwing a lot of data at them without
| focus on what they are supposed to learn outside of a very
| narrow set of tasks, and then discovering what they did and
| didn't learn outside of that, and then if something turns out
| to be important and not learned in one generation from the
| general data thrown at it, doing more focused training on a
| later generation targeting that concept.
| lossolo wrote:
| It's pretty obvious that LLMs are not human like intelligent but
| are just statistical models. They can't produce anything novel in
| the sense of finding a cure for cancer or solving millennium
| problems, even though they have embedded knowledge of all human
| knowledge. The easiest way to test this is by trying to get them
| to generate a novel idea that doesn't yet exist but will exist in
| a year or a few years. This idea shouldn't require
| experimentation in the real world, which LLMs don't have access
| to, but should involve interpreting and reasoning about the
| knowledge we already have in a novel way
| viraptor wrote:
| > The easiest way to test this is by trying to get them to
| generate a novel idea that doesn't yet exist but will exist in
| a year or a few years.
|
| Counterexample: See Tom Scott playing with ChatGPT and asking
| for ideas for the kind of videos he would do. One of the
| results was almost exactly a video which was already planned
| but not released.
| astrange wrote:
| > They can't produce anything novel in the sense of finding a
| cure for cancer or solving millennium problems, even though
| they have embedded knowledge of all human knowledge.
|
| Humans can't do this by thinking about it either. Humans would
| find a cure for cancer by performing experiments and seeing
| which one of them worked.
___________________________________________________________________
(page generated 2023-09-22 23:00 UTC)