[HN Gopher] Does Reasoning Emerge? Probabilities of Causation in...
___________________________________________________________________
Does Reasoning Emerge? Probabilities of Causation in Large Language
Models
Author : belter
Score : 111 points
Date : 2024-08-16 16:19 UTC (6 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| doe_eyes wrote:
| This is proposed as a way to measure "true" reasoning by asking a
| certain type of trick questions, but I don't quite see how this
| could be a basis of a sustainable benchmark.
|
| If this gets attention, the next generation of LLMs will be
| trained on this paper, and then fine-tuned by using this exact
| form of questions to appear strong on this benchmark, and...
| we're back to square one.
| jcoc611 wrote:
| From an external perspective, is there a way to distinguish
| between simulation of consciousness and the real thing?
|
| If the answer is no, could you make an argument that they are
| the same?
| chx wrote:
| Throwing all the paintings made prior 1937 into an LLM would
| never get Guernica out of it. As long as it's an LLM this
| stands, not just today but all the way to the future.
|
| This empty sophistry of presuming automated bullshit
| generators somehow can mimic a human brain is laughable.
|
| Please please read https://aeon.co/essays/your-brain-does-
| not-process-informati...
| altruios wrote:
| The author fails to provide any argument other than one of
| incredulity and some bad reasoning with bad faith examples.
|
| The dollar bill copying example is a faulty metaphor. His
| claim of humans not being information processors and he
| tries to demonstrate this by having a human process
| information (drawing from reference is processing an image
| and giving an output)...
|
| His argument sounds like one from 'it's always sunny'. As
| if metaphors never improve or get more accurate over time,
| and that this latest metaphor isn't the most accurate
| metaphor we have. It is. When we have something better:
| we'll all start talking about the brain in that frame of
| reference.
|
| This is an idiot that can write in a way that masks some
| deep bigotries (in favor of the mythical 'human spirit').
|
| I do not take this person seriously. I'm glossing over all
| casual incorrectness of his statements - a good number of
| them just aren't true. the ones I just scrolled to
| statements like... 'the brain keeps functioning or we
| disappear' or 'This might sound complicated, but it is
| actually incredibly simple, and completely free of
| computations, representations and algorithms' in the
| description of the 'linear optical trajectory' ALGORTHIM (a
| set of simple steps to follow - in this case - visual
| pattern matching).
|
| Where is the sense in what I just read?
| chx wrote:
| AI bros never take anything seriously aside from the
| bullshit they are drunk on
|
| https://disconnect.blog/what-comes-after-the-ai-crash/
| this is your future
| __loam wrote:
| Why are you bringing up metaphysics when the concern is that
| the student has seen the exam, so to speak.
| aflukasz wrote:
| I like this observation. And it fascinates me each time I see
| some self proclaimed conscious entity arguing that this just
| simply cannot be.
| cscurmudgeon wrote:
| Because you don't see X is not a proof that X doesn't
| exist. Here X may or not exist.
|
| X = difference between simulated and real consciousness
|
| Black holes were posited before they were detected
| empirically. We don't declare them to be non-existent when
| their theory came out just because we couldn't detect them.
| altruios wrote:
| > self proclaimed conscious entity
|
| Well, I do not proclaim consciousness: only the subjective
| feeling of consciousness. I really 'feel' conscious: but I
| can't prove or 'know' that in fact I am 'conscious' and
| making choices... to be conscious is to 'make choices'...
| Instead of just obeying the rules of chemistry and
| physics... which YOU HAVE TO BREAK in order to be conscious
| at all (how can you make a choice at all if you are fully
| obeying the rules of chemistry {which have no choice}).
|
| A choice does not apply to chemistry or physics: from where
| does choice come from - I suspect from our fantasies and
| nothing from objective reality (for I do not see humans
| consistently breaking the way chemistry works in their
| brains) - it probably comes from nowhere.
|
| If you can explain the lack of choice available in
| chemistry first (and how that doesn't interfere with us
| being able to make a choice): then I'll entertain the idea
| that we are conscious creatures. But if choice doesn't
| exist at the chemical level, it can't magically emerge from
| following deterministic rules. And chemistry is
| deterministic not probabilistic (h2 + o doesn't magically
| make neon ever, or 2 water molecules instead of one).
| jvalleroy wrote:
| You are confusing consciousness with free will. They are
| not the same.
|
| Consciousness is about experience, not "choices".
| altruios wrote:
| Experience and choice are adjacent when they are not the
| same.
|
| I specifically mean to say the experience of choice is
| the root of conscious thought - if you do not experience
| choice, you're experiencing the world the exact same way
| a robot would.
|
| When pretending you are in the fictional character of a
| movie vs the fictional character in a video game. one
| experience's more choice, is making conscious decisions
| vs a passive experience.
|
| Merely having an experience is not enough to be
| conscious. You have to actively be making choices to be
| considered conscious.
|
| Consciousness is about making choices. Choices are a
| measure of consciousness.
|
| But do choices actually exist?
| adamc wrote:
| I don't think this is clear at all. What I am
| experiencing is mostly the inner narrator, the ongoing
| stream of chatter about how I feel, what I see, what I
| think about what I see, etc.
|
| What I experience is self-observation, largely directed
| through or by language processing.
| jrflowers wrote:
| You could make the argument that two things that we don't
| understand are the same thing because we're equally ignorant
| of both in the same way that you could make the argument that
| Jimmy Hoffa and Genghis Khan are probably buried in the same
| place, since we have equal knowledge of their locations.
| gorjusborg wrote:
| Like the original Mechanical Turk.
|
| Clearly there is a difference between a small person hidden
| within playing chess and a fully mechanical chess
| automaton, but as the observer we might not be able to tell
| the difference. The observer's perception of the facts
| doesn't change the actual facts, and the implications of
| those facts.
| mannykannot wrote:
| The Mechanical Turk, however, was not a simulation of
| human consciousness, reasoning, chess-playing or any
| other human ability: it was the real thing, somewhat
| artfully dressed-up as to appear otherwise.
|
| Is it meaningful to say that Alphago Zero does not play
| Go, it just simulates something that does?
| mensetmanusman wrote:
| There might not be an external perspective, just someone
| else's internal perspective of the external.
| layer8 wrote:
| Consciousness and reasoning are orthogonal to each other.
| andrewla wrote:
| There's an interesting paper [1] that discusses this very
| possibility.
|
| [1] https://academic.oup.com/mind/article/LIX/236/433/986238?
| log...
| imtringued wrote:
| You can shuffle the choices in multiple choice based benchmarks
| and models that memorize the benchmark tend to do really badly,
| almost worse than random guessing.
| slashdave wrote:
| Easily overcome by training models on randomized questions.
| Trivial to implement too.
| tsukikage wrote:
| If the model works correctly on way more questions than
| there is room to store a giant list of recorded answers
| for, some kind of deduction of generalised rules must have
| taken place.
|
| "there is room to represent recorded answers for" is doing
| a lot of work, of course; it might e.g. have invented
| compression mechanisms better than known ones instead.
| slashdave wrote:
| You cannot create generalized questions using
| randomization alone
| altruios wrote:
| Maybe: there is no measurable difference between 'real'
| reasoning, and 'fake' reasoning.
|
| And if there is no measurable difference... we can't measure
| 'realness', we just have to measure something different (and
| more useful) 'soundness'. Regardless of it is reasoning or not
| internally, if it produces a sound and logical argument: who
| cares?
|
| I agree: I don't think any measure tested linguistically can
| prove it is internally reasoning... in the same way we haven't
| truly proven other sentient people aren't in fact zombies (we
| just politely assume the most likely case that they aren't).
| doe_eyes wrote:
| > Maybe: there is no measurable difference between 'real'
| reasoning, and 'fake' reasoning.
|
| The point is, it's easier to teach an LLM to fake it than to
| make it - for example, they get good at answering questions
| that overlap with their training data set long before they
| start generalizing.
|
| So on some epistemological level, your point is worth
| pondering; but more simply, it actually matters if an LLM has
| learned to game a benchmark vs approximate human cognition.
| If it's the former, it might fail in weird ways when we least
| expect it.
| visarga wrote:
| It's like students learning for the test, not really
| understanding. Or like regular people who don't always
| understand, just follow the memorized steps. How many times
| do we really really understand and how many do we just
| imitate?
|
| I have a suspicion that humans often use abstractions or
| methods they don't understand. We frequently rely on
| heuristics, mental shortcuts, and received wisdom without
| grasping the underlying principles. To understand has many
| meanings: to predict, control, use, explain, discover,
| model and generalize. Some also add "to feel".
|
| In one extreme we could say only a PhD in their area of
| expertise really understands, the rest of us just fumble
| concepts. I am sure rigorous causal reasoning is only
| possible by extended education, it is not the natural mode
| of operation of the brain.
| lucianbr wrote:
| Whatever 'real' reasoning is, it's more useful than 'fake'
| reasoning. We can't measure the difference, but we can use
| one and not the other.
|
| Multiple articles pointing out that AI isn't getting enough
| ROI are evidence we don't have 'real', read 'useful'
| reasoning. The fake reasoning in the paper does not help with
| this, and the fact that we can't measure the difference
| changes nothing.
|
| This 'something that we can't measure does not exist' logic
| is flawed. The earth's curvature existed way before we were
| able to measure it.
| adamc wrote:
| Made me think of the famous McNamara fallacy:
| https://en.wikipedia.org/wiki/McNamara_fallacy
|
| "The fourth step is to say that what can't be easily
| measured really doesn't exist. This is suicide."
| smodo wrote:
| I think this talk [0] by Jodie Burchell explains the problem
| pretty well. In short: you are right that for a given task,
| only the outcome matters. However, as Burchell shows, AI is
| sold as being able to generalize. I understand this as the
| ability to transfer concepts between dissimilar problem
| spaces. Clearly, if the problem space and or concepts need to
| be defined beforehand in order for the task to be performed
| by AI, there's little generalization going on.
|
| [0] https://youtu.be/Pv0cfsastFs?si=WLoMrT0S6Oe-f1OJ
| kridsdale3 wrote:
| Then those salesmen need to be silenced. They are selling
| the public AGI when every scientist says we don't have AGI
| but maybe through iterative research we can approach it.
| thwarted wrote:
| Describing some service/product in grandiose terms and
| misrepresenting it's actual use cases, utility, and
| applicablity, that it'll solve all your ills _and_ put
| out the cat, is a grift for as long as there have been
| salesmen. Silencing such salesmen would probably be a net
| gain, but it 's hardly new and probably isn't going to
| change because the salesmen don't get hit with the
| responsibility for following through on the promises they
| make or imply. They closed the sale and got their
| commission.
| HarHarVeryFunny wrote:
| Real reasoning, which can be used to predict outcomes in
| novel situations, is based on multi-step what-if prediction,
| perhaps coupled with actual experimentation, and requires
| things like long-term (task duration) attention, (task
| duration) working memory, online learning (unless you want to
| have to figure everything out from scratch everytime you re-
| encounter it), perhaps (depending on the problem) innate
| curiosity to explore potential solutions, etc. LLMs are
| architecturally missing all of the above.
|
| What you might call "fake" reasoning, or memorized reasoning,
| only works in situations similar to what an LLM was exposed
| to in it's training set (e.g. during a post-training step
| intended to embue better reasoning), and is just recalling
| reasoning steps (reflected in word sequences) that it has
| seen in the training set in similar circumstances.
|
| The difference between the two is that real reasoning will
| work for any problem, while fake/recall reasoning only works
| for situations it saw in the training set. Relying on fake
| reasoning makes the model very "brittle" - it may seem
| intelligent in some/many situations where it can rely on
| recall, but then "unexpectedly" behave in some dumb way when
| faced with a novel problem. You can see an example of this
| with the "farmer crossing river with hen and corn" type
| problem, where the models get it right if problem is similar
| enough to what it was trained on, but can devolve into
| nonsense like crossing back and forth multiple times
| unnecessarily (which has the surface form of a solution) if
| the problem is made a bit less familiar.
| TeMPOraL wrote:
| > _LLMs are architecturally missing all of the above._
|
| So are small children.
|
| I mean, they have a very limited form of the above. So do
| LLMs, within their context windows.
| chongli wrote:
| Children absolutely can solve those "farmer crossing the
| river" type problems with high reliability. Once they
| learn how to solve it once, changing up the animals will
| not fool a typical child. You could even create fictional
| animals with made-up names and they could solve it as
| long as you tell them which animal was the carnivore and
| which one was the herbivore.
|
| The fact that a child can do this an LLM cannot proves
| that the LLM lacks some general reasoning process which
| the child possesses.
| simonh wrote:
| There's an interesting wrinkle to this. There's a faculty
| called Prefrontal Synthesis that children learn from
| language early on, which enables them to compose
| recursive and hierarchical linguistic structures. This
| also enables them to reason about physical tasks in the
| sane way. Children that don't learn this by a certain age
| (I think about 5) can never learn it. The most common
| case is deaf children that never learn a 'proper' sign
| language early enough.
|
| So you're right, and children pick this up very quickly.
| I think Chomsky was definitely right that our brains are
| wired for grammar. Nevertheless there is a window of
| plasticity in young childhood to pick up certain
| capabilities, which still need to be learned, or
| activated.
| wizzwizz4 wrote:
| > _Children that don't learn this by a certain age (I
| think about 5) can never learn it._
|
| Helen Keller is a counterexample for a lot of these
| myths: she didn't have proper language (only several
| dozen home signs) until 7 or so. With things like vision,
| critical periods have been proven, but a lot of the
| higher-level stuff, I really doubt critical periods are a
| thing.
|
| Helen Keller _did_ have hearing until an illness at 19
| months, so it 's conceivable she developed the critical
| faculties then. A proper controlled trial would be
| unethical, so we may never know for sure.
| simonh wrote:
| Thanks, it's good to get counter arguments and wider
| context. This isn't an area I'm very familiar with, so
| I'm aware I could easily fall down a an intellectual
| pothole without knowing. Paper below, any additional
| context welcome.
|
| I misremembered however. The paper noted evidence of
| thresholds at 2, 5 and onset of puberty as seeming to
| affect p mental plasticity in these capabilities so
| there's no one cutoff.
|
| https://riojournal.com/article/38546/
| HarHarVeryFunny wrote:
| > So are small children.
|
| No - children have brains, same as adults. All they are
| missing is some life experience, but they are quite
| capable of applying the knowledge they do have to novel
| problems.
|
| There is a lot more structure to the architecture of our
| brain than that of an LLM (which was never designed for
| this!) - our brain has all the moving parts necessary for
| reasoning, critically including "always on" learning so
| that it can learn from it's own mistakes as it/we figure
| something out.
| matt-attack wrote:
| Isn't the "life experience" that a child is missing
| precisely analogous to the training an LLM requires?
| HarHarVeryFunny wrote:
| No, because the child, will be able to apply what they've
| learnt in novel ways, and experiment/explore (what-if or
| hands-on) to fill in the gaps. The LLM is heavily limited
| to what they learnt due to the architectural limitations
| in going beyond that.
| mannykannot wrote:
| If it produces a sound (and therefore, by definition, a
| logically valid) argument, that is about as good as we could
| hope for. What we want to avoid is the fallacy of assuming
| that all arguments with true conclusions are sound.
|
| Another thing we want to see in an extended discussion on a
| particular topic are a consistent set of premises across all
| arguments.
| slashdave wrote:
| There seems to be the implicit (and unspoken) assumption that
| these probability terms (PN,PS) are all independent. However,
| clearly they are not.
| croes wrote:
| Related
|
| https://news.ycombinator.com/item?id=41233206
| layer8 wrote:
| My impression is that LLMs "pattern-match" on a less abstract
| level than general-purpose reasoning requires. They capture a
| large number of typical reasoning patterns through their
| training, but it is not sufficiently decoupled, or generalized,
| from what the reasoning is _about_ in each of the concrete
| instances that occur in the training data. As a result, the
| apparent reasoning capability that LLMs exhibit significantly
| depends on what they are asked to reason about, and even depends
| on representational aspects like the sentence patterns used in
| the query. LLMs seem to be largely unable to symbolically
| abstract (as opposed to interpolate) from what is exemplified in
| the training data.
| arketyp wrote:
| I agree. Although I wonder sometimes how much "about" is
| involved in my own abstract reasoning.
| astromaniak wrote:
| For some reasons LLMs get a lot of attention. But.. while
| simplicity is great it has limits. To make model reason you
| have to put it in a loop with fallbacks. It has to try
| possibilities and fallback from false branches. Which can be
| done on a level higher. This can be either algorithm, another
| model, or another thread in the same model. To some degree it
| can be done by prompting in the same thread. Like asking LLMs
| to first print high level algorithm and then do it step by
| step.
| layer8 wrote:
| Iteration is important, but I don't think that it can
| substantively compensate for the abstraction limitations
| outlined in the GP comment.
| refulgentis wrote:
| > To make model reason you have to put it in a loop with
| fallbacks
|
| Source? TFA, i.e. the thing we're commenting on, tried to,
| and seems to, show the opposite
| llm_trw wrote:
| LLMs get a lot of attention because they were the first
| architecture that could scale to a trillion parameters while
| still improving with every added parameter.
| kgeist wrote:
| Aren't humans also prone to this?
|
| Take the classic trick question, for example: "A bat and a ball
| cost $1.10 in total. The bat costs $1.00 more than the ball.
| How much does the ball cost?"
|
| Most people give a wrong answer because they, too, "pattern
| match".
| mecsred wrote:
| I think you could reasonably describe that as "pattern
| matching at the wrong level". If you tell most people the
| first answer is wrong they will go up a level or two and work
| out the correct answer.
| nwhitehead wrote:
| You would think so...
|
| I asked this question in a college-level class with
| clickers. For the initial question I told them, "This is a
| trick question, your first answer might not be right".
| Still less than 10% of students got the right answer.
| adamisom wrote:
| Students are more arrogant than LLMs (but still more
| generally intelligent)
| amy-petrik-214 wrote:
| 10 cents! I'm 100% sure
| layer8 wrote:
| Regarding AI reasoning and abstraction capabilities, the ARC
| Prize competition is an interesting project:
| https://arcprize.org/
| w10-1 wrote:
| Their hypothesis is a good one:
|
| - A form of reasoning is to connect cause and effect via
| probability of necessity (PN) and the probability of sufficiency
| (PS).
|
| - You can identify when the natural language inputs can support
| PN and PS inference based on LLM modeling
|
| That would mean you can engineer in more causal reasoning based
| on data input and model architecture.
|
| They define causal functions, project accuracy measures (false
| positives/negatives) onto factual and counter-factual assertion
| tests, and measure LLM performance wrt this accuracy. They
| establish surprisingly low tolerance for counterfactual error
| rate, and suggest it might indicate an upper limit for reasoning
| based on current LLM architectures.
|
| Their findings are limited by how constrained their approach is
| (short simple boolean chains). It's hard to see how this approach
| could be extended to more complex reasoning. Conversely, if/since
| LLM's can't get this right, it's hard to see them progressing at
| the rates hoped, unless this approach somehow misses a dynamic of
| a larger model.
|
| It seems like this would be a very useful starting point for LLM
| quality engineering, at least for simple inference.
| heyjamesknight wrote:
| LLMs have access to the space of collective semantic
| understanding. I don't understand why people expect cognitive
| faculties that are clearly extra-semantic to just fall out of
| them eventually.
|
| The reason they sometimes _appear_ to reason is because there 's
| a lot of reasoning in the corpus of human text activity. But
| that's just a semantic artifact of a non-semantic process.
|
| Human cognition is much more than just our ability to string
| sentences together.
| matt-attack wrote:
| I don't know if you're correct. I don't think you know that our
| brains are that different? We too need to train ourselves on
| massive amounts of data. I feel like the kids of reasoning and
| understanding I've seen ChatGPT do are soooo far beyond
| something like just processing language.
| Minor49er wrote:
| What reasoning have you seen coming from ChatGPT?
| thuuuomas wrote:
| We do represent much of our cognition in language. Sometime I
| feel like LLMs might be "dancing skeletons" - pulleys & wire
| giving motion to the bones of cognition.
| sgt101 wrote:
| Our brains have effects that proceeded language. Look at
| lions for an example.
|
| We are much more (and a little less) than lions in terms of
| mind.
| Torkel wrote:
| Multimodal models are not limited by semantic understanding
| though, right?
|
| They are given photos, video, audio.
| mbil wrote:
| I might expect some extra-semantic cognitive faculties to
| emerge from LLMs, or at least be approximated by LLMs. Let me
| try to explain why. One example of extra-semantic ability is
| spatial reasoning. I can point to a spot on the ground and my
| dog will walk over to it -- he's probably not using semantic
| processing to talk through his relationship with the ground,
| the distance of each pace, his velocity, etc. But could a
| robotic dog powered by an LLM use a linguistic or symbolic
| representation of spacial concepts and actions to translate
| semantic reasoning into spacial reasoning? Imagine sensors with
| a measurement to language translation layer ("kitchen is five
| feet in front of you"), and actuators that can be triggered
| with language ("move forward two feet"). It seems conceivable
| that a detailed enough representation of the world, expressive
| enough controls, and a powerful enough LLM could result in
| something that is akin to spacial reasoning (an extra-semantic
| process), while under the hood it's "just" semantic
| understanding.
| abcde777666 wrote:
| There are many pillars of our own intelligence that we tend to
| gloss over. For instance - awareness and the ability to direct
| attention. Or something as simple as lifting your hand and moving
| some fingers at will. Those things impress me far more than the
| noises we produce with our mouths!
___________________________________________________________________
(page generated 2024-08-16 23:00 UTC)