[HN Gopher] Does Reasoning Emerge? Probabilities of Causation in...
       ___________________________________________________________________
        
       Does Reasoning Emerge? Probabilities of Causation in Large Language
       Models
        
       Author : belter
       Score  : 111 points
       Date   : 2024-08-16 16:19 UTC (6 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | doe_eyes wrote:
       | This is proposed as a way to measure "true" reasoning by asking a
       | certain type of trick questions, but I don't quite see how this
       | could be a basis of a sustainable benchmark.
       | 
       | If this gets attention, the next generation of LLMs will be
       | trained on this paper, and then fine-tuned by using this exact
       | form of questions to appear strong on this benchmark, and...
       | we're back to square one.
        
         | jcoc611 wrote:
         | From an external perspective, is there a way to distinguish
         | between simulation of consciousness and the real thing?
         | 
         | If the answer is no, could you make an argument that they are
         | the same?
        
           | chx wrote:
           | Throwing all the paintings made prior 1937 into an LLM would
           | never get Guernica out of it. As long as it's an LLM this
           | stands, not just today but all the way to the future.
           | 
           | This empty sophistry of presuming automated bullshit
           | generators somehow can mimic a human brain is laughable.
           | 
           | Please please read https://aeon.co/essays/your-brain-does-
           | not-process-informati...
        
             | altruios wrote:
             | The author fails to provide any argument other than one of
             | incredulity and some bad reasoning with bad faith examples.
             | 
             | The dollar bill copying example is a faulty metaphor. His
             | claim of humans not being information processors and he
             | tries to demonstrate this by having a human process
             | information (drawing from reference is processing an image
             | and giving an output)...
             | 
             | His argument sounds like one from 'it's always sunny'. As
             | if metaphors never improve or get more accurate over time,
             | and that this latest metaphor isn't the most accurate
             | metaphor we have. It is. When we have something better:
             | we'll all start talking about the brain in that frame of
             | reference.
             | 
             | This is an idiot that can write in a way that masks some
             | deep bigotries (in favor of the mythical 'human spirit').
             | 
             | I do not take this person seriously. I'm glossing over all
             | casual incorrectness of his statements - a good number of
             | them just aren't true. the ones I just scrolled to
             | statements like... 'the brain keeps functioning or we
             | disappear' or 'This might sound complicated, but it is
             | actually incredibly simple, and completely free of
             | computations, representations and algorithms' in the
             | description of the 'linear optical trajectory' ALGORTHIM (a
             | set of simple steps to follow - in this case - visual
             | pattern matching).
             | 
             | Where is the sense in what I just read?
        
               | chx wrote:
               | AI bros never take anything seriously aside from the
               | bullshit they are drunk on
               | 
               | https://disconnect.blog/what-comes-after-the-ai-crash/
               | this is your future
        
           | __loam wrote:
           | Why are you bringing up metaphysics when the concern is that
           | the student has seen the exam, so to speak.
        
           | aflukasz wrote:
           | I like this observation. And it fascinates me each time I see
           | some self proclaimed conscious entity arguing that this just
           | simply cannot be.
        
             | cscurmudgeon wrote:
             | Because you don't see X is not a proof that X doesn't
             | exist. Here X may or not exist.
             | 
             | X = difference between simulated and real consciousness
             | 
             | Black holes were posited before they were detected
             | empirically. We don't declare them to be non-existent when
             | their theory came out just because we couldn't detect them.
        
             | altruios wrote:
             | > self proclaimed conscious entity
             | 
             | Well, I do not proclaim consciousness: only the subjective
             | feeling of consciousness. I really 'feel' conscious: but I
             | can't prove or 'know' that in fact I am 'conscious' and
             | making choices... to be conscious is to 'make choices'...
             | Instead of just obeying the rules of chemistry and
             | physics... which YOU HAVE TO BREAK in order to be conscious
             | at all (how can you make a choice at all if you are fully
             | obeying the rules of chemistry {which have no choice}).
             | 
             | A choice does not apply to chemistry or physics: from where
             | does choice come from - I suspect from our fantasies and
             | nothing from objective reality (for I do not see humans
             | consistently breaking the way chemistry works in their
             | brains) - it probably comes from nowhere.
             | 
             | If you can explain the lack of choice available in
             | chemistry first (and how that doesn't interfere with us
             | being able to make a choice): then I'll entertain the idea
             | that we are conscious creatures. But if choice doesn't
             | exist at the chemical level, it can't magically emerge from
             | following deterministic rules. And chemistry is
             | deterministic not probabilistic (h2 + o doesn't magically
             | make neon ever, or 2 water molecules instead of one).
        
               | jvalleroy wrote:
               | You are confusing consciousness with free will. They are
               | not the same.
               | 
               | Consciousness is about experience, not "choices".
        
               | altruios wrote:
               | Experience and choice are adjacent when they are not the
               | same.
               | 
               | I specifically mean to say the experience of choice is
               | the root of conscious thought - if you do not experience
               | choice, you're experiencing the world the exact same way
               | a robot would.
               | 
               | When pretending you are in the fictional character of a
               | movie vs the fictional character in a video game. one
               | experience's more choice, is making conscious decisions
               | vs a passive experience.
               | 
               | Merely having an experience is not enough to be
               | conscious. You have to actively be making choices to be
               | considered conscious.
               | 
               | Consciousness is about making choices. Choices are a
               | measure of consciousness.
               | 
               | But do choices actually exist?
        
               | adamc wrote:
               | I don't think this is clear at all. What I am
               | experiencing is mostly the inner narrator, the ongoing
               | stream of chatter about how I feel, what I see, what I
               | think about what I see, etc.
               | 
               | What I experience is self-observation, largely directed
               | through or by language processing.
        
           | jrflowers wrote:
           | You could make the argument that two things that we don't
           | understand are the same thing because we're equally ignorant
           | of both in the same way that you could make the argument that
           | Jimmy Hoffa and Genghis Khan are probably buried in the same
           | place, since we have equal knowledge of their locations.
        
             | gorjusborg wrote:
             | Like the original Mechanical Turk.
             | 
             | Clearly there is a difference between a small person hidden
             | within playing chess and a fully mechanical chess
             | automaton, but as the observer we might not be able to tell
             | the difference. The observer's perception of the facts
             | doesn't change the actual facts, and the implications of
             | those facts.
        
               | mannykannot wrote:
               | The Mechanical Turk, however, was not a simulation of
               | human consciousness, reasoning, chess-playing or any
               | other human ability: it was the real thing, somewhat
               | artfully dressed-up as to appear otherwise.
               | 
               | Is it meaningful to say that Alphago Zero does not play
               | Go, it just simulates something that does?
        
           | mensetmanusman wrote:
           | There might not be an external perspective, just someone
           | else's internal perspective of the external.
        
           | layer8 wrote:
           | Consciousness and reasoning are orthogonal to each other.
        
           | andrewla wrote:
           | There's an interesting paper [1] that discusses this very
           | possibility.
           | 
           | [1] https://academic.oup.com/mind/article/LIX/236/433/986238?
           | log...
        
         | imtringued wrote:
         | You can shuffle the choices in multiple choice based benchmarks
         | and models that memorize the benchmark tend to do really badly,
         | almost worse than random guessing.
        
           | slashdave wrote:
           | Easily overcome by training models on randomized questions.
           | Trivial to implement too.
        
             | tsukikage wrote:
             | If the model works correctly on way more questions than
             | there is room to store a giant list of recorded answers
             | for, some kind of deduction of generalised rules must have
             | taken place.
             | 
             | "there is room to represent recorded answers for" is doing
             | a lot of work, of course; it might e.g. have invented
             | compression mechanisms better than known ones instead.
        
               | slashdave wrote:
               | You cannot create generalized questions using
               | randomization alone
        
         | altruios wrote:
         | Maybe: there is no measurable difference between 'real'
         | reasoning, and 'fake' reasoning.
         | 
         | And if there is no measurable difference... we can't measure
         | 'realness', we just have to measure something different (and
         | more useful) 'soundness'. Regardless of it is reasoning or not
         | internally, if it produces a sound and logical argument: who
         | cares?
         | 
         | I agree: I don't think any measure tested linguistically can
         | prove it is internally reasoning... in the same way we haven't
         | truly proven other sentient people aren't in fact zombies (we
         | just politely assume the most likely case that they aren't).
        
           | doe_eyes wrote:
           | > Maybe: there is no measurable difference between 'real'
           | reasoning, and 'fake' reasoning.
           | 
           | The point is, it's easier to teach an LLM to fake it than to
           | make it - for example, they get good at answering questions
           | that overlap with their training data set long before they
           | start generalizing.
           | 
           | So on some epistemological level, your point is worth
           | pondering; but more simply, it actually matters if an LLM has
           | learned to game a benchmark vs approximate human cognition.
           | If it's the former, it might fail in weird ways when we least
           | expect it.
        
             | visarga wrote:
             | It's like students learning for the test, not really
             | understanding. Or like regular people who don't always
             | understand, just follow the memorized steps. How many times
             | do we really really understand and how many do we just
             | imitate?
             | 
             | I have a suspicion that humans often use abstractions or
             | methods they don't understand. We frequently rely on
             | heuristics, mental shortcuts, and received wisdom without
             | grasping the underlying principles. To understand has many
             | meanings: to predict, control, use, explain, discover,
             | model and generalize. Some also add "to feel".
             | 
             | In one extreme we could say only a PhD in their area of
             | expertise really understands, the rest of us just fumble
             | concepts. I am sure rigorous causal reasoning is only
             | possible by extended education, it is not the natural mode
             | of operation of the brain.
        
           | lucianbr wrote:
           | Whatever 'real' reasoning is, it's more useful than 'fake'
           | reasoning. We can't measure the difference, but we can use
           | one and not the other.
           | 
           | Multiple articles pointing out that AI isn't getting enough
           | ROI are evidence we don't have 'real', read 'useful'
           | reasoning. The fake reasoning in the paper does not help with
           | this, and the fact that we can't measure the difference
           | changes nothing.
           | 
           | This 'something that we can't measure does not exist' logic
           | is flawed. The earth's curvature existed way before we were
           | able to measure it.
        
             | adamc wrote:
             | Made me think of the famous McNamara fallacy:
             | https://en.wikipedia.org/wiki/McNamara_fallacy
             | 
             | "The fourth step is to say that what can't be easily
             | measured really doesn't exist. This is suicide."
        
           | smodo wrote:
           | I think this talk [0] by Jodie Burchell explains the problem
           | pretty well. In short: you are right that for a given task,
           | only the outcome matters. However, as Burchell shows, AI is
           | sold as being able to generalize. I understand this as the
           | ability to transfer concepts between dissimilar problem
           | spaces. Clearly, if the problem space and or concepts need to
           | be defined beforehand in order for the task to be performed
           | by AI, there's little generalization going on.
           | 
           | [0] https://youtu.be/Pv0cfsastFs?si=WLoMrT0S6Oe-f1OJ
        
             | kridsdale3 wrote:
             | Then those salesmen need to be silenced. They are selling
             | the public AGI when every scientist says we don't have AGI
             | but maybe through iterative research we can approach it.
        
               | thwarted wrote:
               | Describing some service/product in grandiose terms and
               | misrepresenting it's actual use cases, utility, and
               | applicablity, that it'll solve all your ills _and_ put
               | out the cat, is a grift for as long as there have been
               | salesmen. Silencing such salesmen would probably be a net
               | gain, but it 's hardly new and probably isn't going to
               | change because the salesmen don't get hit with the
               | responsibility for following through on the promises they
               | make or imply. They closed the sale and got their
               | commission.
        
           | HarHarVeryFunny wrote:
           | Real reasoning, which can be used to predict outcomes in
           | novel situations, is based on multi-step what-if prediction,
           | perhaps coupled with actual experimentation, and requires
           | things like long-term (task duration) attention, (task
           | duration) working memory, online learning (unless you want to
           | have to figure everything out from scratch everytime you re-
           | encounter it), perhaps (depending on the problem) innate
           | curiosity to explore potential solutions, etc. LLMs are
           | architecturally missing all of the above.
           | 
           | What you might call "fake" reasoning, or memorized reasoning,
           | only works in situations similar to what an LLM was exposed
           | to in it's training set (e.g. during a post-training step
           | intended to embue better reasoning), and is just recalling
           | reasoning steps (reflected in word sequences) that it has
           | seen in the training set in similar circumstances.
           | 
           | The difference between the two is that real reasoning will
           | work for any problem, while fake/recall reasoning only works
           | for situations it saw in the training set. Relying on fake
           | reasoning makes the model very "brittle" - it may seem
           | intelligent in some/many situations where it can rely on
           | recall, but then "unexpectedly" behave in some dumb way when
           | faced with a novel problem. You can see an example of this
           | with the "farmer crossing river with hen and corn" type
           | problem, where the models get it right if problem is similar
           | enough to what it was trained on, but can devolve into
           | nonsense like crossing back and forth multiple times
           | unnecessarily (which has the surface form of a solution) if
           | the problem is made a bit less familiar.
        
             | TeMPOraL wrote:
             | > _LLMs are architecturally missing all of the above._
             | 
             | So are small children.
             | 
             | I mean, they have a very limited form of the above. So do
             | LLMs, within their context windows.
        
               | chongli wrote:
               | Children absolutely can solve those "farmer crossing the
               | river" type problems with high reliability. Once they
               | learn how to solve it once, changing up the animals will
               | not fool a typical child. You could even create fictional
               | animals with made-up names and they could solve it as
               | long as you tell them which animal was the carnivore and
               | which one was the herbivore.
               | 
               | The fact that a child can do this an LLM cannot proves
               | that the LLM lacks some general reasoning process which
               | the child possesses.
        
               | simonh wrote:
               | There's an interesting wrinkle to this. There's a faculty
               | called Prefrontal Synthesis that children learn from
               | language early on, which enables them to compose
               | recursive and hierarchical linguistic structures. This
               | also enables them to reason about physical tasks in the
               | sane way. Children that don't learn this by a certain age
               | (I think about 5) can never learn it. The most common
               | case is deaf children that never learn a 'proper' sign
               | language early enough.
               | 
               | So you're right, and children pick this up very quickly.
               | I think Chomsky was definitely right that our brains are
               | wired for grammar. Nevertheless there is a window of
               | plasticity in young childhood to pick up certain
               | capabilities, which still need to be learned, or
               | activated.
        
               | wizzwizz4 wrote:
               | > _Children that don't learn this by a certain age (I
               | think about 5) can never learn it._
               | 
               | Helen Keller is a counterexample for a lot of these
               | myths: she didn't have proper language (only several
               | dozen home signs) until 7 or so. With things like vision,
               | critical periods have been proven, but a lot of the
               | higher-level stuff, I really doubt critical periods are a
               | thing.
               | 
               | Helen Keller _did_ have hearing until an illness at 19
               | months, so it 's conceivable she developed the critical
               | faculties then. A proper controlled trial would be
               | unethical, so we may never know for sure.
        
               | simonh wrote:
               | Thanks, it's good to get counter arguments and wider
               | context. This isn't an area I'm very familiar with, so
               | I'm aware I could easily fall down a an intellectual
               | pothole without knowing. Paper below, any additional
               | context welcome.
               | 
               | I misremembered however. The paper noted evidence of
               | thresholds at 2, 5 and onset of puberty as seeming to
               | affect p mental plasticity in these capabilities so
               | there's no one cutoff.
               | 
               | https://riojournal.com/article/38546/
        
               | HarHarVeryFunny wrote:
               | > So are small children.
               | 
               | No - children have brains, same as adults. All they are
               | missing is some life experience, but they are quite
               | capable of applying the knowledge they do have to novel
               | problems.
               | 
               | There is a lot more structure to the architecture of our
               | brain than that of an LLM (which was never designed for
               | this!) - our brain has all the moving parts necessary for
               | reasoning, critically including "always on" learning so
               | that it can learn from it's own mistakes as it/we figure
               | something out.
        
               | matt-attack wrote:
               | Isn't the "life experience" that a child is missing
               | precisely analogous to the training an LLM requires?
        
               | HarHarVeryFunny wrote:
               | No, because the child, will be able to apply what they've
               | learnt in novel ways, and experiment/explore (what-if or
               | hands-on) to fill in the gaps. The LLM is heavily limited
               | to what they learnt due to the architectural limitations
               | in going beyond that.
        
           | mannykannot wrote:
           | If it produces a sound (and therefore, by definition, a
           | logically valid) argument, that is about as good as we could
           | hope for. What we want to avoid is the fallacy of assuming
           | that all arguments with true conclusions are sound.
           | 
           | Another thing we want to see in an extended discussion on a
           | particular topic are a consistent set of premises across all
           | arguments.
        
       | slashdave wrote:
       | There seems to be the implicit (and unspoken) assumption that
       | these probability terms (PN,PS) are all independent. However,
       | clearly they are not.
        
       | croes wrote:
       | Related
       | 
       | https://news.ycombinator.com/item?id=41233206
        
       | layer8 wrote:
       | My impression is that LLMs "pattern-match" on a less abstract
       | level than general-purpose reasoning requires. They capture a
       | large number of typical reasoning patterns through their
       | training, but it is not sufficiently decoupled, or generalized,
       | from what the reasoning is _about_ in each of the concrete
       | instances that occur in the training data. As a result, the
       | apparent reasoning capability that LLMs exhibit significantly
       | depends on what they are asked to reason about, and even depends
       | on representational aspects like the sentence patterns used in
       | the query. LLMs seem to be largely unable to symbolically
       | abstract (as opposed to interpolate) from what is exemplified in
       | the training data.
        
         | arketyp wrote:
         | I agree. Although I wonder sometimes how much "about" is
         | involved in my own abstract reasoning.
        
         | astromaniak wrote:
         | For some reasons LLMs get a lot of attention. But.. while
         | simplicity is great it has limits. To make model reason you
         | have to put it in a loop with fallbacks. It has to try
         | possibilities and fallback from false branches. Which can be
         | done on a level higher. This can be either algorithm, another
         | model, or another thread in the same model. To some degree it
         | can be done by prompting in the same thread. Like asking LLMs
         | to first print high level algorithm and then do it step by
         | step.
        
           | layer8 wrote:
           | Iteration is important, but I don't think that it can
           | substantively compensate for the abstraction limitations
           | outlined in the GP comment.
        
           | refulgentis wrote:
           | > To make model reason you have to put it in a loop with
           | fallbacks
           | 
           | Source? TFA, i.e. the thing we're commenting on, tried to,
           | and seems to, show the opposite
        
           | llm_trw wrote:
           | LLMs get a lot of attention because they were the first
           | architecture that could scale to a trillion parameters while
           | still improving with every added parameter.
        
         | kgeist wrote:
         | Aren't humans also prone to this?
         | 
         | Take the classic trick question, for example: "A bat and a ball
         | cost $1.10 in total. The bat costs $1.00 more than the ball.
         | How much does the ball cost?"
         | 
         | Most people give a wrong answer because they, too, "pattern
         | match".
        
           | mecsred wrote:
           | I think you could reasonably describe that as "pattern
           | matching at the wrong level". If you tell most people the
           | first answer is wrong they will go up a level or two and work
           | out the correct answer.
        
             | nwhitehead wrote:
             | You would think so...
             | 
             | I asked this question in a college-level class with
             | clickers. For the initial question I told them, "This is a
             | trick question, your first answer might not be right".
             | Still less than 10% of students got the right answer.
        
               | adamisom wrote:
               | Students are more arrogant than LLMs (but still more
               | generally intelligent)
        
           | amy-petrik-214 wrote:
           | 10 cents! I'm 100% sure
        
       | layer8 wrote:
       | Regarding AI reasoning and abstraction capabilities, the ARC
       | Prize competition is an interesting project:
       | https://arcprize.org/
        
       | w10-1 wrote:
       | Their hypothesis is a good one:
       | 
       | - A form of reasoning is to connect cause and effect via
       | probability of necessity (PN) and the probability of sufficiency
       | (PS).
       | 
       | - You can identify when the natural language inputs can support
       | PN and PS inference based on LLM modeling
       | 
       | That would mean you can engineer in more causal reasoning based
       | on data input and model architecture.
       | 
       | They define causal functions, project accuracy measures (false
       | positives/negatives) onto factual and counter-factual assertion
       | tests, and measure LLM performance wrt this accuracy. They
       | establish surprisingly low tolerance for counterfactual error
       | rate, and suggest it might indicate an upper limit for reasoning
       | based on current LLM architectures.
       | 
       | Their findings are limited by how constrained their approach is
       | (short simple boolean chains). It's hard to see how this approach
       | could be extended to more complex reasoning. Conversely, if/since
       | LLM's can't get this right, it's hard to see them progressing at
       | the rates hoped, unless this approach somehow misses a dynamic of
       | a larger model.
       | 
       | It seems like this would be a very useful starting point for LLM
       | quality engineering, at least for simple inference.
        
       | heyjamesknight wrote:
       | LLMs have access to the space of collective semantic
       | understanding. I don't understand why people expect cognitive
       | faculties that are clearly extra-semantic to just fall out of
       | them eventually.
       | 
       | The reason they sometimes _appear_ to reason is because there 's
       | a lot of reasoning in the corpus of human text activity. But
       | that's just a semantic artifact of a non-semantic process.
       | 
       | Human cognition is much more than just our ability to string
       | sentences together.
        
         | matt-attack wrote:
         | I don't know if you're correct. I don't think you know that our
         | brains are that different? We too need to train ourselves on
         | massive amounts of data. I feel like the kids of reasoning and
         | understanding I've seen ChatGPT do are soooo far beyond
         | something like just processing language.
        
           | Minor49er wrote:
           | What reasoning have you seen coming from ChatGPT?
        
           | thuuuomas wrote:
           | We do represent much of our cognition in language. Sometime I
           | feel like LLMs might be "dancing skeletons" - pulleys & wire
           | giving motion to the bones of cognition.
        
           | sgt101 wrote:
           | Our brains have effects that proceeded language. Look at
           | lions for an example.
           | 
           | We are much more (and a little less) than lions in terms of
           | mind.
        
         | Torkel wrote:
         | Multimodal models are not limited by semantic understanding
         | though, right?
         | 
         | They are given photos, video, audio.
        
         | mbil wrote:
         | I might expect some extra-semantic cognitive faculties to
         | emerge from LLMs, or at least be approximated by LLMs. Let me
         | try to explain why. One example of extra-semantic ability is
         | spatial reasoning. I can point to a spot on the ground and my
         | dog will walk over to it -- he's probably not using semantic
         | processing to talk through his relationship with the ground,
         | the distance of each pace, his velocity, etc. But could a
         | robotic dog powered by an LLM use a linguistic or symbolic
         | representation of spacial concepts and actions to translate
         | semantic reasoning into spacial reasoning? Imagine sensors with
         | a measurement to language translation layer ("kitchen is five
         | feet in front of you"), and actuators that can be triggered
         | with language ("move forward two feet"). It seems conceivable
         | that a detailed enough representation of the world, expressive
         | enough controls, and a powerful enough LLM could result in
         | something that is akin to spacial reasoning (an extra-semantic
         | process), while under the hood it's "just" semantic
         | understanding.
        
       | abcde777666 wrote:
       | There are many pillars of our own intelligence that we tend to
       | gloss over. For instance - awareness and the ability to direct
       | attention. Or something as simple as lifting your hand and moving
       | some fingers at will. Those things impress me far more than the
       | noises we produce with our mouths!
        
       ___________________________________________________________________
       (page generated 2024-08-16 23:00 UTC)