[HN Gopher] Chain-of-Thought Can Hurt Performance on Tasks Where...
       ___________________________________________________________________
        
       Chain-of-Thought Can Hurt Performance on Tasks Where Thinking Makes
       Humans Worse
        
       Author : benocodes
       Score  : 115 points
       Date   : 2024-10-30 19:42 UTC (3 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | oatsandsugar wrote:
       | Tasks were thinking makes human worse
       | 
       | > Three such cases are implicit statistical learning, visual
       | recognition, and classifying with patterns containing exceptions.
       | 
       | Fascinating that our lizard brains are better at implicit
       | statistical reasoning
        
         | Dilettante_ wrote:
         | Well, by definition, thinking is always _explicit_ reasoning,
         | no?
         | 
         | And I'd hazard a guess that a well-thought through Fermi
         | Estimation beats lizard-brain eyeballing every time, it's just
         | that in the inbetween space the two interfere unfavourably.
        
           | YetAnotherNick wrote:
           | My guess would be no. I have terrible face recognition
           | ability and I can look into face for hour and still other
           | people could easily beat me in less than a second.(I am
           | assuming "well-thought through Fermi Estimation" would be
           | similar for me and others in this case).
        
         | brewii wrote:
         | Think about how fast you're able to determine the exact
         | trajectory of a ball and location to place your hand to catch a
         | ball using your lizard brain.
        
           | asah wrote:
           | you mean like pingpong?
           | 
           | https://arstechnica.com/information-
           | technology/2024/08/man-v...
        
           | taeric wrote:
           | This isn't some innate ability that people have. As evidenced
           | by how bad my kids are at catching things. :D
           | 
           | That said, I think this is a good example. We call it "muscle
           | memory" in that you are good at what you have trained at.
           | Change a parameter in it, though, and your execution will
           | almost certainly suffer.
        
             | skrtskrt wrote:
             | I mean even people that are "bad at catching things" are
             | still getting ridiculously close to catching it - getting
             | hands to the right area probably within well under a second
             | of the right timing - without being _taught_ anything in
             | particular about how a ball moves through the air.
        
               | taeric wrote:
               | Uh.... have you been around kids? It will take several
               | absurd misses before they even start to respond to a ball
               | in flight.
        
               | 331c8c71 wrote:
               | I hope we still agree the kids learn extremely
               | efficiently by ml standards.
        
           | hangonhn wrote:
           | You can do this while you're staring up the whole time. Your
           | brain can predict where the ball will end up even though it's
           | on a curved trajectory and place your hand in the right spot
           | to catch it without guidance from your eyes in the final
           | phase of travel. I have very little experience playing any
           | kind of sport that involves a ball and can reliably do this.
        
           | newZWhoDis wrote:
           | Which funny enough is why I hate rocket league.
           | 
           | All those years of baseball as a kid gave me a deep intuition
           | for where the ball would go, and that game doesn't use real
           | gravity (the ball is too floaty).
        
         | daft_pink wrote:
         | this is exactly what I was looking for. tasks where I should
         | not think and just trust my gut.
        
       | m3kw9 wrote:
       | would be slow to use COT on simple requests like 1+1
        
       | ryoshu wrote:
       | 95% * 95% = 90.25%
        
       | gpsx wrote:
       | I saw an LLM having this kind of problem when I was doing some
       | testing a ways back. I asked it to order three fruits from
       | largest to smallest. I think it was orange, blueberry and
       | grapefruit. It could do that easily with a simple prompt. When
       | the prompting included something to the effect of "think step by
       | step", it would try to talk through the problem and it would
       | usually get it wrong.
        
       | npunt wrote:
       | "Don't overthink it" is sometimes good advice!
        
         | marviel wrote:
         | I love backpropagating ideas from ML back into psychology :)
         | 
         | I think it shows great promise as a way to sidestep the ethical
         | concerns (and the reproducibility issues) associated with
         | traditional psychology research.
         | 
         | One idea in this space I think a lot about is from the Google
         | paper on curiosity and procrastination in reinforcement
         | learning: https://research.google/blog/curiosity-and-
         | procrastination-i...
         | 
         | Basically the idea is that you can model curiosity as a reward
         | signal proportional to your prediction error. They do an
         | experiment where they train an ML system to explore a maze
         | using curiosity, and it performs the task more efficiently --
         | UNTIL they add a "screen" in the maze that shows random images.
         | In this case, the agent maximizes the curiosity reward by just
         | staring at the screen.
         | 
         | Feels a little too relatable sometimes, as a highly curious
         | person with procrastination issues :)
        
           | npunt wrote:
           | "...in AI" will be the psychology equivalent of biology's
           | "...in Mice"
        
             | marviel wrote:
             | It will! Not 1:1, has issues, but gives hints.
             | 
             | Also much more scalable.
        
               | miningape wrote:
               | > Not 1:1, has issues, but gives hints.
               | 
               | > Also much more scalable.
               | 
               | This same description could be applied to lab mice
        
               | Terr_ wrote:
               | It'll probably be a ways before we start making shrines
               | to their unwilling participation though.
               | 
               | https://en.wikipedia.org/wiki/Monument_to_the_laboratory_
               | mou...
        
           | jeezfrk wrote:
           | "Nerd sniping"
        
       | veryfancy wrote:
       | So like dating?
        
       | Terr_ wrote:
       | Alternate framing: A powerful autocomplete algorithm is being
       | used to iteratively extend an existing document based on its
       | training set. _Sometimes_ you get a less-desirable end-result
       | when you intervene to change the style of the document away from
       | question-and-answer to something less common.
        
       | Y_Y wrote:
       | Reminds me of a mantra from chess class:                  long
       | think = wrong think
        
         | TZubiri wrote:
         | Was that perhaps a speed chess class?
        
       | TZubiri wrote:
       | So, LLMs face a regression on their latest proposed improvement.
       | It's not surprising considering their functional requirements
       | are:
       | 
       | 1) Everything
       | 
       | For the purpose of AGI, LLM are starting to look like a local
       | maximum.
        
       | mitko wrote:
       | This is so uncannily close to the problems we're encountering at
       | Pioneer, trying to make human+LLM workflows in high stakes / high
       | complexity situations.
       | 
       | Humans are so smart and do so many decisions and calculations on
       | the subconscious/implicit level and take a lot of mental
       | shortcuts, so that as we try to automate this by following
       | exactly what the process is, we bring a lot of the implicit
       | thinking out on the surface, and that slows everything down. So
       | we've had to be creative about how we build LLM workflows.
        
       ___________________________________________________________________
       (page generated 2024-10-30 23:00 UTC)