[HN Gopher] Chain-of-Thought Can Hurt Performance on Tasks Where...
___________________________________________________________________
Chain-of-Thought Can Hurt Performance on Tasks Where Thinking Makes
Humans Worse
Author : benocodes
Score : 115 points
Date : 2024-10-30 19:42 UTC (3 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| oatsandsugar wrote:
| Tasks were thinking makes human worse
|
| > Three such cases are implicit statistical learning, visual
| recognition, and classifying with patterns containing exceptions.
|
| Fascinating that our lizard brains are better at implicit
| statistical reasoning
| Dilettante_ wrote:
| Well, by definition, thinking is always _explicit_ reasoning,
| no?
|
| And I'd hazard a guess that a well-thought through Fermi
| Estimation beats lizard-brain eyeballing every time, it's just
| that in the inbetween space the two interfere unfavourably.
| YetAnotherNick wrote:
| My guess would be no. I have terrible face recognition
| ability and I can look into face for hour and still other
| people could easily beat me in less than a second.(I am
| assuming "well-thought through Fermi Estimation" would be
| similar for me and others in this case).
| brewii wrote:
| Think about how fast you're able to determine the exact
| trajectory of a ball and location to place your hand to catch a
| ball using your lizard brain.
| asah wrote:
| you mean like pingpong?
|
| https://arstechnica.com/information-
| technology/2024/08/man-v...
| taeric wrote:
| This isn't some innate ability that people have. As evidenced
| by how bad my kids are at catching things. :D
|
| That said, I think this is a good example. We call it "muscle
| memory" in that you are good at what you have trained at.
| Change a parameter in it, though, and your execution will
| almost certainly suffer.
| skrtskrt wrote:
| I mean even people that are "bad at catching things" are
| still getting ridiculously close to catching it - getting
| hands to the right area probably within well under a second
| of the right timing - without being _taught_ anything in
| particular about how a ball moves through the air.
| taeric wrote:
| Uh.... have you been around kids? It will take several
| absurd misses before they even start to respond to a ball
| in flight.
| 331c8c71 wrote:
| I hope we still agree the kids learn extremely
| efficiently by ml standards.
| hangonhn wrote:
| You can do this while you're staring up the whole time. Your
| brain can predict where the ball will end up even though it's
| on a curved trajectory and place your hand in the right spot
| to catch it without guidance from your eyes in the final
| phase of travel. I have very little experience playing any
| kind of sport that involves a ball and can reliably do this.
| newZWhoDis wrote:
| Which funny enough is why I hate rocket league.
|
| All those years of baseball as a kid gave me a deep intuition
| for where the ball would go, and that game doesn't use real
| gravity (the ball is too floaty).
| daft_pink wrote:
| this is exactly what I was looking for. tasks where I should
| not think and just trust my gut.
| m3kw9 wrote:
| would be slow to use COT on simple requests like 1+1
| ryoshu wrote:
| 95% * 95% = 90.25%
| gpsx wrote:
| I saw an LLM having this kind of problem when I was doing some
| testing a ways back. I asked it to order three fruits from
| largest to smallest. I think it was orange, blueberry and
| grapefruit. It could do that easily with a simple prompt. When
| the prompting included something to the effect of "think step by
| step", it would try to talk through the problem and it would
| usually get it wrong.
| npunt wrote:
| "Don't overthink it" is sometimes good advice!
| marviel wrote:
| I love backpropagating ideas from ML back into psychology :)
|
| I think it shows great promise as a way to sidestep the ethical
| concerns (and the reproducibility issues) associated with
| traditional psychology research.
|
| One idea in this space I think a lot about is from the Google
| paper on curiosity and procrastination in reinforcement
| learning: https://research.google/blog/curiosity-and-
| procrastination-i...
|
| Basically the idea is that you can model curiosity as a reward
| signal proportional to your prediction error. They do an
| experiment where they train an ML system to explore a maze
| using curiosity, and it performs the task more efficiently --
| UNTIL they add a "screen" in the maze that shows random images.
| In this case, the agent maximizes the curiosity reward by just
| staring at the screen.
|
| Feels a little too relatable sometimes, as a highly curious
| person with procrastination issues :)
| npunt wrote:
| "...in AI" will be the psychology equivalent of biology's
| "...in Mice"
| marviel wrote:
| It will! Not 1:1, has issues, but gives hints.
|
| Also much more scalable.
| miningape wrote:
| > Not 1:1, has issues, but gives hints.
|
| > Also much more scalable.
|
| This same description could be applied to lab mice
| Terr_ wrote:
| It'll probably be a ways before we start making shrines
| to their unwilling participation though.
|
| https://en.wikipedia.org/wiki/Monument_to_the_laboratory_
| mou...
| jeezfrk wrote:
| "Nerd sniping"
| veryfancy wrote:
| So like dating?
| Terr_ wrote:
| Alternate framing: A powerful autocomplete algorithm is being
| used to iteratively extend an existing document based on its
| training set. _Sometimes_ you get a less-desirable end-result
| when you intervene to change the style of the document away from
| question-and-answer to something less common.
| Y_Y wrote:
| Reminds me of a mantra from chess class: long
| think = wrong think
| TZubiri wrote:
| Was that perhaps a speed chess class?
| TZubiri wrote:
| So, LLMs face a regression on their latest proposed improvement.
| It's not surprising considering their functional requirements
| are:
|
| 1) Everything
|
| For the purpose of AGI, LLM are starting to look like a local
| maximum.
| mitko wrote:
| This is so uncannily close to the problems we're encountering at
| Pioneer, trying to make human+LLM workflows in high stakes / high
| complexity situations.
|
| Humans are so smart and do so many decisions and calculations on
| the subconscious/implicit level and take a lot of mental
| shortcuts, so that as we try to automate this by following
| exactly what the process is, we bring a lot of the implicit
| thinking out on the surface, and that slows everything down. So
| we've had to be creative about how we build LLM workflows.
___________________________________________________________________
(page generated 2024-10-30 23:00 UTC)