[HN Gopher] Understanding Reasoning LLMs
___________________________________________________________________
Understanding Reasoning LLMs
Author : sebg
Score : 29 points
Date : 2025-02-06 21:34 UTC (1 hours ago)
(HTM) web link (magazine.sebastianraschka.com)
(TXT) w3m dump (magazine.sebastianraschka.com)
| behnamoh wrote:
| doesn't it seem like these models are getting to the point where
| even conceiving their training and development is less and less
| possible for the general public?
|
| I mean, we already knew only a handful of companies with capital
| could train them, but at least the principles, algorithms, etc.
| were accessible to individuals who wanted to create their own -
| much simpler - models.
|
| it seems that era is quickly ending, and we are entering the era
| of truly "magic" AI models that no one knows how they work
| because companies keep their secret sauces...
| dr_dshiv wrote:
| How important is it that the reasoning takes place in another
| thread versus just chain-of-thought in the same thread? I feel
| like it makes a difference, but I have no evidence.
| vector_spaces wrote:
| Is there any work being done in training LLMs on more restricted
| formal languages? Something like a constraint solver or automated
| theorem prover, but much lower level. Specifically something that
| isn't natural language. That's the only path I could see towards
| reasoning models being truly effective
|
| I know there is work being done with e.g. Lean integration with
| ChatGPT, but that's not what I mean exactly -- there's still this
| shakey natural-language-trained-LLM glue in the driver's seat
|
| Like I'm envisioning something that has the creativity to try
| different things, but then JIT compile their chain of thought,
| and avoid bad paths
| mindwok wrote:
| How would that be different from something like ChatGPT
| executing Lean? That's exactly what humans do, we have messy
| reasoning that we then write down in formal logic and compile
| to see if it holds.
| gsam wrote:
| In my mind, the pure reinforcement learning approach of
| DeepSeek is the most practical way to do this. Essentially it
| needs to continually refine and find more sound(?) subspaces of
| the latent (embedding) space. Now this could be the subspace
| which is just Python code (or some other human-invented
| subspace), but I don't think that would be optimal for the
| overall architecture.
|
| The reason why it seems the most reasonable path is because
| when you create restrictions like this you hamper search
| viability (and in a high multi-dimensional subspace, that's a
| massive loss because you can arrive at a result from many
| directions). It's like regular genetic programming vs typed-
| genetic programming. When you discard all your useful results,
| you can't go anywhere near as fast. There will be a threshold
| where constructivist, generative schemes (e.g. reasoning with
| automata and all kinds of fun we've neglected) will be the way
| forward, but I don't think we've hit that point yet.
|
| These are just my cloudy current thoughts.
| prideout wrote:
| This article has a superb diagram of the DeepSeek training
| pipeline.
| aithrowawaycomm wrote:
| I like Raschka's writing, even if he is considerably more
| optimistic about this tech than I am. But I think it's
| inappropriate to claim that models like R1 are "good at deductive
| or inductive reasoning" when that is demonstrably not true, they
| are incapable of even the simplest "out-of-distribution"
| deductive reasoning:
| https://xcancel.com/JJitsev/status/1883158738661691878
|
| They are certainly capable of doing is a wide variety of
| computations that _simulate_ reasoning, and maybe that 's good
| enough for your use case. But it is unpredictably brittle unless
| you spend a lot on o1-pro (and even then...). Raschka has a line
| about "whether and how an LLM actually 'thinks' is a separate
| discussion" but this isn't about semantics. R1 clearly sucks at
| deductive reasoning and you will not understand "reasoning" LLMs
| if you take DeepSeek's claims at face value.
|
| It seems especially incurious for him to copy-paste the "a-ha
| moment" from Deepseek's technical report without critically
| investigating it. DeepSeek's claims are unscientific, without
| real evidence, and seem focused on hype and investment:
| This moment is not only an "aha moment" for the model but also
| for the researchers observing its behavior. It underscores the
| power and beauty of reinforcement learning: rather than
| explicitly teaching the model on how to solve a problem, we
| simply provide it with the right incentives, and it autonomously
| develops advanced problem-solving strategies. The
| "aha moment" serves as a powerful reminder of the potential of RL
| to unlock new levels of intelligence in artificial systems,
| paving the way for more autonomous and adaptive models in the
| future.
|
| Perhaps it was able to solve that tricky Olympiad problem, but
| there are an infinite variety of 1st grade math problems it is
| not able to solve. I doubt it's even reliably able to solve
| simple variations of that root problem. Maybe it is! But it's
| frustrating how little skepticism there is about CoT, reasoning
| traces, etc.
___________________________________________________________________
(page generated 2025-02-06 23:00 UTC)