[HN Gopher] Understanding Reasoning LLMs
       ___________________________________________________________________
        
       Understanding Reasoning LLMs
        
       Author : sebg
       Score  : 29 points
       Date   : 2025-02-06 21:34 UTC (1 hours ago)
        
 (HTM) web link (magazine.sebastianraschka.com)
 (TXT) w3m dump (magazine.sebastianraschka.com)
        
       | behnamoh wrote:
       | doesn't it seem like these models are getting to the point where
       | even conceiving their training and development is less and less
       | possible for the general public?
       | 
       | I mean, we already knew only a handful of companies with capital
       | could train them, but at least the principles, algorithms, etc.
       | were accessible to individuals who wanted to create their own -
       | much simpler - models.
       | 
       | it seems that era is quickly ending, and we are entering the era
       | of truly "magic" AI models that no one knows how they work
       | because companies keep their secret sauces...
        
       | dr_dshiv wrote:
       | How important is it that the reasoning takes place in another
       | thread versus just chain-of-thought in the same thread? I feel
       | like it makes a difference, but I have no evidence.
        
       | vector_spaces wrote:
       | Is there any work being done in training LLMs on more restricted
       | formal languages? Something like a constraint solver or automated
       | theorem prover, but much lower level. Specifically something that
       | isn't natural language. That's the only path I could see towards
       | reasoning models being truly effective
       | 
       | I know there is work being done with e.g. Lean integration with
       | ChatGPT, but that's not what I mean exactly -- there's still this
       | shakey natural-language-trained-LLM glue in the driver's seat
       | 
       | Like I'm envisioning something that has the creativity to try
       | different things, but then JIT compile their chain of thought,
       | and avoid bad paths
        
         | mindwok wrote:
         | How would that be different from something like ChatGPT
         | executing Lean? That's exactly what humans do, we have messy
         | reasoning that we then write down in formal logic and compile
         | to see if it holds.
        
         | gsam wrote:
         | In my mind, the pure reinforcement learning approach of
         | DeepSeek is the most practical way to do this. Essentially it
         | needs to continually refine and find more sound(?) subspaces of
         | the latent (embedding) space. Now this could be the subspace
         | which is just Python code (or some other human-invented
         | subspace), but I don't think that would be optimal for the
         | overall architecture.
         | 
         | The reason why it seems the most reasonable path is because
         | when you create restrictions like this you hamper search
         | viability (and in a high multi-dimensional subspace, that's a
         | massive loss because you can arrive at a result from many
         | directions). It's like regular genetic programming vs typed-
         | genetic programming. When you discard all your useful results,
         | you can't go anywhere near as fast. There will be a threshold
         | where constructivist, generative schemes (e.g. reasoning with
         | automata and all kinds of fun we've neglected) will be the way
         | forward, but I don't think we've hit that point yet.
         | 
         | These are just my cloudy current thoughts.
        
       | prideout wrote:
       | This article has a superb diagram of the DeepSeek training
       | pipeline.
        
       | aithrowawaycomm wrote:
       | I like Raschka's writing, even if he is considerably more
       | optimistic about this tech than I am. But I think it's
       | inappropriate to claim that models like R1 are "good at deductive
       | or inductive reasoning" when that is demonstrably not true, they
       | are incapable of even the simplest "out-of-distribution"
       | deductive reasoning:
       | https://xcancel.com/JJitsev/status/1883158738661691878
       | 
       | They are certainly capable of doing is a wide variety of
       | computations that _simulate_ reasoning, and maybe that 's good
       | enough for your use case. But it is unpredictably brittle unless
       | you spend a lot on o1-pro (and even then...). Raschka has a line
       | about "whether and how an LLM actually 'thinks' is a separate
       | discussion" but this isn't about semantics. R1 clearly sucks at
       | deductive reasoning and you will not understand "reasoning" LLMs
       | if you take DeepSeek's claims at face value.
       | 
       | It seems especially incurious for him to copy-paste the "a-ha
       | moment" from Deepseek's technical report without critically
       | investigating it. DeepSeek's claims are unscientific, without
       | real evidence, and seem focused on hype and investment:
       | This moment is not only an "aha moment" for the model but also
       | for the researchers observing its behavior. It underscores the
       | power and beauty of reinforcement learning: rather than
       | explicitly teaching the model on how to solve a problem, we
       | simply provide it with the right incentives, and it autonomously
       | develops advanced problem-solving strategies.             The
       | "aha moment" serves as a powerful reminder of the potential of RL
       | to unlock new levels of intelligence in artificial systems,
       | paving the way for more autonomous and adaptive models in the
       | future.
       | 
       | Perhaps it was able to solve that tricky Olympiad problem, but
       | there are an infinite variety of 1st grade math problems it is
       | not able to solve. I doubt it's even reliably able to solve
       | simple variations of that root problem. Maybe it is! But it's
       | frustrating how little skepticism there is about CoT, reasoning
       | traces, etc.
        
       ___________________________________________________________________
       (page generated 2025-02-06 23:00 UTC)