[HN Gopher] Less is more: Recursive reasoning with tiny networks
       ___________________________________________________________________
        
       Less is more: Recursive reasoning with tiny networks
        
       Paper: https://arxiv.org/abs/2510.04871, Code:
       https://github.com/SamsungSAILMontreal/TinyRecursiveModels
        
       Author : guybedo
       Score  : 135 points
       Date   : 2025-10-07 17:42 UTC (5 hours ago)
        
 (HTM) web link (alexiajm.github.io)
 (TXT) w3m dump (alexiajm.github.io)
        
       | guybedo wrote:
       | Abstract:
       | 
       | Hierarchical Reasoning Model (HRM) is a novel approach using two
       | small neural networks recursing at different frequencies.
       | 
       | This biologically inspired method beats Large Language models
       | (LLMs) on hard puzzle tasks such as Sudoku, Maze, and ARC-AGI
       | while trained with small models (27M parameters) on small data
       | (around 1000 examples). HRM holds great promise for solving hard
       | problems with small networks, but it is not yet well understood
       | and may be suboptimal.
       | 
       | We propose Tiny Recursive Model (TRM), a much simpler recursive
       | reasoning approach that achieves significantly higher
       | generalization than HRM, while using a single tiny network with
       | only 2 layers.
       | 
       | With only 7M parameters, TRM obtains 45% test-accuracy on ARC-
       | AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek
       | R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the
       | parameters.
        
         | SeanAnderson wrote:
         | "With only 7M parameters, TRM obtains 45% test-accuracy on ARC-
         | AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g.,
         | Deepseek R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of
         | the parameters."
         | 
         | Well, that's pretty compelling when taken in isolation. I
         | wonder what the catch is?
        
           | esafak wrote:
           | It won't be any good at factual questions, for a start; it
           | will be reliant on an external memory. Everything would have
           | to be reasoned from first principles, without knowledge.
           | 
           | My gut feeling is that this will limits its capability,
           | because creativity and intelligence involve connecting
           | disparate things, and to do that you need to know them first.
           | Though philosophers have tried, you can't unravel the
           | mysteries of the universe through reasoning alone. You need
           | observations, facts.
           | 
           | What I could see it good for is a dedicated reasoning module.
        
             | Grosvenor wrote:
             | That's been my expectation from the start.
             | 
             | We'll need a memory system, an executive function/reasoning
             | system as well as some sort of sense integration -
             | auditory, visual, text in the case of LLMs, symbolic
             | probably.
             | 
             | A good avenue of research would be to see if you could glue
             | opencyc to this for external "knowledge".
             | 
             | LLM's are fundamentally a dead end.
             | 
             | Github link:
             | https://github.com/SamsungSAILMontreal/TinyRecursiveModels
        
             | ivape wrote:
             | Should it be a larger frontier model, with this as a tool
             | call (tool call another llm) to verify the larger one?
             | 
             | Why not go nuts with it and put it in the speculative
             | decoding algorithm.
        
               | baq wrote:
               | If we could somehow weave in a reasoning tool directly
               | into the inference process, without having to use the
               | context for it, that's be something. Perhaps compile to
               | weights and pretend this part is pretrained...? No idea
               | if it's feasible, but it'd definitely be a breakthrough
               | if AI had access to z3 in hidden layers.
        
             | js8 wrote:
             | Basic english is about 2000 words. So a small scale LLM
             | that would be capable of reasoning in basic english, and
             | transforming a problem in normal english to basic english
             | by automatically including the relevant word/phrase
             | definitions from a dictionary, could easily beat a large
             | LLM (by being more consistent).
             | 
             | I think this is where all reasoning problems of LLMs will
             | end up. We will use LM to transform problem in informal
             | english (human language) into a formal logical language
             | (possibly fuzzy and modal), from that possibly into an even
             | simpler logic, then we will solve the problem in the
             | logical domain using traditional reasoning approaches, and
             | convert the answer back to informal english. That way, you
             | won't need to run a large model during the reasoning.
             | Larger models will be only useful as a fuzzy K-V stores
             | (attention mechanism) to help drive heuristics during
             | reasoning search.
             | 
             | I suspect the biggest obstacle to AGI is philosophical, we
             | don't really have a good grasp/formalization of
             | human/fuzzy/modal epistemology. Even if you look at
             | formalization of mathematics, it's mostly about proofs, but
             | we lack understanding what is e.g. an interesting
             | mathematical problem, or how to even express in formal
             | logic that something is a problem, or that experiments
             | suggest something, that one model has an advantage over the
             | other in this respect, that there is a certain cost
             | associated with testing a hypothesis etc. Once we figure
             | out what we actually want in epistemology, I am sure the
             | algorithm required will be greatly reduced.
        
       | briandw wrote:
       | " With only 7M parameters, TRM obtains 45% test-accuracy on ARC-
       | AGI- 1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek
       | R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the
       | parameters"
       | 
       | That is very impressive.
       | 
       | Side note: Superficially reminds me of Hierarchical Temporal
       | Memory from Jeff Hawkins "On Intelligence". Although this doesn't
       | have the sparsity aspect, its hierarchical and temporal aspects
       | are related.
       | 
       | https://en.wikipedia.org/wiki/Hierarchical_temporal_memory
       | https://www.numenta.com
        
         | java-man wrote:
         | I suspect the lack of sparsity is an Achilles' heel of the
         | current LLM approach.
        
       | infogulch wrote:
       | So what happens when we figure out how to 10x both scale and
       | throughput on existing hardware by using it more efficiently?
       | Will gigantic models still be useful?
        
         | peterlk wrote:
         | Of course! We still have computers the size of mainframes that
         | ran on vacuum tubes. They are just built with vastly more
         | powerful hardware and are used for specialized tasks that
         | supercomputing facilities care about.
         | 
         | But it has the potential to alter the economics of AI quite
         | dramatically
        
       | Balinares wrote:
       | Wow, so not only are the findings from
       | https://arxiv.org/abs/2506.21734 (posted on HN a while back)
       | confirmed, they're generalizable? Intriguing. I wonder if this
       | will pan out in practical use cases, it'd be transformative.
       | 
       | Also would possibly instantly void the value of trillions of
       | pending AI datacenter capex, which would be funny. (Though
       | possibly not for very long.)
        
         | matthewfcarlson wrote:
         | It would be fitting if the AI bubble was popped by AI getting
         | too good and too efficient
        
         | ACCount37 wrote:
         | Any mention of "HRM" is incomplete without this analysis:
         | 
         | https://arcprize.org/blog/hrm-analysis
         | 
         | This here looks like a stripped down version of HRM - possibly
         | drawing on the ablation studies from this very analysis.
         | 
         | Worth noting that HRMs aren't generally applicable in the same
         | way normal transformer LLMs are. Or, at least, no one has found
         | a way to apply them to the typical generative AI tasks yet.
         | 
         | I'm still reading the paper, but I expect this version to be
         | similar - it uses the same tasks as HRMs as examples. Possibly
         | quite good at spatial reasoning tasks (ARC-AGI and ARC-AGI-2
         | are both spatial reasoning benchmarks), but it would have to be
         | integrated into a larger more generally capable architecture to
         | go past that.
        
           | shawntan wrote:
           | That analysis provided a very non-abrasive wording of their
           | evaluation of HRM and its contributions. The comparison with
           | a recursive / universal transformer on the same settings is
           | telling.
           | 
           | "These results suggest that the performance on ARC-AGI is not
           | an effect of the HRM architecture. While it does provide a
           | small benefit, a replacement baseline transformer in the HRM
           | training pipeline achieves comparable performance."
        
           | Balinares wrote:
           | That's a good read also shared by another poster above,
           | thanks! If I'm reading this right, it contextualizes, but
           | doesn't negate the findings from that paper.
           | 
           | I've got a major aesthetic problem with the fact LLMs require
           | this much training data to get where they are, namely, "not
           | there yet"; it's brute force by any other name, and just
           | plain kind of _vulgar_. Although more importantly it won 't
           | scale much further. Novel architectures will have to feature
           | in at some point, and I'll gladly take any positive result in
           | that direction.
        
             | ACCount37 wrote:
             | Evolution is brute force by any other name. Nothing elegant
             | about it. Nonetheless, here you are.
             | 
             | Poor sample efficiency of the current AIs is a well known
             | issue - but you should keep in mind what kind of grisly
             | process was required to give _you_ the architecture that
             | makes you as sample efficient as you are.
             | 
             | We don't know yet what kind of architectural quirks enable
             | this sample efficiency in the human brain. It could be
             | something like a non-random initialization process that
             | confers the right inductive biases, a more efficient
             | optimizer, recurrent background loops... or just more raw
             | juice.
             | 
             | It might be that one biological neuron is worth 10000 LLM
             | weights, and a big part of how the brain is so sample
             | efficient is that it's hilariously overparametrized.
        
         | ivape wrote:
         | _Also would possibly instantly void the value of trillions of
         | pending AI datacenter capex_
         | 
         | GPU compute is not just for text inferencing. The video
         | generation demand is something I don't think we'll ever
         | saturate for quite a while, even with breakthroughs.
        
           | mirekrusin wrote:
           | It doesn't matter how much compute you have, you'll always be
           | able to saturate it one way or another with ai and having
           | more compute will forever be an advantage.
           | 
           | If breakthrough in ai happens you'll get multiplied benefits,
           | not loss.
        
             | ivape wrote:
             | The "AI is hype" can't seem to wrap this idea around their
             | little heads for some reason.
        
         | baq wrote:
         | Jevon's paradox applies here IMHO. Cheaper AI/watt = more
         | demand.
        
         | lawlessone wrote:
         | >Also would possibly instantly void the value of trillions of
         | pending AI datacenter capex
         | 
         | I think they would just adopt this idea and use it to continue
         | training huge but more capable models.
        
       | shawntan wrote:
       | I think everyone should read the post from ARC-AGI organisers
       | about HRM carefully: https://arcprize.org/blog/hrm-analysis
       | 
       | With the same data augmentation / 'test time training' setting,
       | the vanilla Transformers do pretty well, close to the
       | "breakthrough" HRM reported. From a brief skim, this paper is
       | using similar settings to compare itself on ARC-AGI.
       | 
       | I too, want to believe in smaller models with excellent reasoning
       | performance. But first understand what ARC-AGI tests for, what
       | the general setting is -- the one that commercial LLMs use to
       | compare against each other -- and what the specialised setting
       | HRM and this paper uses as evaluation.
       | 
       | The naming of that benchmark lends itself to hype, as we've seen
       | in both HRM and this paper.
        
         | ACCount37 wrote:
         | Not exactly "vanilla Transformer", but rather "a Transformer-
         | like architecture with recurrence".
         | 
         | Which is still a fun idea to play around with - this approach
         | clearly has its strengths. But it doesn't appear to be an
         | actual "better Transformer". I don't think it deserves nearly
         | as much hype as it gets.
        
           | shawntan wrote:
           | Right. There should really be a vanilla Transformer baseline.
           | 
           | With recurrence: The idea has been around:
           | https://arxiv.org/abs/1807.03819
           | 
           | There are reasons why it hasn't really been picked up at
           | scale, and the method tends to do well on synthetic tasks.
        
       | guybedo wrote:
       | github https://github.com/SamsungSAILMontreal/TinyRecursiveModels
        
       | Timsky wrote:
       | If it is a recursive one, can it apply the induction and solve
       | the Towers of Hanoi beyond level six?
        
         | yorwba wrote:
         | You'll first need to frame Towers of Hanoi as a supervised
         | learning problem. I suspect the answer to your question will
         | differ depending on what you pick as the input-output pairs to
         | train the model on.
        
       | krychu wrote:
       | I implemented HRM for educational purposes and got good results
       | for path finding. But then I started to do ablation experiments
       | and came to the same conclusions as the ARC-AGI team (the HRM
       | architecture itself didn't play a big role):
       | https://github.com/krychu/hrm
       | 
       | This was a bit unfortunate. I think there is something in the
       | idea of latent space reasoning.
        
       ___________________________________________________________________
       (page generated 2025-10-07 23:00 UTC)