[HN Gopher] Less is more: Recursive reasoning with tiny networks
___________________________________________________________________
Less is more: Recursive reasoning with tiny networks
Paper: https://arxiv.org/abs/2510.04871, Code:
https://github.com/SamsungSAILMontreal/TinyRecursiveModels
Author : guybedo
Score : 135 points
Date : 2025-10-07 17:42 UTC (5 hours ago)
(HTM) web link (alexiajm.github.io)
(TXT) w3m dump (alexiajm.github.io)
| guybedo wrote:
| Abstract:
|
| Hierarchical Reasoning Model (HRM) is a novel approach using two
| small neural networks recursing at different frequencies.
|
| This biologically inspired method beats Large Language models
| (LLMs) on hard puzzle tasks such as Sudoku, Maze, and ARC-AGI
| while trained with small models (27M parameters) on small data
| (around 1000 examples). HRM holds great promise for solving hard
| problems with small networks, but it is not yet well understood
| and may be suboptimal.
|
| We propose Tiny Recursive Model (TRM), a much simpler recursive
| reasoning approach that achieves significantly higher
| generalization than HRM, while using a single tiny network with
| only 2 layers.
|
| With only 7M parameters, TRM obtains 45% test-accuracy on ARC-
| AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek
| R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the
| parameters.
| SeanAnderson wrote:
| "With only 7M parameters, TRM obtains 45% test-accuracy on ARC-
| AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g.,
| Deepseek R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of
| the parameters."
|
| Well, that's pretty compelling when taken in isolation. I
| wonder what the catch is?
| esafak wrote:
| It won't be any good at factual questions, for a start; it
| will be reliant on an external memory. Everything would have
| to be reasoned from first principles, without knowledge.
|
| My gut feeling is that this will limits its capability,
| because creativity and intelligence involve connecting
| disparate things, and to do that you need to know them first.
| Though philosophers have tried, you can't unravel the
| mysteries of the universe through reasoning alone. You need
| observations, facts.
|
| What I could see it good for is a dedicated reasoning module.
| Grosvenor wrote:
| That's been my expectation from the start.
|
| We'll need a memory system, an executive function/reasoning
| system as well as some sort of sense integration -
| auditory, visual, text in the case of LLMs, symbolic
| probably.
|
| A good avenue of research would be to see if you could glue
| opencyc to this for external "knowledge".
|
| LLM's are fundamentally a dead end.
|
| Github link:
| https://github.com/SamsungSAILMontreal/TinyRecursiveModels
| ivape wrote:
| Should it be a larger frontier model, with this as a tool
| call (tool call another llm) to verify the larger one?
|
| Why not go nuts with it and put it in the speculative
| decoding algorithm.
| baq wrote:
| If we could somehow weave in a reasoning tool directly
| into the inference process, without having to use the
| context for it, that's be something. Perhaps compile to
| weights and pretend this part is pretrained...? No idea
| if it's feasible, but it'd definitely be a breakthrough
| if AI had access to z3 in hidden layers.
| js8 wrote:
| Basic english is about 2000 words. So a small scale LLM
| that would be capable of reasoning in basic english, and
| transforming a problem in normal english to basic english
| by automatically including the relevant word/phrase
| definitions from a dictionary, could easily beat a large
| LLM (by being more consistent).
|
| I think this is where all reasoning problems of LLMs will
| end up. We will use LM to transform problem in informal
| english (human language) into a formal logical language
| (possibly fuzzy and modal), from that possibly into an even
| simpler logic, then we will solve the problem in the
| logical domain using traditional reasoning approaches, and
| convert the answer back to informal english. That way, you
| won't need to run a large model during the reasoning.
| Larger models will be only useful as a fuzzy K-V stores
| (attention mechanism) to help drive heuristics during
| reasoning search.
|
| I suspect the biggest obstacle to AGI is philosophical, we
| don't really have a good grasp/formalization of
| human/fuzzy/modal epistemology. Even if you look at
| formalization of mathematics, it's mostly about proofs, but
| we lack understanding what is e.g. an interesting
| mathematical problem, or how to even express in formal
| logic that something is a problem, or that experiments
| suggest something, that one model has an advantage over the
| other in this respect, that there is a certain cost
| associated with testing a hypothesis etc. Once we figure
| out what we actually want in epistemology, I am sure the
| algorithm required will be greatly reduced.
| briandw wrote:
| " With only 7M parameters, TRM obtains 45% test-accuracy on ARC-
| AGI- 1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek
| R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the
| parameters"
|
| That is very impressive.
|
| Side note: Superficially reminds me of Hierarchical Temporal
| Memory from Jeff Hawkins "On Intelligence". Although this doesn't
| have the sparsity aspect, its hierarchical and temporal aspects
| are related.
|
| https://en.wikipedia.org/wiki/Hierarchical_temporal_memory
| https://www.numenta.com
| java-man wrote:
| I suspect the lack of sparsity is an Achilles' heel of the
| current LLM approach.
| infogulch wrote:
| So what happens when we figure out how to 10x both scale and
| throughput on existing hardware by using it more efficiently?
| Will gigantic models still be useful?
| peterlk wrote:
| Of course! We still have computers the size of mainframes that
| ran on vacuum tubes. They are just built with vastly more
| powerful hardware and are used for specialized tasks that
| supercomputing facilities care about.
|
| But it has the potential to alter the economics of AI quite
| dramatically
| Balinares wrote:
| Wow, so not only are the findings from
| https://arxiv.org/abs/2506.21734 (posted on HN a while back)
| confirmed, they're generalizable? Intriguing. I wonder if this
| will pan out in practical use cases, it'd be transformative.
|
| Also would possibly instantly void the value of trillions of
| pending AI datacenter capex, which would be funny. (Though
| possibly not for very long.)
| matthewfcarlson wrote:
| It would be fitting if the AI bubble was popped by AI getting
| too good and too efficient
| ACCount37 wrote:
| Any mention of "HRM" is incomplete without this analysis:
|
| https://arcprize.org/blog/hrm-analysis
|
| This here looks like a stripped down version of HRM - possibly
| drawing on the ablation studies from this very analysis.
|
| Worth noting that HRMs aren't generally applicable in the same
| way normal transformer LLMs are. Or, at least, no one has found
| a way to apply them to the typical generative AI tasks yet.
|
| I'm still reading the paper, but I expect this version to be
| similar - it uses the same tasks as HRMs as examples. Possibly
| quite good at spatial reasoning tasks (ARC-AGI and ARC-AGI-2
| are both spatial reasoning benchmarks), but it would have to be
| integrated into a larger more generally capable architecture to
| go past that.
| shawntan wrote:
| That analysis provided a very non-abrasive wording of their
| evaluation of HRM and its contributions. The comparison with
| a recursive / universal transformer on the same settings is
| telling.
|
| "These results suggest that the performance on ARC-AGI is not
| an effect of the HRM architecture. While it does provide a
| small benefit, a replacement baseline transformer in the HRM
| training pipeline achieves comparable performance."
| Balinares wrote:
| That's a good read also shared by another poster above,
| thanks! If I'm reading this right, it contextualizes, but
| doesn't negate the findings from that paper.
|
| I've got a major aesthetic problem with the fact LLMs require
| this much training data to get where they are, namely, "not
| there yet"; it's brute force by any other name, and just
| plain kind of _vulgar_. Although more importantly it won 't
| scale much further. Novel architectures will have to feature
| in at some point, and I'll gladly take any positive result in
| that direction.
| ACCount37 wrote:
| Evolution is brute force by any other name. Nothing elegant
| about it. Nonetheless, here you are.
|
| Poor sample efficiency of the current AIs is a well known
| issue - but you should keep in mind what kind of grisly
| process was required to give _you_ the architecture that
| makes you as sample efficient as you are.
|
| We don't know yet what kind of architectural quirks enable
| this sample efficiency in the human brain. It could be
| something like a non-random initialization process that
| confers the right inductive biases, a more efficient
| optimizer, recurrent background loops... or just more raw
| juice.
|
| It might be that one biological neuron is worth 10000 LLM
| weights, and a big part of how the brain is so sample
| efficient is that it's hilariously overparametrized.
| ivape wrote:
| _Also would possibly instantly void the value of trillions of
| pending AI datacenter capex_
|
| GPU compute is not just for text inferencing. The video
| generation demand is something I don't think we'll ever
| saturate for quite a while, even with breakthroughs.
| mirekrusin wrote:
| It doesn't matter how much compute you have, you'll always be
| able to saturate it one way or another with ai and having
| more compute will forever be an advantage.
|
| If breakthrough in ai happens you'll get multiplied benefits,
| not loss.
| ivape wrote:
| The "AI is hype" can't seem to wrap this idea around their
| little heads for some reason.
| baq wrote:
| Jevon's paradox applies here IMHO. Cheaper AI/watt = more
| demand.
| lawlessone wrote:
| >Also would possibly instantly void the value of trillions of
| pending AI datacenter capex
|
| I think they would just adopt this idea and use it to continue
| training huge but more capable models.
| shawntan wrote:
| I think everyone should read the post from ARC-AGI organisers
| about HRM carefully: https://arcprize.org/blog/hrm-analysis
|
| With the same data augmentation / 'test time training' setting,
| the vanilla Transformers do pretty well, close to the
| "breakthrough" HRM reported. From a brief skim, this paper is
| using similar settings to compare itself on ARC-AGI.
|
| I too, want to believe in smaller models with excellent reasoning
| performance. But first understand what ARC-AGI tests for, what
| the general setting is -- the one that commercial LLMs use to
| compare against each other -- and what the specialised setting
| HRM and this paper uses as evaluation.
|
| The naming of that benchmark lends itself to hype, as we've seen
| in both HRM and this paper.
| ACCount37 wrote:
| Not exactly "vanilla Transformer", but rather "a Transformer-
| like architecture with recurrence".
|
| Which is still a fun idea to play around with - this approach
| clearly has its strengths. But it doesn't appear to be an
| actual "better Transformer". I don't think it deserves nearly
| as much hype as it gets.
| shawntan wrote:
| Right. There should really be a vanilla Transformer baseline.
|
| With recurrence: The idea has been around:
| https://arxiv.org/abs/1807.03819
|
| There are reasons why it hasn't really been picked up at
| scale, and the method tends to do well on synthetic tasks.
| guybedo wrote:
| github https://github.com/SamsungSAILMontreal/TinyRecursiveModels
| Timsky wrote:
| If it is a recursive one, can it apply the induction and solve
| the Towers of Hanoi beyond level six?
| yorwba wrote:
| You'll first need to frame Towers of Hanoi as a supervised
| learning problem. I suspect the answer to your question will
| differ depending on what you pick as the input-output pairs to
| train the model on.
| krychu wrote:
| I implemented HRM for educational purposes and got good results
| for path finding. But then I started to do ablation experiments
| and came to the same conclusions as the ARC-AGI team (the HRM
| architecture itself didn't play a big role):
| https://github.com/krychu/hrm
|
| This was a bit unfortunate. I think there is something in the
| idea of latent space reasoning.
___________________________________________________________________
(page generated 2025-10-07 23:00 UTC)