[HN Gopher] M1: Towards Scalable Test-Time Compute with Mamba Re...
___________________________________________________________________
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Author : dpstart01
Score : 26 points
Date : 2025-04-15 17:00 UTC (6 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| ed wrote:
| Interesting direction for research but not a model you'd want to
| use today. The paper looks at a 3b model built on llama3.2-3b,
| modified for mamba, and they're comparing to a distilled version
| of r1 with 1.5b params.
| solomatov wrote:
| Does anyone know if there were any attempts to test Mamba on
| really large scale? To me this model looks as the most promising
| successor to the transformer architecture. Does anyone know why
| it might not be the case or what are other alternatives?
___________________________________________________________________
(page generated 2025-04-15 23:01 UTC)