[HN Gopher] Smollm3: Smol, multilingual, long-context reasoner LLM
       ___________________________________________________________________
        
       Smollm3: Smol, multilingual, long-context reasoner LLM
        
       Author : kashifr
       Score  : 186 points
       Date   : 2025-07-08 16:13 UTC (6 hours ago)
        
 (HTM) web link (huggingface.co)
 (TXT) w3m dump (huggingface.co)
        
       | gardnr wrote:
       | It's small (3B) and does great on benchmarks. This is a model for
       | edge / mobile deployments so the gains over gemma3-4b are
       | meaningful. It has dual mode reasoning / non_reasoning AND they
       | released the full training method:
       | 
       | > We're releasing SmolLM3 with our engineering blueprint. It
       | includes architecture details, exact data mixtures showing how we
       | progressively boost performance across domains in a three-stage
       | pretraining approach, and the methodology for building a hybrid
       | reasoning model. Usually, achieving these results would require
       | months of reverse engineering. Instead, we're providing the full
       | methodology.
        
       | tiahura wrote:
       | Can anyone estimate how much of the 3B is necessitated by multi-
       | language support?
        
         | rockinghigh wrote:
         | The vocabulary size is fairly small (128,256) for a
         | multilingual model. I would guess it doesn't require many
         | additional parameters to support these 5 languages as many
         | tokens can be shared.
        
       | nateb2022 wrote:
       | https://web.archive.org/web/20250708164705/https://huggingfa...
        
       | _1 wrote:
       | Which small model is good for fine tuning to various enterprise
       | data sets? Our business units are wanting to run small models in
       | browser and on mobile devices, without dealing with RAG and cloud
       | resources.
        
         | mhitza wrote:
         | You really need to try them all out yourself and make sure you
         | have proper benchmarks.
         | 
         | While machine learning is not my field, I've tried to finetune
         | Mistral 7B (following their official guide and toolset) and the
         | results did not satisfy. Had a few very specific questions from
         | the dataset that no matter how much I've finetuned and tweaked
         | the process it was not able to respond with correct
         | information.
         | 
         | A mix of vector search + keyword search is still better at
         | building the right question context than expecting it to learn
         | all the information.
         | 
         | I've used the pretrained dataset approach. Maybe building
         | syntethic questions and answers around the dataset yields
         | better results but I didn't have time to experiment with that
         | approach.
        
           | ivape wrote:
           | How much data did you use to fine tune?
        
             | mhitza wrote:
             | Kilobytes to megabytes of data. I was trying to fine-tune
             | it for some specific legislation I was expecting to be able
             | afterwards to ask about.
        
         | gardnr wrote:
         | Small models are bad at knowing things. Trying to train
         | knowledge in to small models is probably not the way you want
         | to go. You could try building an offline embedded RAG system
         | that is deployable as wasm. Some folks have been experiencing
         | success with this.
        
           | _1 wrote:
           | We do use WebLLM and a hosted Weaviate database, but there
           | are complaints about speed (both retrieval and time to first
           | token as the context will get big). The Gemma 3n "nesting
           | doll" approach sounds like it could be useful .. but haven't
           | found anyone specifically doing it to add domain specific
           | knowledge.
        
             | janalsncm wrote:
             | Typically retrieval is the fast part in my experience. Have
             | you considered cheaper retrieval methods? Bm25 does pretty
             | well on its own. And you can augment your dataset by
             | precomputing relevant queries for each doc.
        
         | simonw wrote:
         | What are you hoping to achieve by fine-tuning a model in this
         | way?
        
         | netdur wrote:
         | I have fine-tuned Gemma 3N 2B and it's pretty good, but loads
         | slow on my S23U, once it's loaded though, it works fine
         | 
         | Also tried SmolVLM 256M and 500M, they load faster and you can
         | embed them in assets, they work if you know what you're doing
         | 
         | Just keep in mind that smaller models don't perform as well due
         | to their limited parameters
         | 
         | Also on Android, since you can't ship files larger than 2GB due
         | to Java compression issues, you need to download models
         | separately, then you can't load the model from the download
         | folder, you have to copy it into the app's own folder, this
         | means a Gemma 3N 2B model that's 3.14 GB would need at least 7
         | GB of free space on the user's phone
        
       | WhitneyLand wrote:
       | Mostly SOTA performance at the 3B level. A notable addition to
       | the small but truly open club of models that provide full
       | disclosure, code, recipes to reproduce their work.
       | 
       | Looks like ballpark a million dollars of GPU time if you want to
       | train up one for yourself (4000 gpus/24 days).
       | 
       | Very nice write up that's generous in sharing their learnings.
       | 
       | This is a solid and positive contribution.
        
         | YetAnotherNick wrote:
         | It's 384 H100s for 24 days, costing less than half a million
         | dollars.
        
           | Imustaskforhelp wrote:
           | Pardon me, but is the dataset public.
           | 
           | Like if I really really just wanted to build it from scratch,
           | could I do so? (not that I have that money but just curious)
        
             | hynky wrote:
             | yes, both core web datasets are publicly available as well
             | as the rest
        
               | Imustaskforhelp wrote:
               | Thanks!
               | 
               | To be honest, if I might argue then that this is one of
               | the best truly open source models that we have got.
               | 
               | There is AllenAI and (Elmo?) and there is also this one
               | which does distributed training but I think this looks a
               | lot like SOTA for 3B parameters to me.
               | 
               | Thanks for telling me, I am not going to lie, I am going
               | to try to test it now! (Ima try some GGUF since ollama
               | convenience)
        
               | peatmoss wrote:
               | OLMo: https://allenai.org/olmo
               | 
               | AFAIK, they were the first open everything model.
        
         | refulgentis wrote:
         | I spent about 10 minutes this AM cross-checking with Phi-4-mini
         | benchmarks, as it was very odd to not include the leader in
         | benchmarks and it seemed universally behind.
         | 
         | For context, I dev an LLM client, a core tenant is keeping
         | local as close to cloud parity as much as is possible. (via
         | llama.cpp)
         | 
         | Companies aren't taking local AI seriously on a _sustained_
         | basis outside Microsoft.
         | 
         | Overall, I usually would bite my tongue. HF is a great citizen,
         | and I doubt this'll be a one off. However, when I see
         | superlatives affirmed, while leaving out the local SoTA for
         | many many moons that is a godsend in this sector, I think it is
         | good to, rather than shy away, stand up and say this.
        
       | bitwize wrote:
       | There's a British comedy skit lurking in here.
       | 
       | "So it's a small large language model?"
       | 
       | "Oh yes, very small."
       | 
       | "How can it be small and large at the same time?"
       | 
       | "Well, it's small by the standards of a large language model."
       | 
       | "So it's large."
       | 
       | "Oh yes, very large."
       | 
       | "Large compared to what?"
       | 
       | "Small language models."
       | 
       | "And so something like ChatGPT, what would that be exactly? A
       | _large_ large language model? "
       | 
       | "Yes, precisely. An LLLM."
        
         | netdur wrote:
         | it's big little planet or small big planet?
        
         | janalsncm wrote:
         | Standards have shifted as well. Gpt2 used to be considered
         | "large" but it is half the size of this. Oh and also Sam Altman
         | said it was too dangerous to release. At this point I consider
         | anything too big to run on consumer grade hardware to be large,
         | but an exact definition is a little silly to argue about.
        
           | a_wild_dandan wrote:
           | Altman released GPT-2 despite expressing that doing so was a
           | bad idea? That's wild.
        
             | Alifatisk wrote:
             | I think Altman meant it's too dangerous to open-source
             | GPT-2, therefore locked it in behind a service.
        
         | papichulo2023 wrote:
         | Do not mess with the Miniature giant space hamsters
        
       | msgodel wrote:
       | Wow. Close to a Qwen3 distill with 75% the size. That's great!
       | 
       | I've been using the smollm base models for my own finetunes just
       | because they're so high quality, it looks like I might be using
       | them to drive local agents/code completion in the near future
       | too.
       | 
       | Their RL algorithm looks interesting. I'm still using OpenAI's
       | algorithm for my stuff, I've been meaning to check on the SoTA
       | since I know my code is pretty outdated (It's crazy how fast that
       | happens with this stuff.)
        
       | gdiamos wrote:
       | Nice work anton et al.
       | 
       | I hope you continue the 50-100M parameter models.
       | 
       | I think there is a case for models that finish fast on CPUs in
       | solve by llm test cases.
        
       | eachro wrote:
       | From what I've heard, the llama3 models are fairly easy to fine-
       | tune (please correct me if I'm wrong or if there are more
       | amenable models here). How easy is it to finetune smollm3? I know
       | a lot of the MoE LLMs have been quite fickle in this regard.
        
       | BarakWidawsky wrote:
       | It's interesting that it looks like they didn't apply their own
       | RL to the model, and instead fine tuned on reasoning traces from
       | large datasets and generating reasoning traces from larger models
        
         | lewtun wrote:
         | Indeed we opted for offline methods like Anchored Preference
         | Optimization as we found in the Open R1 project that doing
         | multi-task RL on small models is quite a hassle to get right.
         | With offline methods, you focus much more on dataset curation /
         | generation, but that still provides faster iteration cycles for
         | the model scale we're dealing with!
        
       | ivape wrote:
       | Looks like it's the 3B models that are being shipped out to on
       | device by default. Apple's on-device LLM is 3B, and I believe
       | Canary is shipping Google nano:
       | 
       | https://developer.chrome.com/docs/ai/rewriter-api
        
       | ivape wrote:
       | I wonder if this will be cheaper than llama 3.1 8b on OpenRouter.
        
       | danielhanchen wrote:
       | I fixed some chat template issues for llama.cpp and other
       | inference engines! To run it, do:
       | 
       | ./llama.cpp/llama-cli -hf unsloth/SmolLM3-3B-GGUF:Q4_K_XL --jinja
       | -ngl 99
        
       ___________________________________________________________________
       (page generated 2025-07-08 23:00 UTC)