[HN Gopher] Smollm3: Smol, multilingual, long-context reasoner LLM
___________________________________________________________________
Smollm3: Smol, multilingual, long-context reasoner LLM
Author : kashifr
Score : 186 points
Date : 2025-07-08 16:13 UTC (6 hours ago)
(HTM) web link (huggingface.co)
(TXT) w3m dump (huggingface.co)
| gardnr wrote:
| It's small (3B) and does great on benchmarks. This is a model for
| edge / mobile deployments so the gains over gemma3-4b are
| meaningful. It has dual mode reasoning / non_reasoning AND they
| released the full training method:
|
| > We're releasing SmolLM3 with our engineering blueprint. It
| includes architecture details, exact data mixtures showing how we
| progressively boost performance across domains in a three-stage
| pretraining approach, and the methodology for building a hybrid
| reasoning model. Usually, achieving these results would require
| months of reverse engineering. Instead, we're providing the full
| methodology.
| tiahura wrote:
| Can anyone estimate how much of the 3B is necessitated by multi-
| language support?
| rockinghigh wrote:
| The vocabulary size is fairly small (128,256) for a
| multilingual model. I would guess it doesn't require many
| additional parameters to support these 5 languages as many
| tokens can be shared.
| nateb2022 wrote:
| https://web.archive.org/web/20250708164705/https://huggingfa...
| _1 wrote:
| Which small model is good for fine tuning to various enterprise
| data sets? Our business units are wanting to run small models in
| browser and on mobile devices, without dealing with RAG and cloud
| resources.
| mhitza wrote:
| You really need to try them all out yourself and make sure you
| have proper benchmarks.
|
| While machine learning is not my field, I've tried to finetune
| Mistral 7B (following their official guide and toolset) and the
| results did not satisfy. Had a few very specific questions from
| the dataset that no matter how much I've finetuned and tweaked
| the process it was not able to respond with correct
| information.
|
| A mix of vector search + keyword search is still better at
| building the right question context than expecting it to learn
| all the information.
|
| I've used the pretrained dataset approach. Maybe building
| syntethic questions and answers around the dataset yields
| better results but I didn't have time to experiment with that
| approach.
| ivape wrote:
| How much data did you use to fine tune?
| mhitza wrote:
| Kilobytes to megabytes of data. I was trying to fine-tune
| it for some specific legislation I was expecting to be able
| afterwards to ask about.
| gardnr wrote:
| Small models are bad at knowing things. Trying to train
| knowledge in to small models is probably not the way you want
| to go. You could try building an offline embedded RAG system
| that is deployable as wasm. Some folks have been experiencing
| success with this.
| _1 wrote:
| We do use WebLLM and a hosted Weaviate database, but there
| are complaints about speed (both retrieval and time to first
| token as the context will get big). The Gemma 3n "nesting
| doll" approach sounds like it could be useful .. but haven't
| found anyone specifically doing it to add domain specific
| knowledge.
| janalsncm wrote:
| Typically retrieval is the fast part in my experience. Have
| you considered cheaper retrieval methods? Bm25 does pretty
| well on its own. And you can augment your dataset by
| precomputing relevant queries for each doc.
| simonw wrote:
| What are you hoping to achieve by fine-tuning a model in this
| way?
| netdur wrote:
| I have fine-tuned Gemma 3N 2B and it's pretty good, but loads
| slow on my S23U, once it's loaded though, it works fine
|
| Also tried SmolVLM 256M and 500M, they load faster and you can
| embed them in assets, they work if you know what you're doing
|
| Just keep in mind that smaller models don't perform as well due
| to their limited parameters
|
| Also on Android, since you can't ship files larger than 2GB due
| to Java compression issues, you need to download models
| separately, then you can't load the model from the download
| folder, you have to copy it into the app's own folder, this
| means a Gemma 3N 2B model that's 3.14 GB would need at least 7
| GB of free space on the user's phone
| WhitneyLand wrote:
| Mostly SOTA performance at the 3B level. A notable addition to
| the small but truly open club of models that provide full
| disclosure, code, recipes to reproduce their work.
|
| Looks like ballpark a million dollars of GPU time if you want to
| train up one for yourself (4000 gpus/24 days).
|
| Very nice write up that's generous in sharing their learnings.
|
| This is a solid and positive contribution.
| YetAnotherNick wrote:
| It's 384 H100s for 24 days, costing less than half a million
| dollars.
| Imustaskforhelp wrote:
| Pardon me, but is the dataset public.
|
| Like if I really really just wanted to build it from scratch,
| could I do so? (not that I have that money but just curious)
| hynky wrote:
| yes, both core web datasets are publicly available as well
| as the rest
| Imustaskforhelp wrote:
| Thanks!
|
| To be honest, if I might argue then that this is one of
| the best truly open source models that we have got.
|
| There is AllenAI and (Elmo?) and there is also this one
| which does distributed training but I think this looks a
| lot like SOTA for 3B parameters to me.
|
| Thanks for telling me, I am not going to lie, I am going
| to try to test it now! (Ima try some GGUF since ollama
| convenience)
| peatmoss wrote:
| OLMo: https://allenai.org/olmo
|
| AFAIK, they were the first open everything model.
| refulgentis wrote:
| I spent about 10 minutes this AM cross-checking with Phi-4-mini
| benchmarks, as it was very odd to not include the leader in
| benchmarks and it seemed universally behind.
|
| For context, I dev an LLM client, a core tenant is keeping
| local as close to cloud parity as much as is possible. (via
| llama.cpp)
|
| Companies aren't taking local AI seriously on a _sustained_
| basis outside Microsoft.
|
| Overall, I usually would bite my tongue. HF is a great citizen,
| and I doubt this'll be a one off. However, when I see
| superlatives affirmed, while leaving out the local SoTA for
| many many moons that is a godsend in this sector, I think it is
| good to, rather than shy away, stand up and say this.
| bitwize wrote:
| There's a British comedy skit lurking in here.
|
| "So it's a small large language model?"
|
| "Oh yes, very small."
|
| "How can it be small and large at the same time?"
|
| "Well, it's small by the standards of a large language model."
|
| "So it's large."
|
| "Oh yes, very large."
|
| "Large compared to what?"
|
| "Small language models."
|
| "And so something like ChatGPT, what would that be exactly? A
| _large_ large language model? "
|
| "Yes, precisely. An LLLM."
| netdur wrote:
| it's big little planet or small big planet?
| janalsncm wrote:
| Standards have shifted as well. Gpt2 used to be considered
| "large" but it is half the size of this. Oh and also Sam Altman
| said it was too dangerous to release. At this point I consider
| anything too big to run on consumer grade hardware to be large,
| but an exact definition is a little silly to argue about.
| a_wild_dandan wrote:
| Altman released GPT-2 despite expressing that doing so was a
| bad idea? That's wild.
| Alifatisk wrote:
| I think Altman meant it's too dangerous to open-source
| GPT-2, therefore locked it in behind a service.
| papichulo2023 wrote:
| Do not mess with the Miniature giant space hamsters
| msgodel wrote:
| Wow. Close to a Qwen3 distill with 75% the size. That's great!
|
| I've been using the smollm base models for my own finetunes just
| because they're so high quality, it looks like I might be using
| them to drive local agents/code completion in the near future
| too.
|
| Their RL algorithm looks interesting. I'm still using OpenAI's
| algorithm for my stuff, I've been meaning to check on the SoTA
| since I know my code is pretty outdated (It's crazy how fast that
| happens with this stuff.)
| gdiamos wrote:
| Nice work anton et al.
|
| I hope you continue the 50-100M parameter models.
|
| I think there is a case for models that finish fast on CPUs in
| solve by llm test cases.
| eachro wrote:
| From what I've heard, the llama3 models are fairly easy to fine-
| tune (please correct me if I'm wrong or if there are more
| amenable models here). How easy is it to finetune smollm3? I know
| a lot of the MoE LLMs have been quite fickle in this regard.
| BarakWidawsky wrote:
| It's interesting that it looks like they didn't apply their own
| RL to the model, and instead fine tuned on reasoning traces from
| large datasets and generating reasoning traces from larger models
| lewtun wrote:
| Indeed we opted for offline methods like Anchored Preference
| Optimization as we found in the Open R1 project that doing
| multi-task RL on small models is quite a hassle to get right.
| With offline methods, you focus much more on dataset curation /
| generation, but that still provides faster iteration cycles for
| the model scale we're dealing with!
| ivape wrote:
| Looks like it's the 3B models that are being shipped out to on
| device by default. Apple's on-device LLM is 3B, and I believe
| Canary is shipping Google nano:
|
| https://developer.chrome.com/docs/ai/rewriter-api
| ivape wrote:
| I wonder if this will be cheaper than llama 3.1 8b on OpenRouter.
| danielhanchen wrote:
| I fixed some chat template issues for llama.cpp and other
| inference engines! To run it, do:
|
| ./llama.cpp/llama-cli -hf unsloth/SmolLM3-3B-GGUF:Q4_K_XL --jinja
| -ngl 99
___________________________________________________________________
(page generated 2025-07-08 23:00 UTC)