[HN Gopher] Devstral
___________________________________________________________________
Devstral
Author : mfiguiere
Score : 314 points
Date : 2025-05-21 14:21 UTC (8 hours ago)
(HTM) web link (mistral.ai)
(TXT) w3m dump (mistral.ai)
| AnhTho_FR wrote:
| Impressive performance!
| ddtaylor wrote:
| Wow. I was just grabbing some models and I happened to see this
| one while I was messing with tool support in LLamaIndex. I have
| an agentic coding thing I threw together and I have been trying
| different models on it and was looking to throw ReAct at it to
| bring in some models that don't have tool support and this just
| pops into existence!
|
| I'm not able to get my agentic system to use this model though as
| it just says "I don't have the tools to do this". I tried
| modifying various agent prompts to explicitly say "Use foo tool
| to do bar" without any luck yet. All of the ToolSpec that I use
| are annotated etc. Pydantic objects and every other model has
| figured out how to use these tools.
| tough wrote:
| you can use constrained outptus for enforcing tool schemas any
| model can get it with a lil help
| abrowne2 wrote:
| Curious to check this out, since they say it can run on a 4090 /
| Mac with >32 GB of RAM.
| ddtaylor wrote:
| I can run it without issue on a 6800 XT with 64GB of RAM.
| yencabulator wrote:
| "Can run" is pretty easy, it's pretty small and quantized. It
| runs at 3.7 tokens/second on pure CPU with AMD 8945HS.
| simonw wrote:
| The first number I look at these days is the file size via
| Ollama, which for this model is 14GB
| https://ollama.com/library/devstral/tags
|
| I find that on my M2 Mac that number is a rough approximation to
| how much memory the model needs (usually plus about 10%) - which
| matters because I want to know how much RAM I will have left for
| running other applications.
|
| Anything below 20GB tends not to interfere with the other stuff
| I'm running too much. This model looks promising!
| lis wrote:
| Yes, I agree. I've just ran the model locally and it's making a
| good impression. I've tested it with some ruby/rspec gotchas,
| which it handled nicely.
|
| I'll give it a try with aider to test the large context as
| well.
| ericb wrote:
| In ollama, how do you set up the larger context, and figure
| out what settings to use? I've yet to find a good guide. I'm
| also not quite sure how I should figure out what those
| settings should be for each model.
|
| There's context length, but then, how does that relate to
| input length and output length? Should I just make the
| numbers match? 32k is 32k? Any pointers?
| lis wrote:
| For aider and ollama, see:
| https://aider.chat/docs/llms/ollama.html
|
| Just for ollama, see: https://github.com/ollama/ollama/blob
| /main/docs/faq.md#how-c...
|
| I'm using llama.cpp though, so I can't confirm these
| methods.
| nico wrote:
| Are you using it with aider? If so, how has your
| experience been?
| nico wrote:
| Any agentic dev software you could recommend that runs well
| with local models?
|
| I've been using Cursor and I'm kind of disappointed. I get
| better results just going back and forth between the editor and
| ChatGPT
|
| I tried localforge and aider, but they are kinda slow with
| local models
| jabroni_salad wrote:
| Do you have any other interface for the model? what kind of
| tokens/sec are you getting?
|
| Try hooking aider up to gemini and see how the speed is. I
| have noticed that people in the localllama scene do not like
| to talk about their TPS.
| nico wrote:
| The models feel pretty snappy when interacting with them
| directly via ollama, not sure about the TPS
|
| However I've also ran into 2 things: 1) most models don't
| support tools, sometimes it's hard to find a version of the
| model that correctly uses tools, 2) even with good TPS,
| since the agents are usually doing chain-of-thought and
| running multiple chained prompts, the experience feels slow
| - this is even true with Cursor using their models/apis
| gyudin wrote:
| Super weird benchmarks
| avereveard wrote:
| from what I gather it's finetuned to use OpenHand specifically
| so shows value on thsoe benchmark that target a whole system as
| a blackbox (i.e. agent + llm) more than directly target the llm
| input/outputs
| amarcheschi wrote:
| Yup the 1st comment says this https://www.reddit.com/r/LocalL
| LaMA/comments/1kryybf/mistral...
| solomatov wrote:
| It's very nice that it has the Apache 2.0 license, i.e. well
| understood license, instead of some "open weight" license with a
| lot of conditions.
| resource_waste wrote:
| This is basically the Mistral niche. If you are doing something
| generally perceived as ethical, you would use Gemma 3 IMO. When
| you aren't... well there are Apache licensed LLMs for you.
| solomatov wrote:
| IMO, it's not about ethics, it's about legal risks. What if
| you want to fine tune a model on output related to your
| usage? Then my understanding is that all these derivatives
| need to be under the same license. What if G will change
| their prohibited use policy (the first line there is that
| they could update it from time to time)? There's really crazy
| stuff in terms of use of some services, what if G adds
| something in the same tune there which basically makes your
| application impossible.
|
| P.S. I am not a lawyer.
| orbisvicis wrote:
| I'm not sure what you're trying to imply... only rogue
| software developers use devstral?
| dismalaf wrote:
| It's not about ethical or not, it's about risk to your
| startup. Ethics are super subjective (and often change based
| on politics). Apache means you own your own model, period.
| simonw wrote:
| What's different between the ethics of Mistral and Gemma?
| Philpax wrote:
| I think their point was more that Gemma open models have
| restrictive licences, while some Mistral open models do
| not.
| Havoc wrote:
| They're all quite easy to strip of protections and I don't
| think anyone doing unethical stuff is big on following
| licenses anyway
| portaouflop wrote:
| TIL Open Source is only used for unethical purposes
| ics wrote:
| Maybe someone here can suggest tools or at least where to look;
| what are the state-of-the-art models to run locally on relatively
| low power machines like a MacBook Air? Is there anyone tracking
| what is feasible given a machine spec?
|
| "Apple Intelligence" isn't it but it would be nice to know
| without churning through tests whether I should bother keeping
| around 2-3 models for specific tasks in ollama or if their
| performance is marginal there's a more stable all-rounder model.
| thatcherc wrote:
| I would recommend just trying it out! (as long as you have the
| disk space for a few models). llama.cpp[0] is pretty easy to
| download and build and has good support for M-series Macbook
| Airs. I usually just use LMStudio[1] though - it's got a nice
| and easy-to-use interface that looks like the ChatGPT or Claude
| webpage, and you can search for and download models from within
| the program. LMStudio would be the easiest way to get started
| and probably all you need. I use it a lot on my M2 Macbook Air
| and it's really handy.
|
| [0] - https://github.com/ggml-org/llama.cpp
|
| [1] - https://lmstudio.ai/
| Etheryte wrote:
| This doesn't do anything to answer the main question of what
| models they can actually run.
| Miraste wrote:
| The best general model you can run locally is probably some
| version of Gemma 3 or the latest Mistral Small. On a Windows
| machine, this is limited by VRAM, since system RAM is too low-
| bandwidth to run models at usable speeds. On an M-series Mac,
| the system memory is on-die and fast enough to use. What you
| can run will be the total RAM, minus whatever MacOS uses and
| the space you want for other programs.
|
| To determine how much space a model needs, you look at the size
| of the quantized (lower precision) model on HuggingFace or
| wherever it's hosted. Q4_K_M is a good default. As a rough rule
| of thumb, this will be a little over half the size of the
| parameters, if they were in gigabytes. For Devstral, that's
| 14.3GB. You will also need 1-8GB more than that, to store the
| context.
|
| For example: A 32GB Macbook Air could use Devstral at 14.3+4GB,
| leaving ~14GB for the system and applications. A 16GB Macbook
| Air could use Gemma 3 12B at 7.3+2GB, leaving ~7GB for
| everything else. An 8GB Macbook could use Gemma 3 4B at
| 2.5GB+1GB, but this is probably not worth doing.
| bravura wrote:
| And how do the results compare to hosted LLMs like Claude 3.7?
| resource_waste wrote:
| Eh, different usecase entirely. I don't really compare these.
| bufferoverflow wrote:
| Different class. Same exact use case.
| ttoinou wrote:
| For which kind of coding would you use a subpar LLM ?
| troyvit wrote:
| I'd use a "subpar" LLM for any coding practice where I want
| to do the bulk of the thinking and where I care about how
| much coal I'm burning.
|
| It's kind-of like asking, for which kind of road-trip would
| you use a Corolla hatchback instead of a Jeep Grand
| Wagoneer? For me the answer would be "almost all of them",
| but for others that might not be the case.
| ManlyBread wrote:
| >Devstral is light enough to run on a single RTX 4090 or a Mac
| with 32GB RAM, making it an ideal choice for local deployment and
| on-device use
|
| This is still too much, a single 4090 costs $3k
| Uehreka wrote:
| > a single 4090 costs $3k
|
| What a ripoff, considering that a 5090 with 32GB of VRAM also
| currently costs $3k ;)
|
| (Source: I just received the one I ordered from Newegg a week
| ago for $2919. I used hotstocks.io to alert me that it was
| available, but I wasn't super fast at clicking and still
| managed to get it. Things have cooled down a lot from the
| craziness of early February.)
| IshKebab wrote:
| That's probably because the 5000 series seems to be a big
| let-down. It's pretty much identical to the 4000 series in
| efficiency; they've only increased performance by massively
| increasing power usage.
| hiatus wrote:
| I receive NXDOMAIN for that hostname.
| jsheard wrote:
| It's hotstock.io, no plural.
| ttoinou wrote:
| I can get the 5090 for 1700 euros on Amazon Spain. But there
| is 95% chance it is a scammy seller :P
| fkyoureadthedoc wrote:
| > a single 4090 costs $3k
|
| I hope not. Mine was $1700 almost 2 years go, and the 5090 is
| out now...
| hnuser123456 wrote:
| The 4090 went up in price for a while as the 5000 marketing
| percolated and people wanted an upgrade they could actually
| buy.
| oezi wrote:
| If it runs on 4090, it also runs on 3090 which are available
| used for 600 EUR.
| threeducks wrote:
| More like 700 EUR if you are lucky. Prices are still not back
| down from the start of the AI boom.
|
| I am hopeful that the prices will drop a bit more with
| Intel's recently announced Arc Pro B60 with 24GB VRAM, which
| unfortunately has only half the memory bandwidth of the RTX
| 3090.
|
| Not sure why other hardware makers are so slow to catch up.
| Apple really was years ahead of the competition with the M1
| Ultra with 800 GB/s memory bandwidth.
| orbisvicis wrote:
| Is there an equivalence between gpu vram and mac ram?
| viraptor wrote:
| For loading models, it's exactly the same. Mac ram is fully
| (more or less) shared between CPU/GPU.
| oofbaroomf wrote:
| The SWE-Bench scores are very, very high for an open source model
| of this size. 46.8% is better than o3-mini (with Agentless-lite)
| and Claude 3.6 (with AutoCodeRover), but it is a little lower
| than Claude 3.6 with Anthropic's proprietary scaffold. And
| considering you can run this for almost free, this is a very
| extraordinary model.
| falcor84 wrote:
| Just to confirm, are you referring to Claude 3.7?
| oofbaroomf wrote:
| No. I am referring to Claude 3.5 Sonnet New, released October
| 22, 2024, with model ID claude-3-5-sonnet-20241022,
| colloquially referred to as Claude 3.6 Sonnet because of
| Anthropic's confusing naming.
| SkyPuncher wrote:
| > colloquially referred to as Claude 3.6
|
| Interesting. I've never heard this.
| Deathmax wrote:
| Also known as Claude 3.5 Sonnet V2 on AWS Bedrock and GCP
| Vertex AI
| ttoinou wrote:
| And it is a very good LLM. Some people complain they don't
| see an improvement with Sonnet 3.7
| AstroBen wrote:
| extraordinary.. or suspicious that the benchmarks aren't doing
| their job
| YetAnotherNick wrote:
| The SWE bench is super impressive of model of any size. However
| just providing one benchmark results and having to do partnership
| with OpenHands seems like they focused too much on optimizing the
| number.
| dismalaf wrote:
| It's nice that Mistral is back to releasing actual open source
| models. Europe needs a competitive AI company.
|
| Also, Mistral has been killing it with their most recent models.
| I pay for Le Chat Pro, it's really good. Mistral Small is really
| good. Also building a startup with Mistral integration.
| jadbox wrote:
| But how does it compare to deepcoder?
| CSMastermind wrote:
| I don't believe the benchmarks they're presenting.
|
| I haven't tried it out yet but every model I've tested from
| Mistral has been towards the bottom of my benchmarks in a similar
| place to Llama.
|
| Would be very surprised if the real life performance is anything
| like they're claiming.
| Ancapistani wrote:
| I've worked with other models from All Hands recently, and I
| believe they were based on Mistral.
|
| My general impression so far is that they aren't quite up to
| Claude 3.7 Sonnet, but they're quite good. More than adequate
| for an "AI pair coding assistant", and suitable for larger
| architectural work as long as you break things into steps for
| it.
| qwertox wrote:
| Maybe the EU should cover the cost of creating this agent/model,
| assuming it really delivers what it promises. It would allow
| Mistral to keep focusing on what they do and for us it would mean
| that the EU spent money wisely.
| Havoc wrote:
| >Maybe the EU should cover the cost of creating this model
|
| Wouldn't mind some of my taxpayer money flowing towards
| apache/mit licensed models.
|
| Even if just to maintain a baseline alternative & keep everyone
| honest. Seems important that we don't have some large megacorps
| run away with this.
| dismalaf wrote:
| Pretty sure the EU paid for some supercomputers that AI
| startups can use and Mistral is partner in that program.
| TZubiri wrote:
| I feel this is part of a larger and very old business trend.
|
| But do we need 20 companies copying each other and doing the same
| thing?
|
| Like, is that really competition? I'd say competition is when you
| do something slightly different, but I guess it's subjective
| based on your interpretation of what is a commodity and what is
| proprietary.
|
| To my view, everyone is outright copying and creating commodity
| markets:
|
| OpenAI: The OG, the Coke of Modern AI
|
| Claude: The first copycat, The Pepsi of Modern AI
|
| Mistral: Euro OpenAI
|
| DeepSeek: Chinese OpenAI
|
| Grok/xAI: Republican OpenAI
|
| Google/MSFT: OpenAI clone as a SaaS or Office package.
|
| Meta's Llama: Open Source OpenAI
|
| etc...
| amarcheschi wrote:
| I think llama is less open source than this mistral release
___________________________________________________________________
(page generated 2025-05-21 23:00 UTC)