[HN Gopher] Devstral
       ___________________________________________________________________
        
       Devstral
        
       Author : mfiguiere
       Score  : 314 points
       Date   : 2025-05-21 14:21 UTC (8 hours ago)
        
 (HTM) web link (mistral.ai)
 (TXT) w3m dump (mistral.ai)
        
       | AnhTho_FR wrote:
       | Impressive performance!
        
       | ddtaylor wrote:
       | Wow. I was just grabbing some models and I happened to see this
       | one while I was messing with tool support in LLamaIndex. I have
       | an agentic coding thing I threw together and I have been trying
       | different models on it and was looking to throw ReAct at it to
       | bring in some models that don't have tool support and this just
       | pops into existence!
       | 
       | I'm not able to get my agentic system to use this model though as
       | it just says "I don't have the tools to do this". I tried
       | modifying various agent prompts to explicitly say "Use foo tool
       | to do bar" without any luck yet. All of the ToolSpec that I use
       | are annotated etc. Pydantic objects and every other model has
       | figured out how to use these tools.
        
         | tough wrote:
         | you can use constrained outptus for enforcing tool schemas any
         | model can get it with a lil help
        
       | abrowne2 wrote:
       | Curious to check this out, since they say it can run on a 4090 /
       | Mac with >32 GB of RAM.
        
         | ddtaylor wrote:
         | I can run it without issue on a 6800 XT with 64GB of RAM.
        
         | yencabulator wrote:
         | "Can run" is pretty easy, it's pretty small and quantized. It
         | runs at 3.7 tokens/second on pure CPU with AMD 8945HS.
        
       | simonw wrote:
       | The first number I look at these days is the file size via
       | Ollama, which for this model is 14GB
       | https://ollama.com/library/devstral/tags
       | 
       | I find that on my M2 Mac that number is a rough approximation to
       | how much memory the model needs (usually plus about 10%) - which
       | matters because I want to know how much RAM I will have left for
       | running other applications.
       | 
       | Anything below 20GB tends not to interfere with the other stuff
       | I'm running too much. This model looks promising!
        
         | lis wrote:
         | Yes, I agree. I've just ran the model locally and it's making a
         | good impression. I've tested it with some ruby/rspec gotchas,
         | which it handled nicely.
         | 
         | I'll give it a try with aider to test the large context as
         | well.
        
           | ericb wrote:
           | In ollama, how do you set up the larger context, and figure
           | out what settings to use? I've yet to find a good guide. I'm
           | also not quite sure how I should figure out what those
           | settings should be for each model.
           | 
           | There's context length, but then, how does that relate to
           | input length and output length? Should I just make the
           | numbers match? 32k is 32k? Any pointers?
        
             | lis wrote:
             | For aider and ollama, see:
             | https://aider.chat/docs/llms/ollama.html
             | 
             | Just for ollama, see: https://github.com/ollama/ollama/blob
             | /main/docs/faq.md#how-c...
             | 
             | I'm using llama.cpp though, so I can't confirm these
             | methods.
        
               | nico wrote:
               | Are you using it with aider? If so, how has your
               | experience been?
        
         | nico wrote:
         | Any agentic dev software you could recommend that runs well
         | with local models?
         | 
         | I've been using Cursor and I'm kind of disappointed. I get
         | better results just going back and forth between the editor and
         | ChatGPT
         | 
         | I tried localforge and aider, but they are kinda slow with
         | local models
        
           | jabroni_salad wrote:
           | Do you have any other interface for the model? what kind of
           | tokens/sec are you getting?
           | 
           | Try hooking aider up to gemini and see how the speed is. I
           | have noticed that people in the localllama scene do not like
           | to talk about their TPS.
        
             | nico wrote:
             | The models feel pretty snappy when interacting with them
             | directly via ollama, not sure about the TPS
             | 
             | However I've also ran into 2 things: 1) most models don't
             | support tools, sometimes it's hard to find a version of the
             | model that correctly uses tools, 2) even with good TPS,
             | since the agents are usually doing chain-of-thought and
             | running multiple chained prompts, the experience feels slow
             | - this is even true with Cursor using their models/apis
        
       | gyudin wrote:
       | Super weird benchmarks
        
         | avereveard wrote:
         | from what I gather it's finetuned to use OpenHand specifically
         | so shows value on thsoe benchmark that target a whole system as
         | a blackbox (i.e. agent + llm) more than directly target the llm
         | input/outputs
        
           | amarcheschi wrote:
           | Yup the 1st comment says this https://www.reddit.com/r/LocalL
           | LaMA/comments/1kryybf/mistral...
        
       | solomatov wrote:
       | It's very nice that it has the Apache 2.0 license, i.e. well
       | understood license, instead of some "open weight" license with a
       | lot of conditions.
        
         | resource_waste wrote:
         | This is basically the Mistral niche. If you are doing something
         | generally perceived as ethical, you would use Gemma 3 IMO. When
         | you aren't... well there are Apache licensed LLMs for you.
        
           | solomatov wrote:
           | IMO, it's not about ethics, it's about legal risks. What if
           | you want to fine tune a model on output related to your
           | usage? Then my understanding is that all these derivatives
           | need to be under the same license. What if G will change
           | their prohibited use policy (the first line there is that
           | they could update it from time to time)? There's really crazy
           | stuff in terms of use of some services, what if G adds
           | something in the same tune there which basically makes your
           | application impossible.
           | 
           | P.S. I am not a lawyer.
        
           | orbisvicis wrote:
           | I'm not sure what you're trying to imply... only rogue
           | software developers use devstral?
        
           | dismalaf wrote:
           | It's not about ethical or not, it's about risk to your
           | startup. Ethics are super subjective (and often change based
           | on politics). Apache means you own your own model, period.
        
           | simonw wrote:
           | What's different between the ethics of Mistral and Gemma?
        
             | Philpax wrote:
             | I think their point was more that Gemma open models have
             | restrictive licences, while some Mistral open models do
             | not.
        
           | Havoc wrote:
           | They're all quite easy to strip of protections and I don't
           | think anyone doing unethical stuff is big on following
           | licenses anyway
        
           | portaouflop wrote:
           | TIL Open Source is only used for unethical purposes
        
       | ics wrote:
       | Maybe someone here can suggest tools or at least where to look;
       | what are the state-of-the-art models to run locally on relatively
       | low power machines like a MacBook Air? Is there anyone tracking
       | what is feasible given a machine spec?
       | 
       | "Apple Intelligence" isn't it but it would be nice to know
       | without churning through tests whether I should bother keeping
       | around 2-3 models for specific tasks in ollama or if their
       | performance is marginal there's a more stable all-rounder model.
        
         | thatcherc wrote:
         | I would recommend just trying it out! (as long as you have the
         | disk space for a few models). llama.cpp[0] is pretty easy to
         | download and build and has good support for M-series Macbook
         | Airs. I usually just use LMStudio[1] though - it's got a nice
         | and easy-to-use interface that looks like the ChatGPT or Claude
         | webpage, and you can search for and download models from within
         | the program. LMStudio would be the easiest way to get started
         | and probably all you need. I use it a lot on my M2 Macbook Air
         | and it's really handy.
         | 
         | [0] - https://github.com/ggml-org/llama.cpp
         | 
         | [1] - https://lmstudio.ai/
        
           | Etheryte wrote:
           | This doesn't do anything to answer the main question of what
           | models they can actually run.
        
         | Miraste wrote:
         | The best general model you can run locally is probably some
         | version of Gemma 3 or the latest Mistral Small. On a Windows
         | machine, this is limited by VRAM, since system RAM is too low-
         | bandwidth to run models at usable speeds. On an M-series Mac,
         | the system memory is on-die and fast enough to use. What you
         | can run will be the total RAM, minus whatever MacOS uses and
         | the space you want for other programs.
         | 
         | To determine how much space a model needs, you look at the size
         | of the quantized (lower precision) model on HuggingFace or
         | wherever it's hosted. Q4_K_M is a good default. As a rough rule
         | of thumb, this will be a little over half the size of the
         | parameters, if they were in gigabytes. For Devstral, that's
         | 14.3GB. You will also need 1-8GB more than that, to store the
         | context.
         | 
         | For example: A 32GB Macbook Air could use Devstral at 14.3+4GB,
         | leaving ~14GB for the system and applications. A 16GB Macbook
         | Air could use Gemma 3 12B at 7.3+2GB, leaving ~7GB for
         | everything else. An 8GB Macbook could use Gemma 3 4B at
         | 2.5GB+1GB, but this is probably not worth doing.
        
       | bravura wrote:
       | And how do the results compare to hosted LLMs like Claude 3.7?
        
         | resource_waste wrote:
         | Eh, different usecase entirely. I don't really compare these.
        
           | bufferoverflow wrote:
           | Different class. Same exact use case.
        
           | ttoinou wrote:
           | For which kind of coding would you use a subpar LLM ?
        
             | troyvit wrote:
             | I'd use a "subpar" LLM for any coding practice where I want
             | to do the bulk of the thinking and where I care about how
             | much coal I'm burning.
             | 
             | It's kind-of like asking, for which kind of road-trip would
             | you use a Corolla hatchback instead of a Jeep Grand
             | Wagoneer? For me the answer would be "almost all of them",
             | but for others that might not be the case.
        
       | ManlyBread wrote:
       | >Devstral is light enough to run on a single RTX 4090 or a Mac
       | with 32GB RAM, making it an ideal choice for local deployment and
       | on-device use
       | 
       | This is still too much, a single 4090 costs $3k
        
         | Uehreka wrote:
         | > a single 4090 costs $3k
         | 
         | What a ripoff, considering that a 5090 with 32GB of VRAM also
         | currently costs $3k ;)
         | 
         | (Source: I just received the one I ordered from Newegg a week
         | ago for $2919. I used hotstocks.io to alert me that it was
         | available, but I wasn't super fast at clicking and still
         | managed to get it. Things have cooled down a lot from the
         | craziness of early February.)
        
           | IshKebab wrote:
           | That's probably because the 5000 series seems to be a big
           | let-down. It's pretty much identical to the 4000 series in
           | efficiency; they've only increased performance by massively
           | increasing power usage.
        
           | hiatus wrote:
           | I receive NXDOMAIN for that hostname.
        
             | jsheard wrote:
             | It's hotstock.io, no plural.
        
           | ttoinou wrote:
           | I can get the 5090 for 1700 euros on Amazon Spain. But there
           | is 95% chance it is a scammy seller :P
        
         | fkyoureadthedoc wrote:
         | > a single 4090 costs $3k
         | 
         | I hope not. Mine was $1700 almost 2 years go, and the 5090 is
         | out now...
        
           | hnuser123456 wrote:
           | The 4090 went up in price for a while as the 5000 marketing
           | percolated and people wanted an upgrade they could actually
           | buy.
        
         | oezi wrote:
         | If it runs on 4090, it also runs on 3090 which are available
         | used for 600 EUR.
        
           | threeducks wrote:
           | More like 700 EUR if you are lucky. Prices are still not back
           | down from the start of the AI boom.
           | 
           | I am hopeful that the prices will drop a bit more with
           | Intel's recently announced Arc Pro B60 with 24GB VRAM, which
           | unfortunately has only half the memory bandwidth of the RTX
           | 3090.
           | 
           | Not sure why other hardware makers are so slow to catch up.
           | Apple really was years ahead of the competition with the M1
           | Ultra with 800 GB/s memory bandwidth.
        
         | orbisvicis wrote:
         | Is there an equivalence between gpu vram and mac ram?
        
           | viraptor wrote:
           | For loading models, it's exactly the same. Mac ram is fully
           | (more or less) shared between CPU/GPU.
        
       | oofbaroomf wrote:
       | The SWE-Bench scores are very, very high for an open source model
       | of this size. 46.8% is better than o3-mini (with Agentless-lite)
       | and Claude 3.6 (with AutoCodeRover), but it is a little lower
       | than Claude 3.6 with Anthropic's proprietary scaffold. And
       | considering you can run this for almost free, this is a very
       | extraordinary model.
        
         | falcor84 wrote:
         | Just to confirm, are you referring to Claude 3.7?
        
           | oofbaroomf wrote:
           | No. I am referring to Claude 3.5 Sonnet New, released October
           | 22, 2024, with model ID claude-3-5-sonnet-20241022,
           | colloquially referred to as Claude 3.6 Sonnet because of
           | Anthropic's confusing naming.
        
             | SkyPuncher wrote:
             | > colloquially referred to as Claude 3.6
             | 
             | Interesting. I've never heard this.
        
             | Deathmax wrote:
             | Also known as Claude 3.5 Sonnet V2 on AWS Bedrock and GCP
             | Vertex AI
        
             | ttoinou wrote:
             | And it is a very good LLM. Some people complain they don't
             | see an improvement with Sonnet 3.7
        
         | AstroBen wrote:
         | extraordinary.. or suspicious that the benchmarks aren't doing
         | their job
        
       | YetAnotherNick wrote:
       | The SWE bench is super impressive of model of any size. However
       | just providing one benchmark results and having to do partnership
       | with OpenHands seems like they focused too much on optimizing the
       | number.
        
       | dismalaf wrote:
       | It's nice that Mistral is back to releasing actual open source
       | models. Europe needs a competitive AI company.
       | 
       | Also, Mistral has been killing it with their most recent models.
       | I pay for Le Chat Pro, it's really good. Mistral Small is really
       | good. Also building a startup with Mistral integration.
        
       | jadbox wrote:
       | But how does it compare to deepcoder?
        
       | CSMastermind wrote:
       | I don't believe the benchmarks they're presenting.
       | 
       | I haven't tried it out yet but every model I've tested from
       | Mistral has been towards the bottom of my benchmarks in a similar
       | place to Llama.
       | 
       | Would be very surprised if the real life performance is anything
       | like they're claiming.
        
         | Ancapistani wrote:
         | I've worked with other models from All Hands recently, and I
         | believe they were based on Mistral.
         | 
         | My general impression so far is that they aren't quite up to
         | Claude 3.7 Sonnet, but they're quite good. More than adequate
         | for an "AI pair coding assistant", and suitable for larger
         | architectural work as long as you break things into steps for
         | it.
        
       | qwertox wrote:
       | Maybe the EU should cover the cost of creating this agent/model,
       | assuming it really delivers what it promises. It would allow
       | Mistral to keep focusing on what they do and for us it would mean
       | that the EU spent money wisely.
        
         | Havoc wrote:
         | >Maybe the EU should cover the cost of creating this model
         | 
         | Wouldn't mind some of my taxpayer money flowing towards
         | apache/mit licensed models.
         | 
         | Even if just to maintain a baseline alternative & keep everyone
         | honest. Seems important that we don't have some large megacorps
         | run away with this.
        
         | dismalaf wrote:
         | Pretty sure the EU paid for some supercomputers that AI
         | startups can use and Mistral is partner in that program.
        
       | TZubiri wrote:
       | I feel this is part of a larger and very old business trend.
       | 
       | But do we need 20 companies copying each other and doing the same
       | thing?
       | 
       | Like, is that really competition? I'd say competition is when you
       | do something slightly different, but I guess it's subjective
       | based on your interpretation of what is a commodity and what is
       | proprietary.
       | 
       | To my view, everyone is outright copying and creating commodity
       | markets:
       | 
       | OpenAI: The OG, the Coke of Modern AI
       | 
       | Claude: The first copycat, The Pepsi of Modern AI
       | 
       | Mistral: Euro OpenAI
       | 
       | DeepSeek: Chinese OpenAI
       | 
       | Grok/xAI: Republican OpenAI
       | 
       | Google/MSFT: OpenAI clone as a SaaS or Office package.
       | 
       | Meta's Llama: Open Source OpenAI
       | 
       | etc...
        
         | amarcheschi wrote:
         | I think llama is less open source than this mistral release
        
       ___________________________________________________________________
       (page generated 2025-05-21 23:00 UTC)