[HN Gopher] Magistral -- the first reasoning model by Mistral AI
       ___________________________________________________________________
        
       Magistral -- the first reasoning model by Mistral AI
        
       Author : meetpateltech
       Score  : 593 points
       Date   : 2025-06-10 14:08 UTC (8 hours ago)
        
 (HTM) web link (mistral.ai)
 (TXT) w3m dump (mistral.ai)
        
       | cchance wrote:
       | Good first shot i guess, but the small ones about as good as v3,
       | and the mediums not quite as good as r1... i wonder if that r1 is
       | the actual new one or the old one
        
         | hacklas wrote:
         | The Deepseek V3 is a model with 671 billion parameters, of
         | which 37 billion are active.
         | 
         | Magistral Small is a 24 billion parameter model.
         | 
         | Pretty impressive in terms of efficiency for Mistral.
         | 
         | The size of the Magistral Medium is not publicly available, so
         | it is difficult to compare efficiency there.
        
           | kouteiheika wrote:
           | > The size of the Magistral Medium is not publicly available,
           | so it is difficult to compare efficiency there.
           | 
           | FWIW one of their 70B models has leaked in the past (search
           | for "miqu") and rumors at the time were that it was their
           | medium model.
        
       | danielhanchen wrote:
       | I made some GGUFs for those interested in running them at
       | https://huggingface.co/unsloth/Magistral-Small-2506-GGUF
       | 
       | ollama run hf.co/unsloth/Magistral-Small-2506-GGUF:UD-Q4_K_XL
       | 
       | or
       | 
       | ./llama.cpp/llama-cli -hf unsloth/Magistral-
       | Small-2506-GGUF:UD-Q4_K_XL --jinja --temp 0.7 --top-k -1 --top-p
       | 0.95 -ngl 99
       | 
       | Please use --jinja for llama.cpp and use temperature = 0.7, top-p
       | 0.95!
       | 
       | Also best to increase Ollama's context length to say 8K at least:
       | OLLAMA_CONTEXT_LENGTH=8192 ollama serve &. Some other details in
       | https://docs.unsloth.ai/basics/magistral
        
         | danielhanchen wrote:
         | Their paper https://mistral.ai/static/research/magistral.pdf is
         | also cool! They edited GRPO via:
         | 
         | 1. Removed KL Divergence
         | 
         | 2. Normalize by total length (Dr. GRPO style)
         | 
         | 3. Minibatch normalization for advantages
         | 
         | 4. Relaxing trust region
        
           | Onavo wrote:
           | > _Removed KL Divergence_
           | 
           | Wait, how are they computing the loss?
        
             | danielhanchen wrote:
             | Oh it's the KL term sorry - beta * KL ie they set beta to
             | 0.
             | 
             | The goal of it was to "force" the model not to stray to far
             | away from the original checkpoint, but it can hinder the
             | model from learning new things
        
             | mjburgess wrote:
             | It's just a penalty term that they delete
        
             | trc001 wrote:
             | It's become trendy to delete it. I say trendy because many
             | papers delete it without offering any proof that it is
             | meaningless
        
           | gyrovagueGeist wrote:
           | Does anyone know why they added minibatch advantage
           | normalization (or when it can be useful)?
           | 
           | The paper they cite "What matters in on-policy RL" claims it
           | does not lead to much difference on their suite of test
           | problems, and (mean-of-minibatch)-normalization doesn't seem
           | theoretically motivated for convergence to the optimal
           | policy?
        
         | cpldcpu wrote:
         | But this is just the SFT - "distilled" model, not the one
         | optimized with RL, right?
        
           | danielhanchen wrote:
           | Oh I think it's SFT + RL as mentioned in the paper - they
           | said combining both is actually more performant than just RL
        
         | lxe wrote:
         | Thanks for all you do!
        
           | danielhanchen wrote:
           | Thanks!
        
         | monkmartinez wrote:
         | At the risk of dating myself; Unsloth is the Bomb-dot-com!!! I
         | use your models all the time and they just work. Thank you!!!
         | What does llama.cpp normally use if not "jinja" for their
         | templates?
        
         | ozgune wrote:
         | Their benchmarks are interesting. They are comparing to
         | DeepSeek-V3's (non-reasoning) December and DeepSeek-R1's
         | January releases. I feel that comparing to DeepSeek-R1-0528
         | would be more fair.
         | 
         | For example, R1 scores 79.8 on AIME 2024, R1-0528 performs
         | 91.4.
         | 
         | R1 scores 70 on AIME 2025, R1-0528 scores 87.5. R1-0528 does
         | similarly better for GPQA Diamond, LiveCodeBench, and Aider
         | (about 10-15 points higher).
         | 
         | https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
        
           | semi-extrinsic wrote:
           | Would also be interesting to compare with R1-0528-Qwen3-8B
           | (chain-of-thought distilled from Deepseek-R1-0528 and post-
           | trained into Qwen3-8B). It scores 86 and 76 on AIME 2024 and
           | 2025 respectively.
           | 
           | Currently running the 6-bit XL quant on a single old RTX 2080
           | Ti and I'm quite impressed TBH. Simply wild for a sub-8GB
           | download.
        
         | gavi wrote:
         | too much thinking
         | 
         | https://gist.github.com/gavi/b9985f730f5deefe49b6a28e5569d46...
        
           | fzzzy wrote:
           | My impression from running the first R1 release locally was
           | that it also does too much thinking.
        
             | cluckindan wrote:
             | It does not do any thinking. It is a statistical model,
             | just like the rest of them.
        
               | robmccoll wrote:
               | What are we doing when we think?
        
               | LordDragonfang wrote:
               | "Thinking" is a term of art referring to the
               | hidden/internal output of "reasoning" models where they
               | output "chain of thought" before giving an answer[1].
               | This technique and name stem from the early observation
               | that LLMs do better when explicitly told to "think step
               | by step"[2]. Hope that helps clarify things for you for
               | future constructive discussion.
               | 
               | [1] https://arxiv.org/html/2410.10630v1
               | 
               | [2] https://arxiv.org/pdf/2205.11916
        
               | bobsomers wrote:
               | We are aware of the term of art.
               | 
               | The point that was trying to be made, which I agree with,
               | is that anthropomorphizing a statistical model isn't
               | actually helpful. It only serves to confuse laypersons
               | into assuming these models are capable of a lot more than
               | they really are.
               | 
               | That's perfect if you're a salesperson trying to dump
               | your bad AI startup onto the public with an IPO, but
               | unhelpful for pretty much any other reason, especially
               | true understanding of what's going on.
        
       | Oras wrote:
       | Would be interesting to see a comparison with Qwen 32B. I found
       | it a fantastic local model (ollama).
        
         | DSingularity wrote:
         | I agree. Qwen models are great.
        
         | SV_BubbleTime wrote:
         | Last year, fit was important. This year, inference speed is
         | key.
         | 
         | Proofreading an email at four tokens per second, great.
         | 
         | Spending a half hour to deep research some topic with artifacts
         | and MCP tools and reasoning at four tokens per second... a bad
         | time.
        
       | ksec wrote:
       | A few days after Apple's "The illusion of Reasoning". I wonder if
       | this is the same again. Anyone runs Tower of Hanoi?
        
         | barrkel wrote:
         | The Tower of Hanoi problem is limited by context length rather
         | than model intelligence - see
         | https://x.com/scaling01/status/1931783050511126954
        
         | NitpickLawyer wrote:
         | That paper was flawed in many ways, but it had a catchy name so
         | lots of 'fluencers and media pounced on it and slopped some
         | content based on the title alone. Chances are it will be
         | relegated to the blooper section of LLM papers, just like that
         | "training on LLM outputs leads to model collapse" paper was...
        
           | __loam wrote:
           | Sorry this has nothing to do with the point you're making but
           | I've literally never seen anyone use the word 'fluencers in
           | place of influencers lol.
        
             | olddustytrail wrote:
             | Me neither and it's not much shorter. I think fluzies could
             | work better.
        
               | squidsoup wrote:
               | I propose effluencers.
        
         | syntex wrote:
         | The illussion of reasoning was terrible paper. 2^n-1 how it
         | could fit in context size. I tried o3 and he gave me python
         | script saying that inserting all moves is to much for context
         | window. completely different results.
        
           | roboboffin wrote:
           | I think that their point was that the problem is easily
           | solvable by humans without code, and shows the ability to
           | chain steps together to achieve a goal.
        
             | roboboffin wrote:
             | Not sure why I am being downvoted. I am simply saying that
             | we know there is a defined algorithm for solving Tower of
             | Hanoi, and the source code for it is widely available. So,
             | o3 producing the code as an answer, demonstrates even less
             | intelligence, as it means it is either memorized or copied
             | from the internet. I don't see how this point counters the
             | paper at all.
             | 
             | I believe what they are trying to show in that paper, is
             | that as the chain of operations approaches a large amount
             | (their proxy for complexity), an LLM will inevitable fail.
             | Humans don't have infinite context either, but they can
             | still solve the Tower Of Hanoi without need to resort to
             | either pen or paper, or coding.
        
               | syntex wrote:
               | I didn't downvote. T the problem with the paper is that
               | it asks the model to output all moves for, say, 15 disks
               | 2 ^ 15 - 1 = 32767
               | 
               | 32767 moves in a single prompt. That's not testing
               | reasoning. That's testing whether the model can emit a
               | huge structured output without error, under a context
               | window limit.
               | 
               | The authors then treat failure to reproduce this entire
               | sequence as evidence that the model can't reason. But
               | that's like saying a calculator is broken because its
               | printer jammed halfway through printing all prime numbers
               | under 10000.
               | 
               | For me o3 returning Python code isn't a failure. It's a
               | smart shortcut. The failure is in the benchmark design.
               | This benchmark just smells.
        
               | roboboffin wrote:
               | No worries, I wasn't saying to you directly.
               | 
               | I agree 15 disks is very difficult for a human, probably
               | on a sheer stamina level; but I managed to do 8 in about
               | 15 minutes by playing around (I.e. no practice). They do
               | state that there is a massive drop in performance at this
               | point.
        
               | teach wrote:
               | Remember that with Towers of Hanoi every extra disk
               | doubles the number of moves required. So 15 discs is 128x
               | more moves. If you did eight in 15m then fifteen would
               | take you 32 hours.
        
               | daveguy wrote:
               | > That's testing whether the model can emit a huge
               | structured output without error, under a context window
               | limit.
               | 
               | Agreed. But to be fair, 1) a relatively simple algorithm
               | can do it, and more importantly 2) a lot of people are
               | trying to build products around doing exactly this (emit
               | large structured output without error).
        
             | jwitthuhn wrote:
             | Is it easily solvable by humans without code? I suspect if
             | you asked a human to write down all the steps in order to
             | solve a Tower of Hanoi with 12 disks they would also give
             | up before completing it. Writing code that produces the
             | correct output is the only realistic way to solve that
             | purely due to the amount of output required.
        
       | tonyhart7 wrote:
       | a bit too late aren't we??
        
       | pu_pe wrote:
       | Benchmarks suggest this model loses to Deepseek-R1 in every one-
       | shot comparison. Considering they were likely not even pitting it
       | against the newer R1 version (no mention of that in the article)
       | and at more than double the cost, this looks like the best AI
       | company in the EU is struggling to keep up with the state-of-the-
       | art.
        
         | atemerev wrote:
         | "EU is leading in regulation", they say.
         | 
         | I don't know what they are thinking.
        
           | micromacrofoot wrote:
           | probably some silly thing like "people should have more
           | rights and protections"
        
             | atemerev wrote:
             | I've yet to find any rights and protections in these cookie
             | banners.
        
               | saubeidl wrote:
               | The cookie banners are corps trying to circumvent the
               | rights and protections. If they actually went by the
               | spirit of the protections, the cookie banners wouldn't be
               | needed. Your ire is misdirected.
        
               | yeahforsureman wrote:
               | Are you sure?
               | 
               | The ePrivacy Directive requires a (GDPR-level) consent
               | for just placing the cookie, unless it's strictly
               | _necessary_ for the provision of the "service". The way
               | EU regulators interpret this, even web analytics falls
               | outside the necessity exception and therefore requires
               | consent.
               | 
               | So as long as the user doesn't and/or is not able to
               | automatically signal consent (or non-consent) eg via
               | general browser-level settings, how _can_ you obtain it
               | without trying to get it from the user on a per-site
               | basis somehow? (And no, DNT doesn 't help since it's an
               | opt-out, not an opt-in mechanism.)
        
               | exyi wrote:
               | Everyone I know of will try to click "reject all
               | unnecessary cookies", and you don't need the dialog for
               | the necessary ones. You can therefore simply remove the
               | dialog and the tracking, simplifying your code and
               | improving your users' experience. Can tracking the
               | fraction which misclicks even give some useful data?
        
               | micromacrofoot wrote:
               | there are analytics providers that don't require third
               | party cookies, it's not hard to switch
        
               | micromacrofoot wrote:
               | cookie banners are malicious compliance while we head
               | towards the death of cross-site cookies, they are indeed
               | a poor implementation but the legislation that lead to
               | them did not come up with it
               | 
               | did you really prefer when companies were selling your
               | data to third parties and didn't have to ask you?
        
               | sunaookami wrote:
               | Do you really think clicking "Reject non-essential
               | cookies" does something?
        
               | micromacrofoot wrote:
               | show me a single example that doesn't
        
             | __alexs wrote:
             | EU regulation is often "you can not have the cool thing"
             | not "the cool thing must be operated equitably".
             | 
             | I think they are more interested in protecting old money
             | than in protecting people.
        
               | saubeidl wrote:
               | Can you name specific examples? Otherwise, this just
               | sounds like inflammatory polemic.
        
               | micromacrofoot wrote:
               | I think usb-c and third party app stores are pretty cool
        
               | umbra07 wrote:
               | I think the government shouldn't be legislating that
               | companies must use a specific USB connector.
               | 
               | Realistically the legislation was only targeting Apple.
               | If consumers want USB-C, then they can vote with their
               | wallets and buy an Android, which is a reasonable
               | alternative.
        
               | micromacrofoot wrote:
               | We've had multiple USB standards for decades with no end
               | in sight. Apple was targeted because they have the most
               | high-profile proprietary connector and they were
               | generally using it to screw consumers. Good riddance.
        
               | umbra07 wrote:
               | Like I said, if consumers don't want it, then they can
               | buy Android phones instead.
               | 
               | > they were generally using it to screw consumers
               | 
               | You understand that there were lots of people happy with
               | Lightning? USB-C is a regression in many ways.
        
               | boroboro4 wrote:
               | I want to have USB-C and I want to have iPhone.
               | 
               | I'm very happy EU regulators took this headache off my
               | shoulders and I don't need to keep multiple chargers at
               | home, and can be almost certain I can find a charger in
               | restaurant if I need it.
               | 
               | Based on the reaction of my friends 90% of people
               | supported this change and were very enthusiastic about
               | it.
               | 
               | I have zero interest in being part of vendor game to lock
               | me in.
        
               | umbra07 wrote:
               | Products are supposed to come with different tradeoffs. I
               | want to have an Android and I want to have my headphone
               | jack back. That doesn't mean that the EU should make that
               | a law.
               | 
               | > Based on the reaction of my friends 90% of people
               | supported this change and were very enthusiastic about
               | it.
               | 
               | That is an absolutely worthless metric, and you know it.
        
               | Aeolos wrote:
               | It's about as useful as your complaining.
               | 
               | Good riddance for Lightning.
        
               | micromacrofoot wrote:
               | Why bother arguing the point if you're not going to
               | provide a single example.
        
               | flmontpetit wrote:
               | It's hard to see the benefit in letting every hardware
               | manufacturer attempt to carve out their own little
               | artificial interconnect monopoly and flood the market
               | with redundant, wasteful solutions.
        
               | msgodel wrote:
               | They shouldn't be forcing people to use patented Qualcomm
               | technology to access cellular networks either but here we
               | are.
               | 
               | Realistically Apple's connector adds no value and if they
               | want to sell into markets like the EU they need to cut
               | that kind of thing out.
        
               | umbra07 wrote:
               | > Realistically Apple's connector adds no value
               | 
               | Like I said, usb-c is a regression from lightning in
               | multiple ways.
               | 
               | * Lightning is easier to plug in.
               | 
               | * Lightning is a physically smaller connector.
               | 
               | * USB-C is a much more mechanically complex port. Instead
               | of a boss in a slot, you have a boss with a slot plugging
               | into a slot in a boss.
               | 
               | There was so much buzz around Apple no longer including a
               | wall wort with its phones, which meant an added cost for
               | the consumer, and potentially an increased environmental
               | impact if enough people were going to say, order a wall
               | wort online and shipped to them. The same logic applies
               | to Apple forced to switch to USB, except that the costs
               | are now multiplied.
        
               | micromacrofoot wrote:
               | I've worked with thousands of both types of cable at this
               | point
               | 
               | > Lightning is easier to plug in.
               | 
               | according to you? neither are at all difficult
               | 
               | > Lightning is a physically smaller connector.
               | 
               | I've had lightning cables physically disassemble in the
               | port, the size also made them somewhat delicate
               | 
               | > USB-C is a much more mechanically complex port.
               | 
               |  _much_ is a bit well, much... they 're both incredibly
               | simple mechanically -- the exposed contacts made
               | lightning more prone to damage
               | 
               | I've had multiple Apple devices fail because of port wear
               | on the device. Haven't encountered this yet with usb-c
               | 
               | > The same logic applies to Apple forced to switch to
               | USB, except that the costs are now multiplied.
               | 
               | Apple would have updated inevitably, as they did in the
               | past -- now at least they're on a standard... the long-
               | term waste reduction is very likely worth the switch
               | (because again, without the standard they'd have likely
               | switched to another proprietary implementation)
        
               | fkyoureadthedoc wrote:
               | Having owned both lighting and USB-C iPhones/iPads, I
               | prefer the USB-C experience, but neither were that bad.
               | 
               | My personal biggest gripe with lightning was that the
               | spring contacts were in the port instead of the cable,
               | and when they wore out you had to replace the phone
               | instead of the cable. The lightning port was not
               | replaceable. In practice I may end up breaking more USB-C
               | ports, we'll see.
        
               | andruby wrote:
               | EU never just states "you can not have the cool thing".
               | Please provide an example if you disagree.
               | 
               | It is very hard to create policies and legislation that
               | protects consumers, workers and privacy while also giving
               | enough liberties for innovation. These are difficult but
               | important trade-offs.
               | 
               | I'm glad there is diversity in cultures and values
               | between the US, EU and Asia.
        
             | bobxmax wrote:
             | Rights and protections that have benefited heavily from an
             | economy built on the alliance with the US.
             | 
             | If it weren't for American help and trade post-WW2, Europe
             | would be a Belarusian backwater and is fast heading back in
             | that direction.
             | 
             | Countries like Greece, Italy, Spain, Portugal, etc. show
             | the future of Europe as it slowly stagnates and becomes a
             | museum that can't feed it's people.
             | 
             | Even Germany that was once excelling is now collapsing
             | economically.
             | 
             | The only bright spot on the continent right now is Poland
             | who are, shocker, much less regulatorily strict and have
             | lower corporate taxes.
        
               | debugnik wrote:
               | > Countries like Greece, Italy, Spain, Portugal
               | 
               | PIGS, really? Some of the top growing EU economies right
               | now, which have turned their deficit around, show the
               | future of a slowly stagnating Europe?
        
               | bobxmax wrote:
               | A 200B economy growing 2% is the future of the EU? Yes
               | that is the point I am making.
        
           | dmos62 wrote:
           | It is fairly common to struggle to understand why different
           | cultures think the way they do.
        
             | moralestapia wrote:
             | Ugh.
             | 
             | Edit: Parent changed their comment significantly, from
             | something quite unpleasant to what it is now. I'm not
             | deleting my comment as I'm not that kind of person.
        
               | dmos62 wrote:
               | I did. I initially said that Europeans often struggle to
               | understand other cultures too. Which was an immature way
               | to point out that the cultural dissonance works both
               | ways. I realized that I was obfuscating my point and
               | rewrote my comment to be clearer, but now that you gave
               | me a chance to think on it some more, I wish I would have
               | said what I wanted to say more directly still.
               | 
               | What I wanted to say is: I like EU's regulation and I
               | find it interesting how other people have different world
               | views.
        
             | atemerev wrote:
             | I live in Europe.
        
               | mrtksn wrote:
               | Cool, which regulations exactly stopped you from doing
               | cutting edge AI?
        
               | kelseyfrog wrote:
               | Decret sur la Pause Gouter Universelle (PGU).
        
               | philjohn wrote:
               | Is that the regulation that says you need to allow
               | someone to take a 20 minute break after 6 hours of work?
        
               | meta_ai_x wrote:
               | regulation-culture breed a certain type of risk-taking
               | culture. So, you can't blame a specific regulation for
               | lack of innovation culture
        
               | mrtksn wrote:
               | Im not sure about that, Europe has plenty of starups.
               | Also, IIRC it has larger number of small businesses than
               | US as in US huge companies employ huge numbers of people.
               | 
               | What Europe does not have is scale ups in tech. The tech
               | consolidated in US. By tech I mean internet based
               | companies. Remove those and EU has higher productivity.
        
           | cpldcpu wrote:
           | Sorry, this is just getting old...
           | 
           | Its a trite talking point and not the reason why there are so
           | few consumer-AI companies in Europe.
        
             | atemerev wrote:
             | And what would be the reason? I am genuinely interested.
             | Also, are there viable not "consumer" AI companies here?
             | Only Mistral seems to train foundation models, and good for
             | them, however, as of now they are absolutely not SOTA.
        
               | baq wrote:
               | Money.
               | 
               | No, really - EU doesn't have the VCs and the megacorps.
               | People laugh at EU sponsoring projects, but there is no
               | private money to sponsor them. There are plenty of US
               | companies with sites in the EU though, so you have people
               | working the problems, but no branding.
        
               | SV_BubbleTime wrote:
               | Ok, just a quick question... why does Europe not have the
               | money actual/people?
        
               | baq wrote:
               | edit: the parent has since edited out the flamebait.
               | 
               | Maybe, or maybe when silicon valley was busy growing
               | exponentially Europe was still picking itself up from the
               | mess of ww2.
               | 
               | Trying to blame a single reason is futile, naive and
               | childish.
        
               | oceanplexian wrote:
               | The US was out-innovating Europe a long time before WW2,
               | we had faster, more extensive rail systems, superior high
               | rise construction, earlier to electrification, invention
               | of the telephone, modern manufacturing (Model T),
               | invention of the airplane, the birth of Hollywood and
               | modern motion pictures, the list goes on.
        
               | msgodel wrote:
               | I think it's funny how the US, Canada, and Scotland/the
               | UK all simultaneously claim to be the home of the
               | telephone.
        
               | bobxmax wrote:
               | And what's the excuse for Euro's GDP being equal to the
               | US in 2007, and now being over $10T less?
        
               | baq wrote:
               | In general, the same. In particular, different.
        
               | fmbb wrote:
               | Quick questions don't always have quick answers.
               | 
               | Moneywise, the US does have the good old Exorbitant
               | Privilege to lean on.
        
               | hshdhdhj4444 wrote:
               | Part of the answer is debt.
               | 
               | The U.S. has a debt of 35Tn. The entire EU around 16Tn.
               | 
               | If even 10% of the debt difference was invested in tech
               | that would have meant about $2tn more in investment in EU
               | tech.
        
               | bobxmax wrote:
               | Because Europeans don't take smart risks. Because they
               | over regulate.
               | 
               | It's fascinating watching people circle back to this
               | answer.
               | 
               | Regulation and taxation reduces incentives. Lower
               | incentives, means lower risk-taking.
               | 
               | The fact this is still a lesson that needs to be debated
               | is absurd.
        
               | baq wrote:
               | Europeans also mostly don't suffer from school shootings
               | and generally don't go bankrupt when they get cancer or
               | just take an ambulance ride to a non-network hospital.
               | Regulation is not all bad, besides the US has more of it
               | than anybody else.
        
               | bobxmax wrote:
               | The vast majority of Americans don't do either of those
               | things either.
               | 
               | And given what happened in Austria just a few hours back,
               | not the best time for your comment.
        
               | camjw wrote:
               | There have been 11 mass shootings in the US in the last 7
               | days so I don't think this disgusting competition is one
               | you're likely to win.
        
               | bobxmax wrote:
               | Nobody is claiming the US has less mass shootings. It's
               | just pointless whataboutism in a conversation (economic
               | strategy) that has nothing to do with it.
        
               | camjw wrote:
               | Ah good, I thought you were trying to imply there is an
               | equivalent problem in the EU. Which would seem to be
               | intentionally dense of course.
        
               | baq wrote:
               | Regulation was the point discussed, healthcare and gun
               | controls are two examples where there are massive
               | qualitative and quantitative differences in regulation
               | between EU and USA. E.g. healthcare is a matter of
               | national security in the EU and it's a profit center for
               | pension funds in the USA. Gun controls I'm not too
               | familiar with, I can only see second order effects in the
               | US in the form of an arms race between police and
               | citizens.
        
               | bobxmax wrote:
               | No, ECONOMIC regulation was the point discussed. That has
               | zilch to do with something like gun control.
        
               | TulliusCicero wrote:
               | The mental gymnastics here are incredible. Do you really
               | think the regulations inhibiting tech startup creation
               | are the same ones that protect people when they get
               | cancer or whatever?
               | 
               | Yes, the US has a lot of school shootings, but does
               | anyone think loose gun regulations are why the US is
               | strong on tech?
        
               | bobxmax wrote:
               | Any time European economic failings are brought up it's
               | always the same thing. "Well at least no school
               | shootings!"
               | 
               | Great, Singapore has less school shootings and homeless
               | people than anywhere in Europe by a country mile and has
               | a soaring economy.
        
               | camjw wrote:
               | I would love to know what you do for a living and whether
               | you personally have taken any smart risks that have lead
               | you to financial success, or whether you just like
               | sniping on HN about school shootings and pretending to be
               | superior.
        
               | bobxmax wrote:
               | Lol skipped right to the ad hominem this time huh?
               | 
               | Europeans defending their economy is like republicans
               | defending gun laws... like watching a chicken run around
               | in circles.
        
               | stefan_ wrote:
               | Thats hardly unique to Europeans. Look at UAV regulations
               | in the US - regulated to death based on nothing, leading
               | to a 5 to 10 year technology gap to China, while
               | recreational pilots crash and burn every other week.
        
               | atemerev wrote:
               | The amount of debt you are allowed to take and the
               | abundance of money to invest in new projects are in
               | direct proportion to the competitiveness of the
               | jurisdiction, i.e. business-friendly environment.
               | 
               | EU is not a business-friendly environment.
        
               | kilpikaarna wrote:
               | Most recently, due to ordoliberalism and coat-according-
               | to-cloth morality guiding economic policy rather than
               | money printer go brrr.
               | 
               | Longer term: cultural and language divisions despite
               | attempts at creating a common market, not running the
               | global reserve currency/military hegemony, social
               | democracies encouraging work-life balance over cutthroat
               | careerism, demographic issues, not getting a boost from
               | being the only consumer economy not to be leveled in WW2,
               | etc.
        
               | PeterStuer wrote:
               | Unlike the US, the EU does not have reserve currency
               | privilige, so we can't print enless trillions of paper
               | and force the rest of the world to give us their
               | companies and goods in return for it.
        
           | 0xDEAFBEAD wrote:
           | Honestly the US approach to AI is incredibly irresponsible.
           | As an American, I'm glad that someone somewhere is thinking
           | about regulation. Not sure it will be enough though:
           | https://xcancel.com/ESYudkowsky/status/1922710969785917691#m
        
             | MoonGhost wrote:
             | No, thanks, we don't want to be like EU. Everything
             | regulated to death. They even thought to criminalize street
             | photography because there could be copyrighted materials in
             | the picture. Not sure, are they still taxing Eiffel tower
             | images?
        
               | johnisgood wrote:
               | I thought it is happening in the US, too. I mean, the
               | Government is there to regulate the shit out of
               | everything. Regardless of where you are.
        
               | int_19h wrote:
               | EU is not a monolithic entity, and amount of regulation
               | varies widely. Baltics are very business friendly, for
               | example.
        
               | bobxmax wrote:
               | And Estonia has the most impressive tech ecosystem on the
               | continent while being a soviet backwater 20 years ago.
               | Shocking how that works.
        
             | msgodel wrote:
             | There's nothing the regulation could meaningfully hope to
             | accomplish other than slow down people willing to play by
             | the rules.
        
               | ambicapter wrote:
               | Wow, the "criminals don't follow laws therefore laws are
               | worthless" argument, here? In my HN?
        
               | msgodel wrote:
               | Usually it's possible to actually detect crime (in fact
               | it's usually hard to ignore.) That's not the case with
               | AI.
        
           | Mistletoe wrote:
           | This is why I want to move to the EU. I don't care if
           | companies aren't coddled there. I want to live where people
           | are the first priority.
        
             | atemerev wrote:
             | Well, are you ready to live on a low middle class salary of
             | a European software engineer? It is really low middle
             | class. The middle middle here would be a bank clerk, and
             | upper middle -- a lawyer or a surgeon.
             | 
             | This is not coincidental.
        
               | baq wrote:
               | Incidentally (also not) surgeons and lawyers are not poor
               | in the states either... it's just Silicon Valley was the
               | perfect place with just the right people and it kept
               | growing for 60 years straight. Surgery and law do not
               | grow exponentially. (I'll pretend the pages of regulation
               | aren't supposed to count.)
        
         | mrtksn wrote:
         | Europe isn't going to catch up in tech as long as its market is
         | open to US tech giants. Tech doesn't have marginal costs, so
         | you want to have one of it in one place and sell it everywhere
         | and when the infra and talent is already in US, EU tech is
         | destined to do niche products.
         | 
         | UK has a bit of it, France has some and that's it. The only
         | viable alternatives are countries who have issues with US and
         | that is China and Russia. China have come up with strong
         | competitors and it is on cutting edge.
         | 
         | Also, it doesn't have anything to do with regulations. 50 US
         | States have the American regulations, its all happening in 1
         | and some other states happen to host some infrastructure but
         | that's true for rest of of the world too.
         | 
         | If the EU/US relationship gets to Trump/Musk level, then EU can
         | have the cutting edge stuff.
         | 
         | Most influential AI researchers are from Europe(inc. UK),
         | Israel and Canada anyway. Ilya Sutskever just the other day
         | gave speech at his alma matter @Canada for example. Andrej
         | Karpathy is Slovakian. Lot's of Brits, French, Polish, Chinese,
         | German etc. are among the pioneers. Significant portion of the
         | talent is non-American already, they just need a reason to be
         | somewhere else than US to have it outside the US. Chinese got
         | their reason and with the state of the affairs in the world I
         | wouldn't be surprised if Europeans gets theirs in less than 3
         | and a half years.
        
           | vikramkr wrote:
           | If you close off the market to US tech giants, maybe they'll
           | have some amount of market dominance at home, but I would
           | doubt that would mean they've "caught up" tech wise. There
           | would be no incentive to compete. American EV manufacturing
           | is pretty far behind Chinese EV manufacturing, protectionism
           | didn't help make a competitive car, it just protected the
           | home market while slowly ceding international market after
           | international market
        
             | saubeidl wrote:
             | As a counterexample, China's tech industry has caught up
             | and in some ways surpassed the US, partially due to being
             | closed off.
        
               | hshdhdhj4444 wrote:
               | But also due to the U.S. driving away smart people from
               | the U.S. to China.
        
               | csomar wrote:
               | > As a counterexample, China's tech industry has caught
               | up and in some ways surpassed the US, partially due to
               | being closed off.
               | 
               | How did you come up to that conclusion? We don't have
               | access to an alternate universe where the Chinese tech
               | market was open. There is a real possibility that it
               | would have been far ahead had it been open.
        
               | yorwba wrote:
               | We do have access to records from the before times when
               | the internet was wide open and Facebook, Google and
               | Microsoft were big in China. Well, Microsoft is still big
               | because they're not an internet company and unfazed by
               | censorship, but the exit of Google and Facebook took a
               | lot of pressure off Baidu and the entire Chinese social
               | media ecosystem.
        
               | mitthrowaway2 wrote:
               | I think there's a few more important reasons beyond being
               | closed off:
               | 
               | - Regulatory friendliness (eg. DJI)
               | 
               | - Non-enforcement of foreign patents (eg. LiFePO4
               | batteries)
               | 
               | - Technology transfer through partnerships with domestic
               | firms
               | 
               | - Government support for industries deemed to be in the
               | national interest
        
             | mrtksn wrote:
             | I agree, protectionism is bad most of the time but it has
             | its place. It is bad when you are ahead, it is useful when
             | you are behind(You want them to be exposed to the cutting
             | edge market but before that you want them to be able to
             | exist in first place even if they are not the best at this
             | very moment).
             | 
             | China's EV dominance is a result of local governments
             | investing and buying from local businesses.
             | 
             | It would be the same with Russia&China. They will receive
             | money from the governments and will sell to local buyers
             | and will aim to expand to foreign markets.
             | 
             | As I said, most AI talent is not American but it is
             | concentrated there. Give them a reason to be somewhere
             | else, some will be somewhere else.
        
             | littlestymaar wrote:
             | > There would be no incentive to compete.
             | 
             | Why not ? First of all there would be plenty of incentives
             | for EU companies to compete with one another (and plenty of
             | capital flowing to them as the European market is big
             | enough), then there would be competition with US actors in
             | the rest of the world. That's exactly how the Asian
             | economic model has been built: Japan, Taiwan, South Korea
             | all have used protectionism + export-based subsidies to
             | create market leaders in all kind of domains (from car
             | manufacturing to electronics and shipbuilding).
        
             | chairmansteve wrote:
             | China is an example of protectionism working. The world is
             | not governed by simple rules.
        
             | foolswisdom wrote:
             | The solution to that would be to force companies within the
             | EU market to compete with each other (fair competition
             | laws), just that idea is less popular than the first winner
             | in a market ensuring they stay dominant (because it serves
             | the interest of those who just got power). Same reason why
             | big tech rules EU in the first place.
        
           | iwontberude wrote:
           | Which Trump/Musk level? There have been so many.
        
           | Iulioh wrote:
           | The problem is, CONSUMER level tech
           | 
           | The EU is doing a lot of enterprise level shit and it's great
           | 
           | The biggest company in Europe sells B2B software (SAP)
        
             | mrtksn wrote:
             | One swallow does not make a summer, all the major platforms
             | are American and that's where Europe lags. I agree that
             | Europe does have some great tech but they are all niche.
             | Europe also have some great consumer tech products but they
             | are all dependent on American platforms. For example some
             | of the best games are French, Polish, Bulgarian, Ukrainian
             | etc. but they all depend on Steam or Apple App Store and
             | have to go by their rules and pay them a significant
             | commission.
        
             | csomar wrote:
             | That's a single company and I'd not call that great.
        
             | PeterStuer wrote:
             | SAP sells B2B software, but most of their income is from
             | consultancy and training.
        
           | ascorbic wrote:
           | It's mostly about money. DeepMind was founded in the UK, and
           | is still based in London, but there was no way it could get
           | the funding it needed without selling to Google or some other
           | US company. China is one of the few other countries that can
           | afford to fund that kind of thing.
        
           | simianwords wrote:
           | How can you explain Israel?
        
         | funnym0nk3y wrote:
         | Thought so too. I don't know how it could be different though.
         | They are competing against behemoths like OpenAI or Google, but
         | have only 200 people. Even Anthropic has over 1000 people.
         | DeepSeek has less than 200 people so the comparison seems fair.
        
           | rsanek wrote:
           | any claim from the deepseek folks should be considered with
           | wide margins of error.
        
             | humpty-d wrote:
             | I know we distrust them on account of being nefarious
             | Chinese, but has anything come to light with R1 or the
             | people behind it specifically to justify this?
        
         | jasonthorsness wrote:
         | Even if it isn't as capable, having a model with control over
         | training is probably strategically important for every major
         | region of the world. But it could only fall so far behind
         | before it effectively doesn't work in the eyes of the users.
        
         | melicerte wrote:
         | If you look at Mistral investors[0], you will quickly
         | understand that Mistral is far from being European. My
         | understanding is it is mainly owned by US companies with a few
         | other companies from EU and other places in the world.
         | 
         | [0] https://tracxn.com/d/companies/mistral-
         | ai/__SLZq7rzxLYqqA97j... (edited for typo)
        
           | pdabbadabba wrote:
           | For the purposes of GP's comment, I think the nationalities
           | of the people actually running the company and doing the work
           | are more relevant than who has invested.
        
             | derektank wrote:
             | And, perhaps most relevantly, the regulatory environment
             | the people are working in. French people working in America
             | are probably more productive than French people working in
             | France (if for no other reason because they probably work
             | more hours in America than France).
        
               | 8n4vidtmkvmk wrote:
               | Are we sure more time butt in office equates to more
               | productivity?
        
               | meta_ai_x wrote:
               | Yes, especially in cutting edge research areas where
               | other high functioning people with high energy isarelso
               | there.
               | 
               | You can write your in-house CRUD app in your basement or
               | your office and it doesn't matter.
               | 
               | The vast majority of HN crowd and general
               | social/mainstream media don't make the difference between
               | these two scenarios
        
               | 1propionyl wrote:
               | Yes, specifically when it comes to open-ended research or
               | development, collocation is non-negotiable. There are
               | greater than linear benefits in creativity of approach,
               | agility in adapting to new intermediate discoveries, etc
               | that you get by putting a number of talented people who
               | get along in the same space who form a community of
               | practice.
               | 
               | Remote work and flattening communication down to what
               | digital media (Slack, Zoom, etc) afford strangle the
               | beneficial network effects.
        
               | throwaway0123_5 wrote:
               | I think they were talking about total time spent working
               | rather than remote vs. in-person. I've seen more than a
               | few studies over the years showing that going from 40 to
               | 35 or 30 hours/wk has minimal or positive impacts on
               | productivity. Idk if that would apply to all work
               | environments though, and I don't recall any of the
               | studies being about research productivity specifically.
        
               | distortionfield wrote:
               | You're being downvoted but you're right. The number of
               | people who act like a web cam reproduces the in person
               | experience perfectly, for good and bad, is hilarious to
               | me.
        
               | alienbaby wrote:
               | I think the mistake people make is believing that one
               | approach is best for all. Diffferent people work most
               | effectively in different ways.
        
               | numpad0 wrote:
               | I think maybe we should completely switch to admitting
               | this. Every extra second you sit in the (home)office adds
               | to productivity, just not necessarily converting into
               | market values, that can be inflated with hype. Also
               | longer hours is not necessarily safe or sustainable.
               | 
               | We only wish more time != more productivity because it's
               | inconvenient in multiple ways if it were. We imagine a
               | multiplier in there to balance the equation, such factor
               | that can completely negate production, using mere
               | anecdotal experiences as proofs.
               | 
               | Maybe that's not scientific, maybe time spent very
               | closely match productivity, and maybe production as well
               | as productivity need external, artificial regulations.
        
               | mschild wrote:
               | > Every extra second you sit in the (home)office adds to
               | productivity
               | 
               | I'm not sure I believe that. I think at some point the
               | additional hours worked will ultimately decrease the
               | output/unit of time and at some point that you'll reach a
               | peak whereafter every hour worked extra will lead to an
               | overall productivity loss.
               | 
               | Its also something that I think is extremely hard to
               | consistently measure, especially for your typical office
               | worker.
        
               | adventured wrote:
               | $89,000 GDP per capita vs $46,000 rather proves the point
               | about productivity per butt. US office workers are
               | extraordinarily productive in terms of what their work
               | generates (thanks to numerous well understood things like
               | the outsized US scaling abilities). Measuring beyond that
               | is very difficult due to the variance of every business.
        
               | cataphract wrote:
               | A part of that figure is an artifact of how strong the
               | dollar is though.
        
               | palata wrote:
               | > $89,000 GDP per capita vs $46,000 rather proves the
               | point about productivity per butt.
               | 
               | So if I work 24h/day in a farm in Afghanistan, I should
               | earn more than software developers in the Silicon Valley
               | (because I'm pretty sure that they sleep)? Is that how
               | you say GDP works?
        
               | vasco wrote:
               | Most measures of productivity have "hours worked" in the
               | denominator so that can't be right.
        
               | underdeserver wrote:
               | If I work 1000 hours and you work 2000 hours in the same
               | timeframe, but you outcompeted me and created 3x value,
               | you are 1.5 times more productive.
               | 
               | There's a numerator too.
        
               | vasco wrote:
               | How does the same exact person get more productive? You
               | forgot the example I replied to? The only thing that
               | changed were hours worked. In your example you change it
               | to less hours worked with more output. You made it
               | circular.
        
               | underdeserver wrote:
               | You can be more productive just because you're faster.
               | 
               | Magistral is amazingly impressive compared to ChatGPT
               | 3.5. If it had come out two years ago we'd be saying
               | Mistral is the clear leader. But it came out now.
               | 
               | Not saying they worked fewer hours, just that speed
               | matters, and in some cases, up to a limit, working more
               | hours gets your work done faster.
        
               | whiplash451 wrote:
               | > they probably work more hours in America than France
               | 
               | Not sure that's even true. Mistral is known to be a
               | _really_ hard-working place
        
               | gwervc wrote:
               | I'm pretty sure there is way less regulations in the US
               | in respect to France where going over the legal 35h/week
               | requires additional capital and legal paperwork.
        
               | retinaros wrote:
               | No one works 35hours in software jobs in france except
               | maybe government. Overtime is also not compensated (they
               | give some days off that is it.)
        
               | psalaun wrote:
               | Even in government; I've worked 50+ hours weeks working
               | for the healthcare branch of the providence state, with a
               | classic 39h/w contract. No compensation of any sort,
               | despite having timesheets.
               | 
               | There are a lot of myths about French worker. Our
               | lifelong worked hours is not exceptional; our
               | productivity is also not exceptional.
        
               | greenavocado wrote:
               | Pointless suffering. Report violations to the CSE,
               | Medecin du Travail, and Inspection du Travail.
        
               | psalaun wrote:
               | It was a choice, I loved my job there. I had more
               | exciting projects than most of my friends in the private
               | sector!
        
               | Saline9515 wrote:
               | Excellent way to get blacklisted and never work for the
               | State again if you're a contractor, or end up in a low
               | impact, boring job if you're a career worker.
        
               | algoghostf wrote:
               | This is not true. Government workers or factory workers
               | can limit to 35h (with some salary loss or days off
               | loss), but else than that (especially in tech) it is very
               | competitive and working 50 hours+/week is not exceptionl.
        
               | kgwgk wrote:
               | > 50 hours+/week is not exceptionl.
               | 
               | https://www.legifrance.gouv.fr/codes/article_lc/LEGIARTI0
               | 000...
               | 
               | Au cours d'une meme semaine, la duree maximale
               | hebdomadaire de travail est de quarante-huit heures.
               | 
               | https://www.legifrance.gouv.fr/codes/article_lc/LEGIARTI0
               | 000...
               | 
               | La duree hebdomadaire de travail calculee sur une periode
               | quelconque de douze semaines consecutives ne peut
               | depasser quarante-quatre heures, sauf dans les cas prevus
               | aux articles L. 3121-23 a L. 3121-25.
        
               | Saline9515 wrote:
               | Everyone is "forfait cadre", which allow them to work
               | with no practical time limit since they don't log their
               | time spent at work. https://www.service-
               | public.fr/particuliers/vosdroits/F19261
        
               | kgwgk wrote:
               | It seems that 20% of employees in the private sector are
               | "cadres" and half of them are on "forfait jours". That
               | makes around 10% of the private sector employees working
               | 218 days per year without the standard hourly limits.
               | It's more than I thought but I doubt that many of them
               | work more than 10 hours per day. Whether that's
               | "exceptional" or not is a matter of definition, of
               | course.
        
               | greenavocado wrote:
               | In the USA most software engineers are FLSA-exempt
               | ("computer employee" exemption).
               | 
               | No overtime pay regardless of hours worked.
               | 
               | No legal maximum hours per day/week.
               | 
               | No mandatory rest periods/breaks (federally).
               | 
               | The US approach places the burden on the individual
               | employee to negotiate protections or prove
               | misclassification, while French law places the burden on
               | the employer to comply with strict, state-enforced
               | standards.
               | 
               | The French Labor Code (Code du travail) applies to
               | virtually all employees in France, regardless of sector
               | (private tech company, government agency, non-profit,
               | etc.), unless explicitly exempted. Software engineering
               | is not an exempted profession. Maximum hour limits are
               | absolute. The caps of 44 hours per week, 48 hours average
               | over 12 weeks, and 10/12 hours per day are legal maximums
               | for almost all employees. Tech companies cannot simply
               | ignore them. The requirements for employee consent,
               | strict annual limits (usually max 220 hours/year),
               | premium pay (+25%/+50%), and compensatory rest apply to
               | software engineers just like any other employee.
               | 
               | "Cadre" Status is not an exemption. Many software
               | engineers are classified as Cadres
               | (managers/professionals) but this status does not
               | automatically exempt them from working time rules.
               | 
               | Cadre au forfait jours (Days-Based Framework): This is
               | common for senior engineers/managers. They are exempt
               | from tracking daily/weekly hours but must still have a
               | maximum of 218 work days per year (including weekends,
               | holidays, and RTT days). Their annual workload must not
               | endanger their health. 80-hour weeks would obliterate
               | this rest requirement and pose severe health risks,
               | making it illegal. Employers must monitor their workload
               | and health.
               | 
               | Cadre au forfait heures (Hours-Based Framework) or Non-
               | Cadre: These employees are fully subject to the standard
               | daily/weekly/hourly limits and overtime rules. 80+
               | hours/week is blatantly illegal.
               | 
               | The tech industry, especially gaming/startups, sometimes
               | tries to import unsustainable "crunch" cultures. This is
               | illegal in France.
               | 
               | EDIT: Fixed work days
        
               | kgwgk wrote:
               | > 218 rest days per year (including weekends, holidays,
               | and RTT days)
               | 
               | Wouldn't that be nice, 218 rest days? It's 218 working
               | days.
        
               | Saline9515 wrote:
               | Some State services, such as the "Tresor", which oversees
               | French economic policies, do not respect this at all, and
               | require 12h work days most of the year. The churn is
               | enormous, workers staying there less than a year on
               | average.
        
               | Saline9515 wrote:
               | In France most white collar jobs are categorized as
               | "management" ("cadre"), and they have no time limit. It
               | is very common for workers to clock 12h days in
               | consultancies (10am-10pm) and in state administrations,
               | for instance.
        
               | retinaros wrote:
               | Most of french people in engineering jobs in France are
               | working late even tho overtime is never paid.
        
               | Disposal8433 wrote:
               | In the USA they have the famous 9 to 5. Most developers'
               | jobs in France are "9 to 6 with 2 hours to eat in the
               | middle and unpaid overtime," so I would say both
               | countries are equivalent.
        
               | psalaun wrote:
               | In parisian startups it's more 9 to 7 with 30 min lunch
               | breaks.
        
               | chairmansteve wrote:
               | Spoken like a guy who's never been to France.
               | 
               | Classic drive by internet trope.
               | 
               | Maybe try a little harder, have an informed opinion about
               | something.
        
               | epolanski wrote:
               | This is beyond ignorant and completely clueless.
               | 
               | People in startups and hard research work extremely hard
               | everywhere, and Mistral is even so more notorious for
               | being a tough place to survive.
               | 
               | You think that European founders and researchers are like
               | "nah, you know what, we're European, we're not ambitious,
               | we don't want to make money, to hell with equity"?
               | 
               | Also, just to point out, I've worked in research, and I
               | can tell you 100% that I've never ever seen anybody more
               | dedicated and hardworking than people from China/South
               | Korea and Japan. I'm talking sleeping bags in the office
               | kind of people.
               | 
               | And yet, that just does not translate in better results.
               | More results, which is important too sometimes, yes,
               | better, more relevant, higher quality? No no and no.
        
           | kergonath wrote:
           | It's a French company, subject to French laws and European
           | regulations. That's what matters, from a user point of view.
        
         | littlestymaar wrote:
         | > Benchmarks suggest this model loses to Deepseek-R1 in every
         | one-shot comparison.
         | 
         | That's not particularly surprising though as the Medium variant
         | is likely close to ten times smaller than DeepSeek-R1 (granted
         | it's a dense model and not an MoE, but still).
        
         | fiatjaf wrote:
         | This reads like an AI-generated comment. What do you mean by
         | "benchmarks suggest"? The benchmarks are very clear and
         | presented right there in the page.
        
         | tootie wrote:
         | As an occasional user of Mistral, I find their model to give
         | generally excellent results and pretty quickly. I think a lot
         | of teams are now overly focused on winning the benchmarks while
         | producing worse real results.
        
           | esafak wrote:
           | If so we need to fix the benchmarks.
        
             | paulddraper wrote:
             | https://en.wikipedia.org/wiki/Goodhart%27s_law
        
             | riku_iki wrote:
             | those who try to fix them are fighting alone against huge
             | corps which try to abuse them..
        
             | tootie wrote:
             | I think there's a fundamental limit to benchmarks when it
             | comes to real-world utility. The best option would be more
             | like a user survey.
        
               | esafak wrote:
               | That's Chatbot Arena: https://lmarena.ai/leaderboard
        
         | segmondy wrote:
         | are you really going to compare a 24B model to a 700B+ model?
        
           | a2128 wrote:
           | 24B is the size of the Small opensourced model. The Medium
           | model is bigger (they don't seem to disclose its size) and
           | still gets beaten by Deepseek R1
        
             | thot_experiment wrote:
             | Mistral Large is 123b so one can probably assume that
             | medium is between 24b and 123b, also Mistral 3.1 is by a
             | wide margin my go-to model in real life situations.
             | Benchmarks absolutely don't tell the whole story, and
             | different models have different use cases.
        
               | Ringz wrote:
               | Can you please explain what your ,,real life situations"
               | are?
        
               | thot_experiment wrote:
               | I use it as a personal assistant (so tool use integrated
               | into calendar/todo/notes etc) often times using the
               | multimodal aspect (taking a photo of a todo list, asking
               | it to remind me to buy something from a picture). I also
               | use it as a code completion tool in vscode, as well as a
               | replacement for most basic google searches ("how does
               | this syntax work", "what's the torch method for X")
               | 
               | I use it for almost every interaction I have with AI that
               | isn't asking it to oneshot complex code. I fairly
               | frequently run my prompts against Claude/ChatGPT and
               | Mistral 3.1 and find that for most things they're not
               | meaningfully different.
               | 
               | I also spend a lot of time playing around with it for
               | storytelling/integration into narrative games.
        
               | mandelken wrote:
               | Cool. What framework or program do you use to orchestrate
               | this?
        
               | ohso4 wrote:
               | It's a 70b model, Medium 2 was 70b.
               | 
               | https://xcancel.com/arthurmensch/status/19201368714614336
               | 20#...
        
           | moffkalast wrote:
           | The most important company is to is to QwQ at 30B sjnce it's
           | still the best local reasoning model for that size. A
           | comparison that Mistral did not run for some reason, not even
           | with Qwen3.
        
         | hmottestad wrote:
         | With how amazing the first R1 model was and how little compute
         | they needed to create it, I'm really wondering how the new R1
         | model isn't beating o3 and 2.5 Pro on every single benchmark.
         | 
         | Magistral Small is only 24B and scores 70.7% on AIME2024 while
         | the 32B distill of R1 scores 72.6%. And with majority voting
         | @64 the Magistral Small manages 83.3%, which is better than the
         | full R1. Since I can run a 24B model on a regular gaming GPU
         | it's a lot more accessible than the full blown R1.
         | 
         | https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-...
        
           | adventured wrote:
           | It's because DeepSeek was a fast copy. That was the easy part
           | and it's why they didn't have to use so much compute to get
           | near the top. Going well beyond o3 or 2.5 Pro is drastically
           | more expensive than fast copy. China's cultural approach to
           | building substantial things produces this sort of outcome
           | regularly, you see the same approach in automobiles, planes,
           | Internet services, industrial machinery, military, et al.
           | Innovation is very expensive and time consuming, fast copy is
           | more often very inexpensive and rapid. 85% good enough is
           | often good enough, that additional 10-15% is comically
           | expensive and difficult as you climb.
        
             | MaxPock wrote:
             | I understand that the French are very innovative so why
             | isn't their model SOTA ?
        
             | natrys wrote:
             | Not disagreeing with the overarching point but:
             | 
             | > That was the easy part
             | 
             | Is a bit hand-wavy in that it doesn't explain why it's only
             | DeepSeek who can do this "easy" thing, but still not Meta,
             | Mistral or anyone else really. There are many other players
             | who have way more compute than DeepSeek (even inside China,
             | not even considering rest of the world), and I can assure
             | you more or less everyone trains on synthetic
             | data/distillation from whatever bigger model they can
             | access.
        
         | epolanski wrote:
         | Jm2c but I feel conflicted about this arms race.
         | 
         | You can be 6/12 months later, and have not burned tens of
         | billions compared to the best in class, I see it an engineering
         | win.
         | 
         | I absolutely understand those that say "yeah, but customers
         | will only use the best", I see it, but is market share of
         | forever money losing businesses that valuable?
        
           | adventured wrote:
           | A similar sentiment existed for a long time about Uber and
           | now they're very profitable and own their market. It was
           | worth the burn to capture the market. Who says OpenAI can't
           | roll over to profitable at a stable scale? Conquer the
           | market, hike the price to $29.95 (family account, no ads;
           | $19.95 individual account with ads; etc etc). To say nothing
           | of how they can branch out in terms of being the interaction
           | point that replaces the search box. The advertising value of
           | owning the land that OpenAI is taking is well over $100
           | billion in annual revenue. Amazon's retail business is
           | terrible, their ad business is fantastic. As OpenAI bolts on
           | an ad product their margin potential will skyrocket and the
           | cost side will be modest in comparison.
           | 
           | Over the coming years it won't be possible to stay a mere
           | 6-12 months behind as the costs to build and maintain the AI
           | super-infrastructure keeps climbing. It'll become a
           | guaranteed implosion scenario. Winning will provide the
           | ongoing immense resources needed to keep pushing up the hill
           | forever. Everybody else - except a few - will fall away. The
           | same outcome took place in search. Anybody spot Lycos,
           | Excite, Hotbot, AltaVista around? It costs an enormous amount
           | of money to try to keep up with Google (Bing, Baidu, Yandex)
           | in search and scale it. This will be an even more brutal
           | example of that, as the costs are even higher to scale.
           | 
           | The only way Mistral survives is if they're heavily
           | subsidized directly by European states.
        
             | aDyslecticCrow wrote:
             | > It was worth the burn to capture the market.
             | 
             | You cannot compare Uber to the AI market. They are too
             | different. Uber captured the market because having three
             | taxi services is annoying. But people are readily jumping
             | between models using multi-model platforms. And nobody is
             | significantly ahead of the pack. There is nothing that sets
             | anyone apart aside from the rate at which they are burning
             | capital. Any advantage is closed within a year.
             | 
             | If OpenAI wants to make a profit, it will raise prices and
             | be dropped at a heartbeat for the next cheapest option.
             | Most software stacks are designed to be model-agnostic,
             | making integration or support a non-factor.
        
               | whiplash451 wrote:
               | Three cab apps are a lot less annoying than three LLM
               | apps each having their piece of your chats history.
               | 
               | The winner-take-all effect is a lot stronger with chat
               | apps.
        
               | snoman wrote:
               | That's the exact opposite of the way it is right now (at
               | least for me). I don't like having multiple ride hailing
               | apps but easily have ChatGPT, Claude, Gemini on my phone
               | (and local LLM at home). There is zero effort cost to go
               | from one to the other.
        
           | louiskottmann wrote:
           | Indeed, and with the technology plateau-ing, being 6-12
           | months late with less debt is just long term thinking.
           | 
           | Also, Europe being in the race is a big deal for consumers.
        
             | adventured wrote:
             | Why would the debt matter when you have $60 billion in ad
             | revenue and are generating $20 billion in op income? That's
             | OpenAI 5-7 years from now, if they're able to maintain
             | their position with consumers. Once they attach an ad
             | product their margins will rapidly soar due to the
             | comparatively low cost of the ad segment.
             | 
             | The technology is closer to a decade from seeing a plateau
             | for the large general models. GPT o3 is significantly
             | beyond o1 (much less 3.5 which was just Nov 2022). Claude 4
             | is significantly beyond 3.5. They're not subtle
             | improvements. And most likely there will be a splintering
             | of specialization that will see huge leaps outside the
             | large general models. The radical leap in coding
             | capabilities over the past 12-18 months is just an early
             | example of how that will work, and it will affect every
             | segment of human endeavour.
        
               | aDyslecticCrow wrote:
               | > Once they attach an ad product their margins will
               | rapidly soar due to the comparatively low cost of the ad
               | segment.
               | 
               | They're burning through computers and capital. No amount
               | of advertising could cover the cost of training or even
               | running these models. The massive subscription costs
               | we've started seeing are just a small glimpse into the
               | money they are burning through.
               | 
               | They will NOT make a profit using the current methods
               | unless the models become at least 10 times more efficient
               | than they are now. At which point can Europe adapt to the
               | innovation without much cost.
               | 
               | It's an arms race to see who can burn the most money the
               | fastest, while selling the result for as little as
               | possible. When they need to start making money, it will
               | all come crashing down.
        
             | ACCount36 wrote:
             | >with the technology plateau-ing
             | 
             | People were claiming that since year 2022. Where's the
             | plateau?
        
             | sisve wrote:
             | Being the best European AI company is also a multi billion
             | business. Its not like China or the US respects GDPR. A lot
             | of companies will choose the best European company.
        
         | wafngar wrote:
         | But they have built a fully "independent" pipeline. Deepseek
         | and others probably trained in gpt4, o1 or whatever data.
        
       | bee_rider wrote:
       | How many other open-weights reasoning models are there?
       | 
       | Is it possible to run multiple reasoning models on one problem?
       | (Why not? I guess).
       | 
       | Another funny thought is: they release their Small model, and
       | kept their Medium as a premium service. I wonder if you could do
       | chains with Medium run occasionally, linked together by local
       | runs of Small?
        
         | simonw wrote:
         | Qwen 3 and DeepSeek R1 and Phi-4 Reasoning are the best open
         | weights reasoning models I know of.
        
         | ls612 wrote:
         | Just Deepseek I think and there are distillations of that that
         | can run on consumer hardware if you really want.
        
       | atemerev wrote:
       | So, worse than R1, and only 24B version is open weights? NGMI. R1
       | is awesome, and full 630B version is open.
        
       | nake13 wrote:
       | The Magistral Small can fit within a single RTX 4090 or a 32GB
       | RAM MacBook once quantized.
        
         | the_sleaze_ wrote:
         | Excellent news for me.
         | 
         | How does one figure this out? As in I want to know the
         | comparable Deepseek or Llama equivalent (size-wise) and don't
         | want to figure it out by trial and error.
        
         | lolive wrote:
         | Is it indeed the plan of Apple to eventually run such kind of
         | models direcly inside a iPhone? Or are the specs of any
         | stateOfTheArt smartphone well below the minimum requirements of
         | such "lightweight" models?
        
       | awongh wrote:
       | Interesting that their niche seems to be small parameter models.
        
       | arnaudsm wrote:
       | I wished the charts included Qwen3, the current SOTA in
       | reasoning.
       | 
       | Qwen3-4B almost beats Magistral-22B on the 4 available
       | benchmarks, and Qwen3-30B-A3B is miles ahead.
        
         | resource_waste wrote:
         | No surprise on my end. Mistral has been basically useless due
         | to other models always being better.
         | 
         | But its European, so its a point of pride.
         | 
         | Relevance or not, we will keep hearing the name as a result.
        
         | SparkyMcUnicorn wrote:
         | 30-A3B is a really impressive model.
         | 
         | I throw tasks at it running locally to save on API costs, and
         | it's possibly better than anything we had a year or so ago from
         | closed source providers. For programming tasks, I'd rank it
         | higher than gpt-4o
        
         | poorman wrote:
         | Is there a popular benchmark site people use? Becaues I had to
         | test all these by hand and `Qwen3-30B-A3B` still seems like the
         | best model I can run in that relative parameter space (/memory
         | requirements).
        
           | arnaudsm wrote:
           | - https://livebench.ai/#/ + AIME + LiveCodeBench for
           | reasoning
           | 
           | - MMLU-Pro for knowledge
           | 
           | - https://lmarena.ai/leaderboard for user preference
           | 
           | We only got Magistral's GPQA, AIME & livecodebench so far.
        
         | devmor wrote:
         | I would agree, Qwen3 is definitely the most impressive
         | "reasoning" model I've evaluated so far.
        
       | 5mv2 wrote:
       | The featured accuracy benchmarks exclude every model that matter
       | except DeepSeek, which is quite telling about this new model's
       | performance.
       | 
       | This makes it yet another example of European companies building
       | great products but fumbling marketing.
       | 
       | Mistral's edge is speed. It's a real pleasure to use because it
       | answers in ~1s what takes other models 5-8s, which makes for a
       | much better experience. But instead of focusing on it, they bury
       | it far down the post.
       | 
       | Try it and see if you like the speed! Note that the speed
       | advantage only applies to queries that don't require web-search,
       | as Mistral is significantly slower on this one, leading to a ~5
       | seconds advantage over 2 minutes of research for the queries I
       | benchmarked with Grok.
        
         | funnym0nk3y wrote:
         | That is reasonable though. Comparing the product of a small
         | company with little resources with giants like Google and
         | OpenAI in a field where most advances are due to more and more
         | expensive models is nonsense.
        
           | 5mv2 wrote:
           | The point I was trying to express is that Mistral is arguably
           | far superior to the giants if you care about speed! So I
           | wished they communicated this more clearly.
        
         | dominicrose wrote:
         | How would you use a fast AI?
         | 
         | My current use of AI is to generate code - or translate some
         | code from a programming language to another - which I can then
         | improve (instead of writing it from stratch). Speed isn't
         | necessary for this. It's a nice-to-have but only if it's not at
         | the cost of quality.
         | 
         | Also, as unfair as it "might" be, we do expect a fast AI not to
         | be as good, don't we? So I wouldn't focus on that in the
         | marketing. I think speed would be easier to sell as something
         | extra you would pay for, because then you'd expect the quality
         | to remain the same or better.
        
           | redavni wrote:
           | analyzing and modifying a user interface in realtime?
        
       | epic9x wrote:
       | This thing is crazy fast.
        
         | smeeth wrote:
         | They have a deal with Cerebras for inference.
         | 
         | https://www.cerebras.ai/blog/mistral-le-chat
        
           | swah wrote:
           | For me this is more important than quality. I love fast
           | responses, feels more futuristic.
        
       | rafram wrote:
       | Is the number of em-dashes in this marketing copy indicative of
       | the kind of output that the model produces? If so, might want to
       | tone it down a bit.
        
         | ModernMech wrote:
         | But the em dashes -- if appreciated -- are delightfully
         | eccentric and whimsical!
        
         | tiahura wrote:
         | Unless you're a lawyer. We love 'em.
        
           | NicuCalcea wrote:
           | As a journalist, same!
        
             | lee-rhapsody wrote:
             | Also a journalist. I use em-dashes all the time
        
               | Gregaros wrote:
               | Really anyone that writes for a living. I have a referee
               | report on a paper asking me to correct something to be an
               | em-dash.
        
         | johnisgood wrote:
         | I do not know but sometimes when I type "-" and press space,
         | LibreOffice converts it to an em-dash. I get rid of it so
         | people won't confuse me with an LLM.
        
         | sebmellen wrote:
         | > _Our early tests indicated that Magistral is an excellent
         | creative companion. We highly recommend it for creative writing
         | and storytelling, with the model capable of producing coherent
         | or -- if needed -- delightfully eccentric copy._
        
         | kobe_bryant wrote:
         | it's bizarre.
         | 
         | the first sentence is "Announcing Magistral -- the first
         | reasoning model by Mistral AI -- excelling in domain-specific,
         | transparent, and multilingual reasoning." and those should
         | clearly be comma
         | 
         | and this sentence is just flat out wrong "Lack of specialized
         | depth needed for domain-specific problems, limited
         | transparency, and inconsistent reasoning in the desired
         | language -- are just some of the known limitations of early
         | thinking models."
        
           | umbra07 wrote:
           | really? i would have written it the exact same way (with
           | dashes instead of commas).
        
             | rafram wrote:
             | The second one is unambiguously wrong. The first just looks
             | kind of weird.
        
         | saratogacx wrote:
         | That is just Mistral's market style. You see it on a lot of
         | their pages. The model output doesn't share the same love for
         | the long dash.
        
         | cAtte_ wrote:
         | 49 em-dashes, 59 commas. that's a crazy ratio
        
       | christianqchung wrote:
       | I don't understand why the benchmark selections are so scattered
       | and limited. It only compares Magistral Medium with Deepseek V3,
       | R1, and the other close weighted Mistral Medium 3. Why did they
       | leave off Magistral Small entirely, alongside comparisons with
       | Alibaba Qwen or the mini versions of o3 and o4?
        
       | diggan wrote:
       | The only mention of tools I could find is this:
       | 
       | > it significantly improves project planning, backend
       | architecture, frontend design, and data engineering through
       | sequenced, multi-step actions involving external tools or API.
       | 
       | I'm guessing this means it was trained with tool calling? And if
       | so, does that mean it does tool calling within the
       | thinking/reasoning, or within the main text? Seems unclear
        
         | simonw wrote:
         | Tool calling isn't enabled in the official Magistral Small GGUF
         | (or the Ollama one) which is sad. Hope they (or someone else)
         | fix that soon.
        
           | NitpickLawyer wrote:
           | They have already released Devstral, which is a tool-specific
           | finetune of the same base model. That works pretty well with
           | cline (even though it was specifically tuned for open-hands).
           | 
           | This would likely be a good model for the "plan" mode in
           | various agentic tools (cline, aider, cursor/windsurf/void,
           | etc). So you'd have a chat in plan mode, then use devstral to
           | actually implement that plan.
        
             | diggan wrote:
             | Devstral is targeting tool use+coding I think, so something
             | like Magistral but also tool calling (during thinking)
             | would be handy too, just for other use cases. But also
             | beneficial in the context of creating plans for Devstral.
        
       | simonw wrote:
       | Here are my notes on trying this out locally via Ollama and via
       | their API (and the llm-mistral plugin) too:
       | https://simonwillison.net/2025/Jun/10/magistral/
        
         | atxtechbro wrote:
         | Hi Simon,
         | 
         | What's the huge difference between the two pelicans riding
         | bicycles? Was one running locally the small version vs the
         | pretty good one running the bigger one thru the API?
         | 
         | Thanks, Morgan
        
           | diggan wrote:
           | Ollama doesn't like proper naming for some reason, so `ollama
           | pull magistral:latest` lands you with the q4_K_M version
           | (currently, subject to change).
           | 
           | Mistral's API defaults to `magistral-medium-2506` right now,
           | which is running with full precision, no quantization.
        
             | samtheprogram wrote:
             | Not only the quantization, but what's available via ollama
             | is magistral-small (for local inference), not the -medium
             | variant.
        
           | simonw wrote:
           | Yes, the bad one was Mistral Small running locally, the
           | better one was Mistral Medium via their API.
        
       | GuinansEyebrows wrote:
       | This doesn't really explain what "reasoning" means in the context
       | of genAI, or how it's done by this product. Are there any good
       | sources to learn more about what "reasoning model" means outside
       | of marketing-speak?
        
         | pier25 wrote:
         | It's pure marketing. See the recent paper by Apple called "The
         | Illusion of Thinking".
         | 
         | https://ml-site.cdn-apple.com/papers/the-illusion-of-thinkin...
        
           | kamranjon wrote:
           | I sort of agree with this, having read the recent Apple paper
           | - but it does show a significant improvement at a certain
           | level of complexity - it's just that it requires quite a few
           | more tokens to achieve that. It could probably be described
           | as a sort of "context" hack because it's basically having a
           | conversation with itself to arrive at a better solution.
           | You're trading performance/time for a bit better quality.
        
           | throwaway314155 wrote:
           | If you read that paper, you'll find a more nuanced take than
           | simply "it's pure marketing"
        
       | skeptrune wrote:
       | Fully open reasoning traces are useful. Happy there is a vendor
       | out there shipping that feature.
        
       | desireco42 wrote:
       | One cool think about this model, that I installed locally is that
       | supports well other languages as well as it should be pleasant
       | conversation partner.
       | 
       | BTW I am personally fan of Mistral, because while it is not the
       | top model, it produces good results and the most important thing
       | is that it is super fast, just go to it's chat and be amazed. It
       | really saves a lot of time to have quick response.
        
       | dwedge wrote:
       | Their OCR model was really well hyped and coincidentally came out
       | at the time I had a batch of 600 page pdfs to OCR. They were all
       | monospace text just for some reason the OCR was missing.
       | 
       | I tried it, 80% of the "text" was recognised as images and output
       | as whitespace so most of it was empty. It was much much worse
       | than tesseract.
       | 
       | A month later I got the bill for that crap and deleted my
       | account.
       | 
       | Maybe this is better but I'm over hype marketing from mistral
        
       | alister wrote:
       | As a quick test of logical reasoning and basic Wikipedia-level
       | knowledge, I asked Mistral AI the following question:
       | 
       | A Brazilian citizen is flying from Sao Paulo to Paris, with a
       | connection in Lisbon. Does he need to clear immigration in Lisbon
       | or in Paris or in both cities or in neither city?
       | 
       | Mistral AI said that "immigration control will only be cleared in
       | Paris," which I think is wrong.
       | 
       | After I pointed it to the Wikipedia article on this topic[1], it
       | corrected itself to say that "immigration control will be cleared
       | in Lisbon, the first point of entry into the Schengen Area."
       | 
       | I tried the same question with Meta AI (Llama 4) and it did much
       | worse: It said that the traveler "wouldn't need to clear
       | immigration in either Lisbon or Paris, given the flight
       | connections are within the Schengen Area", which is completely
       | incorrect.
       | 
       | I'd be interested to hear if other LLMs give a correct answer.
       | 
       | [1] https://en.wikipedia.org/wiki/Schengen_Area#Air_travel
        
         | marsa wrote:
         | doing some reason.. uhh intuitioning i imagine brazil and
         | portugal might have some sort of a visa-free deal going on in
         | which case llama 4 might actually be right here?
        
           | alister wrote:
           | Brazilians don't need a visa for Portugal, France, or any
           | Schengen country. But everybody has to pass through
           | immigration control (at least a passport check even if you
           | don't need a visa) when entering the Schengen zone. My
           | question was which country would that happen in.
        
           | mcintyre1994 wrote:
           | AFAIK Schengen has a common visa policy, so there couldn't be
           | such a deal between Brazil and Portugal. It'd also be
           | extremely surprising if two countries not in a common travel
           | area had a deal where you didn't have to clear customs at
           | all, I suspect that doesn't exist anywhere in the world.
        
         | mcintyre1994 wrote:
         | I think Gemini's answer (2.5 Flash) is impressive
         | 
         | ----
         | 
         | Since both Portugal and France are part of the Schengen Area,
         | and a Brazilian citizen generally does not need a visa for
         | short stays (up to 90 days in any 180-day period) in the
         | Schengen Area, here's how immigration will work:
         | 
         | Lisbon: The Brazilian citizen will need to clear immigration in
         | Lisbon. This is because Lisbon is the first point of entry into
         | the Schengen Area. At this point, their passport will be
         | stamped, and they will be officially admitted into the Schengen
         | Zone.
         | 
         | Paris: Once they have cleared immigration in Lisbon, their
         | flight from Lisbon to Paris is considered a domestic flight
         | within the Schengen Area. Therefore, they will not need to
         | clear immigration again in Paris.
         | 
         | Important Note: While Brazilians currently enjoy visa-free
         | travel, the European Travel Information and Authorization
         | System (ETIAS) is expected to become mandatory by late 2026.
         | Once implemented, Brazilian citizens will need to obtain this
         | electronic authorization before their trip to Europe, even for
         | visa-free stays. However, this is a pre-travel authorization,
         | not a visa in the traditional sense, and the immigration
         | clearance process at the first point of entry would remain the
         | same.
        
       | CobrastanJorji wrote:
       | Etymological fun: both "mistral" and "magistral" mean "masterly."
       | 
       | Mistral comes from Occitan for masterly, although today as far as
       | I know it's only used in English when talking about mediterranean
       | winds.
       | 
       | Magistral is just the adjective form of "magister," so "like a
       | master."
       | 
       | If you want to make a few bucks, maybe look up some more obscure
       | synonyms for masterly and pick up the domain names.
        
       | mark_l_watson wrote:
       | Nice, and I see that Ollama already has the smaller 24B version.
       | I am traveling with just a mobile device so I have to wait to try
       | it, but I have been using their new devstral coding model and it
       | is very useful, given that it is also a locally run model so I
       | looking forward to trying magistral.
        
       ___________________________________________________________________
       (page generated 2025-06-10 23:00 UTC)