[HN Gopher] Magistral -- the first reasoning model by Mistral AI
___________________________________________________________________
Magistral -- the first reasoning model by Mistral AI
Author : meetpateltech
Score : 593 points
Date : 2025-06-10 14:08 UTC (8 hours ago)
(HTM) web link (mistral.ai)
(TXT) w3m dump (mistral.ai)
| cchance wrote:
| Good first shot i guess, but the small ones about as good as v3,
| and the mediums not quite as good as r1... i wonder if that r1 is
| the actual new one or the old one
| hacklas wrote:
| The Deepseek V3 is a model with 671 billion parameters, of
| which 37 billion are active.
|
| Magistral Small is a 24 billion parameter model.
|
| Pretty impressive in terms of efficiency for Mistral.
|
| The size of the Magistral Medium is not publicly available, so
| it is difficult to compare efficiency there.
| kouteiheika wrote:
| > The size of the Magistral Medium is not publicly available,
| so it is difficult to compare efficiency there.
|
| FWIW one of their 70B models has leaked in the past (search
| for "miqu") and rumors at the time were that it was their
| medium model.
| danielhanchen wrote:
| I made some GGUFs for those interested in running them at
| https://huggingface.co/unsloth/Magistral-Small-2506-GGUF
|
| ollama run hf.co/unsloth/Magistral-Small-2506-GGUF:UD-Q4_K_XL
|
| or
|
| ./llama.cpp/llama-cli -hf unsloth/Magistral-
| Small-2506-GGUF:UD-Q4_K_XL --jinja --temp 0.7 --top-k -1 --top-p
| 0.95 -ngl 99
|
| Please use --jinja for llama.cpp and use temperature = 0.7, top-p
| 0.95!
|
| Also best to increase Ollama's context length to say 8K at least:
| OLLAMA_CONTEXT_LENGTH=8192 ollama serve &. Some other details in
| https://docs.unsloth.ai/basics/magistral
| danielhanchen wrote:
| Their paper https://mistral.ai/static/research/magistral.pdf is
| also cool! They edited GRPO via:
|
| 1. Removed KL Divergence
|
| 2. Normalize by total length (Dr. GRPO style)
|
| 3. Minibatch normalization for advantages
|
| 4. Relaxing trust region
| Onavo wrote:
| > _Removed KL Divergence_
|
| Wait, how are they computing the loss?
| danielhanchen wrote:
| Oh it's the KL term sorry - beta * KL ie they set beta to
| 0.
|
| The goal of it was to "force" the model not to stray to far
| away from the original checkpoint, but it can hinder the
| model from learning new things
| mjburgess wrote:
| It's just a penalty term that they delete
| trc001 wrote:
| It's become trendy to delete it. I say trendy because many
| papers delete it without offering any proof that it is
| meaningless
| gyrovagueGeist wrote:
| Does anyone know why they added minibatch advantage
| normalization (or when it can be useful)?
|
| The paper they cite "What matters in on-policy RL" claims it
| does not lead to much difference on their suite of test
| problems, and (mean-of-minibatch)-normalization doesn't seem
| theoretically motivated for convergence to the optimal
| policy?
| cpldcpu wrote:
| But this is just the SFT - "distilled" model, not the one
| optimized with RL, right?
| danielhanchen wrote:
| Oh I think it's SFT + RL as mentioned in the paper - they
| said combining both is actually more performant than just RL
| lxe wrote:
| Thanks for all you do!
| danielhanchen wrote:
| Thanks!
| monkmartinez wrote:
| At the risk of dating myself; Unsloth is the Bomb-dot-com!!! I
| use your models all the time and they just work. Thank you!!!
| What does llama.cpp normally use if not "jinja" for their
| templates?
| ozgune wrote:
| Their benchmarks are interesting. They are comparing to
| DeepSeek-V3's (non-reasoning) December and DeepSeek-R1's
| January releases. I feel that comparing to DeepSeek-R1-0528
| would be more fair.
|
| For example, R1 scores 79.8 on AIME 2024, R1-0528 performs
| 91.4.
|
| R1 scores 70 on AIME 2025, R1-0528 scores 87.5. R1-0528 does
| similarly better for GPQA Diamond, LiveCodeBench, and Aider
| (about 10-15 points higher).
|
| https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
| semi-extrinsic wrote:
| Would also be interesting to compare with R1-0528-Qwen3-8B
| (chain-of-thought distilled from Deepseek-R1-0528 and post-
| trained into Qwen3-8B). It scores 86 and 76 on AIME 2024 and
| 2025 respectively.
|
| Currently running the 6-bit XL quant on a single old RTX 2080
| Ti and I'm quite impressed TBH. Simply wild for a sub-8GB
| download.
| gavi wrote:
| too much thinking
|
| https://gist.github.com/gavi/b9985f730f5deefe49b6a28e5569d46...
| fzzzy wrote:
| My impression from running the first R1 release locally was
| that it also does too much thinking.
| cluckindan wrote:
| It does not do any thinking. It is a statistical model,
| just like the rest of them.
| robmccoll wrote:
| What are we doing when we think?
| LordDragonfang wrote:
| "Thinking" is a term of art referring to the
| hidden/internal output of "reasoning" models where they
| output "chain of thought" before giving an answer[1].
| This technique and name stem from the early observation
| that LLMs do better when explicitly told to "think step
| by step"[2]. Hope that helps clarify things for you for
| future constructive discussion.
|
| [1] https://arxiv.org/html/2410.10630v1
|
| [2] https://arxiv.org/pdf/2205.11916
| bobsomers wrote:
| We are aware of the term of art.
|
| The point that was trying to be made, which I agree with,
| is that anthropomorphizing a statistical model isn't
| actually helpful. It only serves to confuse laypersons
| into assuming these models are capable of a lot more than
| they really are.
|
| That's perfect if you're a salesperson trying to dump
| your bad AI startup onto the public with an IPO, but
| unhelpful for pretty much any other reason, especially
| true understanding of what's going on.
| Oras wrote:
| Would be interesting to see a comparison with Qwen 32B. I found
| it a fantastic local model (ollama).
| DSingularity wrote:
| I agree. Qwen models are great.
| SV_BubbleTime wrote:
| Last year, fit was important. This year, inference speed is
| key.
|
| Proofreading an email at four tokens per second, great.
|
| Spending a half hour to deep research some topic with artifacts
| and MCP tools and reasoning at four tokens per second... a bad
| time.
| ksec wrote:
| A few days after Apple's "The illusion of Reasoning". I wonder if
| this is the same again. Anyone runs Tower of Hanoi?
| barrkel wrote:
| The Tower of Hanoi problem is limited by context length rather
| than model intelligence - see
| https://x.com/scaling01/status/1931783050511126954
| NitpickLawyer wrote:
| That paper was flawed in many ways, but it had a catchy name so
| lots of 'fluencers and media pounced on it and slopped some
| content based on the title alone. Chances are it will be
| relegated to the blooper section of LLM papers, just like that
| "training on LLM outputs leads to model collapse" paper was...
| __loam wrote:
| Sorry this has nothing to do with the point you're making but
| I've literally never seen anyone use the word 'fluencers in
| place of influencers lol.
| olddustytrail wrote:
| Me neither and it's not much shorter. I think fluzies could
| work better.
| squidsoup wrote:
| I propose effluencers.
| syntex wrote:
| The illussion of reasoning was terrible paper. 2^n-1 how it
| could fit in context size. I tried o3 and he gave me python
| script saying that inserting all moves is to much for context
| window. completely different results.
| roboboffin wrote:
| I think that their point was that the problem is easily
| solvable by humans without code, and shows the ability to
| chain steps together to achieve a goal.
| roboboffin wrote:
| Not sure why I am being downvoted. I am simply saying that
| we know there is a defined algorithm for solving Tower of
| Hanoi, and the source code for it is widely available. So,
| o3 producing the code as an answer, demonstrates even less
| intelligence, as it means it is either memorized or copied
| from the internet. I don't see how this point counters the
| paper at all.
|
| I believe what they are trying to show in that paper, is
| that as the chain of operations approaches a large amount
| (their proxy for complexity), an LLM will inevitable fail.
| Humans don't have infinite context either, but they can
| still solve the Tower Of Hanoi without need to resort to
| either pen or paper, or coding.
| syntex wrote:
| I didn't downvote. T the problem with the paper is that
| it asks the model to output all moves for, say, 15 disks
| 2 ^ 15 - 1 = 32767
|
| 32767 moves in a single prompt. That's not testing
| reasoning. That's testing whether the model can emit a
| huge structured output without error, under a context
| window limit.
|
| The authors then treat failure to reproduce this entire
| sequence as evidence that the model can't reason. But
| that's like saying a calculator is broken because its
| printer jammed halfway through printing all prime numbers
| under 10000.
|
| For me o3 returning Python code isn't a failure. It's a
| smart shortcut. The failure is in the benchmark design.
| This benchmark just smells.
| roboboffin wrote:
| No worries, I wasn't saying to you directly.
|
| I agree 15 disks is very difficult for a human, probably
| on a sheer stamina level; but I managed to do 8 in about
| 15 minutes by playing around (I.e. no practice). They do
| state that there is a massive drop in performance at this
| point.
| teach wrote:
| Remember that with Towers of Hanoi every extra disk
| doubles the number of moves required. So 15 discs is 128x
| more moves. If you did eight in 15m then fifteen would
| take you 32 hours.
| daveguy wrote:
| > That's testing whether the model can emit a huge
| structured output without error, under a context window
| limit.
|
| Agreed. But to be fair, 1) a relatively simple algorithm
| can do it, and more importantly 2) a lot of people are
| trying to build products around doing exactly this (emit
| large structured output without error).
| jwitthuhn wrote:
| Is it easily solvable by humans without code? I suspect if
| you asked a human to write down all the steps in order to
| solve a Tower of Hanoi with 12 disks they would also give
| up before completing it. Writing code that produces the
| correct output is the only realistic way to solve that
| purely due to the amount of output required.
| tonyhart7 wrote:
| a bit too late aren't we??
| pu_pe wrote:
| Benchmarks suggest this model loses to Deepseek-R1 in every one-
| shot comparison. Considering they were likely not even pitting it
| against the newer R1 version (no mention of that in the article)
| and at more than double the cost, this looks like the best AI
| company in the EU is struggling to keep up with the state-of-the-
| art.
| atemerev wrote:
| "EU is leading in regulation", they say.
|
| I don't know what they are thinking.
| micromacrofoot wrote:
| probably some silly thing like "people should have more
| rights and protections"
| atemerev wrote:
| I've yet to find any rights and protections in these cookie
| banners.
| saubeidl wrote:
| The cookie banners are corps trying to circumvent the
| rights and protections. If they actually went by the
| spirit of the protections, the cookie banners wouldn't be
| needed. Your ire is misdirected.
| yeahforsureman wrote:
| Are you sure?
|
| The ePrivacy Directive requires a (GDPR-level) consent
| for just placing the cookie, unless it's strictly
| _necessary_ for the provision of the "service". The way
| EU regulators interpret this, even web analytics falls
| outside the necessity exception and therefore requires
| consent.
|
| So as long as the user doesn't and/or is not able to
| automatically signal consent (or non-consent) eg via
| general browser-level settings, how _can_ you obtain it
| without trying to get it from the user on a per-site
| basis somehow? (And no, DNT doesn 't help since it's an
| opt-out, not an opt-in mechanism.)
| exyi wrote:
| Everyone I know of will try to click "reject all
| unnecessary cookies", and you don't need the dialog for
| the necessary ones. You can therefore simply remove the
| dialog and the tracking, simplifying your code and
| improving your users' experience. Can tracking the
| fraction which misclicks even give some useful data?
| micromacrofoot wrote:
| there are analytics providers that don't require third
| party cookies, it's not hard to switch
| micromacrofoot wrote:
| cookie banners are malicious compliance while we head
| towards the death of cross-site cookies, they are indeed
| a poor implementation but the legislation that lead to
| them did not come up with it
|
| did you really prefer when companies were selling your
| data to third parties and didn't have to ask you?
| sunaookami wrote:
| Do you really think clicking "Reject non-essential
| cookies" does something?
| micromacrofoot wrote:
| show me a single example that doesn't
| __alexs wrote:
| EU regulation is often "you can not have the cool thing"
| not "the cool thing must be operated equitably".
|
| I think they are more interested in protecting old money
| than in protecting people.
| saubeidl wrote:
| Can you name specific examples? Otherwise, this just
| sounds like inflammatory polemic.
| micromacrofoot wrote:
| I think usb-c and third party app stores are pretty cool
| umbra07 wrote:
| I think the government shouldn't be legislating that
| companies must use a specific USB connector.
|
| Realistically the legislation was only targeting Apple.
| If consumers want USB-C, then they can vote with their
| wallets and buy an Android, which is a reasonable
| alternative.
| micromacrofoot wrote:
| We've had multiple USB standards for decades with no end
| in sight. Apple was targeted because they have the most
| high-profile proprietary connector and they were
| generally using it to screw consumers. Good riddance.
| umbra07 wrote:
| Like I said, if consumers don't want it, then they can
| buy Android phones instead.
|
| > they were generally using it to screw consumers
|
| You understand that there were lots of people happy with
| Lightning? USB-C is a regression in many ways.
| boroboro4 wrote:
| I want to have USB-C and I want to have iPhone.
|
| I'm very happy EU regulators took this headache off my
| shoulders and I don't need to keep multiple chargers at
| home, and can be almost certain I can find a charger in
| restaurant if I need it.
|
| Based on the reaction of my friends 90% of people
| supported this change and were very enthusiastic about
| it.
|
| I have zero interest in being part of vendor game to lock
| me in.
| umbra07 wrote:
| Products are supposed to come with different tradeoffs. I
| want to have an Android and I want to have my headphone
| jack back. That doesn't mean that the EU should make that
| a law.
|
| > Based on the reaction of my friends 90% of people
| supported this change and were very enthusiastic about
| it.
|
| That is an absolutely worthless metric, and you know it.
| Aeolos wrote:
| It's about as useful as your complaining.
|
| Good riddance for Lightning.
| micromacrofoot wrote:
| Why bother arguing the point if you're not going to
| provide a single example.
| flmontpetit wrote:
| It's hard to see the benefit in letting every hardware
| manufacturer attempt to carve out their own little
| artificial interconnect monopoly and flood the market
| with redundant, wasteful solutions.
| msgodel wrote:
| They shouldn't be forcing people to use patented Qualcomm
| technology to access cellular networks either but here we
| are.
|
| Realistically Apple's connector adds no value and if they
| want to sell into markets like the EU they need to cut
| that kind of thing out.
| umbra07 wrote:
| > Realistically Apple's connector adds no value
|
| Like I said, usb-c is a regression from lightning in
| multiple ways.
|
| * Lightning is easier to plug in.
|
| * Lightning is a physically smaller connector.
|
| * USB-C is a much more mechanically complex port. Instead
| of a boss in a slot, you have a boss with a slot plugging
| into a slot in a boss.
|
| There was so much buzz around Apple no longer including a
| wall wort with its phones, which meant an added cost for
| the consumer, and potentially an increased environmental
| impact if enough people were going to say, order a wall
| wort online and shipped to them. The same logic applies
| to Apple forced to switch to USB, except that the costs
| are now multiplied.
| micromacrofoot wrote:
| I've worked with thousands of both types of cable at this
| point
|
| > Lightning is easier to plug in.
|
| according to you? neither are at all difficult
|
| > Lightning is a physically smaller connector.
|
| I've had lightning cables physically disassemble in the
| port, the size also made them somewhat delicate
|
| > USB-C is a much more mechanically complex port.
|
| _much_ is a bit well, much... they 're both incredibly
| simple mechanically -- the exposed contacts made
| lightning more prone to damage
|
| I've had multiple Apple devices fail because of port wear
| on the device. Haven't encountered this yet with usb-c
|
| > The same logic applies to Apple forced to switch to
| USB, except that the costs are now multiplied.
|
| Apple would have updated inevitably, as they did in the
| past -- now at least they're on a standard... the long-
| term waste reduction is very likely worth the switch
| (because again, without the standard they'd have likely
| switched to another proprietary implementation)
| fkyoureadthedoc wrote:
| Having owned both lighting and USB-C iPhones/iPads, I
| prefer the USB-C experience, but neither were that bad.
|
| My personal biggest gripe with lightning was that the
| spring contacts were in the port instead of the cable,
| and when they wore out you had to replace the phone
| instead of the cable. The lightning port was not
| replaceable. In practice I may end up breaking more USB-C
| ports, we'll see.
| andruby wrote:
| EU never just states "you can not have the cool thing".
| Please provide an example if you disagree.
|
| It is very hard to create policies and legislation that
| protects consumers, workers and privacy while also giving
| enough liberties for innovation. These are difficult but
| important trade-offs.
|
| I'm glad there is diversity in cultures and values
| between the US, EU and Asia.
| bobxmax wrote:
| Rights and protections that have benefited heavily from an
| economy built on the alliance with the US.
|
| If it weren't for American help and trade post-WW2, Europe
| would be a Belarusian backwater and is fast heading back in
| that direction.
|
| Countries like Greece, Italy, Spain, Portugal, etc. show
| the future of Europe as it slowly stagnates and becomes a
| museum that can't feed it's people.
|
| Even Germany that was once excelling is now collapsing
| economically.
|
| The only bright spot on the continent right now is Poland
| who are, shocker, much less regulatorily strict and have
| lower corporate taxes.
| debugnik wrote:
| > Countries like Greece, Italy, Spain, Portugal
|
| PIGS, really? Some of the top growing EU economies right
| now, which have turned their deficit around, show the
| future of a slowly stagnating Europe?
| bobxmax wrote:
| A 200B economy growing 2% is the future of the EU? Yes
| that is the point I am making.
| dmos62 wrote:
| It is fairly common to struggle to understand why different
| cultures think the way they do.
| moralestapia wrote:
| Ugh.
|
| Edit: Parent changed their comment significantly, from
| something quite unpleasant to what it is now. I'm not
| deleting my comment as I'm not that kind of person.
| dmos62 wrote:
| I did. I initially said that Europeans often struggle to
| understand other cultures too. Which was an immature way
| to point out that the cultural dissonance works both
| ways. I realized that I was obfuscating my point and
| rewrote my comment to be clearer, but now that you gave
| me a chance to think on it some more, I wish I would have
| said what I wanted to say more directly still.
|
| What I wanted to say is: I like EU's regulation and I
| find it interesting how other people have different world
| views.
| atemerev wrote:
| I live in Europe.
| mrtksn wrote:
| Cool, which regulations exactly stopped you from doing
| cutting edge AI?
| kelseyfrog wrote:
| Decret sur la Pause Gouter Universelle (PGU).
| philjohn wrote:
| Is that the regulation that says you need to allow
| someone to take a 20 minute break after 6 hours of work?
| meta_ai_x wrote:
| regulation-culture breed a certain type of risk-taking
| culture. So, you can't blame a specific regulation for
| lack of innovation culture
| mrtksn wrote:
| Im not sure about that, Europe has plenty of starups.
| Also, IIRC it has larger number of small businesses than
| US as in US huge companies employ huge numbers of people.
|
| What Europe does not have is scale ups in tech. The tech
| consolidated in US. By tech I mean internet based
| companies. Remove those and EU has higher productivity.
| cpldcpu wrote:
| Sorry, this is just getting old...
|
| Its a trite talking point and not the reason why there are so
| few consumer-AI companies in Europe.
| atemerev wrote:
| And what would be the reason? I am genuinely interested.
| Also, are there viable not "consumer" AI companies here?
| Only Mistral seems to train foundation models, and good for
| them, however, as of now they are absolutely not SOTA.
| baq wrote:
| Money.
|
| No, really - EU doesn't have the VCs and the megacorps.
| People laugh at EU sponsoring projects, but there is no
| private money to sponsor them. There are plenty of US
| companies with sites in the EU though, so you have people
| working the problems, but no branding.
| SV_BubbleTime wrote:
| Ok, just a quick question... why does Europe not have the
| money actual/people?
| baq wrote:
| edit: the parent has since edited out the flamebait.
|
| Maybe, or maybe when silicon valley was busy growing
| exponentially Europe was still picking itself up from the
| mess of ww2.
|
| Trying to blame a single reason is futile, naive and
| childish.
| oceanplexian wrote:
| The US was out-innovating Europe a long time before WW2,
| we had faster, more extensive rail systems, superior high
| rise construction, earlier to electrification, invention
| of the telephone, modern manufacturing (Model T),
| invention of the airplane, the birth of Hollywood and
| modern motion pictures, the list goes on.
| msgodel wrote:
| I think it's funny how the US, Canada, and Scotland/the
| UK all simultaneously claim to be the home of the
| telephone.
| bobxmax wrote:
| And what's the excuse for Euro's GDP being equal to the
| US in 2007, and now being over $10T less?
| baq wrote:
| In general, the same. In particular, different.
| fmbb wrote:
| Quick questions don't always have quick answers.
|
| Moneywise, the US does have the good old Exorbitant
| Privilege to lean on.
| hshdhdhj4444 wrote:
| Part of the answer is debt.
|
| The U.S. has a debt of 35Tn. The entire EU around 16Tn.
|
| If even 10% of the debt difference was invested in tech
| that would have meant about $2tn more in investment in EU
| tech.
| bobxmax wrote:
| Because Europeans don't take smart risks. Because they
| over regulate.
|
| It's fascinating watching people circle back to this
| answer.
|
| Regulation and taxation reduces incentives. Lower
| incentives, means lower risk-taking.
|
| The fact this is still a lesson that needs to be debated
| is absurd.
| baq wrote:
| Europeans also mostly don't suffer from school shootings
| and generally don't go bankrupt when they get cancer or
| just take an ambulance ride to a non-network hospital.
| Regulation is not all bad, besides the US has more of it
| than anybody else.
| bobxmax wrote:
| The vast majority of Americans don't do either of those
| things either.
|
| And given what happened in Austria just a few hours back,
| not the best time for your comment.
| camjw wrote:
| There have been 11 mass shootings in the US in the last 7
| days so I don't think this disgusting competition is one
| you're likely to win.
| bobxmax wrote:
| Nobody is claiming the US has less mass shootings. It's
| just pointless whataboutism in a conversation (economic
| strategy) that has nothing to do with it.
| camjw wrote:
| Ah good, I thought you were trying to imply there is an
| equivalent problem in the EU. Which would seem to be
| intentionally dense of course.
| baq wrote:
| Regulation was the point discussed, healthcare and gun
| controls are two examples where there are massive
| qualitative and quantitative differences in regulation
| between EU and USA. E.g. healthcare is a matter of
| national security in the EU and it's a profit center for
| pension funds in the USA. Gun controls I'm not too
| familiar with, I can only see second order effects in the
| US in the form of an arms race between police and
| citizens.
| bobxmax wrote:
| No, ECONOMIC regulation was the point discussed. That has
| zilch to do with something like gun control.
| TulliusCicero wrote:
| The mental gymnastics here are incredible. Do you really
| think the regulations inhibiting tech startup creation
| are the same ones that protect people when they get
| cancer or whatever?
|
| Yes, the US has a lot of school shootings, but does
| anyone think loose gun regulations are why the US is
| strong on tech?
| bobxmax wrote:
| Any time European economic failings are brought up it's
| always the same thing. "Well at least no school
| shootings!"
|
| Great, Singapore has less school shootings and homeless
| people than anywhere in Europe by a country mile and has
| a soaring economy.
| camjw wrote:
| I would love to know what you do for a living and whether
| you personally have taken any smart risks that have lead
| you to financial success, or whether you just like
| sniping on HN about school shootings and pretending to be
| superior.
| bobxmax wrote:
| Lol skipped right to the ad hominem this time huh?
|
| Europeans defending their economy is like republicans
| defending gun laws... like watching a chicken run around
| in circles.
| stefan_ wrote:
| Thats hardly unique to Europeans. Look at UAV regulations
| in the US - regulated to death based on nothing, leading
| to a 5 to 10 year technology gap to China, while
| recreational pilots crash and burn every other week.
| atemerev wrote:
| The amount of debt you are allowed to take and the
| abundance of money to invest in new projects are in
| direct proportion to the competitiveness of the
| jurisdiction, i.e. business-friendly environment.
|
| EU is not a business-friendly environment.
| kilpikaarna wrote:
| Most recently, due to ordoliberalism and coat-according-
| to-cloth morality guiding economic policy rather than
| money printer go brrr.
|
| Longer term: cultural and language divisions despite
| attempts at creating a common market, not running the
| global reserve currency/military hegemony, social
| democracies encouraging work-life balance over cutthroat
| careerism, demographic issues, not getting a boost from
| being the only consumer economy not to be leveled in WW2,
| etc.
| PeterStuer wrote:
| Unlike the US, the EU does not have reserve currency
| privilige, so we can't print enless trillions of paper
| and force the rest of the world to give us their
| companies and goods in return for it.
| 0xDEAFBEAD wrote:
| Honestly the US approach to AI is incredibly irresponsible.
| As an American, I'm glad that someone somewhere is thinking
| about regulation. Not sure it will be enough though:
| https://xcancel.com/ESYudkowsky/status/1922710969785917691#m
| MoonGhost wrote:
| No, thanks, we don't want to be like EU. Everything
| regulated to death. They even thought to criminalize street
| photography because there could be copyrighted materials in
| the picture. Not sure, are they still taxing Eiffel tower
| images?
| johnisgood wrote:
| I thought it is happening in the US, too. I mean, the
| Government is there to regulate the shit out of
| everything. Regardless of where you are.
| int_19h wrote:
| EU is not a monolithic entity, and amount of regulation
| varies widely. Baltics are very business friendly, for
| example.
| bobxmax wrote:
| And Estonia has the most impressive tech ecosystem on the
| continent while being a soviet backwater 20 years ago.
| Shocking how that works.
| msgodel wrote:
| There's nothing the regulation could meaningfully hope to
| accomplish other than slow down people willing to play by
| the rules.
| ambicapter wrote:
| Wow, the "criminals don't follow laws therefore laws are
| worthless" argument, here? In my HN?
| msgodel wrote:
| Usually it's possible to actually detect crime (in fact
| it's usually hard to ignore.) That's not the case with
| AI.
| Mistletoe wrote:
| This is why I want to move to the EU. I don't care if
| companies aren't coddled there. I want to live where people
| are the first priority.
| atemerev wrote:
| Well, are you ready to live on a low middle class salary of
| a European software engineer? It is really low middle
| class. The middle middle here would be a bank clerk, and
| upper middle -- a lawyer or a surgeon.
|
| This is not coincidental.
| baq wrote:
| Incidentally (also not) surgeons and lawyers are not poor
| in the states either... it's just Silicon Valley was the
| perfect place with just the right people and it kept
| growing for 60 years straight. Surgery and law do not
| grow exponentially. (I'll pretend the pages of regulation
| aren't supposed to count.)
| mrtksn wrote:
| Europe isn't going to catch up in tech as long as its market is
| open to US tech giants. Tech doesn't have marginal costs, so
| you want to have one of it in one place and sell it everywhere
| and when the infra and talent is already in US, EU tech is
| destined to do niche products.
|
| UK has a bit of it, France has some and that's it. The only
| viable alternatives are countries who have issues with US and
| that is China and Russia. China have come up with strong
| competitors and it is on cutting edge.
|
| Also, it doesn't have anything to do with regulations. 50 US
| States have the American regulations, its all happening in 1
| and some other states happen to host some infrastructure but
| that's true for rest of of the world too.
|
| If the EU/US relationship gets to Trump/Musk level, then EU can
| have the cutting edge stuff.
|
| Most influential AI researchers are from Europe(inc. UK),
| Israel and Canada anyway. Ilya Sutskever just the other day
| gave speech at his alma matter @Canada for example. Andrej
| Karpathy is Slovakian. Lot's of Brits, French, Polish, Chinese,
| German etc. are among the pioneers. Significant portion of the
| talent is non-American already, they just need a reason to be
| somewhere else than US to have it outside the US. Chinese got
| their reason and with the state of the affairs in the world I
| wouldn't be surprised if Europeans gets theirs in less than 3
| and a half years.
| vikramkr wrote:
| If you close off the market to US tech giants, maybe they'll
| have some amount of market dominance at home, but I would
| doubt that would mean they've "caught up" tech wise. There
| would be no incentive to compete. American EV manufacturing
| is pretty far behind Chinese EV manufacturing, protectionism
| didn't help make a competitive car, it just protected the
| home market while slowly ceding international market after
| international market
| saubeidl wrote:
| As a counterexample, China's tech industry has caught up
| and in some ways surpassed the US, partially due to being
| closed off.
| hshdhdhj4444 wrote:
| But also due to the U.S. driving away smart people from
| the U.S. to China.
| csomar wrote:
| > As a counterexample, China's tech industry has caught
| up and in some ways surpassed the US, partially due to
| being closed off.
|
| How did you come up to that conclusion? We don't have
| access to an alternate universe where the Chinese tech
| market was open. There is a real possibility that it
| would have been far ahead had it been open.
| yorwba wrote:
| We do have access to records from the before times when
| the internet was wide open and Facebook, Google and
| Microsoft were big in China. Well, Microsoft is still big
| because they're not an internet company and unfazed by
| censorship, but the exit of Google and Facebook took a
| lot of pressure off Baidu and the entire Chinese social
| media ecosystem.
| mitthrowaway2 wrote:
| I think there's a few more important reasons beyond being
| closed off:
|
| - Regulatory friendliness (eg. DJI)
|
| - Non-enforcement of foreign patents (eg. LiFePO4
| batteries)
|
| - Technology transfer through partnerships with domestic
| firms
|
| - Government support for industries deemed to be in the
| national interest
| mrtksn wrote:
| I agree, protectionism is bad most of the time but it has
| its place. It is bad when you are ahead, it is useful when
| you are behind(You want them to be exposed to the cutting
| edge market but before that you want them to be able to
| exist in first place even if they are not the best at this
| very moment).
|
| China's EV dominance is a result of local governments
| investing and buying from local businesses.
|
| It would be the same with Russia&China. They will receive
| money from the governments and will sell to local buyers
| and will aim to expand to foreign markets.
|
| As I said, most AI talent is not American but it is
| concentrated there. Give them a reason to be somewhere
| else, some will be somewhere else.
| littlestymaar wrote:
| > There would be no incentive to compete.
|
| Why not ? First of all there would be plenty of incentives
| for EU companies to compete with one another (and plenty of
| capital flowing to them as the European market is big
| enough), then there would be competition with US actors in
| the rest of the world. That's exactly how the Asian
| economic model has been built: Japan, Taiwan, South Korea
| all have used protectionism + export-based subsidies to
| create market leaders in all kind of domains (from car
| manufacturing to electronics and shipbuilding).
| chairmansteve wrote:
| China is an example of protectionism working. The world is
| not governed by simple rules.
| foolswisdom wrote:
| The solution to that would be to force companies within the
| EU market to compete with each other (fair competition
| laws), just that idea is less popular than the first winner
| in a market ensuring they stay dominant (because it serves
| the interest of those who just got power). Same reason why
| big tech rules EU in the first place.
| iwontberude wrote:
| Which Trump/Musk level? There have been so many.
| Iulioh wrote:
| The problem is, CONSUMER level tech
|
| The EU is doing a lot of enterprise level shit and it's great
|
| The biggest company in Europe sells B2B software (SAP)
| mrtksn wrote:
| One swallow does not make a summer, all the major platforms
| are American and that's where Europe lags. I agree that
| Europe does have some great tech but they are all niche.
| Europe also have some great consumer tech products but they
| are all dependent on American platforms. For example some
| of the best games are French, Polish, Bulgarian, Ukrainian
| etc. but they all depend on Steam or Apple App Store and
| have to go by their rules and pay them a significant
| commission.
| csomar wrote:
| That's a single company and I'd not call that great.
| PeterStuer wrote:
| SAP sells B2B software, but most of their income is from
| consultancy and training.
| ascorbic wrote:
| It's mostly about money. DeepMind was founded in the UK, and
| is still based in London, but there was no way it could get
| the funding it needed without selling to Google or some other
| US company. China is one of the few other countries that can
| afford to fund that kind of thing.
| simianwords wrote:
| How can you explain Israel?
| funnym0nk3y wrote:
| Thought so too. I don't know how it could be different though.
| They are competing against behemoths like OpenAI or Google, but
| have only 200 people. Even Anthropic has over 1000 people.
| DeepSeek has less than 200 people so the comparison seems fair.
| rsanek wrote:
| any claim from the deepseek folks should be considered with
| wide margins of error.
| humpty-d wrote:
| I know we distrust them on account of being nefarious
| Chinese, but has anything come to light with R1 or the
| people behind it specifically to justify this?
| jasonthorsness wrote:
| Even if it isn't as capable, having a model with control over
| training is probably strategically important for every major
| region of the world. But it could only fall so far behind
| before it effectively doesn't work in the eyes of the users.
| melicerte wrote:
| If you look at Mistral investors[0], you will quickly
| understand that Mistral is far from being European. My
| understanding is it is mainly owned by US companies with a few
| other companies from EU and other places in the world.
|
| [0] https://tracxn.com/d/companies/mistral-
| ai/__SLZq7rzxLYqqA97j... (edited for typo)
| pdabbadabba wrote:
| For the purposes of GP's comment, I think the nationalities
| of the people actually running the company and doing the work
| are more relevant than who has invested.
| derektank wrote:
| And, perhaps most relevantly, the regulatory environment
| the people are working in. French people working in America
| are probably more productive than French people working in
| France (if for no other reason because they probably work
| more hours in America than France).
| 8n4vidtmkvmk wrote:
| Are we sure more time butt in office equates to more
| productivity?
| meta_ai_x wrote:
| Yes, especially in cutting edge research areas where
| other high functioning people with high energy isarelso
| there.
|
| You can write your in-house CRUD app in your basement or
| your office and it doesn't matter.
|
| The vast majority of HN crowd and general
| social/mainstream media don't make the difference between
| these two scenarios
| 1propionyl wrote:
| Yes, specifically when it comes to open-ended research or
| development, collocation is non-negotiable. There are
| greater than linear benefits in creativity of approach,
| agility in adapting to new intermediate discoveries, etc
| that you get by putting a number of talented people who
| get along in the same space who form a community of
| practice.
|
| Remote work and flattening communication down to what
| digital media (Slack, Zoom, etc) afford strangle the
| beneficial network effects.
| throwaway0123_5 wrote:
| I think they were talking about total time spent working
| rather than remote vs. in-person. I've seen more than a
| few studies over the years showing that going from 40 to
| 35 or 30 hours/wk has minimal or positive impacts on
| productivity. Idk if that would apply to all work
| environments though, and I don't recall any of the
| studies being about research productivity specifically.
| distortionfield wrote:
| You're being downvoted but you're right. The number of
| people who act like a web cam reproduces the in person
| experience perfectly, for good and bad, is hilarious to
| me.
| alienbaby wrote:
| I think the mistake people make is believing that one
| approach is best for all. Diffferent people work most
| effectively in different ways.
| numpad0 wrote:
| I think maybe we should completely switch to admitting
| this. Every extra second you sit in the (home)office adds
| to productivity, just not necessarily converting into
| market values, that can be inflated with hype. Also
| longer hours is not necessarily safe or sustainable.
|
| We only wish more time != more productivity because it's
| inconvenient in multiple ways if it were. We imagine a
| multiplier in there to balance the equation, such factor
| that can completely negate production, using mere
| anecdotal experiences as proofs.
|
| Maybe that's not scientific, maybe time spent very
| closely match productivity, and maybe production as well
| as productivity need external, artificial regulations.
| mschild wrote:
| > Every extra second you sit in the (home)office adds to
| productivity
|
| I'm not sure I believe that. I think at some point the
| additional hours worked will ultimately decrease the
| output/unit of time and at some point that you'll reach a
| peak whereafter every hour worked extra will lead to an
| overall productivity loss.
|
| Its also something that I think is extremely hard to
| consistently measure, especially for your typical office
| worker.
| adventured wrote:
| $89,000 GDP per capita vs $46,000 rather proves the point
| about productivity per butt. US office workers are
| extraordinarily productive in terms of what their work
| generates (thanks to numerous well understood things like
| the outsized US scaling abilities). Measuring beyond that
| is very difficult due to the variance of every business.
| cataphract wrote:
| A part of that figure is an artifact of how strong the
| dollar is though.
| palata wrote:
| > $89,000 GDP per capita vs $46,000 rather proves the
| point about productivity per butt.
|
| So if I work 24h/day in a farm in Afghanistan, I should
| earn more than software developers in the Silicon Valley
| (because I'm pretty sure that they sleep)? Is that how
| you say GDP works?
| vasco wrote:
| Most measures of productivity have "hours worked" in the
| denominator so that can't be right.
| underdeserver wrote:
| If I work 1000 hours and you work 2000 hours in the same
| timeframe, but you outcompeted me and created 3x value,
| you are 1.5 times more productive.
|
| There's a numerator too.
| vasco wrote:
| How does the same exact person get more productive? You
| forgot the example I replied to? The only thing that
| changed were hours worked. In your example you change it
| to less hours worked with more output. You made it
| circular.
| underdeserver wrote:
| You can be more productive just because you're faster.
|
| Magistral is amazingly impressive compared to ChatGPT
| 3.5. If it had come out two years ago we'd be saying
| Mistral is the clear leader. But it came out now.
|
| Not saying they worked fewer hours, just that speed
| matters, and in some cases, up to a limit, working more
| hours gets your work done faster.
| whiplash451 wrote:
| > they probably work more hours in America than France
|
| Not sure that's even true. Mistral is known to be a
| _really_ hard-working place
| gwervc wrote:
| I'm pretty sure there is way less regulations in the US
| in respect to France where going over the legal 35h/week
| requires additional capital and legal paperwork.
| retinaros wrote:
| No one works 35hours in software jobs in france except
| maybe government. Overtime is also not compensated (they
| give some days off that is it.)
| psalaun wrote:
| Even in government; I've worked 50+ hours weeks working
| for the healthcare branch of the providence state, with a
| classic 39h/w contract. No compensation of any sort,
| despite having timesheets.
|
| There are a lot of myths about French worker. Our
| lifelong worked hours is not exceptional; our
| productivity is also not exceptional.
| greenavocado wrote:
| Pointless suffering. Report violations to the CSE,
| Medecin du Travail, and Inspection du Travail.
| psalaun wrote:
| It was a choice, I loved my job there. I had more
| exciting projects than most of my friends in the private
| sector!
| Saline9515 wrote:
| Excellent way to get blacklisted and never work for the
| State again if you're a contractor, or end up in a low
| impact, boring job if you're a career worker.
| algoghostf wrote:
| This is not true. Government workers or factory workers
| can limit to 35h (with some salary loss or days off
| loss), but else than that (especially in tech) it is very
| competitive and working 50 hours+/week is not exceptionl.
| kgwgk wrote:
| > 50 hours+/week is not exceptionl.
|
| https://www.legifrance.gouv.fr/codes/article_lc/LEGIARTI0
| 000...
|
| Au cours d'une meme semaine, la duree maximale
| hebdomadaire de travail est de quarante-huit heures.
|
| https://www.legifrance.gouv.fr/codes/article_lc/LEGIARTI0
| 000...
|
| La duree hebdomadaire de travail calculee sur une periode
| quelconque de douze semaines consecutives ne peut
| depasser quarante-quatre heures, sauf dans les cas prevus
| aux articles L. 3121-23 a L. 3121-25.
| Saline9515 wrote:
| Everyone is "forfait cadre", which allow them to work
| with no practical time limit since they don't log their
| time spent at work. https://www.service-
| public.fr/particuliers/vosdroits/F19261
| kgwgk wrote:
| It seems that 20% of employees in the private sector are
| "cadres" and half of them are on "forfait jours". That
| makes around 10% of the private sector employees working
| 218 days per year without the standard hourly limits.
| It's more than I thought but I doubt that many of them
| work more than 10 hours per day. Whether that's
| "exceptional" or not is a matter of definition, of
| course.
| greenavocado wrote:
| In the USA most software engineers are FLSA-exempt
| ("computer employee" exemption).
|
| No overtime pay regardless of hours worked.
|
| No legal maximum hours per day/week.
|
| No mandatory rest periods/breaks (federally).
|
| The US approach places the burden on the individual
| employee to negotiate protections or prove
| misclassification, while French law places the burden on
| the employer to comply with strict, state-enforced
| standards.
|
| The French Labor Code (Code du travail) applies to
| virtually all employees in France, regardless of sector
| (private tech company, government agency, non-profit,
| etc.), unless explicitly exempted. Software engineering
| is not an exempted profession. Maximum hour limits are
| absolute. The caps of 44 hours per week, 48 hours average
| over 12 weeks, and 10/12 hours per day are legal maximums
| for almost all employees. Tech companies cannot simply
| ignore them. The requirements for employee consent,
| strict annual limits (usually max 220 hours/year),
| premium pay (+25%/+50%), and compensatory rest apply to
| software engineers just like any other employee.
|
| "Cadre" Status is not an exemption. Many software
| engineers are classified as Cadres
| (managers/professionals) but this status does not
| automatically exempt them from working time rules.
|
| Cadre au forfait jours (Days-Based Framework): This is
| common for senior engineers/managers. They are exempt
| from tracking daily/weekly hours but must still have a
| maximum of 218 work days per year (including weekends,
| holidays, and RTT days). Their annual workload must not
| endanger their health. 80-hour weeks would obliterate
| this rest requirement and pose severe health risks,
| making it illegal. Employers must monitor their workload
| and health.
|
| Cadre au forfait heures (Hours-Based Framework) or Non-
| Cadre: These employees are fully subject to the standard
| daily/weekly/hourly limits and overtime rules. 80+
| hours/week is blatantly illegal.
|
| The tech industry, especially gaming/startups, sometimes
| tries to import unsustainable "crunch" cultures. This is
| illegal in France.
|
| EDIT: Fixed work days
| kgwgk wrote:
| > 218 rest days per year (including weekends, holidays,
| and RTT days)
|
| Wouldn't that be nice, 218 rest days? It's 218 working
| days.
| Saline9515 wrote:
| Some State services, such as the "Tresor", which oversees
| French economic policies, do not respect this at all, and
| require 12h work days most of the year. The churn is
| enormous, workers staying there less than a year on
| average.
| Saline9515 wrote:
| In France most white collar jobs are categorized as
| "management" ("cadre"), and they have no time limit. It
| is very common for workers to clock 12h days in
| consultancies (10am-10pm) and in state administrations,
| for instance.
| retinaros wrote:
| Most of french people in engineering jobs in France are
| working late even tho overtime is never paid.
| Disposal8433 wrote:
| In the USA they have the famous 9 to 5. Most developers'
| jobs in France are "9 to 6 with 2 hours to eat in the
| middle and unpaid overtime," so I would say both
| countries are equivalent.
| psalaun wrote:
| In parisian startups it's more 9 to 7 with 30 min lunch
| breaks.
| chairmansteve wrote:
| Spoken like a guy who's never been to France.
|
| Classic drive by internet trope.
|
| Maybe try a little harder, have an informed opinion about
| something.
| epolanski wrote:
| This is beyond ignorant and completely clueless.
|
| People in startups and hard research work extremely hard
| everywhere, and Mistral is even so more notorious for
| being a tough place to survive.
|
| You think that European founders and researchers are like
| "nah, you know what, we're European, we're not ambitious,
| we don't want to make money, to hell with equity"?
|
| Also, just to point out, I've worked in research, and I
| can tell you 100% that I've never ever seen anybody more
| dedicated and hardworking than people from China/South
| Korea and Japan. I'm talking sleeping bags in the office
| kind of people.
|
| And yet, that just does not translate in better results.
| More results, which is important too sometimes, yes,
| better, more relevant, higher quality? No no and no.
| kergonath wrote:
| It's a French company, subject to French laws and European
| regulations. That's what matters, from a user point of view.
| littlestymaar wrote:
| > Benchmarks suggest this model loses to Deepseek-R1 in every
| one-shot comparison.
|
| That's not particularly surprising though as the Medium variant
| is likely close to ten times smaller than DeepSeek-R1 (granted
| it's a dense model and not an MoE, but still).
| fiatjaf wrote:
| This reads like an AI-generated comment. What do you mean by
| "benchmarks suggest"? The benchmarks are very clear and
| presented right there in the page.
| tootie wrote:
| As an occasional user of Mistral, I find their model to give
| generally excellent results and pretty quickly. I think a lot
| of teams are now overly focused on winning the benchmarks while
| producing worse real results.
| esafak wrote:
| If so we need to fix the benchmarks.
| paulddraper wrote:
| https://en.wikipedia.org/wiki/Goodhart%27s_law
| riku_iki wrote:
| those who try to fix them are fighting alone against huge
| corps which try to abuse them..
| tootie wrote:
| I think there's a fundamental limit to benchmarks when it
| comes to real-world utility. The best option would be more
| like a user survey.
| esafak wrote:
| That's Chatbot Arena: https://lmarena.ai/leaderboard
| segmondy wrote:
| are you really going to compare a 24B model to a 700B+ model?
| a2128 wrote:
| 24B is the size of the Small opensourced model. The Medium
| model is bigger (they don't seem to disclose its size) and
| still gets beaten by Deepseek R1
| thot_experiment wrote:
| Mistral Large is 123b so one can probably assume that
| medium is between 24b and 123b, also Mistral 3.1 is by a
| wide margin my go-to model in real life situations.
| Benchmarks absolutely don't tell the whole story, and
| different models have different use cases.
| Ringz wrote:
| Can you please explain what your ,,real life situations"
| are?
| thot_experiment wrote:
| I use it as a personal assistant (so tool use integrated
| into calendar/todo/notes etc) often times using the
| multimodal aspect (taking a photo of a todo list, asking
| it to remind me to buy something from a picture). I also
| use it as a code completion tool in vscode, as well as a
| replacement for most basic google searches ("how does
| this syntax work", "what's the torch method for X")
|
| I use it for almost every interaction I have with AI that
| isn't asking it to oneshot complex code. I fairly
| frequently run my prompts against Claude/ChatGPT and
| Mistral 3.1 and find that for most things they're not
| meaningfully different.
|
| I also spend a lot of time playing around with it for
| storytelling/integration into narrative games.
| mandelken wrote:
| Cool. What framework or program do you use to orchestrate
| this?
| ohso4 wrote:
| It's a 70b model, Medium 2 was 70b.
|
| https://xcancel.com/arthurmensch/status/19201368714614336
| 20#...
| moffkalast wrote:
| The most important company is to is to QwQ at 30B sjnce it's
| still the best local reasoning model for that size. A
| comparison that Mistral did not run for some reason, not even
| with Qwen3.
| hmottestad wrote:
| With how amazing the first R1 model was and how little compute
| they needed to create it, I'm really wondering how the new R1
| model isn't beating o3 and 2.5 Pro on every single benchmark.
|
| Magistral Small is only 24B and scores 70.7% on AIME2024 while
| the 32B distill of R1 scores 72.6%. And with majority voting
| @64 the Magistral Small manages 83.3%, which is better than the
| full R1. Since I can run a 24B model on a regular gaming GPU
| it's a lot more accessible than the full blown R1.
|
| https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-...
| adventured wrote:
| It's because DeepSeek was a fast copy. That was the easy part
| and it's why they didn't have to use so much compute to get
| near the top. Going well beyond o3 or 2.5 Pro is drastically
| more expensive than fast copy. China's cultural approach to
| building substantial things produces this sort of outcome
| regularly, you see the same approach in automobiles, planes,
| Internet services, industrial machinery, military, et al.
| Innovation is very expensive and time consuming, fast copy is
| more often very inexpensive and rapid. 85% good enough is
| often good enough, that additional 10-15% is comically
| expensive and difficult as you climb.
| MaxPock wrote:
| I understand that the French are very innovative so why
| isn't their model SOTA ?
| natrys wrote:
| Not disagreeing with the overarching point but:
|
| > That was the easy part
|
| Is a bit hand-wavy in that it doesn't explain why it's only
| DeepSeek who can do this "easy" thing, but still not Meta,
| Mistral or anyone else really. There are many other players
| who have way more compute than DeepSeek (even inside China,
| not even considering rest of the world), and I can assure
| you more or less everyone trains on synthetic
| data/distillation from whatever bigger model they can
| access.
| epolanski wrote:
| Jm2c but I feel conflicted about this arms race.
|
| You can be 6/12 months later, and have not burned tens of
| billions compared to the best in class, I see it an engineering
| win.
|
| I absolutely understand those that say "yeah, but customers
| will only use the best", I see it, but is market share of
| forever money losing businesses that valuable?
| adventured wrote:
| A similar sentiment existed for a long time about Uber and
| now they're very profitable and own their market. It was
| worth the burn to capture the market. Who says OpenAI can't
| roll over to profitable at a stable scale? Conquer the
| market, hike the price to $29.95 (family account, no ads;
| $19.95 individual account with ads; etc etc). To say nothing
| of how they can branch out in terms of being the interaction
| point that replaces the search box. The advertising value of
| owning the land that OpenAI is taking is well over $100
| billion in annual revenue. Amazon's retail business is
| terrible, their ad business is fantastic. As OpenAI bolts on
| an ad product their margin potential will skyrocket and the
| cost side will be modest in comparison.
|
| Over the coming years it won't be possible to stay a mere
| 6-12 months behind as the costs to build and maintain the AI
| super-infrastructure keeps climbing. It'll become a
| guaranteed implosion scenario. Winning will provide the
| ongoing immense resources needed to keep pushing up the hill
| forever. Everybody else - except a few - will fall away. The
| same outcome took place in search. Anybody spot Lycos,
| Excite, Hotbot, AltaVista around? It costs an enormous amount
| of money to try to keep up with Google (Bing, Baidu, Yandex)
| in search and scale it. This will be an even more brutal
| example of that, as the costs are even higher to scale.
|
| The only way Mistral survives is if they're heavily
| subsidized directly by European states.
| aDyslecticCrow wrote:
| > It was worth the burn to capture the market.
|
| You cannot compare Uber to the AI market. They are too
| different. Uber captured the market because having three
| taxi services is annoying. But people are readily jumping
| between models using multi-model platforms. And nobody is
| significantly ahead of the pack. There is nothing that sets
| anyone apart aside from the rate at which they are burning
| capital. Any advantage is closed within a year.
|
| If OpenAI wants to make a profit, it will raise prices and
| be dropped at a heartbeat for the next cheapest option.
| Most software stacks are designed to be model-agnostic,
| making integration or support a non-factor.
| whiplash451 wrote:
| Three cab apps are a lot less annoying than three LLM
| apps each having their piece of your chats history.
|
| The winner-take-all effect is a lot stronger with chat
| apps.
| snoman wrote:
| That's the exact opposite of the way it is right now (at
| least for me). I don't like having multiple ride hailing
| apps but easily have ChatGPT, Claude, Gemini on my phone
| (and local LLM at home). There is zero effort cost to go
| from one to the other.
| louiskottmann wrote:
| Indeed, and with the technology plateau-ing, being 6-12
| months late with less debt is just long term thinking.
|
| Also, Europe being in the race is a big deal for consumers.
| adventured wrote:
| Why would the debt matter when you have $60 billion in ad
| revenue and are generating $20 billion in op income? That's
| OpenAI 5-7 years from now, if they're able to maintain
| their position with consumers. Once they attach an ad
| product their margins will rapidly soar due to the
| comparatively low cost of the ad segment.
|
| The technology is closer to a decade from seeing a plateau
| for the large general models. GPT o3 is significantly
| beyond o1 (much less 3.5 which was just Nov 2022). Claude 4
| is significantly beyond 3.5. They're not subtle
| improvements. And most likely there will be a splintering
| of specialization that will see huge leaps outside the
| large general models. The radical leap in coding
| capabilities over the past 12-18 months is just an early
| example of how that will work, and it will affect every
| segment of human endeavour.
| aDyslecticCrow wrote:
| > Once they attach an ad product their margins will
| rapidly soar due to the comparatively low cost of the ad
| segment.
|
| They're burning through computers and capital. No amount
| of advertising could cover the cost of training or even
| running these models. The massive subscription costs
| we've started seeing are just a small glimpse into the
| money they are burning through.
|
| They will NOT make a profit using the current methods
| unless the models become at least 10 times more efficient
| than they are now. At which point can Europe adapt to the
| innovation without much cost.
|
| It's an arms race to see who can burn the most money the
| fastest, while selling the result for as little as
| possible. When they need to start making money, it will
| all come crashing down.
| ACCount36 wrote:
| >with the technology plateau-ing
|
| People were claiming that since year 2022. Where's the
| plateau?
| sisve wrote:
| Being the best European AI company is also a multi billion
| business. Its not like China or the US respects GDPR. A lot
| of companies will choose the best European company.
| wafngar wrote:
| But they have built a fully "independent" pipeline. Deepseek
| and others probably trained in gpt4, o1 or whatever data.
| bee_rider wrote:
| How many other open-weights reasoning models are there?
|
| Is it possible to run multiple reasoning models on one problem?
| (Why not? I guess).
|
| Another funny thought is: they release their Small model, and
| kept their Medium as a premium service. I wonder if you could do
| chains with Medium run occasionally, linked together by local
| runs of Small?
| simonw wrote:
| Qwen 3 and DeepSeek R1 and Phi-4 Reasoning are the best open
| weights reasoning models I know of.
| ls612 wrote:
| Just Deepseek I think and there are distillations of that that
| can run on consumer hardware if you really want.
| atemerev wrote:
| So, worse than R1, and only 24B version is open weights? NGMI. R1
| is awesome, and full 630B version is open.
| nake13 wrote:
| The Magistral Small can fit within a single RTX 4090 or a 32GB
| RAM MacBook once quantized.
| the_sleaze_ wrote:
| Excellent news for me.
|
| How does one figure this out? As in I want to know the
| comparable Deepseek or Llama equivalent (size-wise) and don't
| want to figure it out by trial and error.
| lolive wrote:
| Is it indeed the plan of Apple to eventually run such kind of
| models direcly inside a iPhone? Or are the specs of any
| stateOfTheArt smartphone well below the minimum requirements of
| such "lightweight" models?
| awongh wrote:
| Interesting that their niche seems to be small parameter models.
| arnaudsm wrote:
| I wished the charts included Qwen3, the current SOTA in
| reasoning.
|
| Qwen3-4B almost beats Magistral-22B on the 4 available
| benchmarks, and Qwen3-30B-A3B is miles ahead.
| resource_waste wrote:
| No surprise on my end. Mistral has been basically useless due
| to other models always being better.
|
| But its European, so its a point of pride.
|
| Relevance or not, we will keep hearing the name as a result.
| SparkyMcUnicorn wrote:
| 30-A3B is a really impressive model.
|
| I throw tasks at it running locally to save on API costs, and
| it's possibly better than anything we had a year or so ago from
| closed source providers. For programming tasks, I'd rank it
| higher than gpt-4o
| poorman wrote:
| Is there a popular benchmark site people use? Becaues I had to
| test all these by hand and `Qwen3-30B-A3B` still seems like the
| best model I can run in that relative parameter space (/memory
| requirements).
| arnaudsm wrote:
| - https://livebench.ai/#/ + AIME + LiveCodeBench for
| reasoning
|
| - MMLU-Pro for knowledge
|
| - https://lmarena.ai/leaderboard for user preference
|
| We only got Magistral's GPQA, AIME & livecodebench so far.
| devmor wrote:
| I would agree, Qwen3 is definitely the most impressive
| "reasoning" model I've evaluated so far.
| 5mv2 wrote:
| The featured accuracy benchmarks exclude every model that matter
| except DeepSeek, which is quite telling about this new model's
| performance.
|
| This makes it yet another example of European companies building
| great products but fumbling marketing.
|
| Mistral's edge is speed. It's a real pleasure to use because it
| answers in ~1s what takes other models 5-8s, which makes for a
| much better experience. But instead of focusing on it, they bury
| it far down the post.
|
| Try it and see if you like the speed! Note that the speed
| advantage only applies to queries that don't require web-search,
| as Mistral is significantly slower on this one, leading to a ~5
| seconds advantage over 2 minutes of research for the queries I
| benchmarked with Grok.
| funnym0nk3y wrote:
| That is reasonable though. Comparing the product of a small
| company with little resources with giants like Google and
| OpenAI in a field where most advances are due to more and more
| expensive models is nonsense.
| 5mv2 wrote:
| The point I was trying to express is that Mistral is arguably
| far superior to the giants if you care about speed! So I
| wished they communicated this more clearly.
| dominicrose wrote:
| How would you use a fast AI?
|
| My current use of AI is to generate code - or translate some
| code from a programming language to another - which I can then
| improve (instead of writing it from stratch). Speed isn't
| necessary for this. It's a nice-to-have but only if it's not at
| the cost of quality.
|
| Also, as unfair as it "might" be, we do expect a fast AI not to
| be as good, don't we? So I wouldn't focus on that in the
| marketing. I think speed would be easier to sell as something
| extra you would pay for, because then you'd expect the quality
| to remain the same or better.
| redavni wrote:
| analyzing and modifying a user interface in realtime?
| epic9x wrote:
| This thing is crazy fast.
| smeeth wrote:
| They have a deal with Cerebras for inference.
|
| https://www.cerebras.ai/blog/mistral-le-chat
| swah wrote:
| For me this is more important than quality. I love fast
| responses, feels more futuristic.
| rafram wrote:
| Is the number of em-dashes in this marketing copy indicative of
| the kind of output that the model produces? If so, might want to
| tone it down a bit.
| ModernMech wrote:
| But the em dashes -- if appreciated -- are delightfully
| eccentric and whimsical!
| tiahura wrote:
| Unless you're a lawyer. We love 'em.
| NicuCalcea wrote:
| As a journalist, same!
| lee-rhapsody wrote:
| Also a journalist. I use em-dashes all the time
| Gregaros wrote:
| Really anyone that writes for a living. I have a referee
| report on a paper asking me to correct something to be an
| em-dash.
| johnisgood wrote:
| I do not know but sometimes when I type "-" and press space,
| LibreOffice converts it to an em-dash. I get rid of it so
| people won't confuse me with an LLM.
| sebmellen wrote:
| > _Our early tests indicated that Magistral is an excellent
| creative companion. We highly recommend it for creative writing
| and storytelling, with the model capable of producing coherent
| or -- if needed -- delightfully eccentric copy._
| kobe_bryant wrote:
| it's bizarre.
|
| the first sentence is "Announcing Magistral -- the first
| reasoning model by Mistral AI -- excelling in domain-specific,
| transparent, and multilingual reasoning." and those should
| clearly be comma
|
| and this sentence is just flat out wrong "Lack of specialized
| depth needed for domain-specific problems, limited
| transparency, and inconsistent reasoning in the desired
| language -- are just some of the known limitations of early
| thinking models."
| umbra07 wrote:
| really? i would have written it the exact same way (with
| dashes instead of commas).
| rafram wrote:
| The second one is unambiguously wrong. The first just looks
| kind of weird.
| saratogacx wrote:
| That is just Mistral's market style. You see it on a lot of
| their pages. The model output doesn't share the same love for
| the long dash.
| cAtte_ wrote:
| 49 em-dashes, 59 commas. that's a crazy ratio
| christianqchung wrote:
| I don't understand why the benchmark selections are so scattered
| and limited. It only compares Magistral Medium with Deepseek V3,
| R1, and the other close weighted Mistral Medium 3. Why did they
| leave off Magistral Small entirely, alongside comparisons with
| Alibaba Qwen or the mini versions of o3 and o4?
| diggan wrote:
| The only mention of tools I could find is this:
|
| > it significantly improves project planning, backend
| architecture, frontend design, and data engineering through
| sequenced, multi-step actions involving external tools or API.
|
| I'm guessing this means it was trained with tool calling? And if
| so, does that mean it does tool calling within the
| thinking/reasoning, or within the main text? Seems unclear
| simonw wrote:
| Tool calling isn't enabled in the official Magistral Small GGUF
| (or the Ollama one) which is sad. Hope they (or someone else)
| fix that soon.
| NitpickLawyer wrote:
| They have already released Devstral, which is a tool-specific
| finetune of the same base model. That works pretty well with
| cline (even though it was specifically tuned for open-hands).
|
| This would likely be a good model for the "plan" mode in
| various agentic tools (cline, aider, cursor/windsurf/void,
| etc). So you'd have a chat in plan mode, then use devstral to
| actually implement that plan.
| diggan wrote:
| Devstral is targeting tool use+coding I think, so something
| like Magistral but also tool calling (during thinking)
| would be handy too, just for other use cases. But also
| beneficial in the context of creating plans for Devstral.
| simonw wrote:
| Here are my notes on trying this out locally via Ollama and via
| their API (and the llm-mistral plugin) too:
| https://simonwillison.net/2025/Jun/10/magistral/
| atxtechbro wrote:
| Hi Simon,
|
| What's the huge difference between the two pelicans riding
| bicycles? Was one running locally the small version vs the
| pretty good one running the bigger one thru the API?
|
| Thanks, Morgan
| diggan wrote:
| Ollama doesn't like proper naming for some reason, so `ollama
| pull magistral:latest` lands you with the q4_K_M version
| (currently, subject to change).
|
| Mistral's API defaults to `magistral-medium-2506` right now,
| which is running with full precision, no quantization.
| samtheprogram wrote:
| Not only the quantization, but what's available via ollama
| is magistral-small (for local inference), not the -medium
| variant.
| simonw wrote:
| Yes, the bad one was Mistral Small running locally, the
| better one was Mistral Medium via their API.
| GuinansEyebrows wrote:
| This doesn't really explain what "reasoning" means in the context
| of genAI, or how it's done by this product. Are there any good
| sources to learn more about what "reasoning model" means outside
| of marketing-speak?
| pier25 wrote:
| It's pure marketing. See the recent paper by Apple called "The
| Illusion of Thinking".
|
| https://ml-site.cdn-apple.com/papers/the-illusion-of-thinkin...
| kamranjon wrote:
| I sort of agree with this, having read the recent Apple paper
| - but it does show a significant improvement at a certain
| level of complexity - it's just that it requires quite a few
| more tokens to achieve that. It could probably be described
| as a sort of "context" hack because it's basically having a
| conversation with itself to arrive at a better solution.
| You're trading performance/time for a bit better quality.
| throwaway314155 wrote:
| If you read that paper, you'll find a more nuanced take than
| simply "it's pure marketing"
| skeptrune wrote:
| Fully open reasoning traces are useful. Happy there is a vendor
| out there shipping that feature.
| desireco42 wrote:
| One cool think about this model, that I installed locally is that
| supports well other languages as well as it should be pleasant
| conversation partner.
|
| BTW I am personally fan of Mistral, because while it is not the
| top model, it produces good results and the most important thing
| is that it is super fast, just go to it's chat and be amazed. It
| really saves a lot of time to have quick response.
| dwedge wrote:
| Their OCR model was really well hyped and coincidentally came out
| at the time I had a batch of 600 page pdfs to OCR. They were all
| monospace text just for some reason the OCR was missing.
|
| I tried it, 80% of the "text" was recognised as images and output
| as whitespace so most of it was empty. It was much much worse
| than tesseract.
|
| A month later I got the bill for that crap and deleted my
| account.
|
| Maybe this is better but I'm over hype marketing from mistral
| alister wrote:
| As a quick test of logical reasoning and basic Wikipedia-level
| knowledge, I asked Mistral AI the following question:
|
| A Brazilian citizen is flying from Sao Paulo to Paris, with a
| connection in Lisbon. Does he need to clear immigration in Lisbon
| or in Paris or in both cities or in neither city?
|
| Mistral AI said that "immigration control will only be cleared in
| Paris," which I think is wrong.
|
| After I pointed it to the Wikipedia article on this topic[1], it
| corrected itself to say that "immigration control will be cleared
| in Lisbon, the first point of entry into the Schengen Area."
|
| I tried the same question with Meta AI (Llama 4) and it did much
| worse: It said that the traveler "wouldn't need to clear
| immigration in either Lisbon or Paris, given the flight
| connections are within the Schengen Area", which is completely
| incorrect.
|
| I'd be interested to hear if other LLMs give a correct answer.
|
| [1] https://en.wikipedia.org/wiki/Schengen_Area#Air_travel
| marsa wrote:
| doing some reason.. uhh intuitioning i imagine brazil and
| portugal might have some sort of a visa-free deal going on in
| which case llama 4 might actually be right here?
| alister wrote:
| Brazilians don't need a visa for Portugal, France, or any
| Schengen country. But everybody has to pass through
| immigration control (at least a passport check even if you
| don't need a visa) when entering the Schengen zone. My
| question was which country would that happen in.
| mcintyre1994 wrote:
| AFAIK Schengen has a common visa policy, so there couldn't be
| such a deal between Brazil and Portugal. It'd also be
| extremely surprising if two countries not in a common travel
| area had a deal where you didn't have to clear customs at
| all, I suspect that doesn't exist anywhere in the world.
| mcintyre1994 wrote:
| I think Gemini's answer (2.5 Flash) is impressive
|
| ----
|
| Since both Portugal and France are part of the Schengen Area,
| and a Brazilian citizen generally does not need a visa for
| short stays (up to 90 days in any 180-day period) in the
| Schengen Area, here's how immigration will work:
|
| Lisbon: The Brazilian citizen will need to clear immigration in
| Lisbon. This is because Lisbon is the first point of entry into
| the Schengen Area. At this point, their passport will be
| stamped, and they will be officially admitted into the Schengen
| Zone.
|
| Paris: Once they have cleared immigration in Lisbon, their
| flight from Lisbon to Paris is considered a domestic flight
| within the Schengen Area. Therefore, they will not need to
| clear immigration again in Paris.
|
| Important Note: While Brazilians currently enjoy visa-free
| travel, the European Travel Information and Authorization
| System (ETIAS) is expected to become mandatory by late 2026.
| Once implemented, Brazilian citizens will need to obtain this
| electronic authorization before their trip to Europe, even for
| visa-free stays. However, this is a pre-travel authorization,
| not a visa in the traditional sense, and the immigration
| clearance process at the first point of entry would remain the
| same.
| CobrastanJorji wrote:
| Etymological fun: both "mistral" and "magistral" mean "masterly."
|
| Mistral comes from Occitan for masterly, although today as far as
| I know it's only used in English when talking about mediterranean
| winds.
|
| Magistral is just the adjective form of "magister," so "like a
| master."
|
| If you want to make a few bucks, maybe look up some more obscure
| synonyms for masterly and pick up the domain names.
| mark_l_watson wrote:
| Nice, and I see that Ollama already has the smaller 24B version.
| I am traveling with just a mobile device so I have to wait to try
| it, but I have been using their new devstral coding model and it
| is very useful, given that it is also a locally run model so I
| looking forward to trying magistral.
___________________________________________________________________
(page generated 2025-06-10 23:00 UTC)