[HN Gopher] Un Ministral, Des Ministraux
___________________________________________________________________
Un Ministral, Des Ministraux
Author : veggieroll
Score : 164 points
Date : 2024-10-16 14:31 UTC (8 hours ago)
(HTM) web link (mistral.ai)
(TXT) w3m dump (mistral.ai)
| zurfer wrote:
| poor title, Mistral released new open weight models that win
| across benchmarks in their weight class: Ministral 3B and
| Ministral 8B
| scjody wrote:
| Are they really open weights? Ministral 3B is "Mistral
| Commercial License".
| leetharris wrote:
| Yeah the 3B are NOT open. The 8B are as they can be used
| under commercial license.
| diggan wrote:
| "commercial license != open", by most standards
| zurfer wrote:
| too late to edit now. I was completely wrong about open-
| weights.
|
| The meme at the bottom made me jump to that conclusion.
| Well, not that exciting of a release then. :(
| DreamGen wrote:
| That would be misleading. They aren't open weight (3B is not
| available). They aren't compared to Qwen 2.5 which beats them
| in many of the benchmarks presented while having more
| permissive license. The closed 3B is not competitive with other
| API only models, like Gemini Flash 8B which costs less and has
| better performance.
| WiSaGaN wrote:
| "For self-deployed use, please reach out to us for commercial
| licenses. We will also assist you in lossless quantization of the
| models for your specific use-cases to derive maximum performance.
|
| The model weights for Ministral 8B Instruct are available for
| research use. Both models will be available from our cloud
| partners shortly."
| lairv wrote:
| Hard to see how can Mistral compete with Meta, they have order of
| magnitude less compute, their models are only slightly better (at
| least on the benchmarks) with less permissive licenses?
| simonw wrote:
| Yeah, the license thing is definitely a problem. It's hard to
| get excited about an academic research license for a 3B or 8B
| model when the Llama 3.1 and 3.2 models are SO good, and are
| licensed for commercial usage.
| harisec wrote:
| Qwen 2.5 models are better than Llama and Mistral.
| speedgoose wrote:
| I disagree. I tried the small ones but they too frequently
| output Chinese when the prompt is English.
| harisec wrote:
| I never had this problem but i guess it depends on the
| prompt.
| sigmar wrote:
| to be clear- these ministal model are also licensed for
| commercial use, but not freely licensed for commercial use.
| and meta also has restrictions on commercial use (have to put
| "Built with Meta Llama 3" and need to pay meta if you exceed
| 700 million monthly users)
| sthatipamala wrote:
| You need to pay meta if you have 700 million users _as of
| the Llama 3 release date_. Not at any time going forward.
| simonw wrote:
| ... or presumably if you build a successful company and
| then try to sell that company to Apple, Microsoft, Google
| or a few other huge companies.
| tarruda wrote:
| > need to pay meta if you exceed 700 million monthly users
|
| Seems like a good problem to have
| thrance wrote:
| In Europe, they are basically the only LLM API provider that is
| GDPR compliant. This is a big factor here, when selecting a
| provider.
| vineyardmike wrote:
| Are all the big clouds not GDPR compliant?
|
| Hard to imagine anyone competing with AWS/GCP/Azure for
| slices of GPUs/TPU. AFAIK, most major models are available a
| la carte via API on these providers (with a few exclusives).
| I can't imagine how anyone can compete the big clouds on
| serving an API, and I can't imagine them staying "non
| compliant" for long.
| thrance wrote:
| Maybe, but when selling a SAAS here, big clients will
| always ask what cloud provider you use. Using an European
| one is always a plus, if it isn't simply required.
| TheFragenTaken wrote:
| With the advent of OSS LLMs, it's "just" a matter of renting
| compute.
| leetharris wrote:
| In general I feel like all model providers eventually become
| infrastructure providers. If the difference between models is
| very small, it will be about who can serve it reliably, with
| the most features, with the most security, at the lowest price.
|
| I'm the head of R&D at Rev.ai and this is exactly what we've
| seen in ASR. We started at $1.20/hr, and our new models are
| $0.10/hr in < 2 years. We have done human transcription for ~15
| years and the revenue from ASR is 3 orders of magnitude less
| ($90/hr vs $0.10/hr) and it will likely go lower. However, our
| volumes are many orders of magnitude higher now for serving
| ASR, so it's about even or growth in most cases still.
|
| I think for Mistral to compete with Meta they need a better
| API. The on-prem/self-hosted people will always choose the best
| models for themselves and you won't be able to monetize them in
| a FOSS world anyways, so you just need the better platform.
| Right now, Meta isn't providing a top-tier platform, but that
| may eventually change.
| blihp wrote:
| They can't since Meta can spend billions on models that they
| give away and never need to get a _direct_ ROI on it. But don
| 't expect Meta's largess to persist much beyond wiping out the
| competition. Then their models will probably start to look
| about as open as Android does today. (either through licensing
| restrictions or more 'advanced' capabilities being paywalled
| and/or API-only)
| cosmosgenius wrote:
| Their 12b nemo model is very good in a homelab compared llama
| models. This is for story creation.
| espadrine wrote:
| > _Hard to see how can Mistral compete with Meta_
|
| One significant edge: Meta does not dare even distribute their
| latest models (the 3.2 series) to EU citizens. Mistral does.
| druskacik wrote:
| How many Mistral puns are there?
|
| The benchmarks look promising, great job, Mistral.
| xnx wrote:
| Has anyone put together a good and regularly updated decision
| tree for what model to use in different circumstances (VRAM
| limitations, relative strengths, licensing, etc.)? Given the
| enormous zoo of models in circulation, there must be certain
| models that are totally obsolete.
| leetharris wrote:
| People keep making these, but they become outdated so fast and
| nobody keeps up with it. If your definition of "great" changes
| in 6 months because a new model shatters your perception of
| "great," it's hard to rescore legacy models.
|
| I'd say keeping up with the reddit LocalLLama community is the
| "easiest" way and it's by no means easy.
| potatoman22 wrote:
| Someone should use an LLM to continuously maintain this
| decision tree. The tree itself will decide which LLM is used
| for maintainence.
| iamjackg wrote:
| This is definitely a problem. I mostly take a look at the
| various leaderboards, but there is a proliferation of fine-
| tuned models that makes it incredibly daunting to explore the
| model space. Add to that that often they're not immediately
| available on turn-key tools like ollama, and the friction
| increases even more. All this without even considering things
| like licenses, what kind of data has been used for fine tuning,
| quantization, merges, multimodal capabilities.
|
| I would love a curated list.
| mark_l_watson wrote:
| I tend to choose a recent model available for Ollama, and
| usually stick with a general purpose local model for a month or
| so, then re-evaluate. Exceptions to sticking to one local model
| at a time might be needing a larger context size.
| cmehdy wrote:
| For anybody wondering about the title, that's a sort-of pun in
| French about how words get pluralized following French rules.
|
| The quintessential example is "cheval" (horse) which becomes
| "chevaux" (horses), which is the rule they're following (or being
| cute about). Un mistral, des mistraux. Un ministral, des
| ministraux.
|
| (Ironically the plural of the Mistral wind in the Larousse
| dictionnary would technically be Mistrals[1][2], however weird
| that sounds to my french ears and to the people who wrote that
| article perhaps!)
|
| [1]
| https://www.larousse.fr/dictionnaires/francais/mistral_mistr...
| [2] https://fr.wiktionary.org/wiki/mistral
| BafS wrote:
| It's complex because french is full of exceptions
|
| the classical way to pluralize "-al" words: un
| animal - des animaux [en: animal(s)] un journal - des
| journaux [en: journal(s)]
|
| with some exceptions: un carnaval - des
| carnavals [en: carnival(s)] un festival - des festivals
| [en: festival(s)] un ideal - des ideals (OR des ideaux)
| [en: ideal(s)] un val - des vals (OR des vaux) [en:
| valley(s)]
|
| There is no logic there (as many things in french), it's up to
| Mistral to choose how the plural can be
|
| EDIT: Format + better examples
| rich_sasha wrote:
| That's news to me that French for "valley" is masculine and
| "val" - isn't it feminine "vallee"? Like, say "Vallee
| Blanche" near Chamonix? And I suppose the English ripoff,
| "valley" sounds more like "vallee" than "val" (backwards
| argument, I know).
| mytailorisrich wrote:
| Yes, la vallee (feminine) and le val (masculine). Valley is
| usually la vallee. Val is mostly only used in the names of
| places.
|
| Apparently val gave vale in English.
| makapuf wrote:
| Genders in French words is a fine example of a
| cryptography-grade random generator.
| Muromec wrote:
| It's a keyed generator, they just lost that small bag
| that seeded it
| GuB-42 wrote:
| It can funny sometimes. A breast (un sein) and a vagina
| (un vagin) are both masculine, while a beard (une barbe)
| is feminine. For the slang terms, a ball (une couille)
| and a dick (une bite) are also feminine.
|
| Of course, it is not always the opposite, otherwise it
| wouldn't be random. A penis (un penis) is masculine for
| instance.
| idoubtit wrote:
| The "Vallee blanche" you mentioned is not very far from
| "Val d'Arly" or "Val Thorens" in the Alps. Both words "val"
| and "vallee", and also "vallon", come from the Latin
| "vallis". See the Littre dictionary
| https://www.littre.org/definition/val for examples over the
| last millennium.
|
| By the way "Le dormeur du val" (The sleeper of the small
| valley) is one of Rimbaud's most famous poems, often
| learned at school.
| bambax wrote:
| _Un val_ is a small vallee. _Une vallee_ is typically
| several kilometers wide; _un val_ is a couple of hundred
| meters wide, tops.
|
| The "Tresor de la langue francaise informatise" (which
| hasn't been updated since 1994) says _val_ is deprecated,
| but it 's common in classic literary novels, together with
| _un vallon_ , a near synonym.
| maw wrote:
| But are these truly exceptions? Or are they the result of
| subtler rules French learners are rarely taught explicitly?
|
| I don't know what the precise rules or patterns actually
| might be. But one fact that jumped out at me is that -mal and
| -nal start with nasal consonants and three of the
| "exceptions" end in -val.
| epolanski wrote:
| If it is like Italian, my native language, it's just
| exceptions you learn by usage.
| makapuf wrote:
| I've never heard of such a rule (am native), and your
| reasoning is fine but there are many common examples :
| cheval (horse), rival, estival (adjective, "in the summer
| "), travail (work, same rules for -ail words)...
| cwizou wrote:
| No, like parent says, with many things in French, grammar
| and what we call "orthographe" is based on usage. And
| what's accepted tends to change over time. What's taught in
| school varies over the years too, with a large tendency to
| move to simplification. A good example is the french word
| for "key" which used to be written "clef" but over time
| moved to "cle" (closer to how it sounds phonetically).
| About every 20/30 years, we get some "reformes" on the
| topic, which are more or less followed, there's some good
| information here (the 1990 one is interesting on its own) :
| https://en.wikipedia.org/wiki/Reforms_of_French_orthography
|
| Back to this precise one, there's no precise rule or
| pattern underneath, no rhyme or reason, it's just
| exceptions based on usage and even those can have their own
| exceptions. Like "ideals/ideaux", I (french) personally
| never even heard that "ideals" was a thing. Yet it is,
| somehow :
| https://www.larousse.fr/dictionnaires/francais/ideal/41391
| Muromec wrote:
| Declesion patterns are kinda random in general.
| realo wrote:
| Indeed not always rational...
|
| cuissots de veau cuisseaux de chevreuil
| mytailorisrich wrote:
| Mistral is essentially never in plural form because it is the
| name of a specific wind.
|
| The only plural form people will probably know is from the song
| Mistral Gagnant where the lyrics include _les mistrals
| gagnants_ but that refers to sweets!
|
| Not sure why anyone would think "les mistraux"... ;)
| ucarion wrote:
| I'm not sure if being from the north of France changes
| things, but I think the Renaud song is much more familiar to
| folks I know than the wind.
| mytailorisrich wrote:
| Well yes, it is a Mediterranean wind!
| Spone wrote:
| The song actually refers to a kind of candy named "Mistral
| gagnant"
|
| https://fr.m.wikipedia.org/wiki/Mistral_gagnant_(confiserie
| )
| Rygian wrote:
| On the subject of French plurals, you also get some
| counterintuitive ones:
|
| - Egg: un oeuf (pronounced /oef/), des oeufs (pronounced /oe/
| !)
|
| - Bone: un os (pronounced /os/), des os (pronounced /o/ !)
| barbegal wrote:
| Does anyone know why Mistral use a 17 bit (131k) vocabulary? I'm
| sure it's more efficient at encoding text but each token doesn't
| fit into a 16 bit register which must make it more inefficient
| computationally?
| sharkjacobs wrote:
| I know that Mistral is a French company, but I think it's really
| clever marketing the way they're using French language as
| branding.
| ed wrote:
| 3b is is API-only so you won't be able to run it on-device, which
| is the killer app for these smaller edge models.
|
| I'm not opposed to licensing but "email us for a license" is a
| bad sign for indie developers, in my experience.
|
| 8b weights are here
| https://huggingface.co/mistralai/Ministral-8B-Instruct-2410
|
| Commercial entities aren't permitted to use or distribute 8b
| weights - from the agreement (which states research purposes
| only):
|
| "Research Purposes": means any use of a Mistral Model,
| Derivative, or Output that is solely for (a) personal, scientific
| or academic research, and (b) for non-profit and non-commercial
| purposes, and not directly or indirectly connected to any
| commercial activities or business operations. For illustration
| purposes, Research Purposes does not include (1) any usage of the
| Mistral Model, Derivative or Output by individuals or contractors
| employed in or engaged by companies in the context of (a) their
| daily tasks, or (b) any activity (including but not limited to
| any testing or proof-of-concept) that is intended to generate
| revenue, nor (2) any Distribution by a commercial entity of the
| Mistral Model, Derivative or Output whether in return for payment
| or free of charge, in any medium or form, including but not
| limited to through a hosted or managed service (e.g. SaaS, cloud
| instances, etc.), or behind a software layer.
| mark_l_watson wrote:
| You are correct, convenience for trying many new models is
| important. For me, this means being able to run with Ollama.
| diggan wrote:
| > I'm not opposed to licensing but "email us for a license" is
| a bad sign for indie developers, in my experience.
|
| At least they're not claiming it's Open Source / Open Weights,
| kind of happy about that, as other companies didn't get the
| memo that lying/misleading about stuff like that is bad.
| talldayo wrote:
| Yeah, a real silver-lining on the API-only access for a model
| that is intentionally designed for edge devices. As a user I
| honestly only care about the weights being open - I'm not
| going to reimpliment their training code and I don't need or
| want redistributed training data that both already exists
| elsewhere. There is no benefit, for my uses, to having an
| "open source" model when I could have weights and finetunes
| instead.
|
| There's nothing to be happy about when businesses try to
| wall-off a feature to make you salivate over it more. You're
| within your right to nitpick licensing differences, but
| unless everyone gets government-subsidized H100s in their
| garage I don't think the code will be of use to anyone except
| moneyed competitors that want to undermine foundational work.
| tarruda wrote:
| Isn't 3b the kind of size you'd expect to be able to run on the
| edge? What is the point of using 3b via API when you can use
| larger and more capable models?
| littlestymaar wrote:
| GP misunderstood: 3b will be available for running on edge
| devices, but you must sign a deal with Mistral to get access
| to the weights to run.
|
| I don't think that can work without a significant lobbying
| push towards models running on the edge but who knows
| (especially since they have a former French Minister in the
| founding team).
| ed wrote:
| > GP misunderstood
|
| I don't think it's fair to claim the weights are available
| if you need to hammer out a custom agreement with mistral's
| sales team first.
|
| If they had a self-serve process, or some sort of shink-
| wrapped deal up to say 500k users, that would be great. But
| bespoke contracts are rarely cheap or easy to get. This
| comes from my experience building a bunch of custom infra
| for Flux1-dev, only to find I wasn't big enough for a
| custom agreement, because, duh, the service doesn't exist
| yet. Mistral is not BFL, but sales teams don't like
| speculating on usage numbers for a product that hasn't been
| released yet. Which is a bummer considering most innovation
| happens at a small scale initially.
| cjtrowbridge wrote:
| They released it on huggingface.
| wg0 wrote:
| Genuine question - if I have a model which I only release
| weights with restrictions on commercial usage and then someone
| deploys that model and operates it commercially - what are the
| way to identify that it is my model that's doing the online per
| token slavery over HTTP endpoint?
| DreamGen wrote:
| From what I have heard, getting license from them is also far
| from guaranteed. They are selective about who they want to do
| business with -- understandable, but something to keep in mind.
| gunalx wrote:
| Not having open ish weigths is a total dealbreaker for me. The
| only really compelling reason behind sub 6B models is them being
| easy to run on even consumer hardware or on the edge.
| aabhay wrote:
| This press release is a big change in branding and ethos for
| Mistral. What was originally a vibey, insurgent contender that
| put out magnet links is now a PR-crafting team that has to fight
| to pitch their utility to the public.
| littlestymaar wrote:
| I was going to say the same. Incredible to see how quickly
| Mistral went from "magnet links casually dropped on twitter by
| their CTO" to "PR blog post without the model weights" in just
| a year.
|
| Not a good sign at all as it means their investors are already
| getting nervous.
| wg0 wrote:
| That's usually the evidence of VCs getting involved. Somber
| corporate tone proud on accomplishments user will find useful
| we continue to improve looking to the future and such.
| swyx wrote:
| just want to point out that this isnt entirely true. pixtral
| was magnet link dropped recently. mistral simply has two
| model rollout channels depending on the level of openness
| they choose. dont extrapolate too much due to vc hate.
| tarruda wrote:
| They didn't add a comparison to Qwen 2.5 3b, which seems to
| surpass Ministral 3b MMLU, HumanEval, GSM8K:
| https://qwen2.org/qwen2-5/#qwen25-05b15b3b-performance
|
| These benchmarks don't really matter that much, but it is funny
| how this blog post conveniently forgot to compare with a model
| that already exists and performs better.
| butterfly42069 wrote:
| At this point the benchmarks barely matter at all. It's
| entirely possible to train for a high benchmark score and
| reduce the overall quality of the model in the process.
|
| Imo use the model that makes the most sense when you ask it
| stuff, and personally I'd go for the one with the least
| censorship (which imo isn't AliBaba Qwen anything)
| DreamGen wrote:
| Also, the 3B model, which is API only (so the only thing that
| matters is price, quality and speed) should be compared to
| something like Gemini Flash 1.5 8B which is cheaper than this
| 3B API and also has higher benchmark performance, super long
| context support, etc.
| smcleod wrote:
| It's pretty hard to claim it's the world's best then not compare
| it to Qwen 2.5....
| daghamm wrote:
| Yeah, qwen does great in benchmarks but is it really that good
| in real use?
| daghamm wrote:
| I don't get it.
|
| These are impressive numbers. But while their use case is local
| execution to preserve privacy, the only way to use these models
| right now is to use their API?
___________________________________________________________________
(page generated 2024-10-16 23:00 UTC)