[HN Gopher] Un Ministral, Des Ministraux
       ___________________________________________________________________
        
       Un Ministral, Des Ministraux
        
       Author : veggieroll
       Score  : 164 points
       Date   : 2024-10-16 14:31 UTC (8 hours ago)
        
 (HTM) web link (mistral.ai)
 (TXT) w3m dump (mistral.ai)
        
       | zurfer wrote:
       | poor title, Mistral released new open weight models that win
       | across benchmarks in their weight class: Ministral 3B and
       | Ministral 8B
        
         | scjody wrote:
         | Are they really open weights? Ministral 3B is "Mistral
         | Commercial License".
        
           | leetharris wrote:
           | Yeah the 3B are NOT open. The 8B are as they can be used
           | under commercial license.
        
             | diggan wrote:
             | "commercial license != open", by most standards
        
               | zurfer wrote:
               | too late to edit now. I was completely wrong about open-
               | weights.
               | 
               | The meme at the bottom made me jump to that conclusion.
               | Well, not that exciting of a release then. :(
        
         | DreamGen wrote:
         | That would be misleading. They aren't open weight (3B is not
         | available). They aren't compared to Qwen 2.5 which beats them
         | in many of the benchmarks presented while having more
         | permissive license. The closed 3B is not competitive with other
         | API only models, like Gemini Flash 8B which costs less and has
         | better performance.
        
       | WiSaGaN wrote:
       | "For self-deployed use, please reach out to us for commercial
       | licenses. We will also assist you in lossless quantization of the
       | models for your specific use-cases to derive maximum performance.
       | 
       | The model weights for Ministral 8B Instruct are available for
       | research use. Both models will be available from our cloud
       | partners shortly."
        
       | lairv wrote:
       | Hard to see how can Mistral compete with Meta, they have order of
       | magnitude less compute, their models are only slightly better (at
       | least on the benchmarks) with less permissive licenses?
        
         | simonw wrote:
         | Yeah, the license thing is definitely a problem. It's hard to
         | get excited about an academic research license for a 3B or 8B
         | model when the Llama 3.1 and 3.2 models are SO good, and are
         | licensed for commercial usage.
        
           | harisec wrote:
           | Qwen 2.5 models are better than Llama and Mistral.
        
             | speedgoose wrote:
             | I disagree. I tried the small ones but they too frequently
             | output Chinese when the prompt is English.
        
               | harisec wrote:
               | I never had this problem but i guess it depends on the
               | prompt.
        
           | sigmar wrote:
           | to be clear- these ministal model are also licensed for
           | commercial use, but not freely licensed for commercial use.
           | and meta also has restrictions on commercial use (have to put
           | "Built with Meta Llama 3" and need to pay meta if you exceed
           | 700 million monthly users)
        
             | sthatipamala wrote:
             | You need to pay meta if you have 700 million users _as of
             | the Llama 3 release date_. Not at any time going forward.
        
               | simonw wrote:
               | ... or presumably if you build a successful company and
               | then try to sell that company to Apple, Microsoft, Google
               | or a few other huge companies.
        
             | tarruda wrote:
             | > need to pay meta if you exceed 700 million monthly users
             | 
             | Seems like a good problem to have
        
         | thrance wrote:
         | In Europe, they are basically the only LLM API provider that is
         | GDPR compliant. This is a big factor here, when selecting a
         | provider.
        
           | vineyardmike wrote:
           | Are all the big clouds not GDPR compliant?
           | 
           | Hard to imagine anyone competing with AWS/GCP/Azure for
           | slices of GPUs/TPU. AFAIK, most major models are available a
           | la carte via API on these providers (with a few exclusives).
           | I can't imagine how anyone can compete the big clouds on
           | serving an API, and I can't imagine them staying "non
           | compliant" for long.
        
             | thrance wrote:
             | Maybe, but when selling a SAAS here, big clients will
             | always ask what cloud provider you use. Using an European
             | one is always a plus, if it isn't simply required.
        
           | TheFragenTaken wrote:
           | With the advent of OSS LLMs, it's "just" a matter of renting
           | compute.
        
         | leetharris wrote:
         | In general I feel like all model providers eventually become
         | infrastructure providers. If the difference between models is
         | very small, it will be about who can serve it reliably, with
         | the most features, with the most security, at the lowest price.
         | 
         | I'm the head of R&D at Rev.ai and this is exactly what we've
         | seen in ASR. We started at $1.20/hr, and our new models are
         | $0.10/hr in < 2 years. We have done human transcription for ~15
         | years and the revenue from ASR is 3 orders of magnitude less
         | ($90/hr vs $0.10/hr) and it will likely go lower. However, our
         | volumes are many orders of magnitude higher now for serving
         | ASR, so it's about even or growth in most cases still.
         | 
         | I think for Mistral to compete with Meta they need a better
         | API. The on-prem/self-hosted people will always choose the best
         | models for themselves and you won't be able to monetize them in
         | a FOSS world anyways, so you just need the better platform.
         | Right now, Meta isn't providing a top-tier platform, but that
         | may eventually change.
        
         | blihp wrote:
         | They can't since Meta can spend billions on models that they
         | give away and never need to get a _direct_ ROI on it. But don
         | 't expect Meta's largess to persist much beyond wiping out the
         | competition. Then their models will probably start to look
         | about as open as Android does today. (either through licensing
         | restrictions or more 'advanced' capabilities being paywalled
         | and/or API-only)
        
         | cosmosgenius wrote:
         | Their 12b nemo model is very good in a homelab compared llama
         | models. This is for story creation.
        
         | espadrine wrote:
         | > _Hard to see how can Mistral compete with Meta_
         | 
         | One significant edge: Meta does not dare even distribute their
         | latest models (the 3.2 series) to EU citizens. Mistral does.
        
       | druskacik wrote:
       | How many Mistral puns are there?
       | 
       | The benchmarks look promising, great job, Mistral.
        
       | xnx wrote:
       | Has anyone put together a good and regularly updated decision
       | tree for what model to use in different circumstances (VRAM
       | limitations, relative strengths, licensing, etc.)? Given the
       | enormous zoo of models in circulation, there must be certain
       | models that are totally obsolete.
        
         | leetharris wrote:
         | People keep making these, but they become outdated so fast and
         | nobody keeps up with it. If your definition of "great" changes
         | in 6 months because a new model shatters your perception of
         | "great," it's hard to rescore legacy models.
         | 
         | I'd say keeping up with the reddit LocalLLama community is the
         | "easiest" way and it's by no means easy.
        
           | potatoman22 wrote:
           | Someone should use an LLM to continuously maintain this
           | decision tree. The tree itself will decide which LLM is used
           | for maintainence.
        
         | iamjackg wrote:
         | This is definitely a problem. I mostly take a look at the
         | various leaderboards, but there is a proliferation of fine-
         | tuned models that makes it incredibly daunting to explore the
         | model space. Add to that that often they're not immediately
         | available on turn-key tools like ollama, and the friction
         | increases even more. All this without even considering things
         | like licenses, what kind of data has been used for fine tuning,
         | quantization, merges, multimodal capabilities.
         | 
         | I would love a curated list.
        
         | mark_l_watson wrote:
         | I tend to choose a recent model available for Ollama, and
         | usually stick with a general purpose local model for a month or
         | so, then re-evaluate. Exceptions to sticking to one local model
         | at a time might be needing a larger context size.
        
       | cmehdy wrote:
       | For anybody wondering about the title, that's a sort-of pun in
       | French about how words get pluralized following French rules.
       | 
       | The quintessential example is "cheval" (horse) which becomes
       | "chevaux" (horses), which is the rule they're following (or being
       | cute about). Un mistral, des mistraux. Un ministral, des
       | ministraux.
       | 
       | (Ironically the plural of the Mistral wind in the Larousse
       | dictionnary would technically be Mistrals[1][2], however weird
       | that sounds to my french ears and to the people who wrote that
       | article perhaps!)
       | 
       | [1]
       | https://www.larousse.fr/dictionnaires/francais/mistral_mistr...
       | [2] https://fr.wiktionary.org/wiki/mistral
        
         | BafS wrote:
         | It's complex because french is full of exceptions
         | 
         | the classical way to pluralize "-al" words:                 un
         | animal - des animaux [en: animal(s)]       un journal - des
         | journaux [en: journal(s)]
         | 
         | with some exceptions:                 un carnaval - des
         | carnavals [en: carnival(s)]       un festival - des festivals
         | [en: festival(s)]       un ideal - des ideals (OR des ideaux)
         | [en: ideal(s)]       un val - des vals (OR des vaux) [en:
         | valley(s)]
         | 
         | There is no logic there (as many things in french), it's up to
         | Mistral to choose how the plural can be
         | 
         | EDIT: Format + better examples
        
           | rich_sasha wrote:
           | That's news to me that French for "valley" is masculine and
           | "val" - isn't it feminine "vallee"? Like, say "Vallee
           | Blanche" near Chamonix? And I suppose the English ripoff,
           | "valley" sounds more like "vallee" than "val" (backwards
           | argument, I know).
        
             | mytailorisrich wrote:
             | Yes, la vallee (feminine) and le val (masculine). Valley is
             | usually la vallee. Val is mostly only used in the names of
             | places.
             | 
             | Apparently val gave vale in English.
        
               | makapuf wrote:
               | Genders in French words is a fine example of a
               | cryptography-grade random generator.
        
               | Muromec wrote:
               | It's a keyed generator, they just lost that small bag
               | that seeded it
        
               | GuB-42 wrote:
               | It can funny sometimes. A breast (un sein) and a vagina
               | (un vagin) are both masculine, while a beard (une barbe)
               | is feminine. For the slang terms, a ball (une couille)
               | and a dick (une bite) are also feminine.
               | 
               | Of course, it is not always the opposite, otherwise it
               | wouldn't be random. A penis (un penis) is masculine for
               | instance.
        
             | idoubtit wrote:
             | The "Vallee blanche" you mentioned is not very far from
             | "Val d'Arly" or "Val Thorens" in the Alps. Both words "val"
             | and "vallee", and also "vallon", come from the Latin
             | "vallis". See the Littre dictionary
             | https://www.littre.org/definition/val for examples over the
             | last millennium.
             | 
             | By the way "Le dormeur du val" (The sleeper of the small
             | valley) is one of Rimbaud's most famous poems, often
             | learned at school.
        
             | bambax wrote:
             | _Un val_ is a small vallee. _Une vallee_ is typically
             | several kilometers wide; _un val_ is a couple of hundred
             | meters wide, tops.
             | 
             | The "Tresor de la langue francaise informatise" (which
             | hasn't been updated since 1994) says _val_ is deprecated,
             | but it 's common in classic literary novels, together with
             | _un vallon_ , a near synonym.
        
           | maw wrote:
           | But are these truly exceptions? Or are they the result of
           | subtler rules French learners are rarely taught explicitly?
           | 
           | I don't know what the precise rules or patterns actually
           | might be. But one fact that jumped out at me is that -mal and
           | -nal start with nasal consonants and three of the
           | "exceptions" end in -val.
        
             | epolanski wrote:
             | If it is like Italian, my native language, it's just
             | exceptions you learn by usage.
        
             | makapuf wrote:
             | I've never heard of such a rule (am native), and your
             | reasoning is fine but there are many common examples :
             | cheval (horse), rival, estival (adjective, "in the summer
             | "), travail (work, same rules for -ail words)...
        
             | cwizou wrote:
             | No, like parent says, with many things in French, grammar
             | and what we call "orthographe" is based on usage. And
             | what's accepted tends to change over time. What's taught in
             | school varies over the years too, with a large tendency to
             | move to simplification. A good example is the french word
             | for "key" which used to be written "clef" but over time
             | moved to "cle" (closer to how it sounds phonetically).
             | About every 20/30 years, we get some "reformes" on the
             | topic, which are more or less followed, there's some good
             | information here (the 1990 one is interesting on its own) :
             | https://en.wikipedia.org/wiki/Reforms_of_French_orthography
             | 
             | Back to this precise one, there's no precise rule or
             | pattern underneath, no rhyme or reason, it's just
             | exceptions based on usage and even those can have their own
             | exceptions. Like "ideals/ideaux", I (french) personally
             | never even heard that "ideals" was a thing. Yet it is,
             | somehow :
             | https://www.larousse.fr/dictionnaires/francais/ideal/41391
        
             | Muromec wrote:
             | Declesion patterns are kinda random in general.
        
           | realo wrote:
           | Indeed not always rational...
           | 
           | cuissots de veau cuisseaux de chevreuil
        
         | mytailorisrich wrote:
         | Mistral is essentially never in plural form because it is the
         | name of a specific wind.
         | 
         | The only plural form people will probably know is from the song
         | Mistral Gagnant where the lyrics include _les mistrals
         | gagnants_ but that refers to sweets!
         | 
         | Not sure why anyone would think "les mistraux"... ;)
        
           | ucarion wrote:
           | I'm not sure if being from the north of France changes
           | things, but I think the Renaud song is much more familiar to
           | folks I know than the wind.
        
             | mytailorisrich wrote:
             | Well yes, it is a Mediterranean wind!
        
             | Spone wrote:
             | The song actually refers to a kind of candy named "Mistral
             | gagnant"
             | 
             | https://fr.m.wikipedia.org/wiki/Mistral_gagnant_(confiserie
             | )
        
         | Rygian wrote:
         | On the subject of French plurals, you also get some
         | counterintuitive ones:
         | 
         | - Egg: un oeuf (pronounced /oef/), des oeufs (pronounced /oe/
         | !)
         | 
         | - Bone: un os (pronounced /os/), des os (pronounced /o/ !)
        
       | barbegal wrote:
       | Does anyone know why Mistral use a 17 bit (131k) vocabulary? I'm
       | sure it's more efficient at encoding text but each token doesn't
       | fit into a 16 bit register which must make it more inefficient
       | computationally?
        
       | sharkjacobs wrote:
       | I know that Mistral is a French company, but I think it's really
       | clever marketing the way they're using French language as
       | branding.
        
       | ed wrote:
       | 3b is is API-only so you won't be able to run it on-device, which
       | is the killer app for these smaller edge models.
       | 
       | I'm not opposed to licensing but "email us for a license" is a
       | bad sign for indie developers, in my experience.
       | 
       | 8b weights are here
       | https://huggingface.co/mistralai/Ministral-8B-Instruct-2410
       | 
       | Commercial entities aren't permitted to use or distribute 8b
       | weights - from the agreement (which states research purposes
       | only):
       | 
       | "Research Purposes": means any use of a Mistral Model,
       | Derivative, or Output that is solely for (a) personal, scientific
       | or academic research, and (b) for non-profit and non-commercial
       | purposes, and not directly or indirectly connected to any
       | commercial activities or business operations. For illustration
       | purposes, Research Purposes does not include (1) any usage of the
       | Mistral Model, Derivative or Output by individuals or contractors
       | employed in or engaged by companies in the context of (a) their
       | daily tasks, or (b) any activity (including but not limited to
       | any testing or proof-of-concept) that is intended to generate
       | revenue, nor (2) any Distribution by a commercial entity of the
       | Mistral Model, Derivative or Output whether in return for payment
       | or free of charge, in any medium or form, including but not
       | limited to through a hosted or managed service (e.g. SaaS, cloud
       | instances, etc.), or behind a software layer.
        
         | mark_l_watson wrote:
         | You are correct, convenience for trying many new models is
         | important. For me, this means being able to run with Ollama.
        
         | diggan wrote:
         | > I'm not opposed to licensing but "email us for a license" is
         | a bad sign for indie developers, in my experience.
         | 
         | At least they're not claiming it's Open Source / Open Weights,
         | kind of happy about that, as other companies didn't get the
         | memo that lying/misleading about stuff like that is bad.
        
           | talldayo wrote:
           | Yeah, a real silver-lining on the API-only access for a model
           | that is intentionally designed for edge devices. As a user I
           | honestly only care about the weights being open - I'm not
           | going to reimpliment their training code and I don't need or
           | want redistributed training data that both already exists
           | elsewhere. There is no benefit, for my uses, to having an
           | "open source" model when I could have weights and finetunes
           | instead.
           | 
           | There's nothing to be happy about when businesses try to
           | wall-off a feature to make you salivate over it more. You're
           | within your right to nitpick licensing differences, but
           | unless everyone gets government-subsidized H100s in their
           | garage I don't think the code will be of use to anyone except
           | moneyed competitors that want to undermine foundational work.
        
         | tarruda wrote:
         | Isn't 3b the kind of size you'd expect to be able to run on the
         | edge? What is the point of using 3b via API when you can use
         | larger and more capable models?
        
           | littlestymaar wrote:
           | GP misunderstood: 3b will be available for running on edge
           | devices, but you must sign a deal with Mistral to get access
           | to the weights to run.
           | 
           | I don't think that can work without a significant lobbying
           | push towards models running on the edge but who knows
           | (especially since they have a former French Minister in the
           | founding team).
        
             | ed wrote:
             | > GP misunderstood
             | 
             | I don't think it's fair to claim the weights are available
             | if you need to hammer out a custom agreement with mistral's
             | sales team first.
             | 
             | If they had a self-serve process, or some sort of shink-
             | wrapped deal up to say 500k users, that would be great. But
             | bespoke contracts are rarely cheap or easy to get. This
             | comes from my experience building a bunch of custom infra
             | for Flux1-dev, only to find I wasn't big enough for a
             | custom agreement, because, duh, the service doesn't exist
             | yet. Mistral is not BFL, but sales teams don't like
             | speculating on usage numbers for a product that hasn't been
             | released yet. Which is a bummer considering most innovation
             | happens at a small scale initially.
        
         | cjtrowbridge wrote:
         | They released it on huggingface.
        
         | wg0 wrote:
         | Genuine question - if I have a model which I only release
         | weights with restrictions on commercial usage and then someone
         | deploys that model and operates it commercially - what are the
         | way to identify that it is my model that's doing the online per
         | token slavery over HTTP endpoint?
        
         | DreamGen wrote:
         | From what I have heard, getting license from them is also far
         | from guaranteed. They are selective about who they want to do
         | business with -- understandable, but something to keep in mind.
        
       | gunalx wrote:
       | Not having open ish weigths is a total dealbreaker for me. The
       | only really compelling reason behind sub 6B models is them being
       | easy to run on even consumer hardware or on the edge.
        
       | aabhay wrote:
       | This press release is a big change in branding and ethos for
       | Mistral. What was originally a vibey, insurgent contender that
       | put out magnet links is now a PR-crafting team that has to fight
       | to pitch their utility to the public.
        
         | littlestymaar wrote:
         | I was going to say the same. Incredible to see how quickly
         | Mistral went from "magnet links casually dropped on twitter by
         | their CTO" to "PR blog post without the model weights" in just
         | a year.
         | 
         | Not a good sign at all as it means their investors are already
         | getting nervous.
        
           | wg0 wrote:
           | That's usually the evidence of VCs getting involved. Somber
           | corporate tone proud on accomplishments user will find useful
           | we continue to improve looking to the future and such.
        
           | swyx wrote:
           | just want to point out that this isnt entirely true. pixtral
           | was magnet link dropped recently. mistral simply has two
           | model rollout channels depending on the level of openness
           | they choose. dont extrapolate too much due to vc hate.
        
       | tarruda wrote:
       | They didn't add a comparison to Qwen 2.5 3b, which seems to
       | surpass Ministral 3b MMLU, HumanEval, GSM8K:
       | https://qwen2.org/qwen2-5/#qwen25-05b15b3b-performance
       | 
       | These benchmarks don't really matter that much, but it is funny
       | how this blog post conveniently forgot to compare with a model
       | that already exists and performs better.
        
         | butterfly42069 wrote:
         | At this point the benchmarks barely matter at all. It's
         | entirely possible to train for a high benchmark score and
         | reduce the overall quality of the model in the process.
         | 
         | Imo use the model that makes the most sense when you ask it
         | stuff, and personally I'd go for the one with the least
         | censorship (which imo isn't AliBaba Qwen anything)
        
         | DreamGen wrote:
         | Also, the 3B model, which is API only (so the only thing that
         | matters is price, quality and speed) should be compared to
         | something like Gemini Flash 1.5 8B which is cheaper than this
         | 3B API and also has higher benchmark performance, super long
         | context support, etc.
        
       | smcleod wrote:
       | It's pretty hard to claim it's the world's best then not compare
       | it to Qwen 2.5....
        
         | daghamm wrote:
         | Yeah, qwen does great in benchmarks but is it really that good
         | in real use?
        
       | daghamm wrote:
       | I don't get it.
       | 
       | These are impressive numbers. But while their use case is local
       | execution to preserve privacy, the only way to use these models
       | right now is to use their API?
        
       ___________________________________________________________________
       (page generated 2024-10-16 23:00 UTC)