[HN Gopher] Xiaomi MiMo Reasoning Model
___________________________________________________________________
Xiaomi MiMo Reasoning Model
Author : thm
Score : 387 points
Date : 2025-04-30 08:48 UTC (14 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| w4yai wrote:
| Anyone tried it ?
| Alifatisk wrote:
| No, where can I try it? I saw a huggingface link but I wonder
| if they host it themselves somewhere to like how Alibaba does
| with Qwen chat.
| yorwba wrote:
| There is a HuggingFace space (probably not official) at:
| https://huggingface.co/spaces/orangewong/xiaomi-mimo-7b-rl
| You might have to wait a minute to get a response. Also, the
| space doesn't seem to have turn-taking implemented, so after
| giving the Assistant's response, it kept on generating the
| Human's next message and so on and so forth.
| benterix wrote:
| Yes, not great, not terrible. I gave it my personal test (a
| coding task), it produced semi-decent quality code that
| produced a minor error, after pasting the error it failed to
| solve it during multiple rounds. I believe another 2-3 years
| and we'll have quite usable small models.
| ramesh31 wrote:
| These benchmark numbers cannot be real for a 7b model
| GaggiX wrote:
| https://qwenlm.github.io/blog/qwen3/
|
| Go look at the benchmark numbers of qwen3-4B if you think these
| are unrealistic.
| energy123 wrote:
| Also not "real" in the sense that the model developers most
| likely put the benchmarks into the training data.
| strangescript wrote:
| The smaller models have been creeping upward. They don't make
| headlines because they aren't leapfrogging the mainline models
| from the big companies, but they are all very capable.
|
| I loaded up a random 12B model on ollama the other day and
| couldn't believe how good it competent it seemed and how fast
| it was given the machine I was on. A year or so ago, that would
| have not been the case.
| apples_oranges wrote:
| exactly, it seems to validate my assumption from some time
| ago, that we will mostly use local models for everyday tasks.
| pzo wrote:
| yeah especially that this simplifies e.g. doing mobile app
| for 3rd party developers - not extra cost, no need to setup
| proxy server, monitoring usage to detect abuse, don't need
| to make complicated subscription plan per usage.
|
| We just need Google or Apple to provide their own
| equivalent of both: Ollama and OpenRouter so user either
| use inference for free with local models or BringYourOwnKey
| and pay themself for tokens/electricity bill. We then just
| charge smaller fee for renting or buying our cars.
| wg0 wrote:
| But who will keep them updated and what incentive they
| would have? That's I can't imagine. Bit vague.
| cruzcampo wrote:
| Who keeps open source projects maintained and what
| incentive do they have?
| jsheard wrote:
| Most open source projects don't need the kinds of
| resources that ML development does. Access to huge GPU
| clusters is the obvious one, but it's easy to forget that
| the big players are also using huge amounts of
| soulcrushing human labor for data acquisition, cleaning,
| labeling and fine tuning, and begrudgingly paying for
| data they can't scrape. People coding in their free time
| won't get very far without that supporting
| infrastructure.
|
| I think ML is more akin to open source _hardware,_ in the
| sense that even when there are people with the relevent
| skills willing to donate their time for free, the cost of
| actually realizing their ideas is still so high that it
| 's rarely feasible to keep up with commercial projects.
| cruzcampo wrote:
| That's a fair point. I think GPU clusters are the big
| one, the rest sounds like a good fit for volunteer work.
| wg0 wrote:
| Or sharing GPU compute. Crowd sourcing.
| cruzcampo wrote:
| Ooooh I can see a Seti@Home setup working
| jsheard wrote:
| Easier said than done, training is usually done on "big
| iron" GPUs which are a cut above any hardware that
| consumers have lying around, and the clusters run on
| multi-hundred-gigabit networks. Even if you scaled it
| down to run on gaming cards, and gathered enough
| volunteers, the low bandwidth and high latency of the
| internet would still be a problem.
| simiones wrote:
| For the bigger open source projects, companies who use
| that code for making money. Such as Microsoft and Google
| and IBM (and many others) supporting Linux because they
| use it extensively. The same answer may end up applying
| to these models though - if they really become something
| that gets integrated into products and internal
| workflows, there will be a market for companies to
| collaborate on maintaining a good implementation rather
| than competing needlessly.
| ebiester wrote:
| Eventually? Microsoft and Copilot, and Apple and Siri -
| even if they have to outsource their model making. It
| will be a challenge to desktop Linux.
| WorldPeas wrote:
| I figure this will take the same shape as package
| distribution. If you have ever used a linux distribution
| you'll always see a couple .edu domains serving you
| packages. Big tech might be able to have specialized
| models, but following the linux paradigm, it will likely
| have more cutting edge but temperamental models from
| university research
| jillesvangurp wrote:
| Including figuring out which more expensive models to use
| when needed instead of doing that by default. Early LLMs
| were not great at reasoning and not great at using tools.
| And also not great at reproducing knowledge. Small models
| are too small to reliably reproduce knowledge but when
| trained properly they are decent enough for simple
| reasoning tasks. Like deciding whether to use a
| smarter/slower/more expensive model.
| mring33621 wrote:
| strong agree
|
| my employer talks about spending 10s of millions on AI
|
| but, even at this early stage, my experiments indicate that
| the smaller, locally-run models are just fine for a lot of
| tech and business tasks
|
| this approach has definite privacy advantages and likely
| has cost advantages, vs pay-per-use LLM over API.
| AustinDev wrote:
| Not just local models but bespoke apps. The number of
| bespoke apps I've created shot up dramatically in the last
| 6 months. I use one to do my recipes/meal plan every week.
| I have one that goes through all my email addresses and
| summarizes everything daily. I just finished an intelligent
| planner / scheduler for my irrigation system that takes
| into account weather forecast and soil moisture levels. If
| something is annoying and there is no commercial solution
| or open-source solution that has the features I want I just
| make it now and it's fantastic.
|
| I've had friends/family ask to use some of them; I
| declined. I don't want to do support / feature requests.
| justlikereddit wrote:
| Last time I did that I was also impressed, for a start.
|
| Problem was that of a top ten book recommendations only the
| first 3 existed and the rest was a casually blended
| hallucination delivered in perfect English without skipping a
| beat.
|
| "You like magic? Try reading the Harlew Porthouse series by
| JRR Marrow, following the orphan magicians adventures in
| Hogwesteros"
|
| And the further towards the context limit it goes the deeper
| this descent into creative derivative madness it goes.
|
| It's entertaining but limited in usefulness.
| omnimus wrote:
| LLMs are not search engines...
| mirekrusin wrote:
| Exactly, I think all those base models should be weeded
| out from this nonsense, kardashian-like labyrinths of
| knowledge complexities that just makes them dumber by
| taking space and compute time. If you can google out some
| nonsense news, it should stay there in search engines for
| retrieval. Models should be good at using search tools,
| not at trying to replicate their results. They should
| start from logic, math, programming, physics and so on,
| similar to how education system is suppose to equip you
| with. IMHO small models can give this speed advantage
| (faster to experiment ie. with parallel diverging
| results, ability to munch through more data etc).
| Stripped to this bare minimum they can likely be much
| smaller with impressive results, tunable, allow for huge
| context etc.
| Philpax wrote:
| An interesting development to look forward to will be
| hooking them up to search engines. The proprietary models
| already do this, and the open equivalents are not far
| behind; the recent Qwen models are not as great at
| knowledge, but are some of the best at agentic
| functionality. Exciting times ahead!
| hedgehog wrote:
| If you use something like Open Web UI today the search
| integration works reasonably well.
| justlikereddit wrote:
| They are generalists, being search engines is a subset of
| that.
| achierius wrote:
| Many tasks that one might want to give a model end up
| implicitly including search as a subtask. For example,
| "plan me a trip to Santiago" obviously requires the model
| to understand details about the real city of Santiago.
| Less obviously, "write me a Python script to do ..."
| requires they understand APIs, libraries, etc., the same
| things you might ask a search engine to pull up. The
| tasks which do not require a coherent + mostly-correct
| exterior-world-model are relatively few -- text
| processing (e.g. "proofread this") is a big one;
| calculation tasks fit, but LLMs are also bad at those.
| nickip wrote:
| What model? I have been using api's mostly since ollama was
| too slow for me.
| estsauver wrote:
| Qwen3 and some of the smaller gemma's are pretty good and
| fast. I have a gist with my benchmark #'s here on my m4 pro
| max (with a whole ton of ram, but most small models will
| fit on a well spec'ed dev mac.)
|
| https://gist.github.com/estsauver/a70c929398479f3166f3d69bc
| e...
| patates wrote:
| I really like Gemma 3. Some quantized version of the 27B
| will be good enough for a lot of things. You can also take
| some abliterated version[0] with zero (like zero zero)
| guardrails and make it write you a very interesting crime
| story without having to deal with the infamous "sorry but
| I'm a friendly and safe model and cannot do that and also
| think about the children" response.
|
| [0]: https://huggingface.co/mlabonne/gemma-3-12b-it-
| abliterated
| djmips wrote:
| Which model?
| andrepd wrote:
| Every LLM is basically being trained on benchmarks so
| "benchmark" as applied to LLMs is a pretty meaningless term.
| mirekrusin wrote:
| Today's best models will be worse models for the rest of your
| life.
| bearjaws wrote:
| My guess is that it is over fitted to the tests.
| revel wrote:
| They used RFT and there's only so many benchmarks out there,
| so I would be very surprised if they _didn 't_ train on the
| tests.
| otabdeveloper4 wrote:
| LLM benchmarks are mostly bullshit right now. Wait a few years
| until the hype cycle returns to sanity.
| mobilio wrote:
| Waiting for GGUF or MLX models.
|
| Probably within few hours will be released.
| Havoc wrote:
| FYI making a gguf yourself isn't hard and doesn't even need a
| GPU.
|
| But yeah waiting is the easier option
| mobilio wrote:
| I know - but i'm on holiday break with Chromebook.
| ukuina wrote:
| Now there's a challenge!
| jedisct1 wrote:
| https://huggingface.co/jedisct1/MiMo-7B-RL-GGUF
| CodeCompost wrote:
| Open Source or Open Weights?
| NitpickLawyer wrote:
| MIT - so open source
| Davidzheng wrote:
| Weights
| ilrwbwrkhv wrote:
| And this point everybody will open source their models or
| weights. The only one which will not is open AI.
| rvz wrote:
| > The only one which will not is open AI.
|
| I think you meant Anthropic. OpenAI is "planning" to release
| an open weight model this year likely competing against the
| Llama models. [0]
|
| I have not seen an open weight AI model _ever_ being released
| by Anthropic at all.
|
| [0] https://openai.com/open-model-feedback/
| userbinator wrote:
| ...and searching for things related to multiple antennae just got
| harder.
|
| They could've called it Xiaomimo.
| arghwhat wrote:
| multiple-input, multiple-output was horribly generic to begin
| with. Terms like multipath propagation and spatial multiplexing
| will do just fine.
| Jotalea wrote:
| I wonder if they will use this model for their AI assistant on
| their Xiaomi 15 series phones. They most likely will. I'm not
| really sure what to expect from it.
| jedisct1 wrote:
| GGUF version (for LM Studio, Ollama, etc):
| https://huggingface.co/jedisct1/MiMo-7B-RL-GGUF
| m4r1k wrote:
| My Chinese friend told me MiMo doesn't have a meaning in Chinese
| (of course Mi Mi = rice). Anybody have a clue for what it stands
| for?
| column wrote:
| (Xiao)mi mo(del) ?
| johanyc wrote:
| Yeah i think so Xiao (xiao)_Mi (mi)Mo (mo)_Xing (xing)
| echelon_musk wrote:
| Rice Model?
| est wrote:
| Millet Model
| esafak wrote:
| Sorghum next!
| gandalfgreybeer wrote:
| A lot of Xiaomi products have the prefix Mi. My initial guess
| is Mo is for model.
|
| Also related reference
| https://en.wikipedia.org/wiki/Xiaomi#Name_etymology
| nicman23 wrote:
| probably mimos (mime)
| xmorse wrote:
| Xiaomi is an amazing company
| lvl155 wrote:
| Why are there so many English-first AI models from China? Are
| they not interested in serving their own population? Or is it
| that if they publish Chinese-first models it won't get publicity
| in the West?
| whynotmaybe wrote:
| Haven't we reached a situation where English is the de facto
| language of scientific research, especially AI benchmarks ?
|
| It's clearly impossible for me to try anything in Chinese, I'd
| need a translation.
| xmichael909 wrote:
| Correct. Lingua franca for at least the last 75 years, if not
| longer.
| enlyth wrote:
| I assume a large portion of high quality training material is
| in English
| sigmoid10 wrote:
| You'd be correct. The largest portion of all languages in
| Common Crawl (aka the "whole open internet" training corpus)
| is English with 43%. No other language even reaches double
| digit percentages. The next biggest one is Russian at 6%,
| followed by German at 5%.
| Svoka wrote:
| I wonder where are you getting your data. According to
| wikipedia russian is #7 https://en.wikipedia.org/wiki/Langu
| ages_used_on_the_Internet
|
| Only place where russian is in top 5 is in Wikipedia views.
| Russian part of internet steadily goes down, as russian
| imperialism crumbles.
| div72 wrote:
| > The largest portion of all languages in Common Crawl
|
| https://commoncrawl.github.io/cc-crawl-
| statistics/plots/lang...
| Svoka wrote:
| Thanks!
|
| I wonder where this discrepancy comes from
| tough wrote:
| probably under-indexing of non-english sources by these
| crawlers.
|
| would be interesting if yandex opened some data sets!
| chvid wrote:
| All LLMs are trained on the same basic blob of data - mostly in
| English, mostly pirated books and stuff.
| bilbo0s wrote:
| The mandarin language models obviously exist, but what would
| you do with them if they provided access to them? And what
| knowledge would be in them? What is the body of knowledge
| encoded in Mandarin? What does that look like?
|
| Sad reality is that not many outside of China have the facility
| with Mandarin to use those models. Even non-native Mandarin
| speakers who claim to be "fluent", are often messing up
| intended meaning in text. Or making literal translations that
| wind up making no sense.
|
| Inside of China, llm use will be Mandarin based. Outside, it
| seems to me English is the natural choice.
|
| Irony of Irony, probably the best way for a non Mandarin
| speaking layman to test a Mandarin based model would be to use
| another LLM to translate prompts to Mandarin.
|
| It's a sad future we're looking at.
|
| Or a brilliant one.
|
| Time will tell.
| johnla wrote:
| For it to be brilliant, AI needs to be a benevolent tool all
| the time. It would take just a few malignant actors to turn
| our world upside. I suspect it'll follow the same Internet
| and social media path. Great at first, grow markets, bring us
| together and then take a turn.
| horacemorace wrote:
| You're right of course. That's why these open source /
| weight releases are so critically important.
| mensetmanusman wrote:
| English won. The Chinese youth struggle to write their own
| calligraphy characters they can read now. Typing favors
| English.
| rahimnathwani wrote:
| It's easy and fast to type Chinese sentences using a
| keyboard.
| throwaway519 wrote:
| The pendulum already turned back. The current generation
| under 20 grew up with touchscreens. That obseletes input with
| pinyin; many don't care if the device has no keyboard.
| thenthenthen wrote:
| Input is so interesting in China, basically a sorta t9 but
| just single letters and picking the right characters, with
| common/frequently used characters first, using pinyin. For
| example to say " How are you?" You just type "nhm" (Ni Hao
| Ma) and Ni Hao Ma shows up as suggestion/autofill. You can
| make surprisingly long sentences using this method.
| olalonde wrote:
| > That obseletes input with pinyin
|
| Uh? Pinyin input is by far the most popular input technique
| in China. I rarely see anyone using handwriting input.
|
| That being said, it has nothing to do with English winning.
| It's just a Chinese input technique that uses the latin
| alphabet. English fluency in China is not very common,
| especially spoken English.
| pertymcpert wrote:
| What? Only people I've seen use the writing input mode was
| old people.
| -__---____-ZXyw wrote:
| Source?
|
| This smacks of "I saw a headline once"-itis. Especially the
| fact that you refer to the Chinese characters as "calligraphy
| characters", as if that were the general term or something.
| Jarwain wrote:
| These are probably the headlines they're thinking about,
|
| https://www.globaltimes.cn/content/747853.shtml
|
| https://www.bbc.com/news/blogs-china-blog-28599392
|
| Or more recently this one about character amnesia
|
| https://globalchinapulse.net/character-amnesia-in-china/
|
| None of these really mean that English has won, though.
| Rather that phonetics-based writing systems are easier to
| remember and use, especially in conjunction with digital
| systems that make it easy to map sound and context to
| symbols.
|
| I wouldn't be surprised if characters are faster to read
| though. In English we have all these subconscious shortcuts
| like looking at the shape of the word, first and last
| letters, etc. But I think symbology can convey more at a
| glance. Thus the popularity of emoji
| 34679 wrote:
| Nearly everyone in the urban areas of China spoke some English
| when I visited way back in 1995. It's a bilingual society.
| rahimnathwani wrote:
| I lived in Beijing and Shanghai for 9 years (2010-2019) and
| this is NOT my impression at all.
| crazygringo wrote:
| This is not true. I was in Beijing around then and never met
| a single person who spoke English if they hadn't learned it
| for professional reasons (they worked in tourism,
| international business, etc.).
|
| It could not have been further from a bilingual society.
| gcy wrote:
| I suppose you probably were visiting some university
| districts/CBDs where people likely to have received higher
| education. Elsewhere, aside from basic "hello"/"how are you",
| locals in general are not able to communicate in English.
| choutianxius wrote:
| One reason is that there is no "good" search engine in China.
| The most popular one, Baidu, is like garbage compared to Google
| search. The most useful training data in Chinese would likely
| be from the social media and video sharing platforms, which I
| guess is much more difficult to crawl and clean up.
| thoroughburro wrote:
| A few thousand years of literature ain't nothing...
| fwipsy wrote:
| Given premodern population sizes and literacy rates,
| historical texts probably don't exist in anything like the
| quantity that internet posts do. Even if they did, the
| information may not be relevant to the modern world.
| kccqzy wrote:
| Peanuts compared to the discourse available on the
| internet.
|
| The literature that survived thousands of years are cream
| of the crop; you won't find lots of random unimportant
| dialog between people thousands of years ago, but you find
| that on Reddit.
| littlestymaar wrote:
| > The most popular one, Baidu, is like garbage compared to
| Google search
|
| It must be very bad when you see the walking turd that Google
| search has become over the years...
| spacebanana7 wrote:
| I wonder whether English text having fewer characters provides
| an advantage somehow.
| jmole wrote:
| not really, since tokenization combines multiple characters
| paulsutter wrote:
| I don't see any indication that it's English-first?
| yyhhsj0521 wrote:
| Chinese internet mostly consists of a few closed gardens
| tightly controlled by big corps. Crawlers simply don't work
| when each company employs an army of engineers to guard their
| data. Many of the most popular websites are also app only. It's
| impossible to get the corpus necessary to train a good LLM.
| bredren wrote:
| Do we have estimates on the corpus that is available? This
| model's repo describes "multiple strategies to generate
| massive diverse synthetic reasoning data." FWIW, AI 2027
| forecasts heavy emphasis on synthetic data creation.
|
| Is the lack of existing corpus just an extra hurdle for
| Hanzi-first models that are also leading the pack in
| benchmarks?
| Leary wrote:
| They are not "English-first". Deepseek-R1, for example, reasons
| in Chinese when you ask it a question in Chinese.
| overfeed wrote:
| Why are so many American models multi-lingual, supporting
| hundreds of languages not commonly spoken in the United States?
|
| Could it be that being multilingual results in a larger pool of
| human knowledge on the technical side compared to training on
| just a single language or 2. And on the business side,
| supporting more languages results in a larger TAM (total
| addressable market). Using english-language dataset for
| training LLMs is the _default_ , not the other way like you
| insinuate.
| achierius wrote:
| That's clearly a different question. It'd be possible for
| these models to be Mandarin-first while still supporting
| other languages, like American models are English-first while
| doing the same, but that's not what's happening.
| overfeed wrote:
| > That's clearly a different question. It'd be possible for
| these models to be Mandarin-first while still supporting
| other languages
|
| What would a hypothetical "Mandarin-first" model look like
| to you?
|
| I challenge the notion that the current models are
| "English-first" - that is an unsubstantiated opinion not
| supported by fact. I bet, dollars to donuts, these models
| are SoTA in Mandarin as well. When framed that way, asking
| "Why are they marketed as English-speaking models outside
| of China" or "Why are they really good at English" are
| simply not interesting questions - they have obvious
| answers.
| Havoc wrote:
| I was under the impression that we just see the English stuff
| given that we're using English news channels.
| throwup238 wrote:
| CommonCrawl [1] is the biggest and most easily accessible
| legally acquired crawling dataset around, collecting data since
| 2008. Pretty much everyone uses this as their base dataset for
| training foundation LLMs and since it's mostly English, all
| models perform well in English.
|
| [1] https://commoncrawl.org/
| lwansbrough wrote:
| I'm going to go with: to ensure it is not disadvantaged in
| benchmarks
| julianozen wrote:
| One thing I thought was interesting about this paper [1] on
| understanding LLMs was how the models associate words/concepts
| in different languages with each other in what they call
| Multilingual Circuits.
|
| So the example they give:
|
| English: The opposite of "small" is " - big
|
| French: Le contraire de "petit" est " - grand
|
| Chinese: "Xiao "De Fan Yi Ci Shi " - Da
|
| Cool graphic for the above [2]
|
| So while English is the lingua franca of the interenet and
| represents the largest corpus of data, the primary models being
| built are able to use an English dataset to build associations
| across languages. This might create significantly stronger AI
| and reasoning even for languages and regions that lack the
| data, tech and resources to build local models
|
| [1] https://www.anthropic.com/research/tracing-thoughts-
| language...
|
| [2]
| https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-...
| revskill wrote:
| Chinese is hard.
| sida wrote:
| Xiaomi in Chinese translates to "Little Rice"
|
| Here is the meaning of the name
|
| Described here: https://finance.sina.cn/tech/2020-11-26/detail-
| iiznctke33979...
|
| Zai Hou Lai De Tao Lun Zhong ,Wo Tu Ran Xiang Dao Liao Wo Zui Xi
| Huan De Yi Ju Hua ----"Fo Guan Yi Li Mi ,Da Ru Xu Mi Shan ".
|
| Translated into English, it means:
|
| "In the later discussions, I suddenly thought of one of my
| favorite sayings -- 'A Buddha sees a single grain of rice as vast
| as Mount Sumeru.'"
|
| This expression emphasizes the idea that even something seemingly
| small (like a grain of rice) can hold immense significance or
| value when viewed from a different perspective.
|
| Thanks to chatgpt for translating this
| rahimnathwani wrote:
| When you guys use gguf files in ollama, do you normally create a
| modelfile to go with it, or just hope that whatever default
| ollama has work with the new model?
|
| https://github.com/ollama/ollama/blob/main/docs%2Fmodelfile....
| monkmartinez wrote:
| If you _ollama pull <model>_ the modelfile will be downloaded
| along with the blob. To modify the model permanently, you can
| copypasta the modelfile into a text editor and then create a
| new model from the old modelfile with the changes you
| require/made.
|
| Here is my workflow when using Open WebUI:
|
| 1. _ollama show qwen3:30b-a3b-q8_0 --modelfile_
|
| 2. Paste the contents of the modelfile into -> admin -> models
| -> OpenwebUI and rename _qwen3:30b-a3b-q8_0-monkversion-1_
|
| 3. Change parameters like _num_gpu 90_ to change layers... etc.
|
| 4. Keep | Delete old file
|
| Pay attention to the modelfile, it will show you something like
| this: _# To build a new Modelfile based on this, replace FROM
| with: # FROM qwen3:30b-a3b-q8_0_ and you need to make sure the
| paths are correct. I store my models on a large nvme drive that
| isn 't default ollama as an example of why that matters.
|
| EDIT TO ADD: The 'modelfile' workflow is a pain in the booty.
| It's a dogwater pattern and I hate it. Some of these models are
| 30 to 60GB and copying the entire thing to change one parameter
| is just dumb.
|
| However, ollama does a lot of things right and it makes it easy
| to get up and running. VLLM, SGLang, Mistral.rs and even
| llama.cpp require a lot more work to setup.
| rahimnathwani wrote:
| Sorry, I should have been clearer.
|
| I meant when you download a gguf file from huggingface,
| instead of using a model from ollama's library.
| monkmartinez wrote:
| _ollama pull hf.co /unsloth/Qwen3-30B-A3B-GGUF:Q4_K_M_ and
| the modelfile comes with it. It may have errors in the
| template or parameters this way. It has to be converted to
| GGUF/GGML prior to using it this way. You can, of course,
| convert and create the specific ollama model from bf16
| safetensors as well.
| rahimnathwani wrote:
| Yeah when I do this, the modelfile has only FROM and
| TEMPLATE. No PARAMETERs: ollama pull
| hf.co/jedisct1/MiMo-7B-RL-GGUF:Q4_K_M ollama show
| --modelfile hf.co/jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
| o11c wrote:
| Pretty sure the whole reason Ollama uses raw hashes
| everywhere is to _avoid_ copying the whole NN gigabytes every
| time.
| monkmartinez wrote:
| Maybe I am doing something wrong! When I change parameters
| on the modelfile, the whole thing is copied. You can't just
| edit the file as far as I know, you have to create another
| 38GB monster to change _num_ctx_ to a reasonable number.
| o11c wrote:
| The parameters (prompt, etc.) should be set _only_ in the
| new modelfile (passed to `ollama create`), using a FROM
| referencing the previous ollama model. Parameters in a
| Modelfile override the hard-coded parameters from the
| GGUF itself (which are sometimes buggy); in fact from
| elsewhere in the thread it sounds like Mimo is missing
| proper stop tokens, or maybe templates in general; I 'm
| not an expert).
|
| This _will_ show a separate entry in `ollama list` but
| only copy the Modelfile not the GGUF.
|
| Alternatively, if you use the API, you can override
| parameters "temporarily". Some UIs let you do this
| easily, at least for common parameters.
| memhole wrote:
| I'll typically use the defaults initially and then use a
| Modelfile if it's something I plan on using. I think you can
| dump the modelfile ollama uses to have a template to work with.
| Havoc wrote:
| One of the core design goals Georgi Gerganov had with GGUF was
| to _not_ need other files. It 's literally bullet point #1 in
| the specs
|
| >Single-file deployment
|
| >Full information: all information needed to load a model is
| contained in the model file, and no additional information
| needs to be provided by the user.
|
| https://github.com/ggml-org/ggml/blob/master/docs/gguf.md
|
| We literally just got rid of that multi file chaos only for
| ollama to add it back :/
| rahimnathwani wrote:
| Most of the parameters you would include in ollama's
| ModelFile are things you would pass to llama.cpp using
| command line flags:
|
| https://github.com/ggml-
| org/llama.cpp/blob/master/examples/m...
|
| If you only ever have one set of configuration parameters per
| model (same temp, top_p, system prompt...), then I guess you
| can put them in a gguf file (as the format is extensible).
|
| But what if you want two different sets? You still need to
| keep them somewhere. That could be a shell script for
| llama.cpp, or a ModelFile for ollama.
|
| (Assuming you don't want to create a new (massive) gguf file
| for each permutation of parameters.)
| novaRom wrote:
| This is why we use xdelta3, rdiff, and git
| gizmodo59 wrote:
| Its funny to see benchmarks where they omit the top performing
| models like O3 (Which is the best model in many benchmarks
| currently) and Gemini Pro/Claude 3.7.
| daveguy wrote:
| Those are much much larger models, and they are proprietary.
| Those model providers just don't have the distilled versions
| identified and available.
|
| Notice most of the models they are comparing with are 7B
| models. The exception is also an open weights model
| (Qwen-2.5-32B-RL-Zero). Even with 32B parameters the MiMo-7B
| outperforms it.
| vessenes wrote:
| Umm wow. Great benchmarks. I'm looking forward to chatting with
| this one.
|
| A couple things stand out to me -- first is that the 7B model is
| trained on 25T tokens(!). This is Meta-scale training; Llama 4
| Maverick was trained on 22T or so. (Scout, the smaller model:
| 40T).
|
| Second, this is an interesting path to take - not a distilled
| model or an RL layer to get reasoning out of another model, but a
| from-scratch RL model with reasoning baked in; the claims seem to
| indicate you get a lot of extra efficiency per-parameter doing
| this.
|
| I don't have experience with Xiaomi models, so I'm cautious about
| this one until I play with it, but it looks like a super viable
| local reasoning model from the stats.
| siliconc0w wrote:
| This is incredibly strong coding performance for a 7b. I use
| Gemini Pro 2.5 which got 67.8 and this got 57.8, very close to
| Gemini 2.5 Flash which got 60.6.
|
| I've become pretty skeptical about eval results given what we've
| heard about llama4 so we'll see where this lands on the closed
| evals but very impressive to see.
| Arcuru wrote:
| From the paper, I was intrigued by how they handled their RL step
| for Code Data. They trained against hard but solvable code
| generation tasks by running unit testing. Is that training step
| done by the other models?
|
| > Code Data For coding problems, we curate a high-quality
| training set comprising open-source datasets and our newly
| collected problem set. We remove problems without test cases. For
| problems with golden solutions, we exclude those where the golden
| solution failed to pass all test cases. For problems without
| golden solution, we discard problems where no test case can be
| solved in 16 rollouts of advanced reasoning models. Similar to
| math data, we utilize an SFT version of MiMo-7B to filter out
| easy problems that are perfectly solved in all 16 rollouts. This
| rigorous cleaning process yields 30K code problems.
|
| > During each RL iteration, we evaluate thousands of problems to
| compute the rewards, with each problem potentially containing
| hundreds of test cases. To improve reward computing efficiency
| and eliminate GPU idle time, we developed an online judge
| environment that enables parallel execution of extremely high-
| volume unit tests.
___________________________________________________________________
(page generated 2025-04-30 23:00 UTC)