hngopher.com

       [HN Gopher] Xiaomi MiMo Reasoning Model
       ___________________________________________________________________
        
       Xiaomi MiMo Reasoning Model
        
       Author : thm
       Score  : 387 points
       Date   : 2025-04-30 08:48 UTC (14 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | w4yai wrote:
       | Anyone tried it ?
        
         | Alifatisk wrote:
         | No, where can I try it? I saw a huggingface link but I wonder
         | if they host it themselves somewhere to like how Alibaba does
         | with Qwen chat.
        
           | yorwba wrote:
           | There is a HuggingFace space (probably not official) at:
           | https://huggingface.co/spaces/orangewong/xiaomi-mimo-7b-rl
           | You might have to wait a minute to get a response. Also, the
           | space doesn't seem to have turn-taking implemented, so after
           | giving the Assistant's response, it kept on generating the
           | Human's next message and so on and so forth.
        
         | benterix wrote:
         | Yes, not great, not terrible. I gave it my personal test (a
         | coding task), it produced semi-decent quality code that
         | produced a minor error, after pasting the error it failed to
         | solve it during multiple rounds. I believe another 2-3 years
         | and we'll have quite usable small models.
        
       | ramesh31 wrote:
       | These benchmark numbers cannot be real for a 7b model
        
         | GaggiX wrote:
         | https://qwenlm.github.io/blog/qwen3/
         | 
         | Go look at the benchmark numbers of qwen3-4B if you think these
         | are unrealistic.
        
           | energy123 wrote:
           | Also not "real" in the sense that the model developers most
           | likely put the benchmarks into the training data.
        
         | strangescript wrote:
         | The smaller models have been creeping upward. They don't make
         | headlines because they aren't leapfrogging the mainline models
         | from the big companies, but they are all very capable.
         | 
         | I loaded up a random 12B model on ollama the other day and
         | couldn't believe how good it competent it seemed and how fast
         | it was given the machine I was on. A year or so ago, that would
         | have not been the case.
        
           | apples_oranges wrote:
           | exactly, it seems to validate my assumption from some time
           | ago, that we will mostly use local models for everyday tasks.
        
             | pzo wrote:
             | yeah especially that this simplifies e.g. doing mobile app
             | for 3rd party developers - not extra cost, no need to setup
             | proxy server, monitoring usage to detect abuse, don't need
             | to make complicated subscription plan per usage.
             | 
             | We just need Google or Apple to provide their own
             | equivalent of both: Ollama and OpenRouter so user either
             | use inference for free with local models or BringYourOwnKey
             | and pay themself for tokens/electricity bill. We then just
             | charge smaller fee for renting or buying our cars.
        
             | wg0 wrote:
             | But who will keep them updated and what incentive they
             | would have? That's I can't imagine. Bit vague.
        
               | cruzcampo wrote:
               | Who keeps open source projects maintained and what
               | incentive do they have?
        
               | jsheard wrote:
               | Most open source projects don't need the kinds of
               | resources that ML development does. Access to huge GPU
               | clusters is the obvious one, but it's easy to forget that
               | the big players are also using huge amounts of
               | soulcrushing human labor for data acquisition, cleaning,
               | labeling and fine tuning, and begrudgingly paying for
               | data they can't scrape. People coding in their free time
               | won't get very far without that supporting
               | infrastructure.
               | 
               | I think ML is more akin to open source _hardware,_ in the
               | sense that even when there are people with the relevent
               | skills willing to donate their time for free, the cost of
               | actually realizing their ideas is still so high that it
               | 's rarely feasible to keep up with commercial projects.
        
               | cruzcampo wrote:
               | That's a fair point. I think GPU clusters are the big
               | one, the rest sounds like a good fit for volunteer work.
        
               | wg0 wrote:
               | Or sharing GPU compute. Crowd sourcing.
        
               | cruzcampo wrote:
               | Ooooh I can see a Seti@Home setup working
        
               | jsheard wrote:
               | Easier said than done, training is usually done on "big
               | iron" GPUs which are a cut above any hardware that
               | consumers have lying around, and the clusters run on
               | multi-hundred-gigabit networks. Even if you scaled it
               | down to run on gaming cards, and gathered enough
               | volunteers, the low bandwidth and high latency of the
               | internet would still be a problem.
        
               | simiones wrote:
               | For the bigger open source projects, companies who use
               | that code for making money. Such as Microsoft and Google
               | and IBM (and many others) supporting Linux because they
               | use it extensively. The same answer may end up applying
               | to these models though - if they really become something
               | that gets integrated into products and internal
               | workflows, there will be a market for companies to
               | collaborate on maintaining a good implementation rather
               | than competing needlessly.
        
               | ebiester wrote:
               | Eventually? Microsoft and Copilot, and Apple and Siri -
               | even if they have to outsource their model making. It
               | will be a challenge to desktop Linux.
        
               | WorldPeas wrote:
               | I figure this will take the same shape as package
               | distribution. If you have ever used a linux distribution
               | you'll always see a couple .edu domains serving you
               | packages. Big tech might be able to have specialized
               | models, but following the linux paradigm, it will likely
               | have more cutting edge but temperamental models from
               | university research
        
             | jillesvangurp wrote:
             | Including figuring out which more expensive models to use
             | when needed instead of doing that by default. Early LLMs
             | were not great at reasoning and not great at using tools.
             | And also not great at reproducing knowledge. Small models
             | are too small to reliably reproduce knowledge but when
             | trained properly they are decent enough for simple
             | reasoning tasks. Like deciding whether to use a
             | smarter/slower/more expensive model.
        
             | mring33621 wrote:
             | strong agree
             | 
             | my employer talks about spending 10s of millions on AI
             | 
             | but, even at this early stage, my experiments indicate that
             | the smaller, locally-run models are just fine for a lot of
             | tech and business tasks
             | 
             | this approach has definite privacy advantages and likely
             | has cost advantages, vs pay-per-use LLM over API.
        
             | AustinDev wrote:
             | Not just local models but bespoke apps. The number of
             | bespoke apps I've created shot up dramatically in the last
             | 6 months. I use one to do my recipes/meal plan every week.
             | I have one that goes through all my email addresses and
             | summarizes everything daily. I just finished an intelligent
             | planner / scheduler for my irrigation system that takes
             | into account weather forecast and soil moisture levels. If
             | something is annoying and there is no commercial solution
             | or open-source solution that has the features I want I just
             | make it now and it's fantastic.
             | 
             | I've had friends/family ask to use some of them; I
             | declined. I don't want to do support / feature requests.
        
           | justlikereddit wrote:
           | Last time I did that I was also impressed, for a start.
           | 
           | Problem was that of a top ten book recommendations only the
           | first 3 existed and the rest was a casually blended
           | hallucination delivered in perfect English without skipping a
           | beat.
           | 
           | "You like magic? Try reading the Harlew Porthouse series by
           | JRR Marrow, following the orphan magicians adventures in
           | Hogwesteros"
           | 
           | And the further towards the context limit it goes the deeper
           | this descent into creative derivative madness it goes.
           | 
           | It's entertaining but limited in usefulness.
        
             | omnimus wrote:
             | LLMs are not search engines...
        
               | mirekrusin wrote:
               | Exactly, I think all those base models should be weeded
               | out from this nonsense, kardashian-like labyrinths of
               | knowledge complexities that just makes them dumber by
               | taking space and compute time. If you can google out some
               | nonsense news, it should stay there in search engines for
               | retrieval. Models should be good at using search tools,
               | not at trying to replicate their results. They should
               | start from logic, math, programming, physics and so on,
               | similar to how education system is suppose to equip you
               | with. IMHO small models can give this speed advantage
               | (faster to experiment ie. with parallel diverging
               | results, ability to munch through more data etc).
               | Stripped to this bare minimum they can likely be much
               | smaller with impressive results, tunable, allow for huge
               | context etc.
        
               | Philpax wrote:
               | An interesting development to look forward to will be
               | hooking them up to search engines. The proprietary models
               | already do this, and the open equivalents are not far
               | behind; the recent Qwen models are not as great at
               | knowledge, but are some of the best at agentic
               | functionality. Exciting times ahead!
        
               | hedgehog wrote:
               | If you use something like Open Web UI today the search
               | integration works reasonably well.
        
               | justlikereddit wrote:
               | They are generalists, being search engines is a subset of
               | that.
        
               | achierius wrote:
               | Many tasks that one might want to give a model end up
               | implicitly including search as a subtask. For example,
               | "plan me a trip to Santiago" obviously requires the model
               | to understand details about the real city of Santiago.
               | Less obviously, "write me a Python script to do ..."
               | requires they understand APIs, libraries, etc., the same
               | things you might ask a search engine to pull up. The
               | tasks which do not require a coherent + mostly-correct
               | exterior-world-model are relatively few -- text
               | processing (e.g. "proofread this") is a big one;
               | calculation tasks fit, but LLMs are also bad at those.
        
           | nickip wrote:
           | What model? I have been using api's mostly since ollama was
           | too slow for me.
        
             | estsauver wrote:
             | Qwen3 and some of the smaller gemma's are pretty good and
             | fast. I have a gist with my benchmark #'s here on my m4 pro
             | max (with a whole ton of ram, but most small models will
             | fit on a well spec'ed dev mac.)
             | 
             | https://gist.github.com/estsauver/a70c929398479f3166f3d69bc
             | e...
        
             | patates wrote:
             | I really like Gemma 3. Some quantized version of the 27B
             | will be good enough for a lot of things. You can also take
             | some abliterated version[0] with zero (like zero zero)
             | guardrails and make it write you a very interesting crime
             | story without having to deal with the infamous "sorry but
             | I'm a friendly and safe model and cannot do that and also
             | think about the children" response.
             | 
             | [0]: https://huggingface.co/mlabonne/gemma-3-12b-it-
             | abliterated
        
           | djmips wrote:
           | Which model?
        
         | andrepd wrote:
         | Every LLM is basically being trained on benchmarks so
         | "benchmark" as applied to LLMs is a pretty meaningless term.
        
         | mirekrusin wrote:
         | Today's best models will be worse models for the rest of your
         | life.
        
         | bearjaws wrote:
         | My guess is that it is over fitted to the tests.
        
           | revel wrote:
           | They used RFT and there's only so many benchmarks out there,
           | so I would be very surprised if they _didn 't_ train on the
           | tests.
        
         | otabdeveloper4 wrote:
         | LLM benchmarks are mostly bullshit right now. Wait a few years
         | until the hype cycle returns to sanity.
        
       | mobilio wrote:
       | Waiting for GGUF or MLX models.
       | 
       | Probably within few hours will be released.
        
         | Havoc wrote:
         | FYI making a gguf yourself isn't hard and doesn't even need a
         | GPU.
         | 
         | But yeah waiting is the easier option
        
           | mobilio wrote:
           | I know - but i'm on holiday break with Chromebook.
        
             | ukuina wrote:
             | Now there's a challenge!
        
         | jedisct1 wrote:
         | https://huggingface.co/jedisct1/MiMo-7B-RL-GGUF
        
       | CodeCompost wrote:
       | Open Source or Open Weights?
        
         | NitpickLawyer wrote:
         | MIT - so open source
        
           | Davidzheng wrote:
           | Weights
        
         | ilrwbwrkhv wrote:
         | And this point everybody will open source their models or
         | weights. The only one which will not is open AI.
        
           | rvz wrote:
           | > The only one which will not is open AI.
           | 
           | I think you meant Anthropic. OpenAI is "planning" to release
           | an open weight model this year likely competing against the
           | Llama models. [0]
           | 
           | I have not seen an open weight AI model _ever_ being released
           | by Anthropic at all.
           | 
           | [0] https://openai.com/open-model-feedback/
        
       | userbinator wrote:
       | ...and searching for things related to multiple antennae just got
       | harder.
       | 
       | They could've called it Xiaomimo.
        
         | arghwhat wrote:
         | multiple-input, multiple-output was horribly generic to begin
         | with. Terms like multipath propagation and spatial multiplexing
         | will do just fine.
        
       | Jotalea wrote:
       | I wonder if they will use this model for their AI assistant on
       | their Xiaomi 15 series phones. They most likely will. I'm not
       | really sure what to expect from it.
        
       | jedisct1 wrote:
       | GGUF version (for LM Studio, Ollama, etc):
       | https://huggingface.co/jedisct1/MiMo-7B-RL-GGUF
        
       | m4r1k wrote:
       | My Chinese friend told me MiMo doesn't have a meaning in Chinese
       | (of course Mi Mi  = rice). Anybody have a clue for what it stands
       | for?
        
         | column wrote:
         | (Xiao)mi mo(del) ?
        
           | johanyc wrote:
           | Yeah i think so Xiao (xiao)_Mi (mi)Mo (mo)_Xing (xing)
        
         | echelon_musk wrote:
         | Rice Model?
        
           | est wrote:
           | Millet Model
        
             | esafak wrote:
             | Sorghum next!
        
         | gandalfgreybeer wrote:
         | A lot of Xiaomi products have the prefix Mi. My initial guess
         | is Mo is for model.
         | 
         | Also related reference
         | https://en.wikipedia.org/wiki/Xiaomi#Name_etymology
        
         | nicman23 wrote:
         | probably mimos (mime)
        
       | xmorse wrote:
       | Xiaomi is an amazing company
        
       | lvl155 wrote:
       | Why are there so many English-first AI models from China? Are
       | they not interested in serving their own population? Or is it
       | that if they publish Chinese-first models it won't get publicity
       | in the West?
        
         | whynotmaybe wrote:
         | Haven't we reached a situation where English is the de facto
         | language of scientific research, especially AI benchmarks ?
         | 
         | It's clearly impossible for me to try anything in Chinese, I'd
         | need a translation.
        
           | xmichael909 wrote:
           | Correct. Lingua franca for at least the last 75 years, if not
           | longer.
        
         | enlyth wrote:
         | I assume a large portion of high quality training material is
         | in English
        
           | sigmoid10 wrote:
           | You'd be correct. The largest portion of all languages in
           | Common Crawl (aka the "whole open internet" training corpus)
           | is English with 43%. No other language even reaches double
           | digit percentages. The next biggest one is Russian at 6%,
           | followed by German at 5%.
        
             | Svoka wrote:
             | I wonder where are you getting your data. According to
             | wikipedia russian is #7 https://en.wikipedia.org/wiki/Langu
             | ages_used_on_the_Internet
             | 
             | Only place where russian is in top 5 is in Wikipedia views.
             | Russian part of internet steadily goes down, as russian
             | imperialism crumbles.
        
               | div72 wrote:
               | > The largest portion of all languages in Common Crawl
               | 
               | https://commoncrawl.github.io/cc-crawl-
               | statistics/plots/lang...
        
               | Svoka wrote:
               | Thanks!
               | 
               | I wonder where this discrepancy comes from
        
               | tough wrote:
               | probably under-indexing of non-english sources by these
               | crawlers.
               | 
               | would be interesting if yandex opened some data sets!
        
         | chvid wrote:
         | All LLMs are trained on the same basic blob of data - mostly in
         | English, mostly pirated books and stuff.
        
         | bilbo0s wrote:
         | The mandarin language models obviously exist, but what would
         | you do with them if they provided access to them? And what
         | knowledge would be in them? What is the body of knowledge
         | encoded in Mandarin? What does that look like?
         | 
         | Sad reality is that not many outside of China have the facility
         | with Mandarin to use those models. Even non-native Mandarin
         | speakers who claim to be "fluent", are often messing up
         | intended meaning in text. Or making literal translations that
         | wind up making no sense.
         | 
         | Inside of China, llm use will be Mandarin based. Outside, it
         | seems to me English is the natural choice.
         | 
         | Irony of Irony, probably the best way for a non Mandarin
         | speaking layman to test a Mandarin based model would be to use
         | another LLM to translate prompts to Mandarin.
         | 
         | It's a sad future we're looking at.
         | 
         | Or a brilliant one.
         | 
         | Time will tell.
        
           | johnla wrote:
           | For it to be brilliant, AI needs to be a benevolent tool all
           | the time. It would take just a few malignant actors to turn
           | our world upside. I suspect it'll follow the same Internet
           | and social media path. Great at first, grow markets, bring us
           | together and then take a turn.
        
             | horacemorace wrote:
             | You're right of course. That's why these open source /
             | weight releases are so critically important.
        
         | mensetmanusman wrote:
         | English won. The Chinese youth struggle to write their own
         | calligraphy characters they can read now. Typing favors
         | English.
        
           | rahimnathwani wrote:
           | It's easy and fast to type Chinese sentences using a
           | keyboard.
        
           | throwaway519 wrote:
           | The pendulum already turned back. The current generation
           | under 20 grew up with touchscreens. That obseletes input with
           | pinyin; many don't care if the device has no keyboard.
        
             | thenthenthen wrote:
             | Input is so interesting in China, basically a sorta t9 but
             | just single letters and picking the right characters, with
             | common/frequently used characters first, using pinyin. For
             | example to say " How are you?" You just type "nhm" (Ni Hao
             | Ma) and Ni Hao Ma  shows up as suggestion/autofill. You can
             | make surprisingly long sentences using this method.
        
             | olalonde wrote:
             | > That obseletes input with pinyin
             | 
             | Uh? Pinyin input is by far the most popular input technique
             | in China. I rarely see anyone using handwriting input.
             | 
             | That being said, it has nothing to do with English winning.
             | It's just a Chinese input technique that uses the latin
             | alphabet. English fluency in China is not very common,
             | especially spoken English.
        
             | pertymcpert wrote:
             | What? Only people I've seen use the writing input mode was
             | old people.
        
           | -__---____-ZXyw wrote:
           | Source?
           | 
           | This smacks of "I saw a headline once"-itis. Especially the
           | fact that you refer to the Chinese characters as "calligraphy
           | characters", as if that were the general term or something.
        
             | Jarwain wrote:
             | These are probably the headlines they're thinking about,
             | 
             | https://www.globaltimes.cn/content/747853.shtml
             | 
             | https://www.bbc.com/news/blogs-china-blog-28599392
             | 
             | Or more recently this one about character amnesia
             | 
             | https://globalchinapulse.net/character-amnesia-in-china/
             | 
             | None of these really mean that English has won, though.
             | Rather that phonetics-based writing systems are easier to
             | remember and use, especially in conjunction with digital
             | systems that make it easy to map sound and context to
             | symbols.
             | 
             | I wouldn't be surprised if characters are faster to read
             | though. In English we have all these subconscious shortcuts
             | like looking at the shape of the word, first and last
             | letters, etc. But I think symbology can convey more at a
             | glance. Thus the popularity of emoji
        
         | 34679 wrote:
         | Nearly everyone in the urban areas of China spoke some English
         | when I visited way back in 1995. It's a bilingual society.
        
           | rahimnathwani wrote:
           | I lived in Beijing and Shanghai for 9 years (2010-2019) and
           | this is NOT my impression at all.
        
           | crazygringo wrote:
           | This is not true. I was in Beijing around then and never met
           | a single person who spoke English if they hadn't learned it
           | for professional reasons (they worked in tourism,
           | international business, etc.).
           | 
           | It could not have been further from a bilingual society.
        
           | gcy wrote:
           | I suppose you probably were visiting some university
           | districts/CBDs where people likely to have received higher
           | education. Elsewhere, aside from basic "hello"/"how are you",
           | locals in general are not able to communicate in English.
        
         | choutianxius wrote:
         | One reason is that there is no "good" search engine in China.
         | The most popular one, Baidu, is like garbage compared to Google
         | search. The most useful training data in Chinese would likely
         | be from the social media and video sharing platforms, which I
         | guess is much more difficult to crawl and clean up.
        
           | thoroughburro wrote:
           | A few thousand years of literature ain't nothing...
        
             | fwipsy wrote:
             | Given premodern population sizes and literacy rates,
             | historical texts probably don't exist in anything like the
             | quantity that internet posts do. Even if they did, the
             | information may not be relevant to the modern world.
        
             | kccqzy wrote:
             | Peanuts compared to the discourse available on the
             | internet.
             | 
             | The literature that survived thousands of years are cream
             | of the crop; you won't find lots of random unimportant
             | dialog between people thousands of years ago, but you find
             | that on Reddit.
        
           | littlestymaar wrote:
           | > The most popular one, Baidu, is like garbage compared to
           | Google search
           | 
           | It must be very bad when you see the walking turd that Google
           | search has become over the years...
        
         | spacebanana7 wrote:
         | I wonder whether English text having fewer characters provides
         | an advantage somehow.
        
           | jmole wrote:
           | not really, since tokenization combines multiple characters
        
         | paulsutter wrote:
         | I don't see any indication that it's English-first?
        
         | yyhhsj0521 wrote:
         | Chinese internet mostly consists of a few closed gardens
         | tightly controlled by big corps. Crawlers simply don't work
         | when each company employs an army of engineers to guard their
         | data. Many of the most popular websites are also app only. It's
         | impossible to get the corpus necessary to train a good LLM.
        
           | bredren wrote:
           | Do we have estimates on the corpus that is available? This
           | model's repo describes "multiple strategies to generate
           | massive diverse synthetic reasoning data." FWIW, AI 2027
           | forecasts heavy emphasis on synthetic data creation.
           | 
           | Is the lack of existing corpus just an extra hurdle for
           | Hanzi-first models that are also leading the pack in
           | benchmarks?
        
         | Leary wrote:
         | They are not "English-first". Deepseek-R1, for example, reasons
         | in Chinese when you ask it a question in Chinese.
        
         | overfeed wrote:
         | Why are so many American models multi-lingual, supporting
         | hundreds of languages not commonly spoken in the United States?
         | 
         | Could it be that being multilingual results in a larger pool of
         | human knowledge on the technical side compared to training on
         | just a single language or 2. And on the business side,
         | supporting more languages results in a larger TAM (total
         | addressable market). Using english-language dataset for
         | training LLMs is the _default_ , not the other way like you
         | insinuate.
        
           | achierius wrote:
           | That's clearly a different question. It'd be possible for
           | these models to be Mandarin-first while still supporting
           | other languages, like American models are English-first while
           | doing the same, but that's not what's happening.
        
             | overfeed wrote:
             | > That's clearly a different question. It'd be possible for
             | these models to be Mandarin-first while still supporting
             | other languages
             | 
             | What would a hypothetical "Mandarin-first" model look like
             | to you?
             | 
             | I challenge the notion that the current models are
             | "English-first" - that is an unsubstantiated opinion not
             | supported by fact. I bet, dollars to donuts, these models
             | are SoTA in Mandarin as well. When framed that way, asking
             | "Why are they marketed as English-speaking models outside
             | of China" or "Why are they really good at English" are
             | simply not interesting questions - they have obvious
             | answers.
        
         | Havoc wrote:
         | I was under the impression that we just see the English stuff
         | given that we're using English news channels.
        
         | throwup238 wrote:
         | CommonCrawl [1] is the biggest and most easily accessible
         | legally acquired crawling dataset around, collecting data since
         | 2008. Pretty much everyone uses this as their base dataset for
         | training foundation LLMs and since it's mostly English, all
         | models perform well in English.
         | 
         | [1] https://commoncrawl.org/
        
         | lwansbrough wrote:
         | I'm going to go with: to ensure it is not disadvantaged in
         | benchmarks
        
         | julianozen wrote:
         | One thing I thought was interesting about this paper [1] on
         | understanding LLMs was how the models associate words/concepts
         | in different languages with each other in what they call
         | Multilingual Circuits.
         | 
         | So the example they give:
         | 
         | English: The opposite of "small" is " - big
         | 
         | French: Le contraire de "petit" est " - grand
         | 
         | Chinese: "Xiao "De Fan Yi Ci Shi " - Da
         | 
         | Cool graphic for the above [2]
         | 
         | So while English is the lingua franca of the interenet and
         | represents the largest corpus of data, the primary models being
         | built are able to use an English dataset to build associations
         | across languages. This might create significantly stronger AI
         | and reasoning even for languages and regions that lack the
         | data, tech and resources to build local models
         | 
         | [1] https://www.anthropic.com/research/tracing-thoughts-
         | language...
         | 
         | [2]
         | https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-...
        
         | revskill wrote:
         | Chinese is hard.
        
       | sida wrote:
       | Xiaomi in Chinese translates to "Little Rice"
       | 
       | Here is the meaning of the name
       | 
       | Described here: https://finance.sina.cn/tech/2020-11-26/detail-
       | iiznctke33979...
       | 
       | Zai Hou Lai De Tao Lun Zhong ,Wo Tu Ran Xiang Dao Liao Wo Zui Xi
       | Huan De Yi Ju Hua ----"Fo Guan Yi Li Mi ,Da Ru Xu Mi Shan ".
       | 
       | Translated into English, it means:
       | 
       | "In the later discussions, I suddenly thought of one of my
       | favorite sayings -- 'A Buddha sees a single grain of rice as vast
       | as Mount Sumeru.'"
       | 
       | This expression emphasizes the idea that even something seemingly
       | small (like a grain of rice) can hold immense significance or
       | value when viewed from a different perspective.
       | 
       | Thanks to chatgpt for translating this
        
       | rahimnathwani wrote:
       | When you guys use gguf files in ollama, do you normally create a
       | modelfile to go with it, or just hope that whatever default
       | ollama has work with the new model?
       | 
       | https://github.com/ollama/ollama/blob/main/docs%2Fmodelfile....
        
         | monkmartinez wrote:
         | If you _ollama pull <model>_ the modelfile will be downloaded
         | along with the blob. To modify the model permanently, you can
         | copypasta the modelfile into a text editor and then create a
         | new model from the old modelfile with the changes you
         | require/made.
         | 
         | Here is my workflow when using Open WebUI:
         | 
         | 1. _ollama show qwen3:30b-a3b-q8_0 --modelfile_
         | 
         | 2. Paste the contents of the modelfile into -> admin -> models
         | -> OpenwebUI and rename _qwen3:30b-a3b-q8_0-monkversion-1_
         | 
         | 3. Change parameters like _num_gpu 90_ to change layers... etc.
         | 
         | 4. Keep | Delete old file
         | 
         | Pay attention to the modelfile, it will show you something like
         | this: _# To build a new Modelfile based on this, replace FROM
         | with: # FROM qwen3:30b-a3b-q8_0_ and you need to make sure the
         | paths are correct. I store my models on a large nvme drive that
         | isn 't default ollama as an example of why that matters.
         | 
         | EDIT TO ADD: The 'modelfile' workflow is a pain in the booty.
         | It's a dogwater pattern and I hate it. Some of these models are
         | 30 to 60GB and copying the entire thing to change one parameter
         | is just dumb.
         | 
         | However, ollama does a lot of things right and it makes it easy
         | to get up and running. VLLM, SGLang, Mistral.rs and even
         | llama.cpp require a lot more work to setup.
        
           | rahimnathwani wrote:
           | Sorry, I should have been clearer.
           | 
           | I meant when you download a gguf file from huggingface,
           | instead of using a model from ollama's library.
        
             | monkmartinez wrote:
             | _ollama pull hf.co /unsloth/Qwen3-30B-A3B-GGUF:Q4_K_M_ and
             | the modelfile comes with it. It may have errors in the
             | template or parameters this way. It has to be converted to
             | GGUF/GGML prior to using it this way. You can, of course,
             | convert and create the specific ollama model from bf16
             | safetensors as well.
        
               | rahimnathwani wrote:
               | Yeah when I do this, the modelfile has only FROM and
               | TEMPLATE. No PARAMETERs:                 ollama pull
               | hf.co/jedisct1/MiMo-7B-RL-GGUF:Q4_K_M       ollama show
               | --modelfile hf.co/jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
        
           | o11c wrote:
           | Pretty sure the whole reason Ollama uses raw hashes
           | everywhere is to _avoid_ copying the whole NN gigabytes every
           | time.
        
             | monkmartinez wrote:
             | Maybe I am doing something wrong! When I change parameters
             | on the modelfile, the whole thing is copied. You can't just
             | edit the file as far as I know, you have to create another
             | 38GB monster to change _num_ctx_ to a reasonable number.
        
               | o11c wrote:
               | The parameters (prompt, etc.) should be set _only_ in the
               | new modelfile (passed to `ollama create`), using a FROM
               | referencing the previous ollama model. Parameters in a
               | Modelfile override the hard-coded parameters from the
               | GGUF itself (which are sometimes buggy); in fact from
               | elsewhere in the thread it sounds like Mimo is missing
               | proper stop tokens, or maybe templates in general; I 'm
               | not an expert).
               | 
               | This _will_ show a separate entry in `ollama list` but
               | only copy the Modelfile not the GGUF.
               | 
               | Alternatively, if you use the API, you can override
               | parameters "temporarily". Some UIs let you do this
               | easily, at least for common parameters.
        
         | memhole wrote:
         | I'll typically use the defaults initially and then use a
         | Modelfile if it's something I plan on using. I think you can
         | dump the modelfile ollama uses to have a template to work with.
        
         | Havoc wrote:
         | One of the core design goals Georgi Gerganov had with GGUF was
         | to _not_ need other files. It 's literally bullet point #1 in
         | the specs
         | 
         | >Single-file deployment
         | 
         | >Full information: all information needed to load a model is
         | contained in the model file, and no additional information
         | needs to be provided by the user.
         | 
         | https://github.com/ggml-org/ggml/blob/master/docs/gguf.md
         | 
         | We literally just got rid of that multi file chaos only for
         | ollama to add it back :/
        
           | rahimnathwani wrote:
           | Most of the parameters you would include in ollama's
           | ModelFile are things you would pass to llama.cpp using
           | command line flags:
           | 
           | https://github.com/ggml-
           | org/llama.cpp/blob/master/examples/m...
           | 
           | If you only ever have one set of configuration parameters per
           | model (same temp, top_p, system prompt...), then I guess you
           | can put them in a gguf file (as the format is extensible).
           | 
           | But what if you want two different sets? You still need to
           | keep them somewhere. That could be a shell script for
           | llama.cpp, or a ModelFile for ollama.
           | 
           | (Assuming you don't want to create a new (massive) gguf file
           | for each permutation of parameters.)
        
             | novaRom wrote:
             | This is why we use xdelta3, rdiff, and git
        
       | gizmodo59 wrote:
       | Its funny to see benchmarks where they omit the top performing
       | models like O3 (Which is the best model in many benchmarks
       | currently) and Gemini Pro/Claude 3.7.
        
         | daveguy wrote:
         | Those are much much larger models, and they are proprietary.
         | Those model providers just don't have the distilled versions
         | identified and available.
         | 
         | Notice most of the models they are comparing with are 7B
         | models. The exception is also an open weights model
         | (Qwen-2.5-32B-RL-Zero). Even with 32B parameters the MiMo-7B
         | outperforms it.
        
       | vessenes wrote:
       | Umm wow. Great benchmarks. I'm looking forward to chatting with
       | this one.
       | 
       | A couple things stand out to me -- first is that the 7B model is
       | trained on 25T tokens(!). This is Meta-scale training; Llama 4
       | Maverick was trained on 22T or so. (Scout, the smaller model:
       | 40T).
       | 
       | Second, this is an interesting path to take - not a distilled
       | model or an RL layer to get reasoning out of another model, but a
       | from-scratch RL model with reasoning baked in; the claims seem to
       | indicate you get a lot of extra efficiency per-parameter doing
       | this.
       | 
       | I don't have experience with Xiaomi models, so I'm cautious about
       | this one until I play with it, but it looks like a super viable
       | local reasoning model from the stats.
        
       | siliconc0w wrote:
       | This is incredibly strong coding performance for a 7b. I use
       | Gemini Pro 2.5 which got 67.8 and this got 57.8, very close to
       | Gemini 2.5 Flash which got 60.6.
       | 
       | I've become pretty skeptical about eval results given what we've
       | heard about llama4 so we'll see where this lands on the closed
       | evals but very impressive to see.
        
       | Arcuru wrote:
       | From the paper, I was intrigued by how they handled their RL step
       | for Code Data. They trained against hard but solvable code
       | generation tasks by running unit testing. Is that training step
       | done by the other models?
       | 
       | > Code Data For coding problems, we curate a high-quality
       | training set comprising open-source datasets and our newly
       | collected problem set. We remove problems without test cases. For
       | problems with golden solutions, we exclude those where the golden
       | solution failed to pass all test cases. For problems without
       | golden solution, we discard problems where no test case can be
       | solved in 16 rollouts of advanced reasoning models. Similar to
       | math data, we utilize an SFT version of MiMo-7B to filter out
       | easy problems that are perfectly solved in all 16 rollouts. This
       | rigorous cleaning process yields 30K code problems.
       | 
       | > During each RL iteration, we evaluate thousands of problems to
       | compute the rewards, with each problem potentially containing
       | hundreds of test cases. To improve reward computing efficiency
       | and eliminate GPU idle time, we developed an online judge
       | environment that enables parallel execution of extremely high-
       | volume unit tests.
        
       ___________________________________________________________________
       (page generated 2025-04-30 23:00 UTC)