hngopher.com

       [HN Gopher] Kimi K2 is a state-of-the-art mixture-of-experts (Mo...
       ___________________________________________________________________
        
       Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language
       model
        
       GitHub: https://github.com/MoonshotAI/Kimi-K2
        
       Author : c4pt0r
       Score  : 289 points
       Date   : 2025-07-11 15:38 UTC (2 days ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | gs17 wrote:
       | > 1T total / 32B active MoE model
       | 
       | Is this the largest open-weight model?
        
         | bigeagle wrote:
         | I believe so.
         | 
         | Grok-1 is 341B, DeepSeek-v3 is 671B, and recent new open
         | weights models are around 70B~300B.
        
         | adt wrote:
         | No.
         | 
         | At 1T MoE on 15.5T tokens, K2 is one of the largest open source
         | models to date. But BAAI's TeleFM is 1T dense on 15.7T tokens:
         | https://huggingface.co/CofeAI/Tele-FLM-1T
         | 
         | You can always check here: https://lifearchitect.ai/models-
         | table/
        
       | simonw wrote:
       | Big release - https://huggingface.co/moonshotai/Kimi-K2-Instruct
       | model weights are 958.52 GB
        
         | c4pt0r wrote:
         | Paired with programming tools like Claude Code, it could be a
         | low-cost/open-source replacement for Sonnet
        
           | kkzz99 wrote:
           | According to the bench its closer to Opus, but I venture
           | primarily for English and Chinese.
        
           | martin_ wrote:
           | how do you low cost run a 1T param model?
        
             | maven29 wrote:
             | 32B active parameters with a single shared expert.
        
               | JustFinishedBSG wrote:
               | This doesn't change the VRAM usage, only the compute
               | requirements.
        
               | maven29 wrote:
               | You can probably run this on CPU if you have a 4090D for
               | prompt processing, since 1TB of DDR4 only comes out to
               | around $600.
               | 
               | For GPU inference at scale, I think token-level batching
               | is used.
        
               | t1amat wrote:
               | With 32B active parameters it would be ridiculously slow
               | at generation.
        
               | selfhoster11 wrote:
               | DDR3 workstation here - R1 generates at 1 token per
               | second. In practice, this means that for complex queries,
               | the speed of replying is closer to an email response than
               | a chat message, but this is acceptable to me for
               | confidential queries or queries where I need the model to
               | be steerable. I can always hit the R1 API from a provider
               | instead, if I want to.
               | 
               | Given that R1 uses 37B active parameters (compared to 32B
               | for K2), K2 should be slightly faster than that - around
               | 1.15 tokens/second.
        
               | CamperBob2 wrote:
               | That's pretty good. Are you running the real 600B+
               | parameter R1, or a distill, though?
        
               | zackangelo wrote:
               | Typically a combination of expert level parallelism and
               | tensor level parallelism is used.
               | 
               | For the big MLP tensors they would be split across GPUs
               | in a cluster. Then for the MoE parts you would spread the
               | experts across the GPUs and route to them based on which
               | experts are active (there would likely be more than one
               | if the batch size is > 1).
        
               | selfhoster11 wrote:
               | It does not have to be VRAM, it could be system RAM, or
               | weights streamed from SSD storage. Reportedly, the latter
               | method achieves around 1 token per second on computers
               | with 64 GB of system RAM.
               | 
               | R1 (and K2) is MoE, whereas Llama 3 is a dense model
               | family. MoE actually makes these models practical to run
               | on cheaper hardware. DeepSeek R1 is more comfortable for
               | me than Llama 3 70B for exactly that reason - if it
               | spills out of the GPU, you take a large performance hit.
               | 
               | If you need to spill into CPU inference, you really want
               | to be multiplying a different set of 32B weights for
               | every token compared to the same 70B (or more) instead,
               | simply because the computation takes so long.
        
               | refulgentis wrote:
               | The amount of people who will be using it at 1 token/sec
               | because there's no better option, _and_ have 64 GB of
               | RAM, is _vanishingly_ small.
               | 
               | IMHO it sets the local LLM community back when we lean on
               | extreme quantization & streaming weights from disk to say
               | something is possible*, because when people try it out,
               | it turns out it's an awful experience.
               | 
               | * the implication being, _anything_ is possible in that
               | scenario
        
               | homarp wrote:
               | agentic loop can run all night long. It's just a
               | different way to work: prepare your prompt queue, set it
               | up, check result in the morning, adjust. 'local vibe' in
               | 10h instead of 10mn is still better than 10 days of
               | manual side coding.
        
               | hereme888 wrote:
               | Right on! Especially if its coding abilities are better
               | than Claude 4 Opus. I spent thousands on my PC in
               | anticipation of this rather than to play fancy video
               | games.
               | 
               | Now, where's that spare SSD...
        
               | selfhoster11 wrote:
               | Good. Vanishingly small is still more than zero. Over
               | time, running such models will become easier too, as
               | people slowly upgrade to better hardware. It's not like
               | there aren't options for the compute-constrained either.
               | There are lots of Chinese models in the 3-32B range, and
               | Gemma 3 is particularly good too.
               | 
               | I will also point out that having three API-based
               | providers deploying an impractically-large open-weights
               | model beats the pants of having just one. Back in the
               | day, this was called second-sourcing IIRC. With
               | proprietary models, you're at the mercy of one
               | corporation and their Kafkaesque ToS enforcement.
        
               | refulgentis wrote:
               | You said "Good." then wrote a nice stirring bit about how
               | having a bad experience with a 1T model will force people
               | to try 4B/32B models.
               | 
               | That seems separate from the post it was replying to,
               | about 1T param models.
               | 
               | If it is intended to be a reply, it hand waves about how
               | having a bad experience with it will teach them to buy
               | more expensive hardware.
               | 
               | Is that "Good."?
               | 
               | The post points out that if people are taught they need
               | an expensive computer to get 1 token/second, much less
               | try it and find out it's a horrible experience (let's
               | talk about prefill), it will turn them off against local
               | LLMs unnecessarily.
               | 
               | Is that "Good."?
        
               | jimjimwii wrote:
               | Had you posted this comment in the early 90s about linux
               | instead of local models, it would have made about the
               | same amount of sense but aged just as poorly as this
               | comment will.
               | 
               | I'll remain here happily using 2.something tokens /
               | second model.
        
       | cyanf wrote:
       | This is both the largest oss model release thus far, and the
       | largest Muon training run.
        
       | wiradikusuma wrote:
       | I've only started using Claude, Gemini, etc in the last few
       | months (I guess it comes with age, I'm no longer interested in
       | trying the latest "tech"). I assume those are "non-agentic"
       | models.
       | 
       | From reading articles online, "agentic" means like you have a
       | "virtual" Virtual Assistant with "hands" that can google, open
       | apps, etc, on their own.
       | 
       | Why not use existing "non-agentic" model and "orchestrate" them
       | using LangChain, MCP etc? Why create a new breed of model?
       | 
       | I'm sorry if my questions sound silly. Following AI world is like
       | following JavaScript world.
        
         | ozten wrote:
         | It is not a silly question. The various flavors of LLM have
         | issues with reliability. In software we expect five 9s, LLMs
         | aren't even a one 9. Early on it was reliability of them
         | writing JSON output. Then instruction following. Then tool use.
         | Now it's "computer use" and orchestration.
         | 
         | Creating models for this specific problem domain will have a
         | better chance at reliability, which is not a solved problem.
         | 
         | Jules is the gemini coder that links to github. Half the time
         | it doesn't create a pull request and forgets and assumes I'll
         | do some testing or something. It's wild.
        
         | simonw wrote:
         | "Agentic" and "agent" can mean pretty much anything, there are
         | a ton of different definitions out there.
         | 
         | When an LLM says it's "agentic" it usually means that it's been
         | optimized for tool use. Pretty much _all_ the big models (and
         | most of the small ones) are designed for tool use these days,
         | it 's an incredibly valuable feature for a model to offer.
         | 
         | I don't think this new model is any more "agentic" than o3,
         | o4-mini, Gemini 2.5 or Claude 4. All of those models are
         | trained for tools, all of them are very competent at running
         | tool calls in a loop to try to achieve a goal they have been
         | given.
        
         | dcre wrote:
         | Reasonable question, simple answer: "New breed of model" is
         | overstating it -- all these models for years have been fine-
         | tuned using reinforcement learning on a variety of tasks, it's
         | just that the set of tasks (and maybe the amount of RL) has
         | changed over time to include more tool use tasks, and this has
         | made them much, much better at the latter. The explosion of
         | tools like Claude Code this year is driven by the models just
         | being more effective at it. The orchestration external to the
         | model you mention is what people did before this year and it
         | did not work as well.
        
         | selfhoster11 wrote:
         | > I'm sorry if my questions sound silly. Following AI world is
         | like following JavaScript world.
         | 
         | You are more right than you could possibly imagine.
         | 
         | TL;DR: "agentic" just means "can call tools it's been given
         | access to, autonomously, and then access the output" combined
         | with an infinite loop in which the model runs over and over
         | (compared to a one-off interaction like you'd see in ChatGPT).
         | MCP is essentially one of the methods to expose the tools to
         | the model.
         | 
         | Is this something the models could do for a long while with a
         | wrapper? Yup. "Agentic" is the current term for it, that's all.
         | There's some hype around "agentic AI" that's unwarranted, but
         | part of the reason for the hype is that models have become
         | better at tool calling and using data in their context since
         | the early days.
        
       | simonw wrote:
       | Pelican on a bicycle result:
       | https://simonwillison.net/2025/Jul/11/kimi-k2/
        
         | _alex_ wrote:
         | wow!
        
         | ebiester wrote:
         | At this point, they _have_ to be training it. At what point
         | will you start using something else?
        
           | simonw wrote:
           | Once I get a picture that genuinely looks like a pelican
           | riding a bicycle!
        
         | qmmmur wrote:
         | I'm glad we are looking to build nuclear reactors so we can do
         | more of this...
        
           | sergiotapia wrote:
           | me too - we must energymaxx. i want a nuclear reactor in my
           | backyard powering everything. I want ac units in every room
           | and my open door garage while i workout.
        
             | GenerWork wrote:
             | You're saying this in jest, but I would LOVE to have a
             | nuclear reactor in my backyard that produced enough power
             | to where I could have a minisplit for every room in my
             | house, including the garage so I could work out in there.
        
               | CaptainFever wrote:
               | Related: https://en.wikipedia.org/wiki/Kardashev_scale
               | 
               | > The Kardashev scale (Russian: shkala Kardashiova,
               | romanized: shkala Kardashyova) is a method of measuring a
               | civilization's level of technological advancement based
               | on the amount of energy it is capable of harnessing and
               | using.
               | 
               | > Under this scale, the sum of human civilization does
               | not reach Type I status, though it continues to approach
               | it.
        
               | sergiotapia wrote:
               | I am not joking
        
           | 1vuio0pswjnm7 wrote:
           | "I'm glad we are looking to build nuclear reactors so we can
           | do more of this..."
           | 
           | Does this actually mean "they" not "we"
        
         | csomar wrote:
         | Much better than that of Grok 4.
        
         | jug wrote:
         | That's perhaps the best one I've seen yet! For an open weight
         | model, this performance is of course particularly remarkable
         | and impactful.
        
       | MaxPock wrote:
       | Would be hilarious if Zuck with his billion dollar poaching
       | failed to beat budget Chinese models.
        
         | physix wrote:
         | That reminds me of a thought I had about the poachings.
         | 
         | The poaching was probably more aimed at hamstringing Meta's
         | competition.
         | 
         | Because the disruption caused by them leaving in droves is
         | probably more severe than the benefits of having them on board.
         | Unless they are gods, of course.
        
           | stogot wrote:
           | I thought that too
        
         | rfoo wrote:
         | Wikipedia listed a FAIR alumni as cofounder for this "Moonshot
         | AI". Make it funnier probably.
        
         | jug wrote:
         | I can't tell if Kimi is quite top tier, but since Llama 4
         | performed so poorly then yes, this did in fact happen just now.
        
       | aliljet wrote:
       | If the SWE Bench results are to be believed... this looks best in
       | class right now for a local LLM. To be fair, show me the guy who
       | is running this locally...
        
         | selfhoster11 wrote:
         | It's challenging, but not impossible. With 2-bit quantisation,
         | only about 250-ish gigabytes of RAM is required. It doesn't
         | have to be VRAM either, and you can mix and match GPU+CPU
         | inference.
         | 
         | In addition, some people on /r/localLlama are having success
         | with streaming the weights off SSD storage at 1 token/second,
         | which is about the rate I get for DeepSeek R1.
        
       | helloericsf wrote:
       | How does it stack up against the new Grok 4 model?
        
       | Imustaskforhelp wrote:
       | I really really want to try this model for free since I just
       | don't have a gpu.
       | 
       | Is there any way that I could do so?
       | 
       | Open Router? Or does kimi have their own website? Just curious to
       | really try it out!
        
         | blahgeek wrote:
         | Kimi.com
        
       | Alifatisk wrote:
       | Quite impressive benchmark, how come I don't see Kimi in
       | Artificial analysis benchmarks?
        
       | viraptor wrote:
       | How well separated are experts per domain in a model like that?
       | Specifically, if I'm interested in a programming use only, could
       | we possibly strip it to one or two of them? Or should I assume a
       | much wider spread? (And there would be some overlap anyway from
       | the original root model)
        
         | orbital-decay wrote:
         | Inseparable, routing is done per token in a statistically
         | optimal way, not per request on the knowledge domain basis.
        
           | viraptor wrote:
           | Sure, it's done per token, but the question is: how much do
           | the knowledge domains match up with experts. I could not find
           | hard data on this.
        
             | boroboro4 wrote:
             | Check out DeepSeek v3 model paper. They changed the way
             | they train experts (went from aux loss to different kind
             | expert separation training). It did improve experts domain
             | specialization, they have neat graphics on it in the paper.
        
         | renonce wrote:
         | My experience is that experts are not separated in any
         | intuitive way. I would be very interested (and surprised) if
         | someone manages to prune a majority of experts in a way that
         | preserves model capabilities in a specific domain but not
         | others.
         | 
         | See https://github.com/peteryuqin/Kimi-K2-Mini, a project that
         | keeps a small portion of experts and layers and keep the model
         | capabilities across multiple domains.
        
           | viraptor wrote:
           | Sounds like dumping the routing information from programming
           | questions would answer that... I guess I can do a dump from
           | qwen or deepseek locally. You'd think someone would created
           | that kind of graph already, but I couldn't find one.
           | 
           | What I did find instead is that some MoE models are
           | explicitly domain-routed (MoDEM), but it doesn't apply to
           | deepseek which is just equally load balanced, so it's
           | unlikely to apply to Kimi. On the other hand,
           | https://arxiv.org/html/2505.21079v1 shows modality
           | preferences between experts, even in mostly random training.
           | So maybe there's something there.
        
       | 38 wrote:
       | The web chat has extremely low limits FYI. I ran into the limit
       | twice before getting a sane answer and gave up
        
       | brcmthrowaway wrote:
       | Is Kimi the new deep seek?
        
         | Alifatisk wrote:
         | It kinda feels like it, but Moonshots delivery has been like
         | this before aswell, it was just now their new release got way
         | more highlight than usual. When they released Kimi k1.5, those
         | bench were impressive at the time! But everyone was busy with
         | Deepseek v3 and QwQ-32B
        
       | ozgune wrote:
       | This is a very impressive general purpose LLM (GPT 4o,
       | DeepSeek-V3 family). It's also open source.
       | 
       | I think it hasn't received much attention because the frontier
       | shifted to reasoning and multi-modal AI models. In accuracy
       | benchmarks, all the top models are reasoning ones:
       | 
       | https://artificialanalysis.ai/
       | 
       | If someone took Kimi k2 and trained a reasoning model with it,
       | I'd be curious how that model performs.
        
         | GaggiX wrote:
         | >If someone took Kimi k2 and trained a reasoning model with it
         | 
         | I imagine that's what they are going at MoonshotAI right now
        
         | Alifatisk wrote:
         | Why hasn't Kimis current and older models been benchmarked and
         | added to Artificial analysis yet?
        
       | awestroke wrote:
       | This is the model release that made Sam Altman go "Oh wait
       | actually we can't release the new open source model this week,
       | sorry. Something something security concerns".
       | 
       | Perhaps their open source model release doesn't look so good
       | compared to this one
        
       | data_maan wrote:
       | "Open source" lol
       | 
       | Open-weight. As usual, you don't get the dataset, training
       | scripts, etc.
        
         | mistercheph wrote:
         | Wont happen under the current copyright regime, it is
         | impossible to train SOTA without copyrighted text, how do you
         | propose distributing that?
        
           | irthomasthomas wrote:
           | List the titles.
        
             | mixel wrote:
             | But probably they don't have the rights to actually train
             | on them and that's why they do not publish the list.
             | Otherwise it may be laziness who knows
        
           | msk-lywenn wrote:
           | Bibtex
        
         | CaptainFever wrote:
         | It's not even open-weight. It's weight-available. It uses a
         | "modified MIT license":                   Modified MIT License
         | Copyright (c) 2025 Moonshot AI                  Permission is
         | hereby granted, free of charge, to any person obtaining a copy
         | of this software and associated documentation files (the
         | "Software"), to deal         in the Software without
         | restriction, including without limitation the rights         to
         | use, copy, modify, merge, publish, distribute, sublicense,
         | and/or sell         copies of the Software, and to permit
         | persons to whom the Software is         furnished to do so,
         | subject to the following conditions:                  The above
         | copyright notice and this permission notice shall be included
         | in all         copies or substantial portions of the Software.
         | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
         | EXPRESS OR         IMPLIED, INCLUDING BUT NOT LIMITED TO THE
         | WARRANTIES OF MERCHANTABILITY,         FITNESS FOR A PARTICULAR
         | PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
         | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
         | OR OTHER         LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
         | TORT OR OTHERWISE, ARISING FROM,         OUT OF OR IN
         | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
         | THE         SOFTWARE.                  Our only modification
         | part is that, if the Software (or any derivative works
         | thereof) is used for any of your commercial products or
         | services that have         more than 100 million monthly active
         | users, or more than 20 million US dollars         (or
         | equivalent in other currencies) in monthly revenue, you shall
         | prominently         display "Kimi K2" on the user interface of
         | such product or service.
        
           | mitthrowaway2 wrote:
           | This seems significantly more permissive than GPL. I think
           | it's reasonable to consider it open-weight.
        
           | MallocVoidstar wrote:
           | 4-clause BSD is considered open source by Debian and the FSF
           | and has a similar requirement.
        
           | weitendorf wrote:
           | So "MIT with attribution" (but only for huge commercial use
           | cases making tons of money off the product) is not open-
           | weight? Do you consider CC BY photos on Wikipedia to be Image
           | Available or GPL licensed software to be code-available too?
           | 
           | Tangent: I don't understand the contingent that gets upset
           | about open LLMs not shipping with their full training regimes
           | or source data. The software a company spent hundreds of
           | millions of dollars creating, which you are now free to use
           | and distribute with essentially no restrictions, is open
           | source. It has weights in it, and a bunch of related software
           | for actually running a model with those weights. How dare
           | they!
        
         | spookie wrote:
         | We really need to stop diluting the meaning of open source
        
       | data_maan wrote:
       | Open source" lol
       | 
       | It's open-weight. As usual, you don't get the dataset, training
       | scripts, etc.
        
       | vessenes wrote:
       | I tried Kimi on a few coding problems that Claude was spinning
       | on. It's good. It's huge, way too big to be a "local" model -- I
       | think you need something like 16 H200s to run it - but it has a
       | slightly different vibe than some of the other models. I liked
       | it. It would definitely be useful in ensemble use cases at the
       | very least.
        
         | summarity wrote:
         | Reasonable speeds are possible with 4bit quants on 2 512GB Mac
         | Studios (MLX TB4 Ring - see
         | https://x.com/awnihannun/status/1943723599971443134) or even a
         | single socket Epyc system with >1TB of RAM (about the same real
         | world memory throughput as the M Ultra). So $20k-ish to play
         | with it.
         | 
         | For real-world speeds though yeah, you'd need serious hardware.
         | This is more of a "deploy your own stamp" model, less a "local"
         | model.
        
           | refulgentis wrote:
           | I write a local LLM client, but sometimes, I hate that local
           | models have enough knobs to turn that people can advocate
           | they're reasonable in _any_ scenario - in yesterday 's post
           | re: Kimi k2, multiple people spoke up that you can "just"
           | stream the active expert weights out of 64 GB of RAM, and use
           | the lowest GGUF quant, and then you get something that rounds
           | to 1 token/s, and that is reasonable for use.
           | 
           | Good on you for not exaggerating.
           | 
           | I am very curious what exactly they see in that, 2-3 people
           | hopped in to handwave that you just have it do agent stuff
           | overnight and it's well worth it. I can't even begin to
           | imagine unless you have a metric **-ton of easily solved
           | problems that aren't coding. Even a 90% success rate gets you
           | into "useless" territory quick when one step depends on the
           | other, and you're running it autonomoously for hours
        
             | segmondy wrote:
             | I do deepseek at 5tk/sec at home and I'm happy with it. I
             | don't need to do agent stuff to gain from it, I was saving
             | to eventually build out enough to run it at 10tk/sec, but
             | with kimi k2, plan has changed and the savings continue
             | with a goal to run it at 5 tk/sec at home.
        
               | fzzzy wrote:
               | I agree, 5 tokens per second is plenty fast for casual
               | use.
        
               | refulgentis wrote:
               | Cosign for chat, that's my bar for usable on mobile phone
               | (and correlates well with avg. reading speed)
        
               | overfeed wrote:
               | Also works perfectly fine in fire-and-forget, non-
               | interactive agentic workflows. My dream scenario is that
               | I create a bunch of kanban tickets and assign them to one
               | or more AI personas[1], and wake up to some Pull Requests
               | the next morning. I'd me more concerned about tickets-
               | per-day, and not tk/s as I have no interest in watching
               | the inner-workings of the model.
               | 
               | 1. Some more creative than others, with slightly
               | different injected prompts or perhaps even different
               | models entirely.
        
               | numpad0 wrote:
               | > I create a bunch of kanban tickets and assign them to
               | one or more AI personas[1],
               | 
               | Yeah that. Why can't we just `find ./tasks/ | grep \\.md$
               | | xargs llm`. Can't we just write up a government
               | proposal style document, have LLM recursively down into
               | sub-sub-projects and back up until the original proposal
               | document can be translated into a completion report.
               | Constantly correcting a humongous LLM with infinite
               | context length that can keep everything in its head
               | doesn't feel like the right approach.
        
               | londons_explore wrote:
               | In my experience, this sort of thing _nearly_ works...
               | But never quite works well enough and errors and
               | misunderstandings build at every stage and the output is
               | garbage.
               | 
               | Maybe with bigger models it'll work well.
        
               | SV_BubbleTime wrote:
               | It was, last year 5tk/s was reasonable. If you wanted to
               | proof read a paragraph or rewrite some bullet points into
               | a PowerPoint slide.
               | 
               | Now, with agentic coding, thinking models, a "chat with
               | my pdf" or whatever artifacts are being called now, no, I
               | don't think 5/s is enough.
        
           | gpm wrote:
           | > or even a single socket Epyc system with >1TB of RAM
           | 
           | How many tokens/second would this likely achieve?
        
             | neuroelectron wrote:
             | 1
        
             | kachapopopow wrote:
             | around 1 by the time you try to do anything useful with it
             | (>10000 tokens)
        
           | tuananh wrote:
           | looks very much usable for local usage.
        
           | wongarsu wrote:
           | Reasonable speeds are possible if you pay someone else to run
           | it. Right now both NovitaAI and Parasail are running it, both
           | available through Openrouter and both promising not to store
           | any data. I'm sure the other big model hosters will follow if
           | there's demand.
           | 
           | I may not be able to reasonably run it myself, but at least I
           | can choose who I trust to run it and can have inference
           | pricing determined by a competitive market. According to
           | their benchmarks the model is about in a class with Claude 4
           | Sonet, yet already costs less than one third of Sonet's
           | inference pricing
        
             | winter_blue wrote:
             | I'm actually finding Claude 4 Sonnet's thinking model to be
             | too slow to meet my needs. It literally takes several
             | minutes per query on Cursor.
             | 
             | So running it locally is the exact opposite of what I'm
             | looking for.
             | 
             | Rather, I'm willing to pay more, to have it be run on a
             | faster than normal cloud inference machine.
             | 
             | Anthropic is already too slow.
             | 
             | Since this model is open source, maybe someone could offer
             | it at a "premium" pay per use price, where the response
             | rate / inference is done a lot faster, with more resources
             | thrown at it.
        
               | terhechte wrote:
               | Anthropic isn't slow. I'm running Claude Max and it's
               | pretty fast. The problem is that Cursor slowed down their
               | responses in order to optimize their costs. At least a
               | ton of people are experiencing this.
        
               | satvikpendem wrote:
               | > It literally takes several minutes per query on
               | _Cursor._
               | 
               | There's your issue. Use Claude Code or the API directly
               | and compare the speeds. Cursor is slowing down requests
               | to maintain costs.
        
           | spaceman_2020 wrote:
           | This is fairly affordable if you're a business honestly
        
         | moffkalast wrote:
         | Still pretty good, someone with enough resources could distil
         | it down to a more manageable size for the rest of us.
        
         | handzhiev wrote:
         | I tried it a couple of times in comparison to Claude. Kimi
         | wrote much simpler and more readable code than Claude's over-
         | engineered solutions. It missed a few minor subtle edge cases
         | that Claude took care of though.
        
         | airstrike wrote:
         | Claude what? Sonnet? 3.7? 3.5? Opus? 4?
        
         | nathan_compton wrote:
         | The first question I gave it (a sort of pretty simple
         | recreational math question I asked it to code up for me) and it
         | was outrageously wrong. In fairness, and to my surprise,
         | OpenAI's model also failed with this task, although with some
         | prompting, sort of got it.
        
         | Xmd5a wrote:
         | I asked it to give me its opinion on a mail I'm writing. 95% of
         | its content is quotes from famous authors, and the 5% I wrote
         | is actually minimal glue in-between.
         | 
         | All the models I tested, which includes Sonnet 4, DeepSeekR1,
         | 4o and Gemini 2.5 understand this isn't your normal email and
         | what I ask is literary/philosophical criticism, not remarks
         | about conventions, formatting or how to convey my message in a
         | more impactful way.                   Quick fix:         -
         | Replace half the commas with periods.         - Let one
         | metaphor bloom and then stop.         - Read it aloud; wherever
         | you inhale, insert a line break.                  Before:
         | >And if the collective imagination is so seized by it, it's
         | because         we are already--not in some other world--but in
         | this very life, in a         state of socio-, photo-, and
         | videosynthesis.                  After:         >The collective
         | imagination is seized.         >We are not in another world.
         | >We are in this life, in a state of videosynthesis.
         | 
         | Yes, this quote is by Baudrillard. None of the other models
         | fixated on the fact that it's an email (I only used the word
         | once in the prompt). My gut feeling is that this reflects not
         | so much a lack of intelligence as a difference in model
         | personality. Here's what it replied when I shared Gemini's
         | analysis. The point was to have it understand that:
         | Gemini, second sentence: It is less a simple email than an
         | intellectual         and personal exploration still in the
         | making.
         | 
         | Point deflected as soon as understood:
         | **Gemini is polite and pedagogical; I'm going to be blunt.**
         | 1. **This isn't an email, it's a trip.**            ...
         | 2. **The content is good, the format is not.**            ...
         | 3. **Gemini is right about one thing:**            ...
         | * cut 80% of the material,            * ask **one clear
         | question**            * put the rest in a separate Google Doc
         | or PDF for posterity.                  4. **Brutal summary:**
         | Keep your draft for yourself -- it's valuable. But for the
         | email, write a new one in **three sentences**:
         | * thank them,            * recall the context in one line,
         | * ask the question.                  You've already got
         | Gemini's answer to stroke your ego -- now give me the short
         | version you're actually going to send.
         | 
         | The solution to sycophancy is not disparagement (misplaced
         | criticism). The classical true/false positive/negative dilemma
         | is at play here. I guess the bot got caught in the crossfire of
         | 1deg) its no-bullshit attitude (it can only be an attitude)
         | 2deg) preference for delivering blunt criticism over insincere
         | flattery 3deg) being a helpful assistant. Remove point 3deg),
         | and it could have replied: "I'm not engaging in this nonsense".
         | Preserve it and it will politely suggest that you condense your
         | bullshit text, because shorter explanations are better than
         | long winding rants (it's probably in the prompt).
        
       | jug wrote:
       | I like new, solid non-reasoning models that push the frontier.
       | These still have nice use cases (basically anything where logic
       | puzzles or STEM subjects don't apply) where you don't want to
       | spend cash on reasoning tokens.
        
       | fzysingularity wrote:
       | If I had to guess, the OpenAI open-source model got delayed
       | because Kimi K2 stole their thunder and beat their numbers.
        
         | tempaccount420 wrote:
         | Time to RL the hell out of it so it looks better on
         | benchmarks... It's going to be fried.
        
       | fzysingularity wrote:
       | If I had to guess, the OpenAI open-source model got delayed
       | because Kimi K2 stole their thunder and beat their numbers.
        
         | irthomasthomas wrote:
         | Someone at openai did say it was too big to host at home, so
         | you could be right. They will probably be benchmaxxing, right
         | now, searching for a few evals they can beat.
        
           | johnb231 wrote:
           | These are all "too big to host at home". I don't think that
           | is the issue here.
           | 
           | https://github.com/MoonshotAI/Kimi-K2/blob/main/docs/deploy_.
           | ..
           | 
           | "The smallest deployment unit for Kimi-K2 FP8 weights with
           | 128k seqlen on mainstream H200 or H20 platform is a cluster
           | with 16 GPUs with either Tensor Parallel (TP) or "data
           | parallel + expert parallel" (DP+EP)."
           | 
           | 16 GPUs costing ~$30k each. No one is running a ~$500k server
           | at home.
        
             | pxc wrote:
             | I think what GP means is that because the (hopefully)
             | pending OpenAI release is also "too big to run at home",
             | these two models may be close enough in size that they seem
             | more directly comparable, meaning that it's even more
             | important for OpenAI to outperform Kimi K2 on some key
             | benchmarks.
        
             | ls612 wrote:
             | This is a dumb question I know, but how expensive is model
             | distillation? How much training hardware do you need to
             | take something like this and create a 7B and 12B version
             | for consumer hardware?
        
               | johnb231 wrote:
               | The process involves running the original model. You can
               | rent these big GPUs for ~$10 per hour, so that is ~$160
               | per hour for as long as it takes
        
               | qeternity wrote:
               | You can rent H100s for $1.50/gpu/hr these days.
        
             | weitendorf wrote:
             | For most people, before it makes sense to just buy all the
             | hardware yourself, you probably should be renting GPUs by
             | the hour from the various providers serving that need. On
             | Modal, I think should cost about $72/hr to serve Kimi K2
             | https://modal.com/pricing
             | 
             | Once that's running it can serve the needs of many
             | users/clients simultaneously. It'd be too expensive and
             | underutilized for almost any individual to use regularly,
             | but it's not unreasonable for them to do it in short
             | intervals just to play around with it. And it might
             | actually be reasonable for a small number of students or
             | coworkers to share a $70/hr deployment for ~40hr/week in a
             | lot of cases; in other cases, that $70/hr expense could be
             | shared across a large number of coworkers or product users
             | if they use it somewhat infrequently.
             | 
             | So maybe you won't host it at home, but it's actually quite
             | feasible to self-host, and is it ever really worth
             | physically hosting anything at home except as a hobby?
        
             | spaceman_2020 wrote:
             | The real users for these open source models are businesses
             | that want something on premises for data privacy reasons
             | 
             | Not sure if they'll trust a Chinese model but dropping
             | $50-100k for a quantized model that replaces, say, 10
             | paralegals is good enough for a law firm
        
               | MaxPock wrote:
               | An on-premise,open source Chinese model for my
               | business,or a closed source American model from a company
               | that's a defense contractor .Shouldn't be too difficult a
               | decision to make.
        
         | cubefox wrote:
         | According to the benchmarks, Kimi K2 beats GPT-4.1 in many
         | ways. So to "compete", OpenAI would have to release the GPT-4.1
         | weights, or a similar model. Which, I guess, they likely won't
         | do.
        
       | satvikpendem wrote:
       | This is not open source, they have a "modified MIT license" where
       | they have other restrictions on users over a certain threshold.
       | Our only modification part is that, if the Software (or any
       | derivative works         thereof) is used for any of your
       | commercial products or services that have         more than 100
       | million monthly active users, or more than 20 million US dollars
       | (or equivalent in other currencies) in monthly revenue, you shall
       | prominently         display "Kimi K2" on the user interface of
       | such product or service.
        
         | diggan wrote:
         | That seems like a combination of Llama's "prominently display
         | "Built with Llama"" and "greater than 700 million monthly
         | active users" terms but put into one and masquerading as
         | "slightly changed MIT".
        
           | mrob wrote:
           | The difference is it doesn't include Llama's usage
           | restrictions that disqualify it from being an Open Source
           | license.
        
         | kragen wrote:
         | I feel like those restrictions don't violate the OSD (or the
         | FSF's Free Software Definition, or Debian's); there are similar
         | restrictions in the GPLv2, the GPLv3, the 4-clause BSD license,
         | and so on. They just don't have user or revenue thresholds. The
         | GPLv2, for example, says:
         | 
         | > _c) If the modified program normally reads commands
         | interactively when run, you must cause it, when started running
         | for such interactive use in the most ordinary way, to print or
         | display an announcement including an appropriate copyright
         | notice and a notice that there is no warranty (or else, saying
         | that you provide a warranty) and that users may redistribute
         | the program under these conditions, and telling the user how to
         | view a copy of this License. (Exception: if the Program itself
         | is interactive but does not normally print such an
         | announcement, your work based on the Program is not required to
         | print an announcement.)_
         | 
         | And the 4-clause BSD license says:
         | 
         | > _3. All advertising materials mentioning features or use of
         | this software must display the following acknowledgement: This
         | product includes software developed by_ the organization.
         | 
         | Both of these licenses are not just non-controversially open-
         | source licenses; they're such central open-source licenses that
         | IIRC much of the debate on the adoption of the OSD was centered
         | on ensuring that they, or the more difficult Artistic license,
         | were not excluded.
         | 
         | It's sort of nonsense to talk about neural networks being "open
         | source" or "not open source", because there isn't source code
         | that they could be built from. The nearest equivalent would be
         | the training materials and training procedure, which isn't
         | provided, but running that is not very similar to
         | recompilation: it costs millions of dollars and doesn't produce
         | the same results every time.
         | 
         | But that's not a question about the _license_.
        
           | mindcrime wrote:
           | It may not violate the OSD, but I would still argue that this
           | license is a Bad Idea. Not because what they're trying to do
           | is inherently bad in any way, but simply because it's yet
           | another new, unknown, not-fully-understood license to deal
           | with. The fact that we're having this conversation
           | illustrating that very fact.
           | 
           | My personal feeling is that almost every project (I'll hedge
           | a little because life is complicated) should prefer an OSI
           | certified license and NOT make up their own license (even if
           | that new license is "just" a modification of an existing
           | license). License proliferation[1] is generally considered a
           | Bad Thing for good reason.
           | 
           | [1]: https://en.wikipedia.org/wiki/License_proliferation
        
             | wongarsu wrote:
             | Aren't most licenses "not fully understood" in any
             | reasonable legal sense? To my knowledge only the Artistic
             | License and the GPL have seen the inside of a court room.
             | And yet to this day nobody really knows how the GPL works
             | with languages that don't follow C's model of a compile and
             | a link step. And the boundaries of what's a derivative work
             | in the GPL are still mostly set by convention, not a legal
             | framework.
             | 
             | What makes us comfortable with the "traditional open source
             | licenses" is that people have been using them for decades
             | and nothing bad has happened. But that's mostly because
             | breaking an open source license is rarely litigated
             | against, not because we have some special knowledge of what
             | those licenses mean and how to abide by that
        
               | mindcrime wrote:
               | _Aren 't most licenses "not fully understood" in any
               | reasonable legal sense?_
               | 
               | OK, fair enough. Pretend I said "not well understood"
               | instead. The point is, the long-standing, well known
               | licenses that have been around for decades are better
               | understood that some random "I made up my own thing"
               | license. And yes, some of that may be down to just norms
               | and conventions, and yes, not all of these licenses have
               | been tested in court. But I think most people would feel
               | more comfortable using an OSI approved license, and are
               | hesitant to foster the creation of even more licenses.
               | 
               | If nothing else, license proliferation is bad because of
               | the combinatorics of understanding license compatibility
               | issues. Every new license makes the number of
               | permutations that much bigger, and creates more unknown
               | situations.
        
             | user_7832 wrote:
             | I'm of the personal opinion that it's quite reasonable for
             | the creators to want attribution in case you manage to
             | build a "successful product" off their work. The fact that
             | it's a new or different license is a much smaller thing.
             | 
             | A lot of open source, copyleft things already have
             | attribution clauses. You're allowed commerical use of
             | someone else's work already, regardless of scale.
             | Attribution is a very benign ask.
        
               | mindcrime wrote:
               | I personally have no (or at least little) problem with
               | attribution. As you say, quite a few licenses have some
               | degree of attribution required. There's even a whole
               | dedicated (and OSI approved) license who's raison d'etre
               | is about attribution:
               | 
               | https://en.wikipedia.org/wiki/Common_Public_Attribution_L
               | ice...
               | 
               | What I'm saying, if I'm saying anything at all, is that
               | it might have been better to pick one of these existing
               | licenses that has some attribution requirement, rather
               | than adding to the license proliferation problem.
        
               | hnfong wrote:
               | You speak as if "license proliferation" is actually a
               | problem.
               | 
               | But is it really?
               | 
               | Sure, it may make some licenses incompatible with each
               | other, but that's basically equivalent to whining about
               | somebody releasing their code in GPL and it can't be used
               | in a project that uses MIT...
               | 
               | And your argument that the terms are "less understood"
               | really doesn't matter. It's not like people know the
               | Common Public Attribution License in and out either. (I'm
               | going to argue that 99% devs don't even know the GPL
               | well.) Poor drafting could be an issue, but I don't think
               | this is the case here.
               | 
               | And on an ideological standpoint, I don't think people
               | should be shamed into releasing their code under terms
               | they aren't 100% comfortable with.
        
           | ensignavenger wrote:
           | The OSD does not allow for discrimination:
           | 
           | "The license must not discriminate against any person or
           | group of persons."
           | 
           | "The license must not restrict anyone from making use of the
           | program in a specific field of endeavor. For example, it may
           | not restrict the program from being used in a business, or
           | from being used for genetic research."
           | 
           | By having a clause that discriminates based on revenue, it
           | cannot be Open Source.
           | 
           | If they had required everyone to provide attribution in the
           | same manner, then we would have to examine the specifics of
           | the attribution requirement to determine if it is
           | compatible... but since they discriminate, it violates the
           | open source definition, and no further analysis is necessary.
        
             | sophiebits wrote:
             | This license with the custom clause seems equivalent to
             | dual-licensing the product under the following licenses
             | combined:
             | 
             | * Small companies may use it without attribution
             | 
             | * Anyone may use it with attribution
             | 
             | The first may not be OSI compatible, but if the second
             | license is then it's fair to call the offering open
             | weights, in the same way that dual-licensing software under
             | GPL and a commercial license is a type of open source.
             | 
             | Presumably the restriction on discrimination relates to
             | license terms which grant _no_ valid open source license to
             | some group of people.
        
         | moffkalast wrote:
         | That's basically less restrictive than OpenStreetMap.
        
         | echelon wrote:
         | > This is not open source
         | 
         | OSI purism is deleterious and has led to industry capture.
         | 
         | Non-viral open source is simply a license for hyperscalers to
         | take advantage. To co-opt offerings and make hundreds of
         | millions without giving anything back.
         | 
         | We need more "fair source" licensing to support sustainable
         | engineering that rewards the small ICs rather than mega
         | conglomerate corporations with multi-trillion dollar market
         | caps. The same companies that are destroying the open web.
         | 
         | This license isn't even that protective of the authors. It just
         | asks for credit if you pass a MAU/ARR threshold. They should
         | honestly ask for money if you hit those thresholds and should
         | blacklist the Mag7 from usage altogether.
         | 
         | The resources put into building this are significant and
         | they're giving it to you for free. We should applaud it.
        
           | teiferer wrote:
           | > small ICs
           | 
           | The majority of open source code is contributed by companies,
           | typically very large corporations. The thought of the open
           | source ecosystem being largely carried by lone hobbyist
           | contributors in their spare time after work is a myth. There
           | are such folks (heck I'm one of them) and they are
           | appreciated and important, but their perception far exceeds
           | their real role in the open source ecosystem.
        
             | wredcoll wrote:
             | I've heard people go back and fortg on this before but you
             | seem pretty certain about it, can you share some stats so I
             | can see also?
        
           | satvikpendem wrote:
           | That's great, nothing wrong with giving away something for
           | free, just don't call it open source.
        
           | Intermernet wrote:
           | Yep, awesome stuff. Call it "fair source" if you want to.
           | Don't call it open source. I'm an absolutist about very few
           | things, but the definition of open source is one of them.
           | Every bit of variation given in the definition is a win for
           | those who have ulterior motives for polluting the definition.
           | Open source isn't a vague concept, it's a defined term with a
           | legally accepted meaning. Very much like "fair use". It's
           | dangerous to allow this definition to be altered. OpenAI (A
           | deliberate misnomer if ever there was one) and friends would
           | really love to co-opt the term.
        
         | alt187 wrote:
         | What part of this goes against the four fundamental freedoms?
         | Can you point at it?
        
           | Alifatisk wrote:
           | Exactly, I wouldn't mind adding that text on our service if
           | we made 20m $, the parent made it sound like a huge clause
        
             | tonyhart7 wrote:
             | Yeah, its fair for them if they want a little bit credit
             | 
             | nothing gucci there
        
           | simonw wrote:
           | "The freedom to run the program as you wish, for any purpose
           | (freedom 0)."
           | 
           | Being required to display branding in that way contradicts
           | "run the program as you wish".
        
             | a2128 wrote:
             | Being required to store the GPL license notice on my hard
             | drive is contradicting my wishes. And I'm not even earning
             | $20 million US dollars per month off GPL software!
        
             | weitendorf wrote:
             | You are still free to run the program as you wish, you just
             | have to provide attribution to the end user. It's
             | essentially CC BY but even more permissive, because the
             | attribution only kicks in once when specific, relatively
             | uncommon conditions are met.
             | 
             | I think basically everybody considers CC BY to be open
             | source, so a strictly more permissive license should be
             | too, I think.
        
             | owebmaster wrote:
             | This freedom might be against the freedom of others to get
             | your modifications.
        
         | drawnwren wrote:
         | It's silly, but in the LLM world - "open source" is usually
         | used to mean "weights are published". This is not to be
         | confused with the software licensing meaning of "open source".
        
           | simonw wrote:
           | The more tasteful corners of the LLM world use "open weights"
           | instead of "open source" for licenses that aren't OSI.
        
         | randomNumber7 wrote:
         | This is just so Google doesn't build a woke version of it and
         | calls it gemini-3.0-pro
        
       | bhouston wrote:
       | Impressive benchmarks!
        
       | emacdona wrote:
       | To me, K2 is a mountain and SOTA is "summits on the air". I saw
       | that headline and thought "holy crap" :-)
        
         | esafak wrote:
         | To me K2 is the Kotlin 2.0 compiler.
         | https://blog.jetbrains.com/kotlin/2023/02/k2-kotlin-2-0/
        
       | 38 wrote:
       | The web chat has extremely low limits FYI. I ran into the limit
       | twice before getting a sane answer and gave up
        
         | awestroke wrote:
         | You can use it on OpenRouter without limits (paid API calls)
        
       | exegeist wrote:
       | Technical strengths aside, I've been impressed with how non-
       | robotic Kimi K2 is. Its personality is closer to Anthropic's
       | best: pleasant, sharp, and eloquent. A small victory over botslop
       | prose.
        
         | orbital-decay wrote:
         | I have a different experience in chatting/creative writing. It
         | tends to overuse certain speech patterns without repeating them
         | verbatim, and is strikingly close to the original R1 writing,
         | without being "chaotic" like R1 - unexpected and overly
         | dramatic sci-fi and horror story turns, "somewhere, X happens"
         | at the end etc.
         | 
         | Interestingly enough, EQ-Bench/Creative Writing Bench doesn't
         | spot this despite clearly having it in their samples. This
         | makes me trust it even less.
        
       | pxc wrote:
       | So far, I like the answer quality and its voice (a bit less
       | obsequious than either ChatGPT or DeepSeek, more direct), but it
       | seems to badly mangle the format of its answers more often than
       | I've seen with SOTA models (I'd include DeepSeek in that
       | category, or close enough).
        
         | irthomasthomas wrote:
         | Which host did you use? I noticed the same using parasail.
         | Switching to novita and temp 0.4 solved it.
        
           | pxc wrote:
           | The host was Moonshot AI at Kimi dot com :)
        
       | jacooper wrote:
       | The problem with Chinese models is finding decent hosting. The
       | best you can find right now for kimi k2 is only 30 tps, not
       | great.
        
       | sagarpatil wrote:
       | All the AI models are no using em-dashes. ChatGPT keeps using
       | them even after explicitly told not to. Anybody know what's up
       | with these models?
        
         | cristoperb wrote:
         | I don't know, but as someone who likes using em-dashes in my
         | writing it is disappointing that they have become a marker of
         | LLM slop.
        
       | ksec wrote:
       | _Kimi K2 is the large language model series developed by Moonshot
       | AI team._
       | 
       |  _Moonshot AI [1] (Moonshot; Chinese: Yue Zhi An Mian ; pinyin:
       | Yue Zhi Anmian) is an artificial intelligence (AI) company based
       | in Beijing, China. As of 2024, it has been dubbed one of China 's
       | "AI Tiger" companies by investors with its focus on developing
       | large language models._
       | 
       | I guess everyone is up to date with AI stuff but this is the
       | first time I heard of Kimi and Moonshot and was wondering where
       | it is from. And it wasn't obvious from a quick glance of
       | comments.
       | 
       | [1] https://en.wikipedia.org/wiki/Moonshot_AI
        
       | RandyOrion wrote:
       | This is an open weight model, which is in contrast with closed-
       | source models.
       | 
       | However, 1t parameters makes it nearly impossible for local
       | inference, let alone fine-tuning.
        
       | lvl155 wrote:
       | I love the fact that I can use this right away and test it out in
       | practice. The ecosystem around LLM is simply awesome and
       | improving by the day.
        
       ___________________________________________________________________
       (page generated 2025-07-13 23:01 UTC)