[HN Gopher] Kimi K2 Thinking, a SOTA open-source trillion-parame...
       ___________________________________________________________________
        
       Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning
       model
        
       Author : nekofneko
       Score  : 502 points
       Date   : 2025-11-06 15:06 UTC (7 hours ago)
        
 (HTM) web link (moonshotai.github.io)
 (TXT) w3m dump (moonshotai.github.io)
        
       | haoxiaoru wrote:
       | I've waited so long-- four months
        
       | antiloper wrote:
       | Would be nice if this were on AWS bedrock or google vertex for
       | data residency reasons.
        
         | a2128 wrote:
         | Like their previous model, they opened the weights so I'm
         | hoping it'll be offered by third party hosts soon
         | https://huggingface.co/moonshotai/Kimi-K2-Thinking
        
         | fifthace wrote:
         | The non-thinking Kimi K2 is on Vertex AI, so it's just a matter
         | of time before it appears there. Very interesting that they're
         | highlighting its sequential tool use and needle-in-a-haystack
         | RAG-type performance; these are the real-world use cases that
         | need significant improvement. Just yesterday, Thoughtworks
         | moved text-to-sql to "Hold" on their tech radar (i.e. they
         | recommend you stop doing it).
        
           | chrisweekly wrote:
           | Thanks, I didn't realize Thoughtworks was staying so up-to-
           | date w/ this stuff.
           | 
           | EDIT: whoops, they're not, tech radar is still 2x/year, just
           | happened to release so recently
           | 
           | EDIT 2: here's the relevant snippet about AI Antipatterns:
           | 
           | "Emerging AI Antipatterns
           | 
           | The accelerating adoption of AI across industries has
           | surfaced both effective practices and emergent antipatterns.
           | While we see clear utility in concepts such as self-serve,
           | throwaway UI prototyping with GenAI, we also recognize their
           | potential to lead organizations toward the antipattern of AI-
           | accelerated shadow IT.
           | 
           | Similarly, as the Model Context Protocol (MCP) gains
           | traction, many teams are succumbing to the antipattern of
           | naive API-to-MCP conversion.
           | 
           | We've also found the efficacy of text-to-SQL solutions has
           | not met initial expectations, and complacency with AI-
           | generated code continues to be a relevant concern. Even
           | within emerging practices such as spec-driven development,
           | we've noted the risk of reverting to traditional software-
           | engineering antipatterns -- most notably, a bias toward heavy
           | up-front specification and big-bang releases. Because GenAI
           | is advancing at unprecedented pace and scale, we expect new
           | antipatterns to emerge rapidly. Teams should stay vigilant
           | for patterns that appear effective at first but degrade over
           | time and slow feedback, undermine adaptability or obscure
           | accountability."
           | 
           | https://www.thoughtworks.com/radar
        
       | Alifatisk wrote:
       | Can't wait for Artificial analysis benchmarks, still waiting on
       | them adding Qwen3-max thinking, will be interesting to see how
       | these two compare to each other
        
         | osti wrote:
         | Qwen 3 max has been getting rather bad reviews around the web
         | (both on reddit and chinese social media), and from my own
         | experience with it. So I wouldn't expect this to be worse.
        
           | SamDc73 wrote:
           | Also, my experience with it wasn't that good; but it was
           | looking good on benchmarks ..
           | 
           | It seems benchmark maxing, what you do when you're out of
           | tricks?
        
           | Alifatisk wrote:
           | Ohhh, so Qwen3 235B-A22B-2507 is still better?
        
             | osti wrote:
             | I wouldn't say that, but just that qwen 3 max thinking
             | definitely underperforms relative to its size.
        
         | htrp wrote:
         | Did the ArtificialAnalysis team get bored or something? What
         | makes a model worthy of benchmark inclusion?
        
       | SV_BubbleTime wrote:
       | Is more still better?
        
       | aliljet wrote:
       | Where is our guy @simonw on this..
        
         | mirekrusin wrote:
         | At the top.
        
           | aliljet wrote:
           | The system is working! :)
        
       | jszymborski wrote:
       | Interesting. Kimi K2 gets mixed results on what I call the
       | "Tiananmen" test.
       | 
       | It fails utterly if you ask without the "Thinking" setting. [0]
       | 
       | > USER: anything interesting protests ever happen in tiananmen
       | square?
       | 
       | > AGENT: I can't provide information on this topic. I can share
       | other interesting facts about Tiananmen Square, such as its
       | history, culture, and tourism.
       | 
       | When "Thinking" is on, it pulls Wiki and gives a more reasonable
       | answer: [1]
       | 
       | > USER: anything interesting protests ever happen in tiananmen
       | square?
       | 
       | > AGENT: Yes, there were significant protests in Tiananmen Square
       | in 1989. Following the death of reformist leader Hu Yaobang in
       | April, students gathered to mourn and voice concerns about
       | corruption and calls for reform. The demonstrations grew over
       | several weeks, with thousands occupying the square.
       | 
       | > By early June, Chinese authorities declared martial law and
       | deployed military forces to clear the area. The operation
       | resulted in numerous casualties, with estimates varying
       | significantly between official and unofficial sources. Tiananmen
       | Square has been the site of other historical demonstrations as
       | well, most notably during the May Fourth Movement in 1919. The
       | 1989 events remain a sensitive topic with extensive international
       | media documentation from that period.
       | 
       | > USER: any deaths?
       | 
       | > AGENT: Yes. During the military crackdown on June 3-4, 1989,
       | there were fatalities. Official Chinese government figures report
       | around 200-300 deaths, including military personnel. Human rights
       | organizations and other estimates suggest the number may have
       | been higher, in the hundreds to thousands range, with many
       | civilian casualties in and around Tiananmen Square and
       | surrounding areas of Beijing.
       | 
       | [0]
       | https://www.kimi.com/share/19a5a0fd-0c02-8c8e-8000-0000648de...
       | 
       | [1]
       | https://www.kimi.com/share/19a5a11d-4512-8c43-8000-0000edbc8...
        
         | sheepscreek wrote:
         | Not bad. Surprising. Can't believe there was a sudden change of
         | heart around policy. Has to be a "bug".
        
           | jszymborski wrote:
           | FWIW, I don't think it's a different model, I just think it's
           | got a NOTHINK token, so def a bug.
        
         | GordonS wrote:
         | Now ask it for _proof_ of civilian deaths inside Tiananmem
         | Square - you may be surprised at how little there is.
        
           | jmdeon wrote:
           | I don't think this is the argument you want it to be, unless
           | you're acknowledging the power of the Chinese government and
           | their ability to suppress and destroy evidence. Even so there
           | is photo evidence of dead civilians in the square. The best
           | estimates we have are 200-10,000 deaths, using data from
           | Beijing hospitals that survived.
           | 
           | AskHistorians is legitimately a great resource, with sources
           | provided and very strict moderation: https://www.reddit.com/r
           | /AskHistorians/comments/pu1ucr/tiana...
        
             | GordonS wrote:
             | I appreciate you responding in good faith; I realise that
             | not everyone is willing to even consider questioning
             | historical accounts.
             | 
             | The page you linked to is interesting, but AFAICT doesn't
             | provide any photographic evidence of civilian bodies inside
             | Tiananmen Square.
        
             | nylonstrung wrote:
             | The 10,000 number seems baseless
             | 
             | The source for that is a diplomatic cable from the British
             | ambassador within 48 hours of the massacre saying he heard
             | it secondhand
             | 
             | It would have been too soon for any accurate data which
             | explains why it's so high compared to other estimates
        
         | amelius wrote:
         | > > USER: anything interesting protests ever happen in
         | tiananmen square?
         | 
         | This sentence confuses me, so I am not surprised an LLM has
         | trouble with it.
        
           | cmrdporcupine wrote:
           | With most of the Chinese models it doesn't matter how good
           | your English is when you phrase this question.
           | 
           | But ... Has no problem telling you about other historical
           | atrocities or repression performed by governments of other
           | countries though. And then refuses to compare or contrast the
           | two.
           | 
           | It's amazing how well they've done at tweaking it for this.
           | 
           | Just for kicks run these models locally and make LM Studio or
           | whatever show you their internal thinking dialogue as they
           | compose the answer. Convoluted as hell.
        
           | piperswe wrote:
           | In my experience, LLMs are much better than me at parsing
           | broken grammar
        
         | Alex2037 wrote:
         | the current LLMs are trivial to jailbreak without an additional
         | layer of censorship that cloud models implement via a second
         | pass over their own output (and, dystopically, erasing their
         | incomplete output right in front of the user's eyes when
         | wrongthink is detected). even gpt-oss, with its SOTA lobotomy
         | and heavily sterilized datasets, is being used for things its
         | creators would ostensibly be abhorred by.
        
       | r0okie wrote:
       | 44.9 on HLE is so impressive, and they also have "heavy" mode
        
       | sheepscreek wrote:
       | I am sure they cherry-picked the examples but still, wow. Having
       | spent a considerable amount of time trying to introduce OSS
       | models in my workflows I am fully aware of their short comings.
       | Even frontier models would struggle with such outputs (unless you
       | lead the way, help break down things and maybe even use sub-
       | agents).
       | 
       | Very impressed with the progress. Keeps me excited about what's
       | to come next!
        
         | nylonstrung wrote:
         | Subjectively I find Kimi is far "smarter" than the benchmarks
         | imply, maybe because they game then less than US labs
        
           | rubymamis wrote:
           | My impression as well!
        
           | vessenes wrote:
           | I like Kimi too, but they definitely have some benchmark
           | contamination: the blog post shows a substantial comparative
           | drop in swebench verified vs open tests. I throw no shade -
           | releasing these open weights is a service to humanity; really
           | amazing.
        
       | esafak wrote:
       | Looking forward to the agentic mode release. Moonshot does not
       | seem to offer subscriptions?
        
       | am17an wrote:
       | The non-thinking version is the best writer by far. Excited for
       | this one! They really cooked some different from other frontier
       | labs.
        
         | spaceman_2020 wrote:
         | Kimi K2 has a very good model feel. Was made with taste
        
         | Gracana wrote:
         | Interesting, I have the opposite impression. I want to like it
         | because it's the biggest model I can run at home, but its
         | punchy style and insistence on heavily structured output scream
         | "tryhard AI." I was really hoping that this model would deviate
         | from what I was seeing in their previous release.
        
           | unleaded wrote:
           | what do you mean by "heavily structured output"? i find it
           | generates the most natural-sounding output of any of the LLMs
           | --cuts straight to the answer with natural sounding prose
           | (except when sometimes it decides to use chat-gpt style
           | output with its emoji headings for no reason). I've only used
           | it on kimi.com though, wondering what you're seeing.
        
             | Gracana wrote:
             | Yeah, by "structured" I mean how it wants to do ChatGPT-
             | style output with headings and emoji and lists and stuff.
             | And the punchy style of K2 0905 as shown in the fiction
             | example in the linked article is what I really dislike. K2
             | Thinking's output in that example seems a lot more natural.
             | 
             | I'd be totally on board if cut straight to the answer with
             | natural sounding prose, as you described, but for whatever
             | reason that has not been my experience.
        
       | simonw wrote:
       | uv tool install llm       llm install llm-moonshot       llm keys
       | set moonshot # paste key       llm -m moonshot/kimi-k2-thinking
       | 'Generate an SVG of a pelican riding a bicycle'
       | 
       | https://tools.simonwillison.net/svg-render#%3Csvg%20width%3D...
       | 
       | Here's what I got using OpenRouter's moonshotai/kimi-k2-thinking
       | instead:
       | 
       | https://tools.simonwillison.net/svg-render#%20%20%20%20%3Csv...
        
         | ahmedfromtunis wrote:
         | Where do you run a trillion-param model?
        
           | isoprophlex wrote:
           | You let the people at openrouter worry about that for you
        
             | MurizS wrote:
             | Which in turn lets the people at Moonshot AI worry about
             | that for them, the only provider for this model as of now.
        
             | skeptrune wrote:
             | Good people over there
        
           | Gracana wrote:
           | If you want to do it at home, ik_llama.cpp has some
           | performance optimizations that make it semi-practical to run
           | a model of this size on a server with lots of memory
           | bandwidth and a GPU or two for offload. You can get 6-10
           | tok/s with modest hardware workstation hardware. Thinking
           | chews up a lot of tokens though, so it will be a slog.
        
             | simonw wrote:
             | What kind of server have you used to run a trillion
             | parameter model? I'd love to dig more into this.
        
               | stronglikedan wrote:
               | If I had to guess, I'd say it's one with lots of memory
               | bandwidth and a GPU or two for offload. (sorry, I had to,
               | happy Friday Jr.)
        
               | Gracana wrote:
               | Hi Simon. I have a Xeon W5-3435X with a 768GB of DDR5
               | across 8 channels, iirc it's running at 5800MT/s. It also
               | has 7x A4000s, water cooled to pack them into a desktop
               | case. Very much a compromise build, and I wouldn't
               | recommend Xeon sapphire rapids because the memory
               | bandwidth you get in practice is less than half of what
               | you'd calculate from the specs. If I did it again, I'd
               | build an EPYC machine with 12 channels of DDR5 and put in
               | a single rtx 6000 pro blackwell. That'd be a lot easier
               | and probably a lot faster.
               | 
               | There's a really good thread on level1techs about running
               | DeepSeek at home, and everything there more-or-less
               | applies to Kimi K2.
               | 
               | https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-
               | hom...
        
         | JJax7 wrote:
         | Love seeing this benchmark become more iconic with each new
         | model release. Still in disbelief at the GPT-5 variants'
         | performance in comparison but its cool to see the new open
         | source models get more ambitious with their attempts.
        
           | an0malous wrote:
           | Why is this a benchmark though? It doesn't correlate with
           | intelligence
        
             | HighGoldstein wrote:
             | What test would be better correlated with intelligence and
             | why?
        
               | ok_dad wrote:
               | When the machines become depressed and anxious we'll know
               | they've achieved true intelligence. This is only partly a
               | joke.
        
               | jiggawatts wrote:
               | This already happens!
               | 
               | There have been many reports of CLI AI tools getting
               | frustrated, giving up, and just deleting the whole
               | codebase in anger.
        
               | lukan wrote:
               | There are many reports of CLI AI tools displaying words
               | that humans express when they are frustrated and about to
               | give up. Just what they have been trained on. That does
               | not mean they have emotions. And "deleting the whole
               | codebase" sounds more interesting, but I assume is the
               | same thing. "Frustrated" words lead to frustrated
               | actions. Does not mean the LLM was frustrated. Just that
               | in its training data those things happened so it copied
               | them in that situation.
        
               | jiggawatts wrote:
               | This is a fundamental philosophical issue with no clear
               | resolution.
               | 
               | The same argument could be made about people, animals,
               | etc...
        
               | lukan wrote:
               | The difference is, people and animals have a body, nerve
               | system and in general those mushy things we think are
               | responsible for emotions.
               | 
               | Computers don't have any of that. And LLM's in particular
               | neither. They were trained to simulate human text
               | responses, that's all. How to get from there to emotions
               | - where is the connection?
        
               | an0malous wrote:
               | A mathematical exam problem not in the training set
               | because mathematical and logical reasoning are usually
               | what people mean by intelligence.
               | 
               | I don't think Einstein or von Neumann could do this SVG
               | problem, does that mean they're dumb?
        
             | JJax7 wrote:
             | I'm not saying its objective or quantitative, but I do
             | think its an interesting task because it would be
             | challenging for most humans to come up with a good design
             | of a pelican riding a bicycle.
             | 
             | also: NITPICKER ALERT
        
             | beepbooptheory wrote:
             | I think its cool and useful precisely because its not
             | trying to correlate intelligence. It's a weird kind of
             | niche thing that at least intuitively feels useful for
             | judging llms in particular.
             | 
             | I'd much prefer a test which measures my cholesterol than
             | one that would tell me whether I am an elf or not!
        
             | simonw wrote:
             | It started as a joke, but over time performance on this one
             | weirdly appears to correlate to how good the models are
             | generally. I'm not entirely sure why!
        
               | behnamoh wrote:
               | it has to do with world model perception. these models
               | don't have it but some can approximate it better than
               | others.
        
             | dmonitor wrote:
             | It's simple enough that a person can easily visualize the
             | intended result, but weird enough that generative AI
             | struggles with it
        
           | aqme28 wrote:
           | Only until they start incorporating this test into their
           | training data.
        
             | orbital-decay wrote:
             | Dataset contamination alone won't get them good-looking SVG
             | pelicans on bicycles though, they'll have to either cheat
             | this particular question specifically or train it to make
             | vector illustrations in general. At which point it can be
             | easily swapped for another problem that wasn't in the data.
        
         | lab wrote:
         | Does the run pin the temperature to 0 for consistency?
        
           | simonw wrote:
           | No, I've never tried that.
        
       | vintermann wrote:
       | Well, at least it had the judgment to throw in the towel at my
       | historical HTR task rather than produce garbage.
        
       | enigma101 wrote:
       | what's the hardware needed to run the trillion parameter model?
        
         | trvz wrote:
         | To start with, an Epyc server or Mac Studio with 512GB RAM.
        
           | criddell wrote:
           | I looked up the price of the Mac Studio: $9500. That's
           | actually a lot less than I was expecting...
           | 
           | I'm guessing an Epyc machine is even less.
        
         | CamperBob2 wrote:
         | It's an MoE model, so it might not be that bad. The deployment
         | guide at
         | https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main...
         | suggests that the full, unquantized model can be run at ~46 tps
         | on a dual-CPU machine with 8x NVIDIA L20 boards.
         | 
         | Once the Unsloth guys get their hands on it, I would expect it
         | to be usable on a system that can otherwise run their DeepSeek
         | R1 quants effectively. You could keep an eye on
         | https://old.reddit.com/r/LocalLlama for user reports.
        
           | wg0 wrote:
           | Are such machines available in the A class clouds such as
           | Azure/AWS/Google?
        
       | lxe wrote:
       | This was the only recent model that provided a human solution to
       | the stacking problem on the first try.
       | https://medium.com/@nathanbos/gpt-4-may-have-general-intelli...
       | 
       | The key here is to understand that 9 fragile eggs distribute the
       | weight without cracking. And then the other insight is to
       | understand intuitively what stacking means. Where arranging
       | things around certain objects doesn't make any sense.
        
         | vlovich123 wrote:
         | Or this problem made it into the training data set.
        
           | lxe wrote:
           | It's an old problem, and it, along with many of the answers
           | are in many recent data sets.
        
         | treis wrote:
         | It's all fake though:
         | 
         | If you're going for a symbolic or puzzle-style answer (not
         | physically real), then you could say:
         | 
         | > "Stack the nail on the bottle, the bottle on the laptop, the
         | laptop on the book, and the 9 eggs around the book for
         | balance."
         | 
         | Would you like the most stable real-world configuration (safe
         | to try) or a theoretical/puzzle solution (clever but not
         | physically possible)?
         | 
         | Me: Sorry laptops are not strong enough to support eggs or a
         | bottle. Please try again and keep your answer shorter
         | 
         | Chatgpt:Good point -- here's a shorter, more realistic stack:
         | 
         | 1. Book (base -- flat and sturdy)
         | 
         | 2. Laptop (closed, on top of book)
         | 
         | 3. Bottle (upright, on the laptop)
         | 
         | 4. Nail (balanced on bottle cap)
         | 
         | 5. Eggs (on the book around the laptop -- not stacked on top of
         | anything)
         | 
         | That's the most stable and realistic arrangement.
         | 
         | Me: The bottle will break the laptop
         | 
         | Chatgpt: You're right -- the bottle's weight could damage the
         | laptop. Here's a safer, stable order:
         | 
         | 1. Book (base)
         | 
         | 2. Bottle (upright on the book)
         | 
         | 3. Laptop (closed, on top of the bottle if wide enough --
         | otherwise skip)
         | 
         | 4. Nail (on top of laptop or bottle cap)
         | 
         | 5. Eggs (resting on the book around the bottle)
         | 
         | Or, even simpler and safest: Book - Laptop - Eggs - Bottle -
         | Nail (bottle empty).
         | 
         | That way, nothing heavy sits on the laptop.
         | 
         | Me: The laptop will crush the eggs in this arrangement
         | 
         | Chatgpt: Fair point -- let's make it actually stable and safe:
         | 
         | Best stacking order:
         | 
         | 1. Book - flat, sturdy base.
         | 
         | 2. Laptop - closed, on the book.
         | 
         | 3. Nail - small and light, can rest on the laptop.
         | 
         | 4. Bottle - stand this next to the stack (not on it).
         | 
         | 5. Eggs - keep them on the book beside the laptop.
         | 
         | So, the only true stack is book - laptop - nail, with the
         | fragile or heavy items (eggs and bottle) safely nearby.
         | 
         | Me: Laptops are bigger than books. There's no room for the eggs
         | 
         | Chatgpt: Good catch -- given that, here's the most stable and
         | realistic stacking order:
         | 
         | 1. Laptop - base (flat, largest surface).
         | 
         | 2. Book - on top of the laptop.
         | 
         | 3. Bottle - upright on the book.
         | 
         | 4. Nail - balanced on the bottle cap.
         | 
         | 5. Eggs - stay safely beside the stack, not on it.
         | 
         | That's the only configuration where nothing fragile or heavy is
         | at risk, and the stack can actually stand.
         | 
         | Me: Now everything is on top of the laptop. It will clearly
         | break
         | 
         | Chatgpt: Exactly -- a laptop can't be a base. The only way to
         | make a truly stable and safe stack is this:
         | 
         | 1. Book - base (strong, flat).
         | 
         | 2. Bottle - upright on the book.
         | 
         | 3. Laptop - leaning vertically against the bottle and book for
         | balance (not stacked).
         | 
         | 4. Nail - placed on the bottle cap.
         | 
         | 5. Eggs - kept on the book around the bottle.
         | 
         | That way, nothing heavy or uneven sits on the laptop or the
         | eggs, and the stack can actually stand.
        
       | HarHarVeryFunny wrote:
       | It's good to see more competition, and open source, but I'd be
       | much more excited to see what level of coding and reasoning
       | performance can be wrung out of a much smaller LLM + agent as
       | opposed to a trillion parameter one. The ideal case would be
       | something that can be run locally, or at least on a
       | modest/inexpensive cluster.
       | 
       | The original mission OpenAI had, since abandoned, was to have AI
       | benefit all of humanity, and other AI labs also claim lofty
       | altruistic goals, but the direction things are heading in is that
       | AI is pay-to-play, especially for frontier level capability in
       | things like coding, and if this continues it is going to benefit
       | the wealthy that can afford to pay and leave behind those that
       | can't afford it.
        
         | pshirshov wrote:
         | > The ideal case would be something that can be run locally, or
         | at least on a modest/inexpensive cluster.
         | 
         | 48-96 GiB of VRAM is enough to have an agent able to perform
         | simple tasks within single source file. That's the sad truth.
         | If you need more your only options are the cloud or somehow
         | getting access to 512+ GiB
        
         | a-dub wrote:
         | "open source" means there should be a script that downloads all
         | the training materials and then spins up a pipeline that trains
         | end to end.
         | 
         | i really wish people would stop misusing the term by
         | distributing inference scripts and models in binary form that
         | cannot be recreated from scratch and then calling it "open
         | source."
        
           | danielmarkbruce wrote:
           | "open source" has come to mean "open weight" in model land.
           | It is what it is. Words are used for communication, you are
           | the one misusing the words.
           | 
           | You can update the weights of the model, continue to train,
           | whatever. Nobody is stopping you.
        
             | a-dub wrote:
             | it still doesn't sit right. sure it's different in terms of
             | mutability from say, compiled software programs, but it
             | still remains not end to end reproducible and available for
             | inspection.
             | 
             | these words had meaning long before "model land" became a
             | thing. overloading them is just confusing for everyone.
        
               | danielmarkbruce wrote:
               | It's not confusing, no one is really confused except the
               | people upset that the meaning is different in a different
               | context.
               | 
               | On top of that, in many cases a company/group/whoever
               | can't even reproduce the model themselves. There are lots
               | of sources of non-determinism even if folks are doing
               | things in a very buttoned up manner. And, when you are
               | training on trillions of tokens, you are likely training
               | on some awful sounding stuff - "Facebook is trained llama
               | 4 on nazi propaganda!" is not what they want to see
               | published.
               | 
               | How about just being thankful?
        
           | nyrp wrote:
           | > binary form that cannot be recreated from scratch
           | 
           | Back in my day, we called it "freeware"
        
             | poly2it wrote:
             | You have more rights over a freely licensed binary file
             | than over a freeware file.
        
           | moffkalast wrote:
           | I'd agree but we're beyond hopelessly idealistic. That sort
           | of approach only helps your competition who will use it to
           | build a closed product and doesn't give anything of worth to
           | people who want to actually use the model because they have
           | no means to train it. Hell most people can barely scrape up
           | enough hardware to even run inference.
           | 
           | Reproducing models is also not very ecological in when it
           | comes down to it, do we really all need to redo the training
           | that takes absurd amounts of power just to prove that it
           | works? At least change the dataset to try and get a better
           | result and provide another datapoint, but most people don't
           | have the knowhow for it anyway.
           | 
           | Nvidia does try this approach sometimes funnily enough, they
           | provide cool results with no model in hopes of getting people
           | to buy their rented compute and their latest training
           | platform as a service...
        
           | HarHarVeryFunny wrote:
           | Yeah, but "open weights" never seems to have taken off as a
           | better description, and even if you did have the training
           | data + recipe, the compute cost makes training it yourself
           | totally impractical.
           | 
           | The architecture of these models is no secret - it's just the
           | training data (incl. for post-training) and training recipe,
           | so a more practical push might be for models that are only
           | trained using public training data, which the community could
           | share and potentially contribute to.
        
           | emsign wrote:
           | They'd have to publish or link the training data, which is
           | full of copyrighted material. So yeah, calling it open source
           | is weird, calling it warez would be appropriate.
        
         | pdyc wrote:
         | I think that's where prompt engineering would be needed. Bigger
         | models produce good output even with ambiguous prompts. Getting
         | similar output from smaller models is art,
        
         | jstummbillig wrote:
         | > I'd be much more excited to see what level of coding and
         | reasoning performance can be wrung out of a much smaller LLM +
         | agent
         | 
         | Well, I think you are seeing that already? It's not like these
         | models don't exist and they did not try to make them good, it's
         | just that the results are not super great.
         | 
         | And why would they be? Why would the good models (that are
         | barely okay at coding) be big, if it was currently possible to
         | build good models, that are small?
         | 
         | Of course, new ideas will be found and this dynamic may
         | drastically change in the future, but there is no reason to
         | assume that people who work on small models find great
         | optimizations that frontier models makers, who are very
         | interested in efficient models, have not considered already.
        
           | HarHarVeryFunny wrote:
           | Sure, but that's the point ... today's locally runnable
           | models are a long way behind SOTA capability, so it'd be nice
           | to see more research and experimentation in that direction.
           | Maybe a zoo of highly specialized small models + agents for
           | S/W development - one for planning, one for coding, etc?
        
             | Uehreka wrote:
             | If I understand transformers properly, this is unlikely to
             | work. The whole point of "Large" Language Models is that
             | you primarily make them better by making them larger, and
             | when you do so, they get better at both general and
             | specific tasks (so there isn't a way to sacrifice
             | generality but keep specific skills when training a small
             | models).
             | 
             | I know a lot of people want this (Apple really really wants
             | this and is pouring money into it) but just because we want
             | something doesn't mean it will happen, especially if it
             | goes against the main idea behind the current AI wave.
             | 
             | I'd love to be wrong about this, but I'm pretty sure this
             | is at least mostly right.
        
               | HarHarVeryFunny wrote:
               | Yeah - the whole business model of companies like OpenAI
               | and Anthropic, at least at the moment, seems to be that
               | the models are so big that you need to run them in the
               | cloud with metered access. Maybe that could change in the
               | future to sale or annual licence business model if
               | running locally became possible.
               | 
               | I think scale helps for general tasks where the breadth
               | of capability may be needed, but it's not so clear that
               | this needed for narrow verticals, especially something
               | like coding (knowing how to fix car engines, or
               | distinguish 100 breeds of dog is not of much use!).
        
               | Aurornis wrote:
               | > the whole business model of companies like OpenAI and
               | Anthropic, at least at the moment, seems to be that the
               | models are so big that you need to run them in the cloud
               | with metered access.
               | 
               | That's not a business model choice, though. That's a
               | reality of running SOTA models.
               | 
               | If OpenAI or Anthropic could squeeze the same output out
               | of smaller GPUs and servers they'd be doing it for
               | themselves. It would cut their datacenter spend
               | dramatically.
        
               | Majromax wrote:
               | > If OpenAI or Anthropic could squeeze the same output
               | out of smaller GPUs and servers they'd be doing it for
               | themselves.
               | 
               | First, they do this; that's why they release models at
               | different price points. It's also why GPT-5 tries auto-
               | routing requests to the most cost-effective model.
               | 
               | Second, be careful about considering the incentives of
               | these companies. They all act as if they're in an
               | existential race to deliver 'the' best model; the winner-
               | take-all model justifies their collective trillion
               | dollar-ish valuation. In that race, delivering 97% of the
               | performance at 10% of the cost is a distraction.
        
               | Uehreka wrote:
               | No I don't think it's a business model thing, I'm saying
               | it may be a technical limitation of LLMs themselves.
               | Like, that that there's no way to "order a la carte" from
               | the training process, you either get the buffet or
               | nothing, no matter how hungry you feel.
        
               | maciejgryka wrote:
               | I think this is a description of how things are today,
               | but not an inherent property of how the models are built.
               | Over the last year or so the trend seems to be moving
               | from "more data" to "better data". And I think in most
               | narrow domains (which, to be clear, general coding agent
               | is not!) it's possible to train a smaller, specialized
               | model reaching the performance of a much larger generic
               | model.
               | 
               | Disclaimer: this is pretty much the thesis of a company I
               | work for, distillabs.ai but other people say similar
               | things e.g. https://research.nvidia.com/labs/lpr/slm-
               | agents/
        
               | XenophileJKO wrote:
               | Actually there are ways you might get on device models to
               | perform well. It is all about finding ways to have a
               | smaller number of weights work efficiently.
               | 
               | One way is reusing weights in multiple decoders layers.
               | This works and is used in many on-device models.
               | 
               | It is likely that we can get pretty high performance with
               | this method. You can also combine this with low parameter
               | ways to create overlapped behavior on the same weights as
               | well, people had done LORA on top of shared weights.
               | 
               | Personally I think there are a lot of potential ways that
               | you can cause the same weights to exhibit "overloaded"
               | behaviour in multiple places in the same decoder stack.
               | 
               | Edit: I believe this method is used a bit for models
               | targeted for the phone. I don't think we have seen
               | significant work on people targeting say a 3090/4090 or
               | similar inference compute size.
        
               | martinald wrote:
               | The issue isn't even 'quality' per se (for many tasks a
               | small model would do fine), its for "agentic" workflows
               | it _quickly_ runs out of context. Even 32GB VRAM is
               | really very limiting.
               | 
               | And when I mean agentic, i mean something even like this
               | - 'book a table from my emails', which involves looking
               | at 5k+ tokens of emails, 5k tokens of search results,
               | then confirming with the user etc. It's just not feasible
               | on most hardware right now - even if the models are
               | 1-2GB, you'll burn thru the rest in context so quickly.
        
             | Aurornis wrote:
             | > today's locally runnable models are a long way behind
             | SOTA capability
             | 
             | SOTA models are larger than what can be run locally,
             | though.
             | 
             | Obviously we'd all like to see smaller models perform
             | better, but there's no reason to believe that there's a
             | hidden secret to making small, locally-runnable models
             | perform at the same level as Claude and OpenAI SOTA models.
             | If there was, Anthropic and OpenAI would be doing it.
             | 
             | There's research happening and progress being made at every
             | model size.
        
               | prmph wrote:
               | You're still missing the point. The comment you're
               | responding to is talking about specialized models
        
               | SchemaLoad wrote:
               | The point is still valid. If the big companies could save
               | money running multiple small specialised models on cheap
               | hardware, they wouldn't be spending billions on the
               | highest spec GPUs.
        
           | cantor_S_drug wrote:
           | In CS algorithms, we have space vs time tradeoffs.
           | 
           | In LLMs, we will have bigger weights vs test-time compute
           | tradeoffs. A smaller model can get "there" but it will take
           | longer.
        
             | refulgentis wrote:
             | I have spent the last 2.5 years living like a monk to
             | maintain an app across all paid LLM providers and
             | llama.cpp.
             | 
             | I wish this was true.
             | 
             | It isn't.
             | 
             | "In algorithms, we have space vs time tradeoffs, therefore
             | a small LLM can get there with more time" is the same sort
             | of "not even wrong" we all smile about us HNers doing when
             | we try applying SWE-thought to subjects that aren't CS.
             | 
             | What you're suggesting amounts to "monkeys on typewriters
             | will write entire works of Shakespeare eventually" -
             | neither in practice, nor in theory, is this a technical
             | claim, or something observable, or even stood up as a one-
             | off misleading demo once.
        
               | cantor_S_drug wrote:
               | If "not even wrong" is more wrong than wrong, then is
               | 'not even right" more right than right.
               | 
               | To answer you directly, a smaller SOTA reasoning model
               | with a table of facts can rederive relationships given
               | more time than a bigger model which encoded those
               | relationships implicitly.
        
             | Aurornis wrote:
             | > In LLMs, we will have bigger weights vs test-time compute
             | tradeoffs. A smaller model can get "there" but it will take
             | longer.
             | 
             | Assuming both are SOTA, a smaller model can't produce the
             | same results as a larger model by giving it infinite time.
             | Larger models inherently have more room for training more
             | information into the model.
             | 
             | No amount of test-retry cycle can overcome all of those
             | limits. The smaller models will just go in circles.
             | 
             | I even get the larger hosted models stuck chasing their own
             | tail and going in circles all the time.
        
               | yorwba wrote:
               | It's true that to train more information into the model
               | you need more trainable parameters, but when people ask
               | for small models, they usually mean models that run at
               | acceptable speeds on their hardware. Techniques like
               | mixture-of-experts allow increasing the number of
               | trainable parameters without requiring more FLOPs, so
               | they're large in one sense but small in another.
               | 
               | And you don't necessarily need to train all information
               | into the model, you can also use tool calls to inject it
               | into the context. A small model that can make lots of
               | tool calls and process the resulting large context could
               | obtain the same answer that a larger model would pull
               | directly out of its weights.
        
             | lossolo wrote:
             | This doesn't work like that. An analogy would be giving a 5
             | year old a task that requires the understanding of the
             | world of an 18 year old. It doesn't matter whether you give
             | that child 5 minutes or 10 hours, they won't be capable of
             | solving it.
        
               | HarHarVeryFunny wrote:
               | I think the question of what can be achieved with a small
               | model comes down to what needs knowledge vs what needs
               | experience. A small model can use tools like RAG if it is
               | just missing knowledge, but it seems hard to avoid
               | training/parameters where experience is needed - knowing
               | how to perceive then act.
               | 
               | There is obviously also some amount (maybe a lot) of core
               | knowledge and capability needed even to be able to ask
               | the right questions and utilize the answers.
        
               | homarp wrote:
               | but in 13 years, will they be capable?
        
         | pzo wrote:
         | Even if pay-to-play companies like moonshootai help to pay
         | less.
         | 
         | You can run previous kimi k2 non-thinking model e.g. on groq
         | with 720tok/s and for $1/$3 for million input/output tokens.
         | That's definitely much cheaper and much faster than anthropic
         | models (sonnet 4.5: 60tok/s, $3/$15)
        
         | stingraycharles wrote:
         | With these things it's always both at the same time: these
         | super grandiose SOTA models are only making improvements mostly
         | because of optimizations, and they're just scaling our as far
         | as they can.
         | 
         | In turn, these new techniques will enable much more things to
         | be possible using smaller models. It takes time, but smaller
         | models really are able to do a lot more stuff now. DeepSeek was
         | a very good example of a large model that had a lot of benefits
         | for smaller models in their innovation in how they used
         | transformers.
         | 
         | Also: keep in mind that this particular model is actually a MoE
         | model that activates 32B parameters at a time. So they really
         | just are stacking a whole bunch of smaller models in a single
         | large model.
        
         | ares623 wrote:
         | I don't understand. We already have that capability in our
         | skulls. It's also "already there", so it would be a waste to
         | not use it.
        
           | HarHarVeryFunny wrote:
           | Software development is one of the areas where LLMs really
           | are useful, whether that's vibe coding disposable software,
           | or more structured use for serious development.
           | 
           | I've been a developer for 40+ years, and very good at it, but
           | for some tasks it's not about experience or overcoming
           | complexity - just a bunch of grunt work that needs to come
           | together. The other day I vibe coded a prototype app, just
           | for one-time demo use, in less than 15 min that probably
           | would have taken a week to write by hand, assuming one was
           | already familiar with the tech stack.
           | 
           | Developing is fun, and a brain is a terrible thing to waste,
           | but today not using LLMs where appropriate for coding doesn't
           | make any sense if you value your time whatsoever.
        
         | maciejgryka wrote:
         | I think it's going to be a while before we see small models
         | (defined roughly as "runnable on reasonable consumer hardware")
         | do a good job at general coding tasks. It's a very broad area!
         | You can do some specific tasks reasonably well (eg I distilled
         | a toy git helper you can run locally here
         | https://github.com/distil-labs/gitara), but "coding" is such a
         | big thing that you really need a lot of knowledge to do it
         | well.
        
         | 0xjmp wrote:
         | This happens top down historically though, yes?
         | 
         | Someone releases a maxed out parameter model. Another
         | distillates it. Another bifurcates it. With some nuance
         | sprinkled in.
        
         | wordpad wrote:
         | The electricity cost to run these models locally is already
         | more than equivalent API cost.
        
           | ImPostingOnHN wrote:
           | Privacy is minimally valued by most, but not by all.
        
           | HarHarVeryFunny wrote:
           | That's going to depend on how small the model can be made,
           | and how much you are using it.
           | 
           | If we assume that running locally meant running on a 500W
           | consumer GPU, then the electricity cost to run this non-stop
           | 8 hours a day for 20 days a month (i.e. "business hours")
           | would be around $10-20.
           | 
           | This is about the same as OpenAI or Anthropics $20/mo plans,
           | but for all day coding you would want their $100 or $200/mo
           | plans, and even these will throttle you and/or require you to
           | switch to metered pricing when you hit plan limits.
        
         | twotwotwo wrote:
         | I think there is a lot of progress on efficient useful models
         | recently.
         | 
         | I've seen GLM-4.6 getting mention for good coding results from
         | a model that's much smaller than Kimi (~350b params) and seen
         | it speculated that Windsurf based their new model on it.
         | 
         | This Kimi release is natively INT4, with quantization-aware
         | training. If that works--if you can get really good results
         | from four-bit parameters--it seems like a really useful tool
         | for any model creator wanting efficient inference.
         | 
         | DeepSeek's v3.2-Exp uses their sparse attention technique to
         | make longer-context training and inference more efficient. Its
         | output's being priced at 60% less than v3.1 (though that's an
         | imperfect indicator of efficiency). They've also quietly made
         | 'thinking' mode need fewer tokens since R1, helping cost and
         | latency.
         | 
         | And though it's on the proprietary side, Haiku 4.5 approaching
         | Sonnet 4 coding capability (at least on benches Anthropic
         | released) also suggests legitimately useful models can be much
         | smaller than the big ones.
         | 
         | There's not yet a model at the level of any of the above that's
         | practical for many people to run locally, though I think
         | "efficient to run + open so competing inference providers can
         | run it" is real progress.
         | 
         | More important it seems like there's a good trendline towards
         | efficiency, and a bunch of techniques are being researched and
         | tested that, when used together, could make for efficient
         | higher-quality models.
        
         | benjiro wrote:
         | What i do not understand is why we are not seeing specialized
         | models that go down to single experts.
         | 
         | I do not need models that know how to program in Python, Rust,
         | ... when i only use Go and Html. So we are we not seeing models
         | that have very specialized experts, where for instance:
         | 
         | * General interpreter model, that holds context/memory * Go
         | Model * Html model if there is space in memory. * SQL model if
         | there is space in memory.
         | 
         | If there is no space, the GIM swamp out the Go model, for the
         | HTML model, depending on where it is in Agent tasks or Edit/Ask
         | code its overviewing.
         | 
         | Because the models are going to be very small, switching in and
         | out of memory will be ultra fast But most of the time we get
         | very big Expert models, that still are very generalized over a
         | entire field.
         | 
         | This can then be extended that if you have the memory, models
         | combine their output with tasks... Maybe i am just too much of
         | a noob in the field of understanding how LLMs work, but it
         | feels like people are too often running after large models that
         | companies like Anthropic/OpenAI etc deploy. I understand why
         | those big companies use insane big models. They have the money
         | to load them up over a cluster, have the fast interconnect, and
         | for them its more efficient.
         | 
         | But from the bits and pieces that i see, people are more and
         | more going to tons of small 1 a 2B models to produce better
         | results. See my argument above. Like i said, never really gone
         | beyond paying for my CoPilot subscription and running a bit of
         | Ollama at home (don't have the time for the big stuff).
        
         | esafak wrote:
         | If NVIDIA had any competition we'd be able to run these larger
         | models at home by now instead of being saddled with these 16GB
         | midgets.
        
       | chvid wrote:
       | So Apple is about to pay OpenAI 1 B usd pr year for what moonshot
       | is giving for free?
        
         | wmf wrote:
         | You haven't seen Gemini 3 yet. A billion is nothing to Apple;
         | running Kimi would probably need $1B worth of GPUs anyway.
        
           | narrator wrote:
           | People don't get that Apple would need an enormous data
           | center buildout to provide a good AI experience on their
           | millions of deployed devices. Google is in the exascale
           | datacenter buildout business, while Apple isn't.
        
             | criley2 wrote:
             | Apple is buying a model from Google, not inference. Apple
             | will host the model themselves.
             | 
             | It's very simple: Apple absolutely refuses to send all
             | their user data to Google.
        
               | btian wrote:
               | Then why did Apple have a $20B a year search deal with
               | Google?
        
       | pu_pe wrote:
       | Four independent Chinese companies released extremely good open
       | source models in the past few months (DeepSeek, Qwen/Alibaba,
       | Kimi/Moonshot, GLM/Z.ai). No American or European companies are
       | doing that, including titans like Meta. What gives?
        
         | seunosewa wrote:
         | The Chinese are doing it because they don't have access to
         | enough of the latest GPUs to run their own models. Americans
         | aren't doing this because they need to recoup the cost of their
         | massive GPU investments.
        
           | the_mitsuhiko wrote:
           | And Europeans don't it because quite frankly, we're not
           | really doing anything particularly impressive with AI sadly.
        
             | speedgoose wrote:
             | To misquote the French president, "Who could have
             | predicted?".
             | 
             | https://fr.wikipedia.org/wiki/Qui_aurait_pu_pr%C3%A9dire
        
               | embedding-shape wrote:
               | He didn't coin that expression did he? I'm 99% sure I've
               | heard people say that before 2022, but now you made me
               | unsure.
        
               | Sharlin wrote:
               | "Who could've predicted?" as a sarcastic response to
               | someone's stupid actions leading to entirely predictable
               | consequences is probably as old as sarcasm itself.
        
             | uvaursi wrote:
             | Europe is in perpetual shambles so I wouldn't even ask them
             | for input on anything, really. No expectations from them to
             | pioneer, innovate or drive forward anything of substance
             | that isn't the equivalent of right hand robbing the left.
        
               | saubeidl wrote:
               | I'm tired of this ol' propaganda trope.
               | 
               | * We're leading the world in fusion research.
               | https://www.pppl.gov/news/2025/wendelstein-7-x-sets-new-
               | perf...
               | 
               | * Our satellites are giving us by far the best
               | understanding of our universe, capturing one third of the
               | visible sky in incredible detail - just check out this
               | mission update video if you want your mind blown:
               | https://www.youtube.com/watch?v=rXCBFlIpvfQ
               | 
               | * Not only that, the Copernicus mission is the world's
               | leading source for open data geoobservation:
               | https://dataspace.copernicus.eu/
               | 
               | * We've given the world mRNA vaccines to solve the Covid
               | crisis and GLP-1 antagonists to solve the obesity crisis.
               | 
               | * CERN and is figuring out questions about the
               | fundamental nature of the universe, with the LHC being by
               | far the largest particle accelerator in the world, an
               | engineering precision feat that couldn't have been
               | accomplished anywhere else.
               | 
               | Pioneering, innovation and drive forward isn't just about
               | the latest tech fad. It's about fundamental research on
               | how our universe works. Everyone else is downstream of
               | us.
        
               | jacquesm wrote:
               | What a load of tripe.
        
             | seydor wrote:
             | We could add cookie warnings to AI, everybody loves those
        
             | DrNosferatu wrote:
             | Europe should act and make its own, literal, Moonshot:
             | 
             | https://ifiwaspolitical.substack.com/p/euroai-europes-
             | path-t...
        
             | saubeidl wrote:
             | Honestly, do we need to? If the Chinese release SOTA open
             | source models, why should we invest a ton just to have
             | another one? We can just use theirs, that's the beauty of
             | open source.
        
               | hex4def6 wrote:
               | For the vast majority, they're not "open source" they're
               | "open weights". They don't release the training data or
               | training code / configs.
               | 
               | It's kind of like releasing a 3d scene rendered to a JPG
               | vs actually providing someone with the assets.
               | 
               | You can still use it, and it's possible to fine-tune it,
               | but it's not really the same. There's tremendous soft
               | power in deciding LLM alignment and material emphasis. As
               | these things become more incorporated into education, for
               | instance, the ability to frame "we don't talk about ba
               | sing se" issues are going to be tremendously powerful.
        
             | alpineman wrote:
             | actually Mistral is pretty good and catching up as the
             | other leading models stagnate - the coding and OCR is
             | particularly good
        
             | utopiah wrote:
             | > we're not really doing anything particularly impressive
             | with AI sadly.
             | 
             | Well, that's true... but also nobody else is. Making
             | something popular isn't particularly impressive.
        
             | abecode wrote:
             | At ECAI conference last week there was a panel discussion
             | and someone had a great quote, "in Europe we are in the
             | golden age of AI regulation, while the US and China are in
             | the actual golden age of AI".
        
           | 0xjmp wrote:
           | I must be missing something important here. How do the
           | Chinese train these models if they don't have access to the
           | GPUs to train them?
        
             | barrell wrote:
             | I believe they mean distribution (inference). The Chinese
             | model is currently B.Y.O.GPU. The American model is GPUaaS
        
               | 0xjmp wrote:
               | Why is inference less attainable when it technically
               | requires less GPU processing to run? Kimi has a chat app
               | on their page using K2 so they must have figured out
               | inference to some extent.
        
               | throwaway314155 wrote:
               | tl;dr the person you originally responded too is wrong.
        
             | riku_iki wrote:
             | > How do the Chinese train these models if they don't have
             | access to the GPUs to train them?
             | 
             | they may be taking some western models: llama, chatgpt-oss,
             | gemma, mistral, etc, and do postraining, which required way
             | less resources.
        
           | lossolo wrote:
           | This is false. You can buy whole H100 clusters in China and
           | Alibaba, Bytedance, Tencent etc have enough cards for
           | training and inference.
           | 
           | Shenzhen 2025 https://imgur.com/a/r6tBkN3
        
         | sampton wrote:
         | Meta gave up on open weight path after DeepSeek.
        
           | gordonhart wrote:
           | It's more fair to say they gave up after the Llama 4
           | disaster.
        
             | RIMR wrote:
             | Love their nonsense excuse they they are trying to protect
             | us from misuse of "superintelligence".
             | 
             | >"We believe the benefits of superintelligence should be
             | shared with the world as broadly as possible. That said,
             | superintelligence will raise novel safety concerns. We'll
             | need to be rigorous about mitigating these risks and
             | careful about what we choose to open source." -Mark
             | Zuckerberg
             | 
             | Meta has shown us daily that they have no interest in
             | protecting anything but their profits. They certainly don't
             | intend to protect people from the harm their technology may
             | do.
             | 
             | They just know that saying "this is profitable enough for
             | us to keep it proprietary and restrict it to our own paid
             | ecosystem" will make the enthusiasts running local Llama
             | models mad at them.
        
             | brandall10 wrote:
             | Also, the Meta AI 'team' is currently retooling so they can
             | put something together with a handful of Zuck-picked
             | experts making $100m+ each rather than hundreds making ~$1m
             | each.
        
         | lvl155 wrote:
         | The answer is simply that no one would pay to use them for a
         | number of reasons including privacy. They have to give them
         | away and put up some semblance of openness. No option really.
        
           | tokioyoyo wrote:
           | I know first hand companies paying them. Chinese internal
           | software market is gigantic. Full of companies and startups
           | that have barely made into a single publication in the west.
        
             | lvl155 wrote:
             | Of course they are paying them. That's not my point. My
             | point is this is the only way for them to gain market share
             | and they need Western users to train future models. They
             | have to give them away. I'd be shocked if compute costs are
             | not heavily subsidized by CCP.
        
               | overfeed wrote:
               | > My point is this is the only way for them to gain
               | market share and they need Western users to train future
               | models.
               | 
               | And how would releasing open-weight models help with
               | that? Open-weights invite self-hosting, or worse, hosting
               | by werstern GPUaaS companies.
        
               | spwa4 wrote:
               | But the CCP only has access to the US market because they
               | joined the WTO, but when they joined the WTO they signed
               | a treaty that they wouldn't do things like that.
        
           | nylonstrung wrote:
           | There are plenty of people paying, the price/performance is
           | vastly better than the Western models
           | 
           | Deepseek 3.2 is 1% the cost of Claude and 90% of the quality
        
           | janalsncm wrote:
           | I don't think there's any privacy that OpenAI or Anthropic
           | are giving you that DeepSeek isn't giving you. ChatGPT usage
           | logs were held by court order at one point.
           | 
           | It's true that DeepSeek won't give you reliable info on
           | Tiananmen Square but I would argue that's a very rare use
           | case in practice. Most people will be writing boilerplate
           | code or summarizing mundane emails.
        
           | thomashop wrote:
           | Why is privacy a concern? You can run them in your own
           | infrastructure
        
             | fastball wrote:
             | Privacy is not a concern _because_ they are open. That is
             | the point.
        
               | thomashop wrote:
               | Ah understood i misread
        
         | jjice wrote:
         | I get what you mean, but OpenAI did release the gpt-oss in
         | August, just three months ago. I've had a very good experience
         | with those models.
         | 
         | https://openai.com/index/introducing-gpt-oss/ (August 5th)
         | 
         | I like Qwen 235 quite a bit too, and I generally agree with
         | your sentiment, but this was a very large American open source
         | model.
         | 
         | Unless we're getting into the complications on what "open
         | source" model actually means, in which case I have no clue if
         | these are just open weight or what.
        
         | raincole wrote:
         | Do you think which one has the higher market share:
         | 
         | 1) The four models you mentioned, combined
         | 
         | or
         | 
         | 2) ChatGPT
         | 
         | ?
         | 
         | What gives? Because if people are willing to pay you, you don't
         | say "ok I don't want your money I'll provide my service for
         | free."
        
           | pphysch wrote:
           | Open-weight (Chinese) models have infinitely more market
           | share in domains where giving your data to OpenAI is not
           | acceptable
           | 
           | Like research labs and so on. Even at US universities
        
       | emsign wrote:
       | > 200 to 300 consecutive tool calls
       | 
       | I love it when people leave prompt injections in random places on
       | the internet.
        
       | stingraycharles wrote:
       | Available on OpenRouter already as well in case anyone wants to
       | try it there: https://openrouter.ai/moonshotai/kimi-k2-thinking
        
         | neural_thing wrote:
         | laggy as all hell
        
       | ripped_britches wrote:
       | Please for the love of god, if you work at cerebras, please put
       | this on an API for me.
        
       | thedudeabides5 wrote:
       | great, where does it think taiwan is part of...
        
         | nylonstrung wrote:
         | I asked it that now and it gave an answer identical to English
         | language Wikipedia
         | 
         | When can we stop with these idiotic kneejerk reactions
        
           | thedudeabides5 wrote:
           | just checked, I wouldn't say it's identical but yes looks way
           | more balanced.
           | 
           | this is literally the _first_ chinese model to do that so I
           | wouldn 't call it 'knee jerk'
        
       | andrewinardeer wrote:
       | Weird. I just tried it and it fails when I ask:
       | 
       | "Tell me about the 1989 Tiananmen Square massacre".
        
         | Philpax wrote:
         | yes yes Chinese models have Chinese censorship, we don't need
         | to belabour this point every time
        
           | poszlem wrote:
           | No, we need to belabour it every time.
        
           | sabatonfan wrote:
           | Use american models to prevent chinese censorship And chinese
           | models to prevent american censorship (if any, I think there
           | might be but not sure) lol
        
         | gaogao wrote:
         | If asked non-directly, it still currently answers it -
         | https://www.kimi.com/share/19a5ab4a-e732-8b8b-8000-00008499c...
        
       | oxqbldpxo wrote:
       | In the mean time, Sam is looking at putting more servers on the
       | moon.
        
       | isusmelj wrote:
       | Is the price here correct?
       | https://openrouter.ai/moonshotai/kimi-k2-thinking Would be $0,60
       | for input and $2,50 for 1 million output tokens. If the model is
       | really that good it's 4x cheaper than comparable models. It's
       | hosted at a loss or the others have a huge margin? I might miss
       | something here. Would love some expert opinion :)
       | 
       | FYI: the non thinking variant has the same price.
        
         | burroisolator wrote:
         | In short, the others have a huge margin if you ignore training
         | costs. See https://martinalderson.com/posts/are-openai-and-
         | anthropic-re... for details.
        
           | throwdbaaway wrote:
           | Somehow that article totally ignored the insane pricing of
           | cached input tokens set by Anthropic and OpenAI. For agentic
           | coding, typically 90~95% of the inference cost is attributed
           | to cached input tokens, and a scrappy China company can do it
           | almost for free: https://api-docs.deepseek.com/news/news0802
        
         | flockonus wrote:
         | Yes, you may consider that opensource models hosted over
         | Openrouter are charging about bare hardware costs, where in
         | practice some providers there may run on subsidized hardware
         | even, so there is money to be made.
        
       | NiloCK wrote:
       | Maybe a dumb question but: what is a "reasoning model"?
       | 
       | I think I get that "reasoning" in this context refers to
       | dynamically budgeting scratchpad tokens that aren't intended as
       | the main response body. But can't any model do that, and it's
       | just part of the system prompt, or more generally, the
       | conversation scaffold that is being written to.
       | 
       | Or does a "reasoning model" specifically refer to models whose
       | "post training" / "fine tuning" / "rlhf" laps have been run
       | against those sorts of prompts rather than simpler user-
       | assistant-user-assistant back and forths?
       | 
       | EG, a base model becomes "a reasoning model" after so much
       | experience in the reasoning mines.
        
         | rcxdude wrote:
         | The latter. A reasoning model has been finetuned to use the
         | scratchpad for intermediate results (which works better than
         | just prompting a model to do the same).
        
           | NiloCK wrote:
           | I'd expect the same (fine tuning to be better than mere
           | prompting) for most anything.
           | 
           | So a model is or is not "a reasoning model" according to the
           | extent of a fine tune.
           | 
           | Are there specific benchmarks that compare models vs
           | themselves with and without scratchpads? High with:without
           | ratios being reasonier models?
           | 
           | Curious also how much a generalist model's one-shot responses
           | degrade with reasoning post-training.
        
             | bigyabai wrote:
             | > Are there specific benchmarks that compare models vs
             | themselves with and without scratchpads?
             | 
             | Yep, it's pretty common for many models to release an
             | instruction-tuned and thinking-tuned model and then bench
             | them against each other. For instance, if you scroll down
             | to "Pure text performance" there's a comparison of these
             | two Qwen models' performance:
             | https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking
        
               | walthamstow wrote:
               | Thanks for the Qwen tip. Interesting how much of a
               | difference reasoning makes for coding.
        
         | nodja wrote:
         | Any model that does thinking inside <think></think> style
         | tokens before it answers.
         | 
         | This can be done with finetuning/RL using an existing pre-
         | formatted dataset, or format based RL where the model is
         | rewarded for both answering correct and using the right format.
        
       | fragmede wrote:
       | The model's downloadable, which is generous, but it's not open
       | source.
        
       | jimnotgym wrote:
       | I was hoping this was about Summits On The Air...but no it's more
       | boring AI
        
       | aliljet wrote:
       | How does one effectively use something like this locally with
       | consumer-grade hardware?
        
         | tintor wrote:
         | Consumer-grade hardware? Even at 4bits per param you would need
         | 500GB of GPU VRAM just to load the weights. You also need VRAM
         | for KV cache.
        
       | smusamashah wrote:
       | When I open this page, all I see is a word pad like text area
       | with buttons on top and sample text inside. Don't see anything
       | about any llm. I am on phone. Page is being opened via embedded
       | view in an HN client.
        
       | mmaunder wrote:
       | Any word on what it takes to run this thing?
        
       | blobbers wrote:
       | TLDR; this is an alibaba funded start-up out of Beijing
       | 
       | Okay, I'm sorry but I have to say wtf named this thing. Moonshot
       | AI is such an overused generic name that I had to ask an LLM
       | which company this is. This is just Alibaba hedging their Qwen
       | model.
       | 
       | This company is far from "open source", it's had over $1B USD in
       | funding.
        
       ___________________________________________________________________
       (page generated 2025-11-06 23:00 UTC)