[HN Gopher] Kimi-K2 Tech Report [pdf]
       ___________________________________________________________________
        
       Kimi-K2 Tech Report [pdf]
        
       Author : swyx
       Score  : 47 points
       Date   : 2025-07-21 20:03 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | dang wrote:
       | Related. Others?
       | 
       |  _China 's moonshot launches free AI model Kimi K2 that
       | outperforms GPT4_ - https://news.ycombinator.com/item?id=44575309
       | - July 2025 (3 comments)
       | 
       |  _Kimi K2 and when "DeepSeek Moments" become normal_ -
       | https://news.ycombinator.com/item?id=44561565 - July 2025 (2
       | comments)
       | 
       |  _Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language
       | model_ - https://news.ycombinator.com/item?id=44533403 - July
       | 2025 (178 comments)
        
       | jtrn wrote:
       | The results without the fluff:
       | 
       | Model Architecture * Type: Mixture-of-Experts (MoE) transformer
       | model. * Total Parameters: 1 trillion. * Activated Parameters: 32
       | billion. * Experts: 384 total experts, with 8 activated per
       | token. * Attention Heads: 64.
       | 
       | Pre-training * Optimizer: A novel optimizer named MuonClip was
       | used. It integrates the Muon optimizer with a QK-Clip mechanism
       | to address training instability. * Dataset: The model was pre-
       | trained on 15.5 trillion tokens. * Training Process: Kimi K2 was
       | trained with zero loss spikes. The initial context window was
       | 4,096 tokens, later extended to 128k tokens using the YaRN
       | method.
       | 
       | Post-training * The model underwent a multi-stage process
       | featuring a large-scale agentic data synthesis pipeline and a
       | joint reinforcement learning (RL) stage. * The RL framework
       | combines verifiable rewards with a self-critique rubric reward
       | mechanism. * A data synthesis pipeline generated tens of
       | thousands of tool-use training examples.
       | 
       | Performance Benchmarks (non-thinking mode) * SWE-bench Verified:
       | 65.8%. * SWE-bench Multilingual: 47.3%. * LiveCodeBench v6:
       | 53.7%. * OJBench: 27.1%. * Tau2-Bench micro-average: 66.1. *
       | ACEBench (en): 76.5. * AIME 2025: 49.5. * GPQA-Diamond: 75.1. *
       | LMSYS Arena Leaderboard (July 17, 2025): Ranked 1st among open-
       | source models and 5th overall.
        
       ___________________________________________________________________
       (page generated 2025-07-23 23:00 UTC)