[HN Gopher] Kimi K1.5: Scaling Reinforcement Learning with LLMs
       ___________________________________________________________________
        
       Kimi K1.5: Scaling Reinforcement Learning with LLMs
        
       Author : noch
       Score  : 171 points
       Date   : 2025-01-21 08:53 UTC (14 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | NitpickLawyer wrote:
       | Really unfortunate timing with Deepseek-R1 and the distills
       | coming out at basically the same time. Hard for people to pay
       | attention to, and plus open source > API, even if the results are
       | a bit lower.
        
         | miohtama wrote:
         | DeepSeek is not open source, as the source of its 14.8T high
         | quality training tokens is not disclosed.
        
           | NitpickLawyer wrote:
           | Yawn. This has been debated ad nauseam. If you want to feel
           | that way it's up to you, but disclosing means and hidden
           | information has never been a requirement for open source. As
           | long as the thing is licensed under a permissive license (MIT
           | in this case), and you can see the data, change the data and
           | re-publish the data, it's open source.
        
       | cuuupid wrote:
       | I really, really dislike when companies use GitHub to promote
       | their product by posting a "research paper" and a code sample.
       | 
       | It's not even an SDK, library, etc., it's just advertising.
       | 
       | I've noticed a number of China-based labs do this; they will
       | often post a really cool demo, some images, and then either an
       | API or just nothing except advertising for their company (e.g.
       | model may not even exist). Often they will also promise in some
       | GitHub issue that they will release the weights, and never do.
       | 
       | I'd love to see some sort of study here, I wonder what % of "omg
       | really cool AI model!!!" hype papers [1] never provide an API,
       | [2] cannot be reproduced at all, and/or [3] promise but never
       | provide weights. If this was any other field, academics would be
       | up in arms about likely fraud, false advertising, etc.
        
         | diggan wrote:
         | It's not just Chinese labs that do this, lots of companies
         | upload a README to a GitHub repository then link that
         | repository from the website, I guess so they can have a GitHub
         | icon somewhere on the website?
         | 
         | Submission is basically a form for requesting access to their
         | closed API (which ironically is called "OpenPlatform" for some
         | reason).
        
           | rfoo wrote:
           | > which ironically is called "OpenPlatform" for some reason
           | 
           | This is pretty weird, the original text is Kai Fang Ping Tai
           | , but it basically is another name for "API" in China.
           | 
           | Not sure who started this, but it's really popular, for
           | example, WeChat has an "Open Platform":
           | https://open.weixin.qq.com/. AliPay too:
           | https://open.alipay.com/. And peak strangeness, Alibaba Cloud
           | (whose API is largely an AWS clone): https://open.aliyun.com/
        
             | diggan wrote:
             | Same thing in English, you have huge enterprises which
             | basically operate on the complete opposite end of the
             | spectrum, and end up calling themselves things like
             | "OpenAI".
             | 
             | It even bleeds into marketing pages, go to the Llama
             | website and you see "open source model" plastered all over
             | the place, completely misusing both the "open" and "source"
             | parts of it.
        
             | v3ss0n wrote:
             | How about OpenAI?
        
               | llm_trw wrote:
               | You mean the charity* foundation** Open***AI?
        
         | visarga wrote:
         | That is unfortunate but they do present some theoretical
         | insights about scaling context length and probably a more
         | efficient way to do RL. Even knowledge about it can have an
         | effect on next iterations from other labs.
        
         | prjkt wrote:
         | These types of "repositories" should contain some kind of
         | flag/indication that it contains no source code, similar to
         | when a repo is archived
        
           | whimsicalism wrote:
           | really? it takes like 1 second looking at the file structure
           | to see what it is, maybe like 2 seconds if you're hopeful
           | "images" somehow refers to a dockerfile or something
        
         | whimsicalism wrote:
         | ...but they do provide an APi.
         | 
         | HN is really not beating the bikeshedding allegations
        
           | ensignavenger wrote:
           | Do they? I see a note that says it will be "available soon"?
        
           | cuuupid wrote:
           | They don't it's just promised in the future(tm). And even
           | then, it should be a webpage on their website or API
           | documentation, not a GitHub repo.
           | 
           | It's not bikeshedding to expect a source code repository to
           | have source code...
        
         | sgtpepper13 wrote:
         | I'm one of the authors of the paper. Thanks for raising a good
         | point. It will be better if we upload the paper to arxiv but
         | it's MLK in the US so submissions will be delayed by a couple
         | of days. And we just can't wait to share some of the knowledge
         | we gained from our experiments. Hope they will be useful for
         | the community. Would much appreciate it if you have an idea
         | about a better site for this. That said our API requests are
         | open and we'll roll out more in the next few days depending on
         | our server resources.
        
       | asah wrote:
       | The set of math/logic problems behind AIME 2024 appears to be...
       | https://artofproblemsolving.com/wiki/index.php/2024_AIME_I_P...
       | 
       | Impressive stuff! But unclear to me if it's literally just these
       | 15 or if there's a large problem set...
        
         | whimsicalism wrote:
         | doesn't seem too hard to me, shame i was never exposed to this
         | stuff in highschool
         | 
         | e: oh i see, they get progressively harder
        
         | codelion wrote:
         | The full dataset is here - https://huggingface.co/datasets/AI-
         | MO/aimo-validation-aime you can use the eval script I have in
         | optillm to benchmark on it -
         | https://github.com/codelion/optillm/blob/main/scripts/eval_a...
        
       | zurfer wrote:
       | Is it fair to say that 2 of the 3 leading models are from Chinese
       | labs? It's really incredible how fast China has caught up.
        
         | idiotsecant wrote:
         | Its not all that surprising that the country with 20% of the
         | population of earth has some smart people in it. What is, I
         | think, surprising and fascinating is how China has been
         | focusing on doing more with less - their underdog position
         | w.r.t. hardware has pushed a huge focus on model efficiency and
         | distillation, to the benefit of us all.
         | 
         | I think its a distinct possibility that while the first AGI to
         | say 'hello world' might do it in english, the first open source
         | AGI running on consumer hardware will probably say it in
         | mandarin.
        
           | whimsicalism wrote:
           | > Its not all that surprising that the country with 20% of
           | the population of earth has some smart people in it.
           | 
           | where's india's reasoning models? what about entire continent
           | of africa? i'd be curious if they even have a single h100 on
           | the continent
        
             | airstrike wrote:
             | I guess central planning can be very effective when playing
             | catch up on greenfield projects of massive scale
        
               | eunos wrote:
               | Well DeepSeek hasn't really caught the eye of the
               | leadership before they released R1.
        
       | joaohkfaria wrote:
       | But wait, which LLM models were used to train Kimi? It wasn't
       | clear on the report.
        
       ___________________________________________________________________
       (page generated 2025-01-21 23:01 UTC)