[HN Gopher] DeepSeek: Inference-Time Scaling for Generalist Rewa...
       ___________________________________________________________________
        
       DeepSeek: Inference-Time Scaling for Generalist Reward Modeling
        
       Author : tim_sw
       Score  : 105 points
       Date   : 2025-04-04 04:50 UTC (18 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | ALLTaken wrote:
       | Not jus being impressed that every paper coming out is SOTA, but
       | also leads the way in being Open-Source in the pure definition of
       | OSS, even with permissible licensing.
       | 
       | Let's not confuse the company with the country by over-fitting a
       | narrative. Popular media is reenforcing hatred or anything that
       | sponsors them, especially to weaker groups. Less repercussions
       | and more clicks/money to be made I guess.
       | 
       | While Politicians may hate each other, Scientists love to work
       | with other aspiring Scientists who have similar ambitions and the
       | only competition is in achieving measurable success and the
       | reward it means to the greater public.
       | 
       | Without any bias, but it's genuinely admirable when companies
       | release their sources to enable faster scientific progress
       | cycles. It's ironic that this company is dedicated to finance,
       | yet shares their progress, while non-profits and companies
       | dedicated purely to AI are locking all knowledge about their
       | findings from access.
       | 
       | Are there other companies like DeepSeek that you know of that
       | commonly release great papers? I am following Mistral already,
       | but I'd love to enrich my sources of publications that I consume.
       | Highly appreciated!
        
         | wood_spirit wrote:
         | When OpenAI surged ahead Meta ended up giving away its
         | incredibly expensive to make llama model to reduce the OpenAI
         | valuations.
         | 
         | Is DeepSeeks openness in part to reduce the big American tech
         | companies?
        
           | ALLTaken wrote:
           | Correlation isn't causation, I hate to say this, but here's
           | really applicable. Facebook aka Meta has always been very
           | opensource. Let's not talk about the license though. :)
           | 
           | Why do you imply malice in OSS companies? Or for profit
           | companies opensourcing their models and sourcecode?
        
             | mwigdahl wrote:
             | Personally I don't impute any malice whatsoever -- these
             | are soulless corporate entities -- but a for-profit company
             | with fiduciary duty to shareholders releasing expensive,
             | in-house-developed intellectual property for free certainly
             | deserves some scrutiny.
             | 
             | I tend to believe this is a "commoditize your complement"
             | strategy on Meta's part, myself. No idea what Deepseek's
             | motivation is, but it wouldn't surprise me if it was a
             | similar strategy.
        
             | throwaway314155 wrote:
             | Meta is decidedly not an "OSS company" no matter how much
             | they put out.
        
               | SXX wrote:
               | In this case there are very few truly "OSS companies"
               | except for Red Hat and few other Linux distribution
               | maintainers. Even companies centered around open source
               | like Gitlab are usually generate most of their revenue of
               | proprietary products or use liceses like BSL.
        
           | phoronixrly wrote:
           | If only totalitarian nation states used their subjects' money
           | to undermine the dominance of US-based software vendors by
           | releasing open-source alternatives created with slave
           | labour... Oh wait, it can't work because software patents are
           | here to the rescue again ... Wait, open source is communism?
           | Always has been. /s
        
         | Febra33 wrote:
         | > Let's not confuse the company with the country
         | 
         | What's wrong with China? They're wonderful in the OSS
         | ecosystem.
        
           | echelon wrote:
           | It varies on a company to company basis. BOOX, for instance,
           | are notorious GPL violators.
           | 
           | There's also significant alpha in releasing open weights
           | models. You get to slow down the market leaders to make sure
           | they don't have runaway success. It reduces moats, slows
           | funding, creates a wealth of competition, reduces margin.
           | It's a really smart move if you want to make sure there's a
           | future where you can compete with Google, OpenAI, etc.
           | There's even a chance it makes those companies bleed a
           | little. The value chain moves to differently shaped companies
           | (tools, infra) leaving space for consumer and product to not
           | necessarily be won by the "labs" companies.
        
         | refulgentis wrote:
         | I love open source and the general vibe of good vibes you're
         | bringing, but...this isn't SOTA, or close, even on the papers
         | own terms. (i.e. excluding models released the last 6 months,
         | including their own, which is a strange, yet understandable,
         | choice given the results they report)
         | 
         | Quickest way to show this:
         | 
         | - Table 2, top of page 7
         | 
         | - Gemma 2 27B, 0 interventions, has 94.1/56.6/60.2
         | 
         | - Gemma 2 27B, with all _their_ interventions, has 86 /64/69.
         | 
         | - Gemma 2 27B, with all _their_ interventions, sampled 32
         | times, is at 90.4 /67.2/70.3.
         | 
         | - Gemma 2 27B came out in...June 2024. :/
         | 
         | Quick heuristics employed here:
         | 
         | - What models did they compare against? (this isn't _strictly_
         | an issue, the big screaming tell is  "What models did they
         | compare against _compared to their last N papers_? "
         | 
         | - How quickly does the paper have to move towards N samples,
         | and how big does N get before they're happy enough to conclude?
         | (32). How much does that improve performance on their chosen
         | metric? (1.8%)
        
       | resters wrote:
       | DeepSeek R1 is by far the best at writing prose of any model,
       | including Grok-3, GPT-4o, o1-pro, o3, claude, etc.
       | 
       | Paste in a snippet from a book and ask the model to continue the
       | story in the style of the snippet. It's surprising how bad most
       | of the models are.
       | 
       | Grok-3 comes in a close second, likely because it is actually
       | DeepSeek R1 with a few mods behind the scenes.
        
         | vessenes wrote:
         | why do you think that grok 3 is deepseek, out of curiosity?
        
           | azinman2 wrote:
           | Yes that's a pretty giant accusation, especially given
           | they're buying boatloads of GPUs and have previous versions
           | as well (it's not like they're starting with 3).
        
       | ftbsqcfjm wrote:
       | Interesting work on open-ending language models to foster
       | imagination and narrative generation. The idea of role-playing as
       | different characters is novel. I wonder how well it would
       | generalize to non-fantasy domains and if the lack of grounding
       | could lead to hallucinations. Excited to see where this research
       | goes!
        
         | NitpickLawyer wrote:
         | > The idea of role-playing as different characters is novel.
         | 
         | It is not. I remember Karpathy being really excited about the
         | "1 million gpt personas" dataset and highlighted it as a way to
         | avoid reward hacking in RLAIF. That was 3-6 months ago I
         | believe.
         | 
         | Of course paper / code / weights beats idea, and it's exciting
         | to see how far this can go.
        
       | mentalgear wrote:
       | Happy to see deekseek using the correct (and much more idiomatic)
       | term "inference-time scaling", instead of the grotesque
       | construction of "test-time compute" that openAI came up with.
        
       | bilsbie wrote:
       | Any idea why I lost interest in deep seek? I used it and grok3 a
       | whole bunch when they first came out but now I've fallen back to
       | Claude for everything.
        
       ___________________________________________________________________
       (page generated 2025-04-04 23:01 UTC)