hngopher.com

       [HN Gopher] DeepSeek: Inference-Time Scaling for Generalist Rewa...
       ___________________________________________________________________
        
       DeepSeek: Inference-Time Scaling for Generalist Reward Modeling
        
       Author : tim_sw
       Score  : 149 points
       Date   : 2025-04-04 04:50 UTC (1 days ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | ALLTaken wrote:
       | Not jus being impressed that every paper coming out is SOTA, but
       | also leads the way in being Open-Source in the pure definition of
       | OSS, even with permissible licensing.
       | 
       | Let's not confuse the company with the country by over-fitting a
       | narrative. Popular media is reenforcing hatred or anything that
       | sponsors them, especially to weaker groups. Less repercussions
       | and more clicks/money to be made I guess.
       | 
       | While Politicians may hate each other, Scientists love to work
       | with other aspiring Scientists who have similar ambitions and the
       | only competition is in achieving measurable success and the
       | reward it means to the greater public.
       | 
       | Without any bias, but it's genuinely admirable when companies
       | release their sources to enable faster scientific progress
       | cycles. It's ironic that this company is dedicated to finance,
       | yet shares their progress, while non-profits and companies
       | dedicated purely to AI are locking all knowledge about their
       | findings from access.
       | 
       | Are there other companies like DeepSeek that you know of that
       | commonly release great papers? I am following Mistral already,
       | but I'd love to enrich my sources of publications that I consume.
       | Highly appreciated!
        
         | wood_spirit wrote:
         | When OpenAI surged ahead Meta ended up giving away its
         | incredibly expensive to make llama model to reduce the OpenAI
         | valuations.
         | 
         | Is DeepSeeks openness in part to reduce the big American tech
         | companies?
        
           | ALLTaken wrote:
           | Correlation isn't causation, I hate to say this, but here's
           | really applicable. Facebook aka Meta has always been very
           | opensource. Let's not talk about the license though. :)
           | 
           | Why do you imply malice in OSS companies? Or for profit
           | companies opensourcing their models and sourcecode?
        
             | mwigdahl wrote:
             | Personally I don't impute any malice whatsoever -- these
             | are soulless corporate entities -- but a for-profit company
             | with fiduciary duty to shareholders releasing expensive,
             | in-house-developed intellectual property for free certainly
             | deserves some scrutiny.
             | 
             | I tend to believe this is a "commoditize your complement"
             | strategy on Meta's part, myself. No idea what Deepseek's
             | motivation is, but it wouldn't surprise me if it was a
             | similar strategy.
        
               | eidifikwn24 wrote:
               | In its ideal form, the sum of every participant
               | commoditising their complements is how competition should
               | benefit everyone -- albeit at the expense of excess
               | returns
        
               | astrange wrote:
               | Companies basically don't have fiduciary duties to
               | shareholders. Also, Zuck has all the votes and can do
               | whatever he wants.
        
               | ALLTaken wrote:
               | This I think is closer to the truth, there can be despite
               | all fiducuiary duty an executive who just wants his way.
               | I admire being bold. OSS is in my opinion a "Co-Operation
               | request" and co-operation is in game theory a winning
               | move.
        
             | throwaway314155 wrote:
             | Meta is decidedly not an "OSS company" no matter how much
             | they put out.
        
               | SXX wrote:
               | In this case there are very few truly "OSS companies"
               | except for Red Hat and few other Linux distribution
               | maintainers. Even companies centered around open source
               | like Gitlab are usually generate most of their revenue of
               | proprietary products or use liceses like BSL.
        
               | throwaway314155 wrote:
               | > In this case there are very few truly "OSS companies"
               | except for Red Hat and few other Linux distribution
               | maintainers.
               | 
               | Okay then. Fine by me.
               | 
               | > Gitlab
               | 
               | Perfect example. They have OSS offerings. They are not an
               | OSS _company_.
               | 
               | This also serves to exclude the hundreds of VC-backed
               | "totally open source 100% not going to enshittify this
               | when our investors come asking for returns". Which,
               | again, I'm fine with.
               | 
               | The business model of the purist OSS company is not one
               | that's been found to be terribly successful.
               | Nevertheless, it _is_ one which has a sort of moral high
               | ground at least. I would prefer to leave definitions as
               | is so as to keep that distinction (of having the moral
               | high ground) crystal clear.
               | 
               | Does that make sense?
        
           | phoronixrly wrote:
           | If only totalitarian nation states used their subjects' money
           | to undermine the dominance of US-based software vendors by
           | releasing open-source alternatives created with slave
           | labour... Oh wait, it can't work because software patents are
           | here to the rescue again ... Wait, open source is communism?
           | Always has been. /s
        
         | Febra33 wrote:
         | > Let's not confuse the company with the country
         | 
         | What's wrong with China? They're wonderful in the OSS
         | ecosystem.
        
           | echelon wrote:
           | It varies on a company to company basis. BOOX, for instance,
           | are notorious GPL violators.
           | 
           | There's also significant alpha in releasing open weights
           | models. You get to slow down the market leaders to make sure
           | they don't have runaway success. It reduces moats, slows
           | funding, creates a wealth of competition, reduces margin.
           | It's a really smart move if you want to make sure there's a
           | future where you can compete with Google, OpenAI, etc.
           | There's even a chance it makes those companies bleed a
           | little. The value chain moves to differently shaped companies
           | (tools, infra) leaving space for consumer and product to not
           | necessarily be won by the "labs" companies.
        
             | ALLTaken wrote:
             | If you look at releasing "everything" from the perspective
             | of a quant and purely so, then the objective to dominate a
             | metric relevant to the quant is obviously the motive. But
             | it's impossible to prove and a very strong assumption with
             | little to no data. If DeepSeek's parent company traded on
             | the data and release of DeepSeek with quant models that
             | target affected firms with shorts before release, then
             | that's a whole new level of WOW and honestly great funds do
             | that. But this is a too big and bold of a move to underpin
             | motive.
             | 
             | But believing a man could achieve such a feat alone is
             | inspiring to be frank.
        
           | ALLTaken wrote:
           | I didn't want to be politically correct, but also not
           | insensitive. Many countries produce great things, but if we
           | measure these countries rigerously, just a few stand out.
           | Unfortunately from here on it get's messy, political,
           | unsubstantiated or backed by data that is inherently biased
           | due to selection criteria and weight.
           | 
           | It's very difficult to be truly unbiased and neutral and it's
           | not my goal, I just think it's a common thought, that needs
           | to be challenged. To associate products/results of
           | scientists, quants, engineers and companies they are employed
           | with an entire Nation is inherently simplistic.
           | 
           | In that case, why did the CIA/NSA develop TOR and made it
           | OSS? If the governments in the UK/France/Turkey are so
           | brutally against encryption, why does the USA release safe
           | encryption products?
           | 
           | If the world were absolute, we would absolutely be doomed and
           | I hope to be part of a world, where freedom of thought,
           | responsibility of each, constructive cooperation and a mesh
           | of companies can work and produce value from and with each
           | other permissionlessly. A world where Copyright/Patents are
           | not needed anymore, because a stronger framework supports the
           | individual contributor and also companies. Leftist, Right and
           | Centrists views how an economy should look like are flawed,
           | because they introduce idealogies to a mathematical non-
           | linear partially closed but mostly open system.
           | 
           | Every idealistic concept shouldn't be believed, but explored.
           | To hate one system over another one is also flawed, because
           | it doesn't produce data and forces hypothesis testing without
           | consequentially following conclusions. Economy is too complex
           | for a man to design. It shouldn't be put into a canvas of
           | restricted operations, but circuits would need to be
           | developed locally. If we empower small communities and allow
           | changes to be made quicker with less bureaucracy, this
           | seemingly grand introduction of chaos leads to emergence of a
           | larger stability of the whole. We are soo far away from that
           | man..
        
         | refulgentis wrote:
         | I love open source and the general vibe of good vibes you're
         | bringing, but...this isn't SOTA, or close, even on the papers
         | own terms. (i.e. excluding models released the last 6 months,
         | including their own, which is a strange, yet understandable,
         | choice given the results they report)
         | 
         | Quickest way to show this:
         | 
         | - Table 2, top of page 7
         | 
         | - Gemma 2 27B, 0 interventions, has 94.1/56.6/60.2
         | 
         | - Gemma 2 27B, with all _their_ interventions, has 86 /64/69.
         | 
         | - Gemma 2 27B, with all _their_ interventions, sampled 32
         | times, is at 90.4 /67.2/70.3.
         | 
         | - Gemma 2 27B came out in...June 2024. :/
         | 
         | Quick heuristics employed here:
         | 
         | - What models did they compare against? (this isn't _strictly_
         | an issue, the big screaming tell is  "What models did they
         | compare against _compared to their last N papers_? "
         | 
         | - How quickly does the paper have to move towards N samples,
         | and how big does N get before they're happy enough to conclude?
         | (32). How much does that improve performance on their chosen
         | metric? (1.8%)
        
       | resters wrote:
       | DeepSeek R1 is by far the best at writing prose of any model,
       | including Grok-3, GPT-4o, o1-pro, o3, claude, etc.
       | 
       | Paste in a snippet from a book and ask the model to continue the
       | story in the style of the snippet. It's surprising how bad most
       | of the models are.
       | 
       | Grok-3 comes in a close second, likely because it is actually
       | DeepSeek R1 with a few mods behind the scenes.
        
         | vessenes wrote:
         | why do you think that grok 3 is deepseek, out of curiosity?
        
           | azinman2 wrote:
           | Yes that's a pretty giant accusation, especially given
           | they're buying boatloads of GPUs and have previous versions
           | as well (it's not like they're starting with 3).
        
             | resters wrote:
             | 1) Grok-2 was akin to GPT-3.5
             | 
             | 2) Grok-3 comes out a month after DeepSeek R1 was open
             | sourced. I think Grok-3 is DeepSeek R1 with some added
             | params and about a month of training on the giant cluster,
             | possibly a bit of in-house secret sauce added to the model
             | or training methodology.
             | 
             | What are the chances that XAI just happened to have a
             | thinking model close to as good as revolutionary DeepSeek
             | but happened to launch it 30 days later?
             | 
             | It was both smart and pragmatic for XAI to simply use the
             | best available open source stuff and layer their own stuff
             | on top of it. Imagine they doubled the parameter count and
             | trained it for 30 days, that would not even use half of the
             | GPU power!
        
               | vessenes wrote:
               | > What are the chances that XAI just happened to have a
               | thinking model close to as good as revolutionary DeepSeek
               | but happened to launch it 30 days later?
               | 
               | Extremely, extremely good. That was in fact the real
               | point of the deepseek paper - it was extremely cheap to
               | turn a frontier(ish?) model into a reasoning model. There
               | is nothing suspicious about this timeline from an ML Ops
               | point of view.
               | 
               | In fact DeepSeek themselves in a sort of victory lap
               | released six OTHER models from other providers finetuned
               | with reasoning as part of the initial drop.
        
           | resters wrote:
           | I replied to the child of your comment
        
         | gmerc wrote:
         | If it was Elon is even more stupid than he lets on because
         | 
         | DS3: 5M training run Grok3: 400M training run
         | 
         | for 2% difference in the benchmarks.
        
       | ftbsqcfjm wrote:
       | Interesting work on open-ending language models to foster
       | imagination and narrative generation. The idea of role-playing as
       | different characters is novel. I wonder how well it would
       | generalize to non-fantasy domains and if the lack of grounding
       | could lead to hallucinations. Excited to see where this research
       | goes!
        
         | NitpickLawyer wrote:
         | > The idea of role-playing as different characters is novel.
         | 
         | It is not. I remember Karpathy being really excited about the
         | "1 million gpt personas" dataset and highlighted it as a way to
         | avoid reward hacking in RLAIF. That was 3-6 months ago I
         | believe.
         | 
         | Of course paper / code / weights beats idea, and it's exciting
         | to see how far this can go.
        
       | mentalgear wrote:
       | Happy to see deekseek using the correct (and much more idiomatic)
       | term "inference-time scaling", instead of the grotesque
       | construction of "test-time compute" that openAI came up with.
        
       | bilsbie wrote:
       | Any idea why I lost interest in deep seek? I used it and grok3 a
       | whole bunch when they first came out but now I've fallen back to
       | Claude for everything.
        
         | manmal wrote:
         | For coding, I'm finding Claude's responses most to the point
         | and on-task. While many other models try to extrapolate or
         | lecture or patronize. DeepSeek is pretty good though. Maybe
         | it's the high latency (probably due to prompt processing)?
        
         | UltraSane wrote:
         | Claude is love. Claude is life.
        
       ___________________________________________________________________
       (page generated 2025-04-05 23:02 UTC)