[HN Gopher] DeepSeek: Inference-Time Scaling for Generalist Rewa...
___________________________________________________________________
DeepSeek: Inference-Time Scaling for Generalist Reward Modeling
Author : tim_sw
Score : 105 points
Date : 2025-04-04 04:50 UTC (18 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| ALLTaken wrote:
| Not jus being impressed that every paper coming out is SOTA, but
| also leads the way in being Open-Source in the pure definition of
| OSS, even with permissible licensing.
|
| Let's not confuse the company with the country by over-fitting a
| narrative. Popular media is reenforcing hatred or anything that
| sponsors them, especially to weaker groups. Less repercussions
| and more clicks/money to be made I guess.
|
| While Politicians may hate each other, Scientists love to work
| with other aspiring Scientists who have similar ambitions and the
| only competition is in achieving measurable success and the
| reward it means to the greater public.
|
| Without any bias, but it's genuinely admirable when companies
| release their sources to enable faster scientific progress
| cycles. It's ironic that this company is dedicated to finance,
| yet shares their progress, while non-profits and companies
| dedicated purely to AI are locking all knowledge about their
| findings from access.
|
| Are there other companies like DeepSeek that you know of that
| commonly release great papers? I am following Mistral already,
| but I'd love to enrich my sources of publications that I consume.
| Highly appreciated!
| wood_spirit wrote:
| When OpenAI surged ahead Meta ended up giving away its
| incredibly expensive to make llama model to reduce the OpenAI
| valuations.
|
| Is DeepSeeks openness in part to reduce the big American tech
| companies?
| ALLTaken wrote:
| Correlation isn't causation, I hate to say this, but here's
| really applicable. Facebook aka Meta has always been very
| opensource. Let's not talk about the license though. :)
|
| Why do you imply malice in OSS companies? Or for profit
| companies opensourcing their models and sourcecode?
| mwigdahl wrote:
| Personally I don't impute any malice whatsoever -- these
| are soulless corporate entities -- but a for-profit company
| with fiduciary duty to shareholders releasing expensive,
| in-house-developed intellectual property for free certainly
| deserves some scrutiny.
|
| I tend to believe this is a "commoditize your complement"
| strategy on Meta's part, myself. No idea what Deepseek's
| motivation is, but it wouldn't surprise me if it was a
| similar strategy.
| throwaway314155 wrote:
| Meta is decidedly not an "OSS company" no matter how much
| they put out.
| SXX wrote:
| In this case there are very few truly "OSS companies"
| except for Red Hat and few other Linux distribution
| maintainers. Even companies centered around open source
| like Gitlab are usually generate most of their revenue of
| proprietary products or use liceses like BSL.
| phoronixrly wrote:
| If only totalitarian nation states used their subjects' money
| to undermine the dominance of US-based software vendors by
| releasing open-source alternatives created with slave
| labour... Oh wait, it can't work because software patents are
| here to the rescue again ... Wait, open source is communism?
| Always has been. /s
| Febra33 wrote:
| > Let's not confuse the company with the country
|
| What's wrong with China? They're wonderful in the OSS
| ecosystem.
| echelon wrote:
| It varies on a company to company basis. BOOX, for instance,
| are notorious GPL violators.
|
| There's also significant alpha in releasing open weights
| models. You get to slow down the market leaders to make sure
| they don't have runaway success. It reduces moats, slows
| funding, creates a wealth of competition, reduces margin.
| It's a really smart move if you want to make sure there's a
| future where you can compete with Google, OpenAI, etc.
| There's even a chance it makes those companies bleed a
| little. The value chain moves to differently shaped companies
| (tools, infra) leaving space for consumer and product to not
| necessarily be won by the "labs" companies.
| refulgentis wrote:
| I love open source and the general vibe of good vibes you're
| bringing, but...this isn't SOTA, or close, even on the papers
| own terms. (i.e. excluding models released the last 6 months,
| including their own, which is a strange, yet understandable,
| choice given the results they report)
|
| Quickest way to show this:
|
| - Table 2, top of page 7
|
| - Gemma 2 27B, 0 interventions, has 94.1/56.6/60.2
|
| - Gemma 2 27B, with all _their_ interventions, has 86 /64/69.
|
| - Gemma 2 27B, with all _their_ interventions, sampled 32
| times, is at 90.4 /67.2/70.3.
|
| - Gemma 2 27B came out in...June 2024. :/
|
| Quick heuristics employed here:
|
| - What models did they compare against? (this isn't _strictly_
| an issue, the big screaming tell is "What models did they
| compare against _compared to their last N papers_? "
|
| - How quickly does the paper have to move towards N samples,
| and how big does N get before they're happy enough to conclude?
| (32). How much does that improve performance on their chosen
| metric? (1.8%)
| resters wrote:
| DeepSeek R1 is by far the best at writing prose of any model,
| including Grok-3, GPT-4o, o1-pro, o3, claude, etc.
|
| Paste in a snippet from a book and ask the model to continue the
| story in the style of the snippet. It's surprising how bad most
| of the models are.
|
| Grok-3 comes in a close second, likely because it is actually
| DeepSeek R1 with a few mods behind the scenes.
| vessenes wrote:
| why do you think that grok 3 is deepseek, out of curiosity?
| azinman2 wrote:
| Yes that's a pretty giant accusation, especially given
| they're buying boatloads of GPUs and have previous versions
| as well (it's not like they're starting with 3).
| ftbsqcfjm wrote:
| Interesting work on open-ending language models to foster
| imagination and narrative generation. The idea of role-playing as
| different characters is novel. I wonder how well it would
| generalize to non-fantasy domains and if the lack of grounding
| could lead to hallucinations. Excited to see where this research
| goes!
| NitpickLawyer wrote:
| > The idea of role-playing as different characters is novel.
|
| It is not. I remember Karpathy being really excited about the
| "1 million gpt personas" dataset and highlighted it as a way to
| avoid reward hacking in RLAIF. That was 3-6 months ago I
| believe.
|
| Of course paper / code / weights beats idea, and it's exciting
| to see how far this can go.
| mentalgear wrote:
| Happy to see deekseek using the correct (and much more idiomatic)
| term "inference-time scaling", instead of the grotesque
| construction of "test-time compute" that openAI came up with.
| bilsbie wrote:
| Any idea why I lost interest in deep seek? I used it and grok3 a
| whole bunch when they first came out but now I've fallen back to
| Claude for everything.
___________________________________________________________________
(page generated 2025-04-04 23:01 UTC)