[HN Gopher] DeepSeek: Inference-Time Scaling for Generalist Rewa...
___________________________________________________________________
DeepSeek: Inference-Time Scaling for Generalist Reward Modeling
Author : tim_sw
Score : 149 points
Date : 2025-04-04 04:50 UTC (1 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| ALLTaken wrote:
| Not jus being impressed that every paper coming out is SOTA, but
| also leads the way in being Open-Source in the pure definition of
| OSS, even with permissible licensing.
|
| Let's not confuse the company with the country by over-fitting a
| narrative. Popular media is reenforcing hatred or anything that
| sponsors them, especially to weaker groups. Less repercussions
| and more clicks/money to be made I guess.
|
| While Politicians may hate each other, Scientists love to work
| with other aspiring Scientists who have similar ambitions and the
| only competition is in achieving measurable success and the
| reward it means to the greater public.
|
| Without any bias, but it's genuinely admirable when companies
| release their sources to enable faster scientific progress
| cycles. It's ironic that this company is dedicated to finance,
| yet shares their progress, while non-profits and companies
| dedicated purely to AI are locking all knowledge about their
| findings from access.
|
| Are there other companies like DeepSeek that you know of that
| commonly release great papers? I am following Mistral already,
| but I'd love to enrich my sources of publications that I consume.
| Highly appreciated!
| wood_spirit wrote:
| When OpenAI surged ahead Meta ended up giving away its
| incredibly expensive to make llama model to reduce the OpenAI
| valuations.
|
| Is DeepSeeks openness in part to reduce the big American tech
| companies?
| ALLTaken wrote:
| Correlation isn't causation, I hate to say this, but here's
| really applicable. Facebook aka Meta has always been very
| opensource. Let's not talk about the license though. :)
|
| Why do you imply malice in OSS companies? Or for profit
| companies opensourcing their models and sourcecode?
| mwigdahl wrote:
| Personally I don't impute any malice whatsoever -- these
| are soulless corporate entities -- but a for-profit company
| with fiduciary duty to shareholders releasing expensive,
| in-house-developed intellectual property for free certainly
| deserves some scrutiny.
|
| I tend to believe this is a "commoditize your complement"
| strategy on Meta's part, myself. No idea what Deepseek's
| motivation is, but it wouldn't surprise me if it was a
| similar strategy.
| eidifikwn24 wrote:
| In its ideal form, the sum of every participant
| commoditising their complements is how competition should
| benefit everyone -- albeit at the expense of excess
| returns
| astrange wrote:
| Companies basically don't have fiduciary duties to
| shareholders. Also, Zuck has all the votes and can do
| whatever he wants.
| ALLTaken wrote:
| This I think is closer to the truth, there can be despite
| all fiducuiary duty an executive who just wants his way.
| I admire being bold. OSS is in my opinion a "Co-Operation
| request" and co-operation is in game theory a winning
| move.
| throwaway314155 wrote:
| Meta is decidedly not an "OSS company" no matter how much
| they put out.
| SXX wrote:
| In this case there are very few truly "OSS companies"
| except for Red Hat and few other Linux distribution
| maintainers. Even companies centered around open source
| like Gitlab are usually generate most of their revenue of
| proprietary products or use liceses like BSL.
| throwaway314155 wrote:
| > In this case there are very few truly "OSS companies"
| except for Red Hat and few other Linux distribution
| maintainers.
|
| Okay then. Fine by me.
|
| > Gitlab
|
| Perfect example. They have OSS offerings. They are not an
| OSS _company_.
|
| This also serves to exclude the hundreds of VC-backed
| "totally open source 100% not going to enshittify this
| when our investors come asking for returns". Which,
| again, I'm fine with.
|
| The business model of the purist OSS company is not one
| that's been found to be terribly successful.
| Nevertheless, it _is_ one which has a sort of moral high
| ground at least. I would prefer to leave definitions as
| is so as to keep that distinction (of having the moral
| high ground) crystal clear.
|
| Does that make sense?
| phoronixrly wrote:
| If only totalitarian nation states used their subjects' money
| to undermine the dominance of US-based software vendors by
| releasing open-source alternatives created with slave
| labour... Oh wait, it can't work because software patents are
| here to the rescue again ... Wait, open source is communism?
| Always has been. /s
| Febra33 wrote:
| > Let's not confuse the company with the country
|
| What's wrong with China? They're wonderful in the OSS
| ecosystem.
| echelon wrote:
| It varies on a company to company basis. BOOX, for instance,
| are notorious GPL violators.
|
| There's also significant alpha in releasing open weights
| models. You get to slow down the market leaders to make sure
| they don't have runaway success. It reduces moats, slows
| funding, creates a wealth of competition, reduces margin.
| It's a really smart move if you want to make sure there's a
| future where you can compete with Google, OpenAI, etc.
| There's even a chance it makes those companies bleed a
| little. The value chain moves to differently shaped companies
| (tools, infra) leaving space for consumer and product to not
| necessarily be won by the "labs" companies.
| ALLTaken wrote:
| If you look at releasing "everything" from the perspective
| of a quant and purely so, then the objective to dominate a
| metric relevant to the quant is obviously the motive. But
| it's impossible to prove and a very strong assumption with
| little to no data. If DeepSeek's parent company traded on
| the data and release of DeepSeek with quant models that
| target affected firms with shorts before release, then
| that's a whole new level of WOW and honestly great funds do
| that. But this is a too big and bold of a move to underpin
| motive.
|
| But believing a man could achieve such a feat alone is
| inspiring to be frank.
| ALLTaken wrote:
| I didn't want to be politically correct, but also not
| insensitive. Many countries produce great things, but if we
| measure these countries rigerously, just a few stand out.
| Unfortunately from here on it get's messy, political,
| unsubstantiated or backed by data that is inherently biased
| due to selection criteria and weight.
|
| It's very difficult to be truly unbiased and neutral and it's
| not my goal, I just think it's a common thought, that needs
| to be challenged. To associate products/results of
| scientists, quants, engineers and companies they are employed
| with an entire Nation is inherently simplistic.
|
| In that case, why did the CIA/NSA develop TOR and made it
| OSS? If the governments in the UK/France/Turkey are so
| brutally against encryption, why does the USA release safe
| encryption products?
|
| If the world were absolute, we would absolutely be doomed and
| I hope to be part of a world, where freedom of thought,
| responsibility of each, constructive cooperation and a mesh
| of companies can work and produce value from and with each
| other permissionlessly. A world where Copyright/Patents are
| not needed anymore, because a stronger framework supports the
| individual contributor and also companies. Leftist, Right and
| Centrists views how an economy should look like are flawed,
| because they introduce idealogies to a mathematical non-
| linear partially closed but mostly open system.
|
| Every idealistic concept shouldn't be believed, but explored.
| To hate one system over another one is also flawed, because
| it doesn't produce data and forces hypothesis testing without
| consequentially following conclusions. Economy is too complex
| for a man to design. It shouldn't be put into a canvas of
| restricted operations, but circuits would need to be
| developed locally. If we empower small communities and allow
| changes to be made quicker with less bureaucracy, this
| seemingly grand introduction of chaos leads to emergence of a
| larger stability of the whole. We are soo far away from that
| man..
| refulgentis wrote:
| I love open source and the general vibe of good vibes you're
| bringing, but...this isn't SOTA, or close, even on the papers
| own terms. (i.e. excluding models released the last 6 months,
| including their own, which is a strange, yet understandable,
| choice given the results they report)
|
| Quickest way to show this:
|
| - Table 2, top of page 7
|
| - Gemma 2 27B, 0 interventions, has 94.1/56.6/60.2
|
| - Gemma 2 27B, with all _their_ interventions, has 86 /64/69.
|
| - Gemma 2 27B, with all _their_ interventions, sampled 32
| times, is at 90.4 /67.2/70.3.
|
| - Gemma 2 27B came out in...June 2024. :/
|
| Quick heuristics employed here:
|
| - What models did they compare against? (this isn't _strictly_
| an issue, the big screaming tell is "What models did they
| compare against _compared to their last N papers_? "
|
| - How quickly does the paper have to move towards N samples,
| and how big does N get before they're happy enough to conclude?
| (32). How much does that improve performance on their chosen
| metric? (1.8%)
| resters wrote:
| DeepSeek R1 is by far the best at writing prose of any model,
| including Grok-3, GPT-4o, o1-pro, o3, claude, etc.
|
| Paste in a snippet from a book and ask the model to continue the
| story in the style of the snippet. It's surprising how bad most
| of the models are.
|
| Grok-3 comes in a close second, likely because it is actually
| DeepSeek R1 with a few mods behind the scenes.
| vessenes wrote:
| why do you think that grok 3 is deepseek, out of curiosity?
| azinman2 wrote:
| Yes that's a pretty giant accusation, especially given
| they're buying boatloads of GPUs and have previous versions
| as well (it's not like they're starting with 3).
| resters wrote:
| 1) Grok-2 was akin to GPT-3.5
|
| 2) Grok-3 comes out a month after DeepSeek R1 was open
| sourced. I think Grok-3 is DeepSeek R1 with some added
| params and about a month of training on the giant cluster,
| possibly a bit of in-house secret sauce added to the model
| or training methodology.
|
| What are the chances that XAI just happened to have a
| thinking model close to as good as revolutionary DeepSeek
| but happened to launch it 30 days later?
|
| It was both smart and pragmatic for XAI to simply use the
| best available open source stuff and layer their own stuff
| on top of it. Imagine they doubled the parameter count and
| trained it for 30 days, that would not even use half of the
| GPU power!
| vessenes wrote:
| > What are the chances that XAI just happened to have a
| thinking model close to as good as revolutionary DeepSeek
| but happened to launch it 30 days later?
|
| Extremely, extremely good. That was in fact the real
| point of the deepseek paper - it was extremely cheap to
| turn a frontier(ish?) model into a reasoning model. There
| is nothing suspicious about this timeline from an ML Ops
| point of view.
|
| In fact DeepSeek themselves in a sort of victory lap
| released six OTHER models from other providers finetuned
| with reasoning as part of the initial drop.
| resters wrote:
| I replied to the child of your comment
| gmerc wrote:
| If it was Elon is even more stupid than he lets on because
|
| DS3: 5M training run Grok3: 400M training run
|
| for 2% difference in the benchmarks.
| ftbsqcfjm wrote:
| Interesting work on open-ending language models to foster
| imagination and narrative generation. The idea of role-playing as
| different characters is novel. I wonder how well it would
| generalize to non-fantasy domains and if the lack of grounding
| could lead to hallucinations. Excited to see where this research
| goes!
| NitpickLawyer wrote:
| > The idea of role-playing as different characters is novel.
|
| It is not. I remember Karpathy being really excited about the
| "1 million gpt personas" dataset and highlighted it as a way to
| avoid reward hacking in RLAIF. That was 3-6 months ago I
| believe.
|
| Of course paper / code / weights beats idea, and it's exciting
| to see how far this can go.
| mentalgear wrote:
| Happy to see deekseek using the correct (and much more idiomatic)
| term "inference-time scaling", instead of the grotesque
| construction of "test-time compute" that openAI came up with.
| bilsbie wrote:
| Any idea why I lost interest in deep seek? I used it and grok3 a
| whole bunch when they first came out but now I've fallen back to
| Claude for everything.
| manmal wrote:
| For coding, I'm finding Claude's responses most to the point
| and on-task. While many other models try to extrapolate or
| lecture or patronize. DeepSeek is pretty good though. Maybe
| it's the high latency (probably due to prompt processing)?
| UltraSane wrote:
| Claude is love. Claude is life.
___________________________________________________________________
(page generated 2025-04-05 23:02 UTC)