[HN Gopher] Reinforcement Pre-Training
___________________________________________________________________
Reinforcement Pre-Training
Author : frozenseven
Score : 51 points
Date : 2025-06-10 05:30 UTC (17 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| hzia wrote:
| This is very exciting! Existing data will become a lot more
| valuable and it brings it one step closer to how we learn as
| humans!
|
| The downside is that this is going to be extremely expensive, so
| the data set to conduct RL will need to be curated.
| watsonmusic wrote:
| cannot wait seeing how it goes beyond the current llm training
| pipeline
| nsagent wrote:
| It's clear that you're either one of the authors or a friend
| of theirs. You created this account 8 months ago to comment
| on another paper [1] that was released by the same authors.
|
| [1]: https://news.ycombinator.com/item?id=41776324
| dgshsg wrote:
| I notice that you can do this recursively to arbitrary depth. The
| cost is terrible though.
| watsonmusic wrote:
| it could be adaptive. only high-value tokens were allocated
| with more compute
| babelfish wrote:
| So marginally better (and occasionally worse) performance for an
| order of magnitude larger training costs...?
| watsonmusic wrote:
| 14b model performs comparably with 32b size. the improvement is
| huge
| 85392_school wrote:
| are we only comparing them in terms of text completion
| accuracy? does it also improve perf on benchmarks?
| watsonmusic wrote:
| A new scaling paradigm finally comes out!
| beauzero wrote:
| Interesting
| NotAnOtter wrote:
| I'm interested how an innovation like this affects the business
| prospects.
|
| Let's assume this is a paradigm shift on the scale of
| Transformers / `Attention is all you need`. Companies build out
| new models and pump another $100 Billion through it. And then a
| year from now, another innovation comes out. Same circus. And
| again.
|
| No one wants to be left behind but trying to keep up will sink
| smaller companies.
| curious_cat_163 wrote:
| I am not sure why this ought to require "pump another $100
| Billion". Could you elaborate?
|
| Yes, the more recent generation of GPUs optimize for attention
| math. But they are still fairly "general-purpose" accelerators
| as well. So when I see papers like this (interesting idea,
| btw!), my mental model for costs suggests that the CapEx to buy
| up the GPUs and build out the data centers would get re-used
| for this and 100s of other ideas and experiments.
|
| And then the hope is that the best ideas will occupy more of
| the available capacity...
| gessha wrote:
| Sir, this is an arxiv paper
| NotAnOtter wrote:
| So true, just like this one: https://arxiv.org/abs/1706.03762
| Imnimo wrote:
| This is an interesting way of squeezing extra feedback from raw
| text, but I'm a little skeptical that it's the best way to spend
| training flops. It feels like most "next tokens" are pretty low
| information (even after filtering for entropy like they do). Does
| it make sense to spend a bunch of compute on a reasoning trace on
| them? Maybe if you're harshly data limited, but not compute
| limited?
| rafaelero wrote:
| This should be used for high entropy tokens during pre-training.
| ntonozzi wrote:
| Is there any work related to using some kind of soft tokens for
| reasoning? It seems so inefficient to try to encode so much
| information down into a single token for the next pass of the
| model, when you could output a large vector for each forward
| pass, and have a drastically larger working memory/scratchpad,
| and have much higher bandwidth for the models to pass information
| forward to the next token call. If a single token has 17 bits of
| information, a vector of 1024 floats could have 32,768 bits of
| information.
___________________________________________________________________
(page generated 2025-06-10 23:01 UTC)