[HN Gopher] Outcome-Based Reinforcement Learning to Predict the ...
___________________________________________________________________
Outcome-Based Reinforcement Learning to Predict the Future
Author : bturtel
Score : 76 points
Date : 2025-05-27 13:33 UTC (9 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| ctoth wrote:
| Do you want paperclips? Because this is how you get paperclips!
|
| Eliminate all agents, all sources of change, all complexity -
| anything that could introduce unpredictability, and it suddenly
| becomes far easier to predict the future, no?
| JoshTriplett wrote:
| > Do you want paperclips? Because this is how you get
| paperclips!
|
| Don't^W worry, there are many other ways of getting paperclips,
| and we're doing all of them.
| sitkack wrote:
| Even explaining how _not_ to get paper clips, gets you paper
| clips when you can invert the loss function. Paper clips for
| everyone!
| vlovich123 wrote:
| I don't know. Paperclips are awful useful. Would it be so bad
| to build more of them?
| throwaway71271 wrote:
| https://www.decisionproblem.com/paperclips/index2.html go
| ahead :)
| Ygg2 wrote:
| That's all fun and games until paperclip maximizers starts
| looking at your blood as source of iron.
| valine wrote:
| So instead of next token prediction its next event prediction. At
| some point this just loops around and we're back to teaching
| models to predict the next token in the sequence.
| lumost wrote:
| Tokens are an awfully convenient way to describe an event.
| phyalow wrote:
| Tokens are just discretized state representations.
| ww520 wrote:
| It's the next state. So instead of spitting out words, it will
| spit out a whole movie, or a sequence of world states in a game
| or simulation.
| amelius wrote:
| Why would you use RL if you're not going to control the
| environment, but just predict it?
| jldugger wrote:
| From the abstract
|
| > A simple trading rule turns this calibration edge into $127 of
| hypothetical profit versus $92 for o1 (p = 0.037).
|
| I'm lazy: is this hypothetical shooting fish in a barrel, or is
| it a real edge?
___________________________________________________________________
(page generated 2025-05-27 23:00 UTC)