[HN Gopher] Outcome-Based Reinforcement Learning to Predict the ...
       ___________________________________________________________________
        
       Outcome-Based Reinforcement Learning to Predict the Future
        
       Author : bturtel
       Score  : 76 points
       Date   : 2025-05-27 13:33 UTC (9 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | ctoth wrote:
       | Do you want paperclips? Because this is how you get paperclips!
       | 
       | Eliminate all agents, all sources of change, all complexity -
       | anything that could introduce unpredictability, and it suddenly
       | becomes far easier to predict the future, no?
        
         | JoshTriplett wrote:
         | > Do you want paperclips? Because this is how you get
         | paperclips!
         | 
         | Don't^W worry, there are many other ways of getting paperclips,
         | and we're doing all of them.
        
           | sitkack wrote:
           | Even explaining how _not_ to get paper clips, gets you paper
           | clips when you can invert the loss function. Paper clips for
           | everyone!
        
         | vlovich123 wrote:
         | I don't know. Paperclips are awful useful. Would it be so bad
         | to build more of them?
        
           | throwaway71271 wrote:
           | https://www.decisionproblem.com/paperclips/index2.html go
           | ahead :)
        
           | Ygg2 wrote:
           | That's all fun and games until paperclip maximizers starts
           | looking at your blood as source of iron.
        
       | valine wrote:
       | So instead of next token prediction its next event prediction. At
       | some point this just loops around and we're back to teaching
       | models to predict the next token in the sequence.
        
         | lumost wrote:
         | Tokens are an awfully convenient way to describe an event.
        
           | phyalow wrote:
           | Tokens are just discretized state representations.
        
         | ww520 wrote:
         | It's the next state. So instead of spitting out words, it will
         | spit out a whole movie, or a sequence of world states in a game
         | or simulation.
        
       | amelius wrote:
       | Why would you use RL if you're not going to control the
       | environment, but just predict it?
        
       | jldugger wrote:
       | From the abstract
       | 
       | > A simple trading rule turns this calibration edge into $127 of
       | hypothetical profit versus $92 for o1 (p = 0.037).
       | 
       | I'm lazy: is this hypothetical shooting fish in a barrel, or is
       | it a real edge?
        
       ___________________________________________________________________
       (page generated 2025-05-27 23:00 UTC)