[HN Gopher] Show HN: RULER - Easily apply RL to any agent
       ___________________________________________________________________
        
       Show HN: RULER - Easily apply RL to any agent
        
       Hey HN, Kyle here, one of the co-founders of OpenPipe.
       Reinforcement learning is one of the best techniques for making
       agents more reliable, and has been widely adopted by frontier labs.
       However, adoption in the outside community has been slow because
       it's so hard to implement.  One of the biggest challenges when
       adapting RL to a new task is the need for a task-specific "reward
       function" (way of measuring success). This is often difficult to
       define, and requires either high-quality labeled data and/or
       significant domain expertise to generate.  RULER is a drop-in
       reward function that works across different tasks without any of
       that complexity.  It works by showing N trajectories to an LLM
       judge and asking it to rank them relative to each other. This
       sidesteps the calibration issues that plague most LLM-as-judge
       approaches. Combined with GRPO (which only cares about relative
       scores within groups), it just works (surprisingly well!).  We have
       a full writeup on the blog, including results on 4 production
       tasks. On all 4 tasks, small Qwen 2.5 models trained with
       RULER+GRPO beat the best prompted frontier model, despite being
       significantly smaller and cheaper to run. Surprisingly, they even
       beat models trained with hand-crafted reward functions on 3/4
       tasks! https://openpipe.ai/blog/ruler  Repo:
       https://github.com/OpenPipe/ART
        
       Author : kcorbitt
       Score  : 39 points
       Date   : 2025-07-11 17:47 UTC (5 hours ago)
        
 (HTM) web link (openpipe.ai)
 (TXT) w3m dump (openpipe.ai)
        
       | someoneontenet wrote:
       | Love these write ups!
        
         | kcorbitt wrote:
         | Thank! If there are any topics that you'd find particularly
         | interesting, let me know and I can try to find time. :)
        
       | sadiq wrote:
       | Excellent, look forward to giving this a go.
       | 
       | I was looking at: https://arxiv.org/abs/2506.18254 but your
       | approach is even more general.
        
       | spmurrayzzz wrote:
       | Might end up being some confusion with the RULER benchmark from
       | NVIDIA given the (somewhat shared) domain:
       | https://github.com/NVIDIA/RULER
       | 
       | EDIT: by shared I only mean the adjacency to LLMs/AI/ML, RL is a
       | pretty big differentiator though and project looks great
        
       | ndgold wrote:
       | Dope
        
       ___________________________________________________________________
       (page generated 2025-07-11 23:00 UTC)