[HN Gopher] Show HN: RULER - Easily apply RL to any agent
___________________________________________________________________
Show HN: RULER - Easily apply RL to any agent
Hey HN, Kyle here, one of the co-founders of OpenPipe.
Reinforcement learning is one of the best techniques for making
agents more reliable, and has been widely adopted by frontier labs.
However, adoption in the outside community has been slow because
it's so hard to implement. One of the biggest challenges when
adapting RL to a new task is the need for a task-specific "reward
function" (way of measuring success). This is often difficult to
define, and requires either high-quality labeled data and/or
significant domain expertise to generate. RULER is a drop-in
reward function that works across different tasks without any of
that complexity. It works by showing N trajectories to an LLM
judge and asking it to rank them relative to each other. This
sidesteps the calibration issues that plague most LLM-as-judge
approaches. Combined with GRPO (which only cares about relative
scores within groups), it just works (surprisingly well!). We have
a full writeup on the blog, including results on 4 production
tasks. On all 4 tasks, small Qwen 2.5 models trained with
RULER+GRPO beat the best prompted frontier model, despite being
significantly smaller and cheaper to run. Surprisingly, they even
beat models trained with hand-crafted reward functions on 3/4
tasks! https://openpipe.ai/blog/ruler Repo:
https://github.com/OpenPipe/ART
Author : kcorbitt
Score : 39 points
Date : 2025-07-11 17:47 UTC (5 hours ago)
(HTM) web link (openpipe.ai)
(TXT) w3m dump (openpipe.ai)
| someoneontenet wrote:
| Love these write ups!
| kcorbitt wrote:
| Thank! If there are any topics that you'd find particularly
| interesting, let me know and I can try to find time. :)
| sadiq wrote:
| Excellent, look forward to giving this a go.
|
| I was looking at: https://arxiv.org/abs/2506.18254 but your
| approach is even more general.
| spmurrayzzz wrote:
| Might end up being some confusion with the RULER benchmark from
| NVIDIA given the (somewhat shared) domain:
| https://github.com/NVIDIA/RULER
|
| EDIT: by shared I only mean the adjacency to LLMs/AI/ML, RL is a
| pretty big differentiator though and project looks great
| ndgold wrote:
| Dope
___________________________________________________________________
(page generated 2025-07-11 23:00 UTC)