[HN Gopher] Show HN: LlamaGym - fine-tune LLM agents with online...
___________________________________________________________________
Show HN: LlamaGym - fine-tune LLM agents with online reinforcement
learning
Author : KhoomeiK
Score : 140 points
Date : 2024-03-10 12:40 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| 3abiton wrote:
| Interesting project, basically a wrapper too around openai gym-
| like functionality that can handle open llms.
| KhoomeiK wrote:
| Yup, it does simplify LLM agent inference on Gym environments
| but the main technical contribution is reducing your would-be
| code overhead for online RL
| raidicy wrote:
| Thanks for creating this!
| KhoomeiK wrote:
| Twitter thread:
| https://x.com/khoomeik/status/1766805213644800011?s=46
| internet101010 wrote:
| Thank you for making this. Simplifying any aspect of RL is always
| welcome.
| KhoomeiK wrote:
| Thanks! Yeah RL for LLMs is pretty underexplored I think beyond
| the RLHF stuff. Pretty tough to get working tho.
| dennisy wrote:
| Can this be used outside of OpenAI environments? If yes I think
| an example would be great!
| KhoomeiK wrote:
| Gymnasium is now maintained by the Farama Fpundation, an open-
| source consortium, not OpenAI. But most RL environment work for
| the past 5+ years has been Gym-compliant. The TextWord example
| in the repo, for example, instantiates a Gym-style environment
| but it doesn't import from Gymnasium (uses textworld.gym
| instead).
| neodypsis wrote:
| Very interesting!
| adawg4 wrote:
| Thanks for making this! Helps simplify it nicely
___________________________________________________________________
(page generated 2024-03-10 23:00 UTC)