[HN Gopher] Show HN: LlamaGym - fine-tune LLM agents with online...
       ___________________________________________________________________
        
       Show HN: LlamaGym - fine-tune LLM agents with online reinforcement
       learning
        
       Author : KhoomeiK
       Score  : 140 points
       Date   : 2024-03-10 12:40 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | 3abiton wrote:
       | Interesting project, basically a wrapper too around openai gym-
       | like functionality that can handle open llms.
        
         | KhoomeiK wrote:
         | Yup, it does simplify LLM agent inference on Gym environments
         | but the main technical contribution is reducing your would-be
         | code overhead for online RL
        
       | raidicy wrote:
       | Thanks for creating this!
        
       | KhoomeiK wrote:
       | Twitter thread:
       | https://x.com/khoomeik/status/1766805213644800011?s=46
        
       | internet101010 wrote:
       | Thank you for making this. Simplifying any aspect of RL is always
       | welcome.
        
         | KhoomeiK wrote:
         | Thanks! Yeah RL for LLMs is pretty underexplored I think beyond
         | the RLHF stuff. Pretty tough to get working tho.
        
       | dennisy wrote:
       | Can this be used outside of OpenAI environments? If yes I think
       | an example would be great!
        
         | KhoomeiK wrote:
         | Gymnasium is now maintained by the Farama Fpundation, an open-
         | source consortium, not OpenAI. But most RL environment work for
         | the past 5+ years has been Gym-compliant. The TextWord example
         | in the repo, for example, instantiates a Gym-style environment
         | but it doesn't import from Gymnasium (uses textworld.gym
         | instead).
        
       | neodypsis wrote:
       | Very interesting!
        
       | adawg4 wrote:
       | Thanks for making this! Helps simplify it nicely
        
       ___________________________________________________________________
       (page generated 2024-03-10 23:00 UTC)