https://github.com/KhoomeiK/LlamaGym

Skip to content
Toggle navigation
 
Sign in

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
KhoomeiK / LlamaGym Public

  * Notifications
  * Fork 10
  * Star 279
  * 

Fine-tune LLM agents with online reinforcement learning

License

MIT license
279 stars 10 forks Branches Tags Activity
Star
Notifications

  * Code
  * Issues 0
  * Pull requests 0
  * Actions
  * Projects 0
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Security
  * Insights

KhoomeiK/LlamaGym

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
 main
BranchesTags
  
Go to file
Code

Folders and files

     Name            Name      Last commit message Last commit date
Latest commit

 

History

32 Commits
 
examples        examples                            

llamagym        llamagym                            

.gitignore      .gitignore                          

LICENSE         LICENSE                             

README.md       README.md                           

llamagym.png    llamagym.png                        

poetry.lock     poetry.lock                         

pyproject.toml  pyproject.toml                      

View all files

Repository files navigation

  * README
  * MIT license

                              Llama Gym

       Fine-tune LLM agents with online reinforcement learning

                           Python Version

           Agents for Web Data Extraction   *    Twitter

LlamaGym

 

"Agents" originated in reinforcement learning, where they learn by
interacting with an environment and receiving a reward signal.
However, LLM-based agents today do not learn online (i.e.
continuously in real time) via reinforcement.

OpenAI created Gym to standardize and simplify RL environments, but
if you try dropping an LLM-based agent into a Gym environment for
training, you'd find it's still quite a bit of code to handle LLM
conversation context, episode batches, reward assignment, PPO setup,
and more.

LlamaGym seeks to simplify fine-tuning LLM agents with RL. Right now,
it's a single Agent abstract class that handles all the issues
mentioned above, letting you quickly iterate and experiment with
agent prompting & hyperparameters across any Gym environment.

Usage

 

Fine-tuning an LLM-based agent to play in a Gym-style environment
with RL has never been easier! Once you install LlamaGym...

pip install llamagym

First, implement 3 abstract methods on the Agent class:

from llamagym import Agent

class BlackjackAgent(Agent):
    def get_system_prompt(self) -> str:
        return "You are an expert blackjack player."

    def format_observation(self, observation) -> str:
        return f"Your current total is {observation[0]}"

    def extract_action(self, response: str):
        return 0 if "stay" in response else 1

Then, define your base LLM (as you would for any fine-tuning job) and
instantiate your agent:

model = AutoModelForCausalLMWithValueHead.from_pretrained("Llama-2-7b").to(device)
tokenizer = AutoTokenizer.from_pretrained("Llama-2-7b")
agent = BlackjackAgent(model, tokenizer, device)

Finally, write your RL loop as usual and simply call your agent to
act, reward, and terminate:

env = gym.make("Blackjack-v1")

for episode in trange(5000):
    observation, info = env.reset()
    done = False

    while not done:
        action = agent.act(observation) # act based on observation
        observation, reward, terminated, truncated, info = env.step(action)
        agent.assign_reward(reward) # provide reward to agent
        done = terminated or truncated

    train_stats = agent.terminate_episode() # trains if batch is full

Some reminders:

  * above code snippets are mildly simplified above but a fully
    working example is available in examples/blackjack.py
  * getting online RL to converge is notoriously difficult so you'll
    have to mess with hyperparameters to see improvement
      + your model may also benefit from a supervised fine-tuning
        stage on sampled trajectories before running RL (we may add
        this feature in the future)
  * our implementation values simplicity so is not as compute
    efficient as e.g. Lamorel, but easier to start playing around
    with
  * LlamaGym is a weekend project and still a WIP, but we love
    contributions!

Relevant Work

 

  * Grounding Large Language Models with Online Reinforcement
    Learning
      + Lamorel: Language Models for Reinforcement Learning
  * True Knowledge Comes from Practice: Aligning LLMs with Embodied
    Environments via Reinforcement Learning

Citation

 

bibtex
@misc{pandey2024llamagym,
  title        = {LlamaGym: Fine-tune LLM agents with Online Reinforcement Learning},
  author       = {Rohan Pandey},
  year         = {2024},
  howpublished = {GitHub},
  url          = {https://github.com/KhoomeiK/LlamaGym}
}

About

Fine-tune LLM agents with online reinforcement learning

Resources

Readme

License

MIT license
Activity

Stars

279 stars

Watchers

5 watching

Forks

10 forks
Report repository

Releases

No releases published

Packages 0

No packages published

Languages

  * Python 100.0%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.