https://github.com/wgryc/phasellm

Skip to content Toggle navigation
 
Sign up

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
      + Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
      + For
      + Enterprise
      + Teams
      + Startups
      + Education
      + By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
      + Case Studies
      + Customer Stories
      + Resources
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
      + Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this user All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}
wgryc / phasellm Public

  * Notifications
  * Fork 0
  * Star 67

Large language model evaluation and workflow framework from Phase AI.

License

MIT license
67 stars 0 forks
Star
Notifications

  * Code
  * Issues 0
  * Pull requests 0
  * Actions
  * Projects 0
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Security
  * Insights

wgryc/phasellm

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags

Name already in use

A tag already exists with the provided branch name. Many Git commands
accept both tag and branch names, so creating this branch may cause
unexpected behavior. Are you sure you want to create this branch?
Cancel Create
1 branch 0 tags
Code

  * Local
  * Codespaces

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/w]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone wgryc/]

    Work fast with our official CLI. Learn more.

  * Open with GitHub Desktop
  * Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@wgryc
wgryc Update README.md
...
3fb5517 Apr 10, 2023
Update README.md
3fb5517

Git stats

  * 6 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
.gitignore
Initial commit
April 10, 2023 19:54
LICENSE
Initial commit
April 10, 2023 19:54
README.md
Update README.md
April 10, 2023 20:04
llms.py
Create llms.py
April 10, 2023 20:03
View code
PhaseLLM Example: Evaluating Travel Chatbot Prompts with GPT-3.5,
Claude, and more Contact Us

README.md

 PhaseLLM

Large language model evaluation and workflow framework from Phase AI.

The coming months and years will bring thousands of new products and
experienced powered by large language models (LLMs) like ChatGPT or
its increasing number of variants. Whether you're using OpenAI's
ChatGPT, Anthropic's Claude, or something else all together, you'll
want to test how well your models and prompts perform against user
needs. As more models are launched, you'll also have a bigger range
of options.

PhaseLLM is a framework designed to help manage and test LLM-driven
experiences -- products, content, or other experiences that product
and brand managers might be driving for their users.

Here's what PhaseLLM does:

 1. We standardize API calls so you can plug and play models from
    OpenAI, Cohere, Anthropic, or other providers.
 2. We've built evaluation frameworks so you can compare outputs and
    decide which ones are driving the best experiences for users.
 3. We're adding automations so you can use advanced models (e.g.,
    GPT-4) to evaluate simpler models (e.g., GPT-3) to determine what
    combination of prompts yield the best experiences, especially
    when taking into account costs and speed of model execution.

PhaseLLM is open source and we envision building more features to
help with model understanding. We want to help developers, data
scientists, and others launch new, robust products as easily as
possible.

If you're working on an LLM product, please reach out. We'd love to
help out.

 Example: Evaluating Travel Chatbot Prompts with GPT-3.5, Claude, and
more

PhaseLLM makes it incredibly easy to plug and play LLMs and evaluate
them, in some cases with other LLMs. Suppose you're building a travel
chatbot, and you want to test Claude and Cohere against each other,
using GPT-3.5.

What's awesome with this approach is that (1) you can plug and play
models and prompts as needed, and (2) the entire workflow takes a
small amount of code. This simple example can easily be scaled to
much more complex workflows.

So, time for the code... First, load your API keys.

from dotenv import load_dotenv

load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
cohere_api_key = os.getenv("COHERE_API_KEY")

We're going to set up the Evaluator, which takes two LLM model
outputs and decides which one is better for the objective at hand.

# We'll use GPT-3.5 as the evaluator.
e = llms.GPT35Evaluator(openai_api_key)

Now it's time to set up the experiment. In this case, we'll set up an
objective which describes what we're trying to achieve with our
chatbot. We'll also provide 5 examples of starting chats that we've
seen with our users.

objective = "We're building a chatbot to discuss a user's travel preferences and provide advice."

# Chats that have been launched by users.
travel_chat_starts = [
    "I'm planning to visit Poland in spring.",
    "I'm looking for the cheapest flight to Europe next week.",
    "I am trying to decide between Prague and Paris for a 5-day trip",
    "I want to visit Europe but can't decide if spring, summer, or fall would be better.",
    "I'm unsure I should visit Spain by flying via the UK or via France."
]

Now we set up our Cohere and Claude models.

claude_model = llms.ClaudeWrapper(anthropic_api_key)

Finally, we launch our test. We run an experiments where both models
generate a chat response and then we have GPT-3.5 evaluate the
response.

for tcs in travel_chat_starts:

    messages = [{"role":"system", "content":objective},
            {"role":"user", "content":tcs}]

    response_cohere = cohere_model.complete_chat(messages, "assistant")
    response_claude = claude_model.complete_chat(messages, "assistant")

    pref = e.choose(objective, tcs, response_cohere, response_claude)
    print(f"{pref}")

In this case, we simply print which of the two models was preferred.

Voila! You've got a suite to test your models and can plug-and-play
three major LLMs.

 Contact Us

If you have questions, requests, ideas, etc. please reach out at w
(at) phaseai (dot) com.

About

Large language model evaluation and workflow framework from Phase AI.

Resources

Readme

License

MIT license

Stars

67 stars

Watchers

1 watching

Forks

0 forks
Report repository

Releases

No releases published

Packages 0

No packages published

Languages

  * Python 100.0%

Footer

 (c) 2023 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.