https://github.com/ianarawjo/ChainForge

Skip to content Toggle navigation
 
Sign up

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Case Studies
      + Customer Stories
      + Resources
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this user All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}
ianarawjo / ChainForge Public

  * Notifications
  * Fork 7
  * Star 288

An open-source visual programming environment for battle-testing
prompts to LLMs.

License

MIT license
288 stars 7 forks
Star
Notifications

  * Code
  * Issues 19
  * Pull requests 0
  * Discussions
  * Actions
  * Projects 0
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Projects
  * Security
  * Insights

ianarawjo/ChainForge

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags

Name already in use

A tag already exists with the provided branch name. Many Git commands
accept both tag and branch names, so creating this branch may cause
unexpected behavior. Are you sure you want to create this branch?
Cancel Create
5 branches 0 tags
Code

  * Local
  * Codespaces

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/i]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone ianara]

    Work fast with our official CLI. Learn more about the CLI.

  * Open with GitHub Desktop
  * Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@eglassman
eglassman Update README.md
...
01851f9 May 24, 2023
Update README.md

the Harvard HCI website is terribly out of date (by multiple years) and my personal page on our lab website is not very informative, so I removed the Harvard HCI website and pointed to the glassmanlab main page, where all our publications are.

01851f9

Git stats

  * 213 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
chainforge
Update package to 0.1.2.2
May 24, 2023 10:21
examples
Add examples
May 22, 2023 11:07
.gitignore
Add package dirs to gitignore
May 19, 2023 09:33
GUIDE.md
Update GUIDE.md
May 22, 2023 13:51
INSTALL_GUIDE.md
Rename GUIDE.md to INSTALL_GUIDE.md
May 22, 2023 11:29
LICENSE.md
Create LICENSE.md
March 26, 2023 13:34
MANIFEST.in
Add readme to longdescription
May 23, 2023 15:11
README.md
Update README.md
May 24, 2023 11:02
setup.py
Update package to 0.1.2.2
May 24, 2023 10:21
View code
[                    ]
[?][?][?] ChainForge Installation Example evaluation flows Features
Development Future Planned Features Inspiration and Links How to
collaborate? License

README.md

 [?][?][?] ChainForge

An open-source visual programming environment for battle-testing
prompts to LLMs.

prompt-injection-test

ChainForge is a data flow prompt engineering environment for
analyzing and evaluating LLM responses. It is geared towards
early-stage, quick-and-dirty exploration of prompts and response
quality that goes beyond ad-hoc chatting with individual LLMs. With
ChainForge, you can:

  * Query multiple LLMs at once to test prompt ideas and variations
    quickly and effectively.
  * Compare response quality across prompt permutations and across
    models to choose the best prompt and model for your use case.
  * Setup an evaluation metric (scoring function) and immediately
    visualize results across prompts, prompt parameters, and models.

This is an open alpha of Chainforge. Functionality is powerful but
limited. We currently support OpenAI models GPT3.5 and GPT4,
Anthropic's Claude, Google PaLM2 (text-bison), and Alpaca 7B (through
Dalai) at default settings. Visualization nodes support numeric and
boolean evaluation metrics. Try it and let us know what you'd like to
see in the future! :)

ChainForge is built on ReactFlow and Flask.

 Installation

To install Chainforge alpha, make sure you have Python 3.8 or higher,
then run

pip install chainforge

Once installed, do

chainforge serve

Open localhost:8000 in a Google Chrome browser (other browsers are
currently unsupported).

You can set your API keys by clicking the Settings icon in the
top-right corner. If you prefer to not worry about this everytime you
open ChainForge, we recommend that save your OpenAI, Anthropic, and/
or Google PaLM API keys to your local environment. For more details,
see the Installation Guide.

 Example evaluation flows

In the examples/ folder, we've prepared a couple example flows to
give you a sense of what's possible with Chainforge. Click the Import
button in the top of the screen and select one. Here is
basic_comparison.cforge, plotting the length of responses across
different models and arguments for the prompt parameter {game}:

basic-compare

For more details about features and available nodes, check out the
User Guide.

 Features

A key goal of ChainForge is facilitating comparison and evaluation of
prompts and models, and (in the near future) prompt chains. Basic
features are:

  * Prompt permutations: Setup a prompt template and feed it
    variations of input variables. ChainForge will prompt all
    selected LLMs with all possible permutations of the input prompt,
    so that you can get a better sense of prompt quality. You can
    also chain prompt templates at arbitrary depth (e.g., to compare
    templates).
  * Evaluation nodes: Probe LLM responses in a chain and test them
    (classically) for some desired behavior. At a basic level, this
    is Python script based. We plan to add preset evaluator nodes for
    common use cases in the near future (e.g., name-entity
    recognition). Note that you can also chain LLM responses into
    prompt templates to help evaluate outputs cheaply before more
    extensive evaluation methods.
  * Visualization nodes: Visualize evaluation results on plots like
    grouped box-and-whisker (for numeric metrics) and histograms (for
    boolean metrics). Currently we only support numeric and boolean
    metrics. We aim to provide users more control and options for
    plotting in the future.

Taken together, these three features let you easily:

  * Compare across prompts and prompt parameters: Choose the best set
    of prompts that maximizes your eval target metrics (e.g., lowest
    code error rate). Or, see how changing parameters in a prompt
    template affects the quality of responses.
  * Compare across models: Compare responses for every prompt across
    models.

We've also found that some users simply want to use ChainForge to
make tons of parametrized queries to LLMs (e.g., chaining prompt
templates into prompt templates), possibly score them, and then
output the results to a spreadsheet (Excel xlsx). To do this, attach
an Inspect node to the output of a Prompt node and click Export Data.

For more specific details, see the User Guide.

 Development

ChainForge was created by Ian Arawjo, a postdoctoral scholar in
Harvard HCI's Glassman Lab with support from the whole Harvard HCI
community, especially PhD student Priyan Vaithilingam.

This work was partially funded by the NSF grant IIS-2107391. Any
opinions, findings, and conclusions or recommendations expressed in
this material are those of the author(s) and do not necessarily
reflect the views of the National Science Foundation.

We provide ongoing releases of this tool in the hopes that others
find it useful for their projects.

 Future Planned Features

Highest priority:

  * Model settings: Change settings for individual models, so one can
    test across the same model with different settings.
  * LLM annotator nodes: Select an LLM to evaluate and "tag"
    responses (for instance, named-entity recognition). Currently,
    one can chain prompt nodes into prompt nodes, but the final
    output loses information on which LLM generated the input
    response.

Medium-to-low priority:

  * Compare across response batches: Run an evaluator over all N
    responses generated for each prompt, to measure factors like
    variability or parseability (e.g., how many code outputs pass a
    basic smell test?)
  * System prompts: Ability to change the system prompt for models
    that support it (e.g., ChatGPT). Try out different system prompts
    and compare response quality.
  * Collapse nodes: Nodes should be collapseable, to save screen
    space.
  * LMQL and Microsoft guidance nodes: Support for prompt pipelines
    that involve LMQL and {{guidance}} code, esp. inspecting masked
    response variables.
  * AI assistance for prompt engineering: Spur creative ideas and
    quickly iterate on variations of prompts through interaction with
    GPT4.
  * Compare fine-tuned to base models: Beyond comparing between
    different models like Alpaca and ChatGPT, support comparison
    between versions of the same model (e.g., a base model and a
    fine-tuned one). Helper users detect where fine-tuning resulted
    in any 'breaking changes' elsewhere.
  * Export to code: In the future, export prompt and (potentially)
    chains using a programming API like LangChain.
  * Dark mode: A dark mode theme
  * Compare across chains: If a prompt P is used across chains C1 C2
    etc, how does changing it affect all downstream events?

See a feature you'd like that isn't here? Open an Issue.

 Inspiration and Links

ChainForge is meant to be general-purpose, and is not developed for a
specific API or LLM back-end. Our ultimate goal is integration into
other tools for the systematic evaluation and auditing of LLMs. We
hope to help others who are developing prompt-analysis flows in LLMs,
or otherwise auditing LLM outputs. This project was inspired by own
our use case, but also shares some comraderie with two related
(closed-source) research projects, both led by Sherry Wu:

  * "PromptChainer: Chaining Large Language Model Prompts through
    Visual Programming" (Wu et al., CHI '22 LBW) Video
  * "AI Chains: Transparent and Controllable Human-AI Interaction by
    Chaining Large Language Model Prompts" (Wu et al., CHI '22)

Unlike these projects, we are focusing on supporting evaluation
across prompts, prompt parameters, and models.

 How to collaborate?

We are looking for open-source collaborators. The best way to do
this, at the moment, is simply to implement the requested feature /
bug fix and submit a Pull Request. If you want to report a bug or
request a feature, open an Issue.

 License

ChainForge is released under the MIT License.

About

An open-source visual programming environment for battle-testing
prompts to LLMs.

Resources

Readme

License

MIT license

Stars

288 stars

Watchers

3 watching

Forks

7 forks
Report repository

Releases

No releases published

Packages 0

No packages published

Contributors 3

  * @ianarawjo ianarawjo
  * @priyanmuthu priyanmuthu Priyan Vaithilingam
  * @eglassman eglassman Elena Glassman

Languages

  * Python 82.5%
  * CSS 14.5%
  * HTML 2.7%
  * JavaScript 0.3%

Footer

 (c) 2023 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.