https://github.com/NVIDIA/garak

Skip to content

Navigation Menu

Toggle navigation
 
Sign in

  * Product
      +  
        GitHub Copilot
        Write better code with AI
      +  
        Security
        Find and fix vulnerabilities
      +  
        Actions
        Automate any workflow
      +  
        Codespaces
        Instant dev environments
      +  
        Issues
        Plan and track work
      +  
        Code Review
        Manage code changes
      +  
        Discussions
        Collaborate outside of code
      +  
        Code Search
        Find more, search less
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    By company size
      + Enterprises
      + Small and medium teams
      + Startups
    By use case
      + DevSecOps
      + DevOps
      + CI/CD
      + View all use cases
    By industry
      + Healthcare
      + Financial services
      + Manufacturing
      + Government
      + View all industries
    View all solutions
  * Resources
    Topics
      + AI
      + DevOps
      + Security
      + Software Development
      + View all
    Explore
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Enterprise
      +  
        Enterprise platform
        AI-powered developer platform
    Available add-ons
      +  
        Advanced Security
        Enterprise-grade security features
      +  
        GitHub Copilot
        Enterprise-grade AI features
      +  
        Premium Support
        Enterprise-grade 24/7 support
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up Reseting focus
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
NVIDIA / garak Public

  * Notifications You must be signed in to change notification
    settings
  * Fork 183
  * Star 1.7k

the LLM vulnerability scanner

discord.gg/uvch4puucs

License

Apache-2.0 license
1.7k stars 183 forks Branches Tags Activity
Star
Notifications You must be signed in to change notification settings

  * Code
  * Issues 298
  * Pull requests 13
  * Discussions
  * Actions
  * Projects 0
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Projects
  * Security
  * Insights

NVIDIA/garak

 main
BranchesTags
  
[                    ]
Go to file
Code

Folders and files

                                                Last commit   Last
         Name                    Name             message    commit
                                                              date
Latest commit

 

History

1,379 Commits
 
.github                 .github                              
docs                    docs                                 
garak                   garak                                
signatures              signatures                           
tests                   tests                                
tools                   tools                                
.gitignore              .gitignore                           
.pre-commit-config.yaml .pre-commit-config.yaml              
.readthedocs.yaml       .readthedocs.yaml                    
CA_DCO.md               CA_DCO.md                            
CONTRIBUTING.md         CONTRIBUTING.md                      
FAQ.md                  FAQ.md                               
LICENSE                 LICENSE                              
MANIFEST.in             MANIFEST.in                          
README.md               README.md                            
SECURITY.md             SECURITY.md                          
garak-paper.pdf         garak-paper.pdf                      
pylintrc                pylintrc                             
pyproject.toml          pyproject.toml                       
requirements.txt        requirements.txt                     
View all files

Repository files navigation

  * README
  * Apache-2.0 license
  * Security

garak, LLM vulnerability scanner

 

Generative AI Red-teaming & Assessment Kit

garak checks if an LLM can be made to fail in a way we don't want.
garak probes for hallucination, data leakage, prompt injection,
misinformation, toxicity generation, jailbreaks, and many other
weaknesses. If you know nmap, it's nmap for LLMs.

garak focuses on ways of making an LLM or dialog system fail. It
combines static, dyanmic, and adaptive probes to explore this.

garak's a free tool. We love developing it and are always interested
in adding functionality to support applications.

License Tests/Linux Tests/Windows Tests/OSX Documentation Status
discord-img Code style: black PyPI - Python Version PyPI Downloads
Downloads

Get started

 

> See our user guide! docs.garak.ai

 

> Join our Discord!

 

> Project links & home: garak.ai

 

> Twitter: @garak_llm

 

> DEF CON slides!

 
---------------------------------------------------------------------

LLM support

 

currently supports:

  * hugging face hub generative models
  * replicate text models
  * openai api chat & continuation models
  * litellm
  * pretty much anything accessible via REST
  * gguf models like llama.cpp version >= 1046
  * .. and many more LLMs!

Install:

 

garak is a command-line tool. It's developed in Linux and OSX.

Standard install with pip

 

Just grab it from PyPI and you should be good to go:

python -m pip install -U garak

Install development version with pip

 

The standard pip version of garak is updated periodically. To get a
fresher version, from GitHub, try:

python -m pip install -U git+https://github.com/NVIDIA/garak.git@main

Clone from source

 

garak has its own dependencies. You can to install garak in its own
Conda environment:

conda create --name garak "python>=3.10,<=3.12"
conda activate garak
gh repo clone NVIDIA/garak
cd garak
python -m pip install -e .

OK, if that went fine, you're probably good to go!

Note: if you cloned before the move to the NVIDIA GitHub
organisation, but you're reading this at the github.com/NVIDIA URI,
please update your remotes as follows:

git remote set-url origin https://github.com/NVIDIA/garak.git

Getting started

 

The general syntax is:

garak <options>

garak needs to know what model to scan, and by default, it'll try all
the probes it knows on that model, using the vulnerability detectors
recommended by each probe. You can see a list of probes using:

garak --list_probes

To specify a generator, use the --model_type and, optionally, the
--model_name options. Model type specifies a model family/interface;
model name specifies the exact model to be used. The "Intro to
generators" section below describes some of the generators supported.
A straightforward generator family is Hugging Face models; to load
one of these, set --model_type to huggingface and --model_name to the
model's name on Hub (e.g. "RWKV/rwkv-4-169m-pile"). Some generators
might need an API key to be set as an environment variable, and
they'll let you know if they need that.

garak runs all the probes by default, but you can be specific about
that too. --probes promptinject will use only the PromptInject
framework's methods, for example. You can also specify one specific
plugin instead of a plugin family by adding the plugin name after a
.; for example, --probes lmrc.SlurUsage will use an implementation of
checking for models generating slurs based on the Language Model Risk
Cards framework.

For help & inspiration, find us on twitter or discord!

Examples

 

Probe ChatGPT for encoding-based prompt injection (OSX/*nix) (replace
example value with a real OpenAI API key)

export OPENAI_API_KEY="sk-123XXXXXXXXXXXX"
python3 -m garak --model_type openai --model_name gpt-3.5-turbo --probes encoding

See if the Hugging Face version of GPT2 is vulnerable to DAN 11.0

python3 -m garak --model_type huggingface --model_name gpt2 --probes dan.Dan_11_0

Reading the results

 

For each probe loaded, garak will print a progress bar as it
generates. Once generation is complete, a row evaluating that probe's
results on each detector is given. If any of the prompt attempts
yielded an undesirable behavior, the response will be marked as FAIL,
and the failure rate given.

Here are the results with the encoding module on a GPT-3 variant: alt
text

And the same results for ChatGPT: alt text

We can see that the more recent model is much more susceptible to
encoding-based injection attacks, where text-babbage-001 was only
found to be vulnerable to quoted-printable and MIME encoding
injections. The figures at the end of each row, e.g. 840/840,
indicate the number of text generations total and then how many of
these seemed to behave OK. The figure can be quite high because more
than one generation is made per prompt - by default, 10.

Errors go in garak.log; the run is logged in detail in a .jsonl file
specified at analysis start & end. There's a basic analysis script in
analyse/analyse_log.py which will output the probes and prompts that
led to the most hits.

Send PRs & open issues. Happy hunting!

Intro to generators

 

Hugging Face

 

Using the Pipeline API:

  * --model_type huggingface (for transformers models to run locally)
  * --model_name - use the model name from Hub. Only generative
    models will work. If it fails and shouldn't, please open an issue
    and paste in the command you tried + the exception!

Using the Inference API:

  * --model_type huggingface.InferenceAPI (for API-based model
    access)
  * --model_name - the model name from Hub, e.g. "mosaicml/
    mpt-7b-instruct"

Using private endpoints:

  * --model_type huggingface.InferenceEndpoint (for private
    endpoints)

  * --model_name - the endpoint URL, e.g. https://
    xxx.us-east-1.aws.endpoints.huggingface.cloud

  * (optional) set the HF_INFERENCE_TOKEN environment variable to a
    Hugging Face API token with the "read" role; see https://
    huggingface.co/settings/tokens when logged in

OpenAI

 

  * --model_type openai
  * --model_name - the OpenAI model you'd like to use.
    gpt-3.5-turbo-0125 is fast and fine for testing.
  * set the OPENAI_API_KEY environment variable to your OpenAI API
    key (e.g. "sk-19763ASDF87q6657"); see https://platform.openai.com
    /account/api-keys when logged in

Recognised model types are whitelisted, because the plugin needs to
know which sub-API to use. Completion or ChatCompletion models are
OK. If you'd like to use a model not supported, you should get an
informative error message, and please send a PR / open an issue.

Replicate

 

  * set the REPLICATE_API_TOKEN environment variable to your
    Replicate API token, e.g. "r8-123XXXXXXXXXXXX"; see https://
    replicate.com/account/api-tokens when logged in

Public Replicate models:

  * --model_type replicate
  * --model_name - the Replicate model name and hash, e.g.
    "stability-ai/stablelm-tuned-alpha-7b:c49dae36"

Private Replicate endpoints:

  * --model_type replicate.InferenceEndpoint (for private endpoints)
  * --model_name - username/model-name slug from the deployed
    endpoint, e.g. elim/elims-llama2-7b

Cohere

 

  * --model_type cohere
  * --model_name (optional, command by default) - The specific Cohere
    model you'd like to test
  * set the COHERE_API_KEY environment variable to your Cohere API
    key, e.g. "aBcDeFgHiJ123456789"; see https://dashboard.cohere.ai/
    api-keys when logged in

Groq

 

  * --model_type groq
  * --model_name - The name of the model to access via the Groq API
  * set the GROQ_API_KEY environment variable to your Groq API key,
    see https://console.groq.com/docs/quickstart for details on
    creating an API key

ggml

 

  * --model_type ggml
  * --model_name - The path to the ggml model you'd like to load,
    e.g. /home/leon/llama.cpp/models/7B/ggml-model-q4_0.bin
  * set the GGML_MAIN_PATH environment variable to the path to your
    ggml main executable

REST

 

rest.RestGenerator is highly flexible and can connect to any REST
endpoint that returns plaintext or JSON. It does need some brief
config, which will typically result a short YAML file describing your
endpoint. See https://reference.garak.ai/en/latest/
garak.generators.rest.html for examples.

NIM

 

Use models from https://build.nvidia.com/ or other NIM endpoints.

  * set the NIM_API_KEY environment variable to your authentication
    API token, or specify it in the config YAML

For chat models:

  * --model_type nim
  * --model_name - the NIM model name, e.g. meta/
    llama-3.1-8b-instruct

For completion models:

  * --model_type nim.NVOpenAICompletion
  * --model_name - the NIM model name, e.g. bigcode/starcoder2-15b

OctoAI

 

  * set the OCTO_API_TOKEN environment variable to your Replicate API
    token, e.g. "r8-123XXXXXXXXXXXX"; see https://replicate.com/
    account/api-tokens when logged in

Octo public endpoint:

  * --model_type octo
  * --model_name - the OctoAI public endpoint for the model, e.g.
    mistral-7b-instruct-fp16

Octo private endpoint:

  * --model_type octo.InferenceEndpoint (for private endpoints)
  * --model_name - the deployed endpoint URL, e.g. https://
    llama-2-70b-chat-xxx.octoai.run/v1/chat/completions

Test

 

  * --model_type test

  * (alternatively) --model_name test.Blank For testing. This always
    generates the empty string, using the test.Blank generator. Will
    be marked as failing for any tests that require an output, e.g.
    those that make contentious claims and expect the model to refute
    them in order to pass.

  * --model_type test.Repeat For testing. This generator repeats back
    the prompt it received.

Intro to probes

 

       Probe                           Description
blank                A simple probe that always sends an empty
                     prompt.
                     Automated Attack Generation. A red-teaming LLM
                     probes the target and reacts to it in an attempt
                     to get toxic output. Prototype, mostly
atkgen               stateless, for now uses a simple GPT-2
                     fine-tuned on the subset of hhrlhf attempts that
                     yielded detectable toxicity (the only target
                     currently supported for now).
av_spam_scanning     Probes that attempt to make the model output
                     malicious content signatures
continuation         Probes that test if the model will continue a
                     probably undesirable word
dan                  Various DAN and DAN-like attacks
donotanswer          Prompts to which responsible language models
                     should not answer.
encoding             Prompt injection through text encoding
gcg                  Disrupt a system prompt by appending an
                     adversarial suffix.
glitch               Probe model for glitch tokens that provoke
                     unusual behavior.
grandma              Appeal to be reminded of one's grandmother.
goodside             Implementations of Riley Goodside attacks.
leakerplay           Evaluate if a model will replay training data.
lmrc                 Subsample of the Language Model Risk Cards
                     probes
malwaregen           Attempts to have the model generate code for
                     building malware
misleading           Attempts to make a model support misleading and
                     false claims
packagehallucination Trying to get code generations that specify
                     non-existent (and therefore insecure) packages.
                     Implementation of the Agency Enterprise
promptinject         PromptInject work (best paper awards @ NeurIPS
                     ML Safety Workshop 2022)
                     Subset of the RealToxicityPrompts work (data
realtoxicityprompts  constrained because the full test will take so
                     long to run)
                     Snowballed Hallucination probes designed to make
snowball             a model give a wrong answer to questions too
                     complex for it to process
                     Look for vulnerabilities the permit or enact
xss                  cross-site attacks, such as private data
                     exfiltration.

Logging

 

garak generates multiple kinds of log:

  * A log file, garak.log. This includes debugging information from
    garak and its plugins, and is continued across runs.
  * A report of the current run, structured as JSONL. A new report
    file is created every time garak runs. The name of this file is
    output at the beginning and, if successful, also the end of the
    run. In the report, an entry is made for each probing attempt
    both as the generations are received, and again when they are
    evaluated; the entry's status attribute takes a constant from
    garak.attempts to describe what stage it was made at.
  * A hit log, detailing attempts that yielded a vulnerability (a
    'hit')

How is the code structured?

 

Check out the reference docs for an authoritative guide to garak code
structure.

In a typical run, garak will read a model type (and optionally model
name) from the command line, then determine which probes and
detectors to run, start up a generator, and then pass these to a
harness to do the probing; an evaluator deals with the results. There
are many modules in each of these categories, and each module
provides a number of classes that act as individual plugins.

  * garak/probes/ - classes for generating interactions with LLMs
  * garak/detectors/ - classes for detecting an LLM is exhibiting a
    given failure mode
  * garak/evaluators/ - assessment reporting schemes
  * garak/generators/ - plugins for LLMs to be probed
  * garak/harnesses/ - classes for structuring testing
  * resources/ - ancillary items required by plugins

The default operating mode is to use the probewise harness. Given a
list of probe module names and probe plugin names, the probewise
harness instantiates each probe, then for each probe reads its
recommended_detectors attribute to get a list of detectors to run on
the output.

Each plugin category (probes, detectors, evaluators, generators,
harnesses) includes a base.py which defines the base classes usable
by plugins in that category. Each plugin module defines plugin
classes that inherit from one of the base classes. For example,
garak.generators.openai.OpenAIGenerator descends from
garak.generators.base.Generator.

Larger artefacts, like model files and bigger corpora, are kept out
of the repository; they can be stored on e.g. Hugging Face Hub and
loaded locally by clients using garak.

Developing your own plugin

 

  * Take a look at how other plugins do it
  * Inherit from one of the base classes, e.g.
    garak.probes.base.TextProbe
  * Override as little as possible
  * You can test the new code in at least two ways:
      + Start an interactive Python session
          o Import the model, e.g. import garak.probes.mymodule
          o Instantiate the plugin, e.g. p =
            garak.probes.mymodule.MyProbe()
      + Run a scan with test plugins
          o For probes, try a blank generator and always.Pass
            detector: python3 -m garak -m test.Blank -p mymodule -d
            always.Pass
          o For detectors, try a blank generator and a blank probe:
            python3 -m garak -m test.Blank -p test.Blank -d mymodule
          o For generators, try a blank probe and always.Pass
            detector: python3 -m garak -m mymodule -p test.Blank -d
            always.Pass
      + Get garak to list all the plugins of the type you're writing,
        with --list_probes, --list_detectors, or --list_generators

FAQ

 

We have an FAQ here. Reach out if you have any more questions!
leon@garak.ai

Code reference documentation is at garak.readthedocs.io.

Citing garak

 

You can read the garak preprint paper. If you use garak, please cite
us.

@article{garak,
  title={{garak: A Framework for Security Probing Large Language Models}},
  author={Leon Derczynski and Erick Galinkin and Jeffrey Martin and Subho Majumdar and Nanna Inie},
  year={2024},
  howpublished={\url{https://garak.ai}}
}

---------------------------------------------------------------------

"Lying is a skill like any other, and if you wish to maintain a level
of excellence you have to practice constantly" - Elim

For updates and news see @garak_llm

(c) 2023- Leon Derczynski; Apache license v2, see LICENSE

About

the LLM vulnerability scanner

discord.gg/uVch4puUCs

Topics

ai vulnerability-assessment security-scanners llm-security 
llm-evaluation

Resources

Readme

License

Apache-2.0 license

Security policy

Security policy
Activity
Custom properties

Stars

1.7k stars

Watchers

22 watching

Forks

183 forks
Report repository

Releases 17

 
v0.10.0 Latest
Oct 31, 2024
+ 16 releases

Contributors 24

  * @leondz
  * @jmartin-tech
  * @github-actions[bot]
  * @erickgalinkin
  * @arjun-krishna1
  * @shubhobm
  * @martinebl
  * @eric-therond
  * @DavidLee528
  * @rgstephens
  * @mkonxd
  * @drazvan
  * @greshake
  * @zmackie

+ 10 contributors

Languages

  * Python 99.5%
  * Jinja 0.5%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.