https://github.com/setzer22/llama-rs

Skip to content Toggle navigation
 
Sign up

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
      + Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
      + For
      + Enterprise
      + Teams
      + Startups
      + Education
      + By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
      + Case Studies
      + Customer Stories
      + Resources
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
      + Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this user All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}
setzer22 / llama-rs Public

  * 
  * Notifications
  * Fork 10
  * Star 466

Run LLaMA inference on CPU, with Rust 

License

MIT license
466 stars 10 forks
Star
Notifications

  * Code
  * Issues 6
  * Pull requests 4
  * Actions
  * Projects 0
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Security
  * Insights

setzer22/llama-rs

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags

Name already in use

A tag already exists with the provided branch name. Many Git commands
accept both tag and branch names, so creating this branch may cause
unexpected behavior. Are you sure you want to create this branch?
Cancel Create
2 branches 0 tags
Code

  * Local
  * Codespaces

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/s]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone setzer]

    Work fast with our official CLI. Learn more.

  * Open with GitHub Desktop
  * Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@setzer22
setzer22 Mark ggml as vendored for statistics
...
90703d9 Mar 15, 2023
Mark ggml as vendored for statistics
90703d9

Git stats

  * 32 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
.github
Add FUNDING.yml
March 14, 2023 13:14
doc/resources
Update image
March 14, 2023 13:39
ggml-raw
Mark ggml as vendored for statistics
March 15, 2023 16:23
llama-rs
Fixes #3
March 14, 2023 15:52
.gitignore
Hello world!
March 13, 2023 14:39
Cargo.lock
Restructure modules. Add clap argument parsing.
March 14, 2023 12:35
Cargo.toml
Hello world!
March 13, 2023 14:39
LICENSE
Update README and add license
March 14, 2023 13:12
README.md
Update README.md
March 14, 2023 23:01
View code
LLaMA-rs Getting started Q&A Known issues / To-dos

README.md

 LLaMA-rs

    Do the LLaMA thing, but now in Rust 

A llama riding a crab, AI-generated

    Image by @darthdeus, using Stable Diffusion

ko-fi

Latest version MIT

Gif showcasing language generation using llama-rs

LLaMA-rs is a Rust port of the llama.cpp project. This allows running
inference for Facebook's LLaMA model on a CPU with good performance
using full precision, f16 or 4-bit quantized versions of the model.

Just like its C++ counterpart, it is powered by the ggml tensor
library, achieving the same performance as the original code.

 Getting started

Make sure you have a rust toolchain set up.

 1. Get a copy of the model's weights^1
 2. Clone the repository
 3. Build (cargo build --release)
 4. Run with cargo run --release -- <ARGS>

NOTE: Make sure to build and run in release mode. Debug builds are
currently broken.

For example, you try the following prompt:

cargo run --release -- -m /data/Llama/LLaMA/7B/ggml-model-q4_0.bin -p "Tell me how cool the Rust programming language is

 Q&A

  * Q: Why did you do this?

  * A: It was not my choice. Ferris appeared to me in my dreams and
    asked me to rewrite this in the name of the Holy crab.

  * Q: Seriously now

  * A: Come on! I don't want to get into a flame war. You know how it
    goes, something something memory something something cargo is
    nice, don't make me say it, everybody knows this already.

  * Q: I insist.

  * A: Sheesh! Okaaay. After seeing the huge potential for llama.cpp,
    the first thing I did was to see how hard would it be to turn it
    into a library to embed in my projects. I started digging into
    the code, and realized the heavy lifting is done by ggml (a C
    library, easy to bind to Rust) and the whole project was just
    around ~2k lines of C++ code (not so easy to bind). After a
    couple of (failed) attempts to build an HTTP server into the
    tool, I realized I'd be much more productive if I just ported the
    code to Rust, where I'm more comfortable.

  * Q: Is this the real reason?

  * A: Haha. Of course not. I just like collecting imaginary internet
    points, in the form of little stars, that people seem to give to
    me whenever I embark on pointless quests for rewriting X thing,
    but in Rust.

 Known issues / To-dos

Contributions welcome! Here's a few pressing issues:

  * [ ] The code only sets the right CFLAGS on Linux. The build.rs
    script in ggml_raw needs to be fixed, so inference will be very
    slow on every other OS.
  * [ ] The quantization code has not been ported (yet). You can
    still use the quantized models with llama.cpp.
  * [ ] The code needs to be "library"-fied. It is nice as a showcase
    binary, but the real potential for this tool is to allow
    embedding in other services.
  * [ ] No crates.io release. The name llama-rs is reserved and I
    plan to do this soon-ish.
  * [ ] Debug builds are currently broken.
  * [ ] Anything from the original C++ code.

Footnotes

 1. The only legal source to get the weights at the time of writing
    is this repository. The choice of words also may or may not hint
    at the existence of other kinds of sources. -

About

Run LLaMA inference on CPU, with Rust 

Resources

Readme

License

MIT license

Stars

466 stars

Watchers

5 watching

Forks

10 forks

Releases

No releases published

Sponsor this project

  * 
     

  * ko_fi ko-fi.com/setzer22

Learn more about GitHub Sponsors

Packages 0

No packages published

Contributors 3

  * @setzer22 setzer22
  * @philpax philpax Philpax
  * @mwbryant mwbryant

Languages

  * Rust 100.0%

Footer

 (c) 2023 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.