https://github.com/setzer22/llama-rs Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code + Explore + All features + Documentation + GitHub Skills + Blog * Solutions + For + Enterprise + Teams + Startups + Education + By Solution + CI/CD & Automation + DevOps + DevSecOps + Case Studies + Customer Stories + Resources * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles + Repositories + Topics + Trending + Collections * Pricing [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this user All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} setzer22 / llama-rs Public * * Notifications * Fork 10 * Star 466 Run LLaMA inference on CPU, with Rust License MIT license 466 stars 10 forks Star Notifications * Code * Issues 6 * Pull requests 4 * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Security * Insights setzer22/llama-rs This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 2 branches 0 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/s] Use Git or checkout with SVN using the web URL. [gh repo clone setzer] Work fast with our official CLI. Learn more. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @setzer22 setzer22 Mark ggml as vendored for statistics ... 90703d9 Mar 15, 2023 Mark ggml as vendored for statistics 90703d9 Git stats * 32 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .github Add FUNDING.yml March 14, 2023 13:14 doc/resources Update image March 14, 2023 13:39 ggml-raw Mark ggml as vendored for statistics March 15, 2023 16:23 llama-rs Fixes #3 March 14, 2023 15:52 .gitignore Hello world! March 13, 2023 14:39 Cargo.lock Restructure modules. Add clap argument parsing. March 14, 2023 12:35 Cargo.toml Hello world! March 13, 2023 14:39 LICENSE Update README and add license March 14, 2023 13:12 README.md Update README.md March 14, 2023 23:01 View code LLaMA-rs Getting started Q&A Known issues / To-dos README.md LLaMA-rs Do the LLaMA thing, but now in Rust A llama riding a crab, AI-generated Image by @darthdeus, using Stable Diffusion ko-fi Latest version MIT Gif showcasing language generation using llama-rs LLaMA-rs is a Rust port of the llama.cpp project. This allows running inference for Facebook's LLaMA model on a CPU with good performance using full precision, f16 or 4-bit quantized versions of the model. Just like its C++ counterpart, it is powered by the ggml tensor library, achieving the same performance as the original code. Getting started Make sure you have a rust toolchain set up. 1. Get a copy of the model's weights^1 2. Clone the repository 3. Build (cargo build --release) 4. Run with cargo run --release -- NOTE: Make sure to build and run in release mode. Debug builds are currently broken. For example, you try the following prompt: cargo run --release -- -m /data/Llama/LLaMA/7B/ggml-model-q4_0.bin -p "Tell me how cool the Rust programming language is Q&A * Q: Why did you do this? * A: It was not my choice. Ferris appeared to me in my dreams and asked me to rewrite this in the name of the Holy crab. * Q: Seriously now * A: Come on! I don't want to get into a flame war. You know how it goes, something something memory something something cargo is nice, don't make me say it, everybody knows this already. * Q: I insist. * A: Sheesh! Okaaay. After seeing the huge potential for llama.cpp, the first thing I did was to see how hard would it be to turn it into a library to embed in my projects. I started digging into the code, and realized the heavy lifting is done by ggml (a C library, easy to bind to Rust) and the whole project was just around ~2k lines of C++ code (not so easy to bind). After a couple of (failed) attempts to build an HTTP server into the tool, I realized I'd be much more productive if I just ported the code to Rust, where I'm more comfortable. * Q: Is this the real reason? * A: Haha. Of course not. I just like collecting imaginary internet points, in the form of little stars, that people seem to give to me whenever I embark on pointless quests for rewriting X thing, but in Rust. Known issues / To-dos Contributions welcome! Here's a few pressing issues: * [ ] The code only sets the right CFLAGS on Linux. The build.rs script in ggml_raw needs to be fixed, so inference will be very slow on every other OS. * [ ] The quantization code has not been ported (yet). You can still use the quantized models with llama.cpp. * [ ] The code needs to be "library"-fied. It is nice as a showcase binary, but the real potential for this tool is to allow embedding in other services. * [ ] No crates.io release. The name llama-rs is reserved and I plan to do this soon-ish. * [ ] Debug builds are currently broken. * [ ] Anything from the original C++ code. Footnotes 1. The only legal source to get the weights at the time of writing is this repository. The choice of words also may or may not hint at the existence of other kinds of sources. - About Run LLaMA inference on CPU, with Rust Resources Readme License MIT license Stars 466 stars Watchers 5 watching Forks 10 forks Releases No releases published Sponsor this project * * ko_fi ko-fi.com/setzer22 Learn more about GitHub Sponsors Packages 0 No packages published Contributors 3 * @setzer22 setzer22 * @philpax philpax Philpax * @mwbryant mwbryant Languages * Rust 100.0% Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.