https://github.com/intel-analytics/ipex-llm Skip to content Toggle navigation Sign in * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} intel-analytics / ipex-llm Public * Notifications * Fork 1.2k * Star 5.3k * Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc. ipex-llm.readthedocs.io License Apache-2.0 license 5.3k stars 1.2k forks Branches Tags Activity Star Notifications * Code * Issues 768 * Pull requests 240 * Discussions * Actions * Projects 0 * Wiki * Security * Insights Additional navigation options * Code * Issues * Pull requests * Discussions * Actions * Projects * Wiki * Security * Insights intel-analytics/ipex-llm This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main BranchesTags Go to file Code Folders and files Name Name Last commit Last commit message date Latest commit History 2,520 Commits .github .github apps apps docker/llm docker/llm docs/readthedocs docs/readthedocs python/llm python/llm .gitignore .gitignore .readthedocs.yml .readthedocs.yml LICENSE LICENSE README.md README.md SECURITY.md SECURITY.md pyproject.toml pyproject.toml View all files Repository files navigation * README * Apache-2.0 license * Security Important bigdl-llm has now become ipex-llm (see the migration guide here); you may find the original BigDL project here. --------------------------------------------------------------------- IPEX-LLM IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency^1. Note * It is built on top of Intel Extension for PyTorch (IPEX), as well as the excellent work of llama.cpp, bitsandbytes, vLLM, qlora, AutoGPTQ, AutoAWQ, etc. * It provides seamless integration with llama.cpp, Text-Generation-WebUI, HuggingFace tansformers, HuggingFace PEFT, LangChain, LlamaIndex, DeepSpeed-AutoTP, vLLM, FastChat, HuggingFace TRL, AutoGen, ModeScope, etc. * 50+ models have been optimized/verified on ipex-llm (including LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM, Baichuan, Qwen, RWKV, and more); see the complete list here. Latest Update * [2024/03] bigdl-llm has now become ipex-llm (see the migration guide here); you may find the original BigDL project here. * [2024/02] ipex-llm now supports directly loading model from ModelScope (Mo Da ). * [2024/02] ipex-llm added inital INT2 support (based on llama.cpp IQ2 mechanism), which makes it possible to run large-size LLM (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM. * [2024/02] Users can now use ipex-llm through Text-Generation-WebUI GUI. * [2024/02] ipex-llm now supports Self-Speculative Decoding, which in practice brings ~30% speedup for FP16 and BF16 inference latency on Intel GPU and CPU respectively. * [2024/02] ipex-llm now supports a comprehensive list of LLM finetuning on Intel GPU (including LoRA, QLoRA, DPO, QA-LoRA and ReLoRA). * [2024/01] Using ipex-llm QLoRA, we managed to finetune LLaMA2-7B in 21 minutes and LLaMA2-70B in 3.14 hours on 8 Intel Max 1550 GPU for Standford-Alpaca (see the blog here). More updates * [2023/12] ipex-llm now supports ReLoRA (see "ReLoRA: High-Rank Training Through Low-Rank Updates"). * [2023/12] ipex-llm now supports Mixtral-8x7B on both Intel GPU and CPU. * [2023/12] ipex-llm now supports QA-LoRA (see "QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models" ). * [2023/12] ipex-llm now supports FP8 and FP4 inference on Intel GPU. * [2023/11] Initial support for directly loading GGUF, AWQ and GPTQ models into ipex-llm is available. * [2023/11] ipex-llm now supports vLLM continuous batching on both Intel GPU and CPU. * [2023/10] ipex-llm now supports QLoRA finetuning on both Intel GPU and CPU. * [2023/10] ipex-llm now supports FastChat serving on on both Intel CPU and GPU. * [2023/09] ipex-llm now supports Intel GPU (including iGPU, Arc, Flex and MAX). * [2023/09] ipex-llm tutorial is released. ipex-llm Demos See the optimized performance of chatglm2-6b and llama-2-13b-chat models on 12th Gen Intel Core CPU and Intel Arc GPU below. 12th Gen Intel Core CPU Intel Arc GPU [6874747073] [6874747073] [6874747073] [6874747073] chatglm2-6b llama-2-13b-chat chatglm2-6b llama-2-13b-chat ipex-llm Quickstart Install ipex-llm * Windows GPU: installing ipex-llm on Windows with Intel GPU * Linux GPU: installing ipex-llm on Linux with Intel GPU * Docker: using ipex-llm dockers on Intel CPU and GPU * For more details, please refer to the installation guide Run ipex-llm * llama.cpp: running ipex-llm for llama.cpp (using C++ interface of ipex-llm as an accelerated backend for llama.cpp on Intel GPU) * vLLM: running ipex-llm in vLLM on both Intel GPU and CPU * FastChat: running ipex-llm in FastChat serving on on both Intel GPU and CPU * LangChain-Chatchat RAG: running ipex-llm in LangChain-Chatchat ( Knowledge Base QA using RAG pipeline) * Text-Generation-WebUI: running ipex-llm in oobabooga WebUI * Benchmarking: running (latency and throughput) benchmarks for ipex-llm on Intel CPU and GPU Code Examples * Low bit inference + INT4 inference: INT4 LLM inference on Intel GPU and CPU + FP8/FP4 inference: FP8 and FP4 LLM inference on Intel GPU + INT8 inference: INT8 LLM inference on Intel GPU and CPU + INT2 inference: INT2 LLM inference (based on llama.cpp IQ2 mechanism) on Intel GPU * FP16/BF16 inference + FP16 LLM inference on Intel GPU, with possible self-speculative decoding optimization + BF16 LLM inference on Intel CPU, with possible self-speculative decoding optimization * Save and load + Low-bit models: saving and loading ipex-llm low-bit models + GGUF: directly loading GGUF models into ipex-llm + AWQ: directly loading AWQ models into ipex-llm + GPTQ: directly loading GPTQ models into ipex-llm * Finetuning + LLM finetuning on Intel GPU, including LoRA, QLoRA, DPO, QA-LoRA and ReLoRA + QLoRA finetuning on Intel CPU * Integration with community libraries + HuggingFace tansformers + Standard PyTorch model + DeepSpeed-AutoTP + HuggingFace PEFT + HuggingFace TRL + LangChain + LlamaIndex + AutoGen + ModeScope * Tutorials For more details, please refer to the ipex-llm document website. Verified Models Over 50 models have been optimized/verified on ipex-llm, including LLaMA/LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM2/ ChatGLM3, Baichuan/Baichuan2, Qwen/Qwen-1.5, InternLM and more; see the list below. Model CPU GPU Example Example LLaMA (such as Vicuna, Guanaco, Koala, Baize, link1, link WizardLM, etc.) link2 LLaMA 2 link1, link link2 ChatGLM link ChatGLM2 link link ChatGLM3 link link Mistral link link Mixtral link link Falcon link link MPT link link Dolly-v1 link link Dolly-v2 link link Replit Code link link RedPajama link1, link2 Phoenix link1, link2 StarCoder link1, link link2 Baichuan link link Baichuan2 link link InternLM link link Qwen link link Qwen1.5 link link Qwen-VL link link Aquila link link Aquila2 link link MOSS link Whisper link link Phi-1_5 link link Flan-t5 link link LLaVA link link CodeLlama link link Skywork link InternLM-XComposer link WizardCoder-Python link CodeShell link Fuyu link Distil-Whisper link link Yi link link BlueLM link link Mamba link link SOLAR link link Phixtral link link InternLM2 link link RWKV4 link RWKV5 link Bark link link SpeechT5 link DeepSeek-MoE link Ziya-Coding-34B-v1.0 link Phi-2 link link Yuan2 link link Gemma link link DeciLM-7B link link Deepseek link link StableLM link link Footnotes 1. Performance varies by use, configuration and other factors. ipex-llm may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex. - About Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc. ipex-llm.readthedocs.io Topics python scala spark tensorflow keras transformers pytorch bigdl distributed-deep-learning analytics-zoo llm Resources Readme License Apache-2.0 license Security policy Security policy Activity Custom properties Stars 5.3k stars Watchers 243 watching Forks 1.2k forks Report repository Releases 19 BigDL release 2.4.0 Latest Nov 13, 2023 + 18 releases Packages 0 No packages published Contributors 94 * @Oscilloscope98 * @rnwang04 * @plusbang * @MeouSker77 * @qiuxin2012 * @yangw1234 * @Uxito-Ada * @jason-dai * @dding3 * @hkvision * @Zhengjin-Wang * @cyita * @lalalapotter * @hzjane + 80 contributors Languages * Python 97.2% * Shell 2.3% * Other 0.5% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.