https://github.com/intel-analytics/ipex-llm

Skip to content
Toggle navigation
 
Sign in

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
intel-analytics / ipex-llm Public

  * Notifications
  * Fork 1.2k
  * Star 5.3k
  * 

Accelerate local LLM inference and finetuning (LLaMA, Mistral,
ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU
(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A
PyTorch LLM library that seamlessly integrates with llama.cpp,
HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat,
ModelScope, etc.

ipex-llm.readthedocs.io

License

Apache-2.0 license
5.3k stars 1.2k forks Branches Tags Activity
Star
Notifications

  * Code
  * Issues 768
  * Pull requests 240
  * Discussions
  * Actions
  * Projects 0
  * Wiki
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Projects
  * Wiki
  * Security
  * Insights

intel-analytics/ipex-llm

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
 main
BranchesTags
  
Go to file
Code

Folders and files

      Name              Name          Last commit       Last commit
                                        message            date
Latest commit

 

History

2,520 Commits
 
.github           .github                              

apps              apps                                 

docker/llm        docker/llm                           

docs/readthedocs  docs/readthedocs                     

python/llm        python/llm                           

.gitignore        .gitignore                           

.readthedocs.yml  .readthedocs.yml                     

LICENSE           LICENSE                              

README.md         README.md                            

SECURITY.md       SECURITY.md                          

pyproject.toml    pyproject.toml                       

View all files

Repository files navigation

  * README
  * Apache-2.0 license
  * Security

Important

bigdl-llm has now become ipex-llm (see the migration guide here); you
may find the original BigDL project here.

---------------------------------------------------------------------

 IPEX-LLM

 

IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU
(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)
with very low latency^1.

Note

  * It is built on top of Intel Extension for PyTorch (IPEX), as well
    as the excellent work of llama.cpp, bitsandbytes, vLLM, qlora, 
    AutoGPTQ, AutoAWQ, etc.
  * It provides seamless integration with llama.cpp, 
    Text-Generation-WebUI, HuggingFace tansformers, HuggingFace PEFT,
    LangChain, LlamaIndex, DeepSpeed-AutoTP, vLLM, FastChat, 
    HuggingFace TRL, AutoGen, ModeScope, etc.
  * 50+ models have been optimized/verified on ipex-llm (including
    LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM,
    Baichuan, Qwen, RWKV, and more); see the complete list here.

Latest Update 

 

  * [2024/03] bigdl-llm has now become ipex-llm (see the migration
    guide here); you may find the original BigDL project here.
  * [2024/02] ipex-llm now supports directly loading model from
    ModelScope (Mo Da ).
  * [2024/02] ipex-llm added inital INT2 support (based on llama.cpp
    IQ2 mechanism), which makes it possible to run large-size LLM
    (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM.
  * [2024/02] Users can now use ipex-llm through
    Text-Generation-WebUI GUI.
  * [2024/02] ipex-llm now supports Self-Speculative Decoding, which
    in practice brings ~30% speedup for FP16 and BF16 inference
    latency on Intel GPU and CPU respectively.
  * [2024/02] ipex-llm now supports a comprehensive list of LLM
    finetuning on Intel GPU (including LoRA, QLoRA, DPO, QA-LoRA and
    ReLoRA).
  * [2024/01] Using ipex-llm QLoRA, we managed to finetune LLaMA2-7B
    in 21 minutes and LLaMA2-70B in 3.14 hours on 8 Intel Max 1550
    GPU for Standford-Alpaca (see the blog here).

More updates

  * [2023/12] ipex-llm now supports ReLoRA (see "ReLoRA: High-Rank
    Training Through Low-Rank Updates").
  * [2023/12] ipex-llm now supports Mixtral-8x7B on both Intel GPU
    and CPU.
  * [2023/12] ipex-llm now supports QA-LoRA (see "QA-LoRA:
    Quantization-Aware Low-Rank Adaptation of Large Language Models"
    ).
  * [2023/12] ipex-llm now supports FP8 and FP4 inference on Intel
    GPU.
  * [2023/11] Initial support for directly loading GGUF, AWQ and GPTQ
    models into ipex-llm is available.
  * [2023/11] ipex-llm now supports vLLM continuous batching on both
    Intel GPU and CPU.
  * [2023/10] ipex-llm now supports QLoRA finetuning on both Intel
    GPU and CPU.
  * [2023/10] ipex-llm now supports FastChat serving on on both Intel
    CPU and GPU.
  * [2023/09] ipex-llm now supports Intel GPU (including iGPU, Arc,
    Flex and MAX).
  * [2023/09] ipex-llm tutorial is released.

ipex-llm Demos

 

See the optimized performance of chatglm2-6b and llama-2-13b-chat
models on 12th Gen Intel Core CPU and Intel Arc GPU below.

     12th Gen Intel Core CPU                 Intel Arc GPU
[6874747073]     [6874747073]      [6874747073]     [6874747073]
  chatglm2-6b    llama-2-13b-chat    chatglm2-6b    llama-2-13b-chat

ipex-llm Quickstart

 

Install ipex-llm

 

  * Windows GPU: installing ipex-llm on Windows with Intel GPU
  * Linux GPU: installing ipex-llm on Linux with Intel GPU
  * Docker: using ipex-llm dockers on Intel CPU and GPU
  * For more details, please refer to the installation guide

Run ipex-llm

 

  * llama.cpp: running ipex-llm for llama.cpp (using C++ interface of
    ipex-llm as an accelerated backend for llama.cpp on Intel GPU)
  * vLLM: running ipex-llm in vLLM on both Intel GPU and CPU
  * FastChat: running ipex-llm in FastChat serving on on both Intel
    GPU and CPU
  * LangChain-Chatchat RAG: running ipex-llm in LangChain-Chatchat (
    Knowledge Base QA using RAG pipeline)
  * Text-Generation-WebUI: running ipex-llm in oobabooga WebUI
  * Benchmarking: running (latency and throughput) benchmarks for
    ipex-llm on Intel CPU and GPU

Code Examples

 

  * Low bit inference
      + INT4 inference: INT4 LLM inference on Intel GPU and CPU
      + FP8/FP4 inference: FP8 and FP4 LLM inference on Intel GPU
      + INT8 inference: INT8 LLM inference on Intel GPU and CPU
      + INT2 inference: INT2 LLM inference (based on llama.cpp IQ2
        mechanism) on Intel GPU
  * FP16/BF16 inference
      + FP16 LLM inference on Intel GPU, with possible
        self-speculative decoding optimization
      + BF16 LLM inference on Intel CPU, with possible
        self-speculative decoding optimization
  * Save and load
      + Low-bit models: saving and loading ipex-llm low-bit models
      + GGUF: directly loading GGUF models into ipex-llm
      + AWQ: directly loading AWQ models into ipex-llm
      + GPTQ: directly loading GPTQ models into ipex-llm
  * Finetuning
      + LLM finetuning on Intel GPU, including LoRA, QLoRA, DPO,
        QA-LoRA and ReLoRA
      + QLoRA finetuning on Intel CPU
  * Integration with community libraries
      + HuggingFace tansformers
      + Standard PyTorch model
      + DeepSpeed-AutoTP
      + HuggingFace PEFT
      + HuggingFace TRL
      + LangChain
      + LlamaIndex
      + AutoGen
      + ModeScope
  * Tutorials

For more details, please refer to the ipex-llm document website.

Verified Models

 

Over 50 models have been optimized/verified on ipex-llm, including
LLaMA/LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM2/
ChatGLM3, Baichuan/Baichuan2, Qwen/Qwen-1.5, InternLM and more; see
the list below.

                      Model                          CPU       GPU
                                                   Example   Example
LLaMA (such as Vicuna, Guanaco, Koala, Baize,     link1,    link
WizardLM, etc.)                                   link2
LLaMA 2                                           link1,    link
                                                  link2
ChatGLM                                           link
ChatGLM2                                          link      link
ChatGLM3                                          link      link
Mistral                                           link      link
Mixtral                                           link      link
Falcon                                            link      link
MPT                                               link      link
Dolly-v1                                          link      link
Dolly-v2                                          link      link
Replit Code                                       link      link
RedPajama                                         link1,
                                                  link2
Phoenix                                           link1,
                                                  link2
StarCoder                                         link1,    link
                                                  link2
Baichuan                                          link      link
Baichuan2                                         link      link
InternLM                                          link      link
Qwen                                              link      link
Qwen1.5                                           link      link
Qwen-VL                                           link      link
Aquila                                            link      link
Aquila2                                           link      link
MOSS                                              link
Whisper                                           link      link
Phi-1_5                                           link      link
Flan-t5                                           link      link
LLaVA                                             link      link
CodeLlama                                         link      link
Skywork                                           link
InternLM-XComposer                                link
WizardCoder-Python                                link
CodeShell                                         link
Fuyu                                              link
Distil-Whisper                                    link      link
Yi                                                link      link
BlueLM                                            link      link
Mamba                                             link      link
SOLAR                                             link      link
Phixtral                                          link      link
InternLM2                                         link      link
RWKV4                                                       link
RWKV5                                                       link
Bark                                              link      link
SpeechT5                                                    link
DeepSeek-MoE                                      link
Ziya-Coding-34B-v1.0                              link
Phi-2                                             link      link
Yuan2                                             link      link
Gemma                                             link      link
DeciLM-7B                                         link      link
Deepseek                                          link      link
StableLM                                          link      link

Footnotes

 1. Performance varies by use, configuration and other factors.
    ipex-llm may not optimize to the same degree for non-Intel
    products. Learn more at www.Intel.com/PerformanceIndex. -

About

Accelerate local LLM inference and finetuning (LLaMA, Mistral,
ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU
(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A
PyTorch LLM library that seamlessly integrates with llama.cpp,
HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat,
ModelScope, etc.

ipex-llm.readthedocs.io

Topics

python scala spark tensorflow keras transformers pytorch bigdl 
distributed-deep-learning analytics-zoo llm

Resources

Readme

License

Apache-2.0 license

Security policy

Security policy
Activity
Custom properties

Stars

5.3k stars

Watchers

243 watching

Forks

1.2k forks
Report repository

Releases 19

 
BigDL release 2.4.0 Latest
Nov 13, 2023
+ 18 releases

Packages 0

No packages published

Contributors 94

  * @Oscilloscope98
  * @rnwang04
  * @plusbang
  * @MeouSker77
  * @qiuxin2012
  * @yangw1234
  * @Uxito-Ada
  * @jason-dai
  * @dding3
  * @hkvision
  * @Zhengjin-Wang
  * @cyita
  * @lalalapotter
  * @hzjane

+ 80 contributors

Languages

  * Python 97.2%
  * Shell 2.3%
  * Other 0.5%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.