https://github.com/ictnlp/LLaMA-Omni Skip to content Navigation Menu Toggle navigation Sign in * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + GitHub Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions By size + Enterprise + Teams + Startups By industry + Healthcare + Financial services + Manufacturing By use case + CI/CD & Automation + DevOps + DevSecOps * Resources Topics + AI + DevOps + Security + Software Development + View all Explore + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up Reseting focus You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} ictnlp / LLaMA-Omni Public * Notifications You must be signed in to change notification settings * Fork 68 * Star 1.1k LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level. arxiv.org/abs/2409.06666 License Apache-2.0 license 1.1k stars 68 forks Branches Tags Activity Star Notifications You must be signed in to change notification settings * Code * Issues 5 * Pull requests 2 * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Security * Insights ictnlp/LLaMA-Omni This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main BranchesTags Go to file Code Folders and files Name Name Last commit message Last commit date Latest commit History 8 Commits images images omni_speech omni_speech .gitignore .gitignore LICENSE LICENSE README.md README.md pyproject.toml pyproject.toml View all files Repository files navigation * README * Apache-2.0 license LLaMA-Omni: Seamless Speech Interaction with Large Language Models Authors: Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng* arXiv model code LLaMA-Omni is a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions. [model] Highlights * Built on Llama-3.1-8B-Instruct, ensuring high-quality responses. * Low-latency speech interaction with a latency as low as 226ms. * Simultaneous generation of both text and speech responses. * [?][?] Trained in less than 3 days using just 4 GPUs. demo.mp4 Install 1. Clone this repository. git clone https://github.com/ictnlp/LLaMA-Omni cd LLaMA-Omni 2. Install packages. conda create -n llama-omni python=3.10 conda activate llama-omni pip install pip==24.0 pip install -e . 3. Install fairseq. git clone https://github.com/pytorch/fairseq cd fairseq pip install -e . --no-build-isolation 4. Install flash-attention. pip install flash-attn --no-build-isolation Quick Start 1. Download the Llama-3.1-8B-Omni model from Huggingface. 2. Download the Whisper-large-v3 model. import whisper model = whisper.load_model("large-v3", download_root="models/speech_encoder/") 3. Download the unit-based HiFi-GAN vocoder. wget https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000 -P vocoder/ wget https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/config.json -P vocoder/ Gradio Demo 1. Launch a controller. python -m omni_speech.serve.controller --host 0.0.0.0 --port 10000 2. Launch a gradio web server. python -m omni_speech.serve.gradio_web_server --controller http://localhost:10000 --port 8000 --model-list-mode reload --vocoder vocoder/g_00500000 --vocoder-cfg vocoder/config.json 3. Launch a model worker. python -m omni_speech.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path Llama-3.1-8B-Omni --model-name Llama-3.1-8B-Omni --s2s 4. Visit http://localhost:8000/ and interact with LLaMA-3.1-8B-Omni! Note: Due to the instability of streaming audio playback in Gradio, we have only implemented streaming audio synthesis without enabling autoplay. If you have a good solution, feel free to submit a PR. Thanks! Local Inference To run inference locally, please organize the speech instruction files according to the format in the omni_speech/infer/examples directory, then refer to the following script. bash omni_speech/infer/run.sh omni_speech/infer/examples LICENSE Our code is released under the Apache-2.0 License. Our model, as it is built on Llama 3.1, is required to comply with the Llama 3.1 License. Acknowledgements * LLaVA: The codebase we built upon. * SLAM-LLM: We borrow some code about speech encoder and speech adaptor. Citation If you have any questions, please feel free to submit an issue or contact fangqingkai21b@ict.ac.cn. If our work is useful for you, please cite as: @article{fang-etal-2024-llama-omni, title={LLaMA-Omni: Seamless Speech Interaction with Large Language Models}, author={Fang, Qingkai and Guo, Shoutao and Zhou, Yan and Ma, Zhengrui and Zhang, Shaolei and Feng, Yang}, journal={arXiv preprint arXiv:2409.06666}, year={2024} } Star History Star History Chart About LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level. arxiv.org/abs/2409.06666 Topics speech-to-text speech-to-speech large-language-models multimodal-large-language-models speech-language-model speech-interaction Resources Readme License Apache-2.0 license Activity Custom properties Stars 1.1k stars Watchers 19 watching Forks 68 forks Report repository Releases No releases published Packages 0 No packages published Languages * Python 99.5% * Shell 0.5% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.