https://github.com/fishaudio/fish-speech

Skip to content

Navigation Menu

Toggle navigation
 
Sign in

  * Product
      +  
        GitHub Copilot
        Write better code with AI
      +  
        Security
        Find and fix vulnerabilities
      +  
        Actions
        Automate any workflow
      +  
        Codespaces
        Instant dev environments
      +  
        Issues
        Plan and track work
      +  
        Code Review
        Manage code changes
      +  
        Discussions
        Collaborate outside of code
      +  
        Code Search
        Find more, search less
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    By company size
      + Enterprises
      + Small and medium teams
      + Startups
    By use case
      + DevSecOps
      + DevOps
      + CI/CD
      + View all use cases
    By industry
      + Healthcare
      + Financial services
      + Manufacturing
      + Government
      + View all industries
    View all solutions
  * Resources
    Topics
      + AI
      + DevOps
      + Security
      + Software Development
      + View all
    Explore
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
      + Executive Insights
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Enterprise
      +  
        Enterprise platform
        AI-powered developer platform
    Available add-ons
      +  
        Advanced Security
        Enterprise-grade security features
      +  
        GitHub Copilot
        Enterprise-grade AI features
      +  
        Premium Support
        Enterprise-grade 24/7 support
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up Reseting focus
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
fishaudio / fish-speech Public

  * Notifications You must be signed in to change notification
    settings
  * Fork 1.1k
  * Star 14.9k

SOTA Open Source TTS

speech.fish.audio

License

View license
14.9k stars 1.1k forks Branches Tags Activity
Star
Notifications You must be signed in to change notification settings

  * Code
  * Issues 40
  * Pull requests 1
  * Discussions
  * Actions
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Security
  * Insights

fishaudio/fish-speech

 main
BranchesTags
  
[                    ]
Go to file
Code

Folders and files

                                                Last commit   Last
         Name                    Name             message    commit
                                                              date
Latest commit

 

History

612 Commits
 
.github                 .github                              
docs                    docs                                 
fish_speech             fish_speech                          
tools                   tools                                
.dockerignore           .dockerignore                        
.gitignore              .gitignore                           
.pre-commit-config.yaml .pre-commit-config.yaml              
.project-root           .project-root                        
.readthedocs.yaml       .readthedocs.yaml                    
API_FLAGS.txt           API_FLAGS.txt                        
LICENSE                 LICENSE                              
README.md               README.md                            
docker-compose.dev.yml  docker-compose.dev.yml               
dockerfile              dockerfile                           
dockerfile.dev          dockerfile.dev                       
entrypoint.sh           entrypoint.sh                        
inference.ipynb         inference.ipynb                      
install_env.bat         install_env.bat                      
mkdocs.yml              mkdocs.yml                           
pyproject.toml          pyproject.toml                       
pyrightconfig.json      pyrightconfig.json                   
run_cmd.bat             run_cmd.bat                          
start.bat               start.bat                            
View all files

Repository files navigation

  * README
  * License

                             Fish Speech

 

English | Jian Ti Zhong Wen  | Portuguese | Ri Ben Yu  | hangugeo

Fish Speech 1.4 - Open-Source Multilingual Text-to-Speech with Voice
Cloning | Product Hunt fishaudio%2Ffish-speech | Trendshift

                            [6874747073]

                Discord Docker Huggingface QQ Channel

This codebase and all models are released under CC-BY-NC-SA-4.0
License. Please refer to LICENSE for more details.

---------------------------------------------------------------------

Fish Agent

 

We are very excited to annoce that we have made our self-research
agent demo open source, you can now try our agent demo online at demo
for instant English chat and English and Chinese chat locally by
following the docs.

You should mention that the content is released under a CC BY-NC-SA
4.0 licence. And the demo is an early alpha test version, the
inference speed needs to be optimised, and there are a lot of bugs
waiting to be fixed. If you've found a bug or want to fix it, we'd be
very happy to receive an issue or a pull request.

Features

 

Fish Speech

 

 1. Zero-shot & Few-shot TTS: Input a 10 to 30-second vocal sample to
    generate high-quality TTS output. For detailed guidelines, see 
    Voice Cloning Best Practices.

 2. Multilingual & Cross-lingual Support: Simply copy and paste
    multilingual text into the input box--no need to worry about the
    language. Currently supports English, Japanese, Korean, Chinese,
    French, German, Arabic, and Spanish.

 3. No Phoneme Dependency: The model has strong generalization
    capabilities and does not rely on phonemes for TTS. It can handle
    text in any language script.

 4. Highly Accurate: Achieves a low CER (Character Error Rate) and
    WER (Word Error Rate) of around 2% for 5-minute English texts.

 5. Fast: With fish-tech acceleration, the real-time factor is
    approximately 1:5 on an Nvidia RTX 4060 laptop and 1:15 on an
    Nvidia RTX 4090.

 6. WebUI Inference: Features an easy-to-use, Gradio-based web UI
    compatible with Chrome, Firefox, Edge, and other browsers.

 7. GUI Inference: Offers a PyQt6 graphical interface that works
    seamlessly with the API server. Supports Linux, Windows, and
    macOS. See GUI.

 8. Deploy-Friendly: Easily set up an inference server with native
    support for Linux, Windows and MacOS, minimizing speed loss.

Fish Agent

 

 1. Completely End to End: Automatically integrates ASR and TTS
    parts, no need to plug-in other models, i.e., true end-to-end,
    not three-stage (ASR+LLM+TTS).

 2. Timbre Control: Can use reference audio to control the speech
    timbre.

 3. Emotional: The model can generate speech with strong emotion.

Disclaimer

 

We do not hold any responsibility for any illegal usage of the
codebase. Please refer to your local laws about DMCA and other
related laws.

Online Demo

 

Fish Audio

Fish Agent

Quick Start for Local Inference

 

inference.ipynb

Videos

 

V1.4 Demo Video: Youtube

 

Documents

 

  * English
  * Zhong Wen 
  * Ri Ben Yu 
  * Portuguese (Brazil)

Samples (2024/10/02 V1.4)

 

  * English
  * Zhong Wen 
  * Ri Ben Yu 
  * Portuguese (Brazil)

Credits

 

  * VITS2 (daniilrobnikov)
  * Bert-VITS2
  * GPT VITS
  * MQTTS
  * GPT Fast
  * GPT-SoVITS

Tech Report (V1.4)

 

@misc{fish-speech-v1.4,
      title={Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis},
      author={Shijia Liao and Yuxuan Wang and Tianyu Li and Yifan Cheng and Ruoyi Zhang and Rongzhi Zhou and Yijin Xing},
      year={2024},
      eprint={2411.01156},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2411.01156},
}

Sponsor

 
6Block Avatar
Data Processing sponsor by 6Block
Lepton Avatar
Fish Audio is served on Lepton.AI

About

SOTA Open Source TTS

speech.fish.audio

Topics

tts transformer llama valle vqvae vits vqgan

Resources

Readme

License

View license
Activity
Custom properties

Stars

14.9k stars

Watchers

98 watching

Forks

1.1k forks
Report repository

Releases 11

 
V1.4.3 Latest
Nov 29, 2024
+ 10 releases

Contributors 51

  * @leng-yue
  * @github-actions[bot]
  * @pre-commit-ci[bot]
  * @AnyaCoder
  * @Stardust-minus
  * @PoTaTo-Mika
  * @Whale-Dolphin
  * @v3ucn
  * @OedoSoldier
  * @Tps-F
  * @duliangang
  * @faceair
  * @eltociear
  * @wojiushixiaobai

+ 37 contributors

Languages

  * Python 95.9%
  * Batchfile 1.7%
  * Jupyter Notebook 1.1%
  * JavaScript 0.6%
  * CSS 0.6%
  * HTML 0.1%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.