https://github.com/abus-aikorea/voice-pro

Skip to content

Navigation Menu

Toggle navigation
 
Sign in

  * Product
      +  
        GitHub Copilot
        Write better code with AI
      +  
        Security
        Find and fix vulnerabilities
      +  
        Actions
        Automate any workflow
      +  
        Codespaces
        Instant dev environments
      +  
        Issues
        Plan and track work
      +  
        Code Review
        Manage code changes
      +  
        Discussions
        Collaborate outside of code
      +  
        Code Search
        Find more, search less
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    By company size
      + Enterprises
      + Small and medium teams
      + Startups
    By use case
      + DevSecOps
      + DevOps
      + CI/CD
      + View all use cases
    By industry
      + Healthcare
      + Financial services
      + Manufacturing
      + Government
      + View all industries
    View all solutions
  * Resources
    Topics
      + AI
      + DevOps
      + Security
      + Software Development
      + View all
    Explore
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Enterprise
      +  
        Enterprise platform
        AI-powered developer platform
    Available add-ons
      +  
        Advanced Security
        Enterprise-grade security features
      +  
        GitHub Copilot
        Enterprise-grade AI features
      +  
        Premium Support
        Enterprise-grade 24/7 support
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up Reseting focus
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
abus-aikorea / voice-pro Public

  * 
  * Notifications You must be signed in to change notification
    settings
  * Fork 125
  * Star 1.7k

Comprehensive Gradio WebUI for audio processing, powered by Whisper
engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features
Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube
downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and
multi-language translation. Perfect for content creators and
developers.

www.youtube.com/watch?v=z8g8lmhoh_o&list=
plwx5dnmdvc9y7dajm9r26czuw1uu5vieq

License

MIT license
1.7k stars 125 forks Branches Tags Activity
Star
Notifications You must be signed in to change notification settings

  * Code
  * Issues 6
  * Pull requests 0
  * Discussions
  * Actions
  * Projects 0
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Projects
  * Security
  * Insights

abus-aikorea/voice-pro

 main
BranchesTags
  
[                    ]
Go to file
Code

Folders and files

                                                             Last    Last
            Name                          Name              commit  commit
                                                            message  date
Latest commit

 

History

61 Commits
 
.github                       .github                                
app                           app                                    
docs                          docs                                   
model                         model                                  
rvc                           rvc                                    
src                           src                                    
.gitattributes                .gitattributes                         
.gitignore                    .gitignore                             
LICENSE                       LICENSE                                
README.md                     README.md                              
configure.bat                 configure.bat                          
one_click.cp310-win_amd64.pyd one_click.cp310-win_amd64.pyd          
requirements-voice-cpu.txt    requirements-voice-cpu.txt             
requirements-voice-gpu.txt    requirements-voice-gpu.txt             
start-abus.py                 start-abus.py                          
start-voice.py                start-voice.py                         
start.bat                     start.bat                              
uninstall.bat                 uninstall.bat                          
View all files

Repository files navigation

  * README
  * MIT license

Voice-Pro: The best gradio web-ui for transcription, translation and
text-to-speech 

 

 hangugeo [?] English [?] Zhong Wen Jian Ti  [?] Zhong Wen Fan Ti  [?] Ri Ben Yu 

GitHub License GitHub Release

Voice-Pro is the best gradio WebUI for transcription, translation and
text-to-speech. It can be easily installed with one click. Create a
virtual environment using Miniconda, running completely separate from
the Windows system (fully portable). Supports real-time transcription
and translation, as well as batch mode.

  * YouTube Downloader: You can download YouTube videos and extract
    the audio (mp3, wav, flac).
  * Vocal Remover: Use MDX-Net supported in UVR5 and the Demucs
    engine developed by Meta for voice separation.
  * STT: Supports speech-to-text conversion with Whisper,
    Faster-Whisper, and whisper-timestamped.
  * Translator: Google Translator. Short text translation, subtitle
    file translation.
  * TTS: Text to Speech. Edge-TTS. E2 and F5-TTS that support
    zero-shot voice cloning.
  * We provide Celeb voices for free. Try creating your own podcast.
    You can check it in the F5-TTS tab.

 Run screen

 

  * TTS tab : Podcast Production using F5-TTS

    f5-tts-demo-elon-zuckerberg-1115-3.mp4

  * Studio tab : Transcription, Translation & Text-to-Speech

    voice-pro-demo-v1.6.7-1080p.mp4

 Key Features

 

  * Studio tab
      + Provides integrated environment for YouTube downloader, noise
        removal, subtitles, translation, and TTS
      + All video/audio formats supported by ffmpeg can be used
      + Selectable output audio format (wav, flac, mp3)
      + Speech recognition and subtitle creation for 100 languages
      + Select subtitle creation options suitable for PC performance
        (Whisper Model & Compute Type)
      + Translation into over 100 languages   and voice generation
        through TTS
      + The BGM and sound effects from the original video are
        maintained in the multilingual video.
      + Supports TTS voice speed, volume, and pitch adjustment

                             [main_page]

  * Whisper Caption tab

      + A tab dedicated to creating subtitles. Supports over 90
        languages
      + Display subtitles created with the video
      + World-Level Highlight function provided
      + Denoise function provided (1-Demucs, 2-MDXNet)
  * Translate tab

      + Dedicated tab for translation. Supports over 100 languages
      + Supports subtitle files (ass, ssa, srt, mpl2, tmp, vtt,
        microdvd, json)
      + Direct text input is also possible
      + Automatically detects the language of uploaded files
  * TTS tab

      + Edge-TTS and F5-TTS are supported.
      + Edge-TTS supports over 100 languages and more than 400
        voices.
      + Pitch, Volume, and Speed can be adjusted.
      + F5-TTS supports Zero-Shot Voice Cloning.
      + You can create podcasts using Celeb Voices.

                            [tts_f5_mul]

  * Live Translation tab

      + Real-time voice recognition & translation support
      + Select audio input source such as Mic, Speaker, etc.
      + Provides the ability to save captured audio, recognized
        subtitles, and translated subtitles
  * Batch tab

      + Batch processing for large amounts of files
      + Subtitles, translation, TTS

 Execution environment

 

  * OS: Windows 10/11 (64bits) * Linux and Mac OS are not supported.
  * GPU: NVIDIA graphics card supporting CUDA 12.1 recommended.
  * VRAM: 4GB or more. 8GB or more recommended.
  * RAM: 4GB or more
  * HDD: At least 20GB of free space during installation
  * Internet connection required (installation and translation work)

 Installation

 

Voice-Pro can be easily installed with one click. Just run 
configure.bat and start.bat

step 1. Package preparation

 

  * Clone or download the latest release (Source code (zip)) from
    GitHub Release

git clone https://github.com/abus-aikorea/voice-pro.git

step 2. Install and run the program

 

 1.  Run configure.bat
      + Install git, ffmpeg and CUDA (if using NVIDIA GPU) on
        Windows.
      + You only need to run it the first time.
      + An internet connection is required, and it may take over an
        hour depending on the system.
      + Never close the Windows-Command window during installation.
 2.  Run start.bat
      + Start Voice-Pro. Web-UI will run automatically.
      + When running for the first time, Voice-Pro is installed
        first.
      + An internet connection is required, and it may take over an
        hour depending on the system.
      + Never close the Windows-Command window during installation.
      + If a problem occurs during installation, delete the
        installer_files folder and run start.bat again.

step 3. Uninstall program

 

  * Run uninstall.bat:
      + Remove the installer_files folder.
      + Remove ffmepg, git and CUDA packages installed on Windows (if
        selected)
  * Voice-Pro has portable installation as standard. To uninstall the
    program, deleting the installation folder is sufficient.

Tips & Tricks

 

If Browser does not run automatically

 

  * Close the Windows-Commnad window and run start.bat again.
  * Run the browser directly and enter the address displayed in the
    Windows-Command window (e.g. http://127.0.0.1:7892) in the
    address bar.

If a CUDA Out-Of-Memory error occurs

 

  * Check the GPU memory status in Windows Task Manager - Performance
    tab.
  * Set the Denoise level to 0 or 1. Denoise level 2 requires at
    least 8GB of GPU memory.
  * Set Compute Type to int type. The float type has better quality,
    but requires more GPU memory.

How to improve the quality of subtitles?

 

  * The quality of subtitles tends to improve with larger Whisper
    models, but this is not necessarily the case. large > medium >
    small > base > tiny
  * Among compute types, float type has good performance. The int
    type is a model that reduces GPU usage and increases speed
    through model quantization. On the other hand, performance
    decreases.
  * If you increase the denoise level, more background sounds will be
    removed, and only the remaining voice will be used for voice
    recognition. It does not always guarantee good results.

 caution

 

Windows Defender may give a warning about untrusted application and
disallow further execution of Voice-Pro. If SmartScreen security
level is set to "Warn", just click "More info" and then click "Run
anyway". If SmartScreen is set to level "Block" there will be no
button to run the installation. In this case, open the properties of
the start.bat file, and check "Unblock", apply the change and run the
start.bat again.

                            [windows_sm]

When Windows Defender mistakenly recognizes a batch file as a Trojan,
this is often called a 'False Positive'. To solve this problem, you
can go through the following steps:

 1. File exception handling: In Windows Defender, you can set certain
    files or processes to skip security scanning. To do this, follow
    the steps below:
      + Click the 'Start' button and go to 'Settings'.
      + Click 'Update & Security'.
      + Select 'Windows Security' and go to 'Virus & threat
        protection'.
      + Click 'Manage Virus & Threat Protection Settings'.
      + Select 'Add exception' in 'Virus & threat protection
        settings'.
      + Select 'File or Folder', find the batch file in question and
        add it as an exception.
 2. Temporarily disable Windows Defender: This may be a temporary
    solution. However, you must be careful when using this method as
    it may expose your computer to other threats.
 3. Report the problem to anti-virus software: If you are sure that
    the file is not a Trojan horse, you can report it to Microsoft as
    a False Positive. Microsoft will review this and take any
    necessary action.

 Contact us

 

  * e-mail: abus.aikorea@gmail.com
  * homepage(Korean): https://abuskorea.imweb.me
  * Amazon(US): https://www.amazon.com/dp/B0DBR69JPL
  * Amazon(Japan): https://www.amazon.co.jp/dp/B0DBVRJ542
  * Amazon(Singapore): https://www.amazon.sg/dp/B0DCGKL8R4
  * Amazon(UAE): https://www.amazon.ae/dp/B0DCGKM7FF
  * neibeo seumateuseutoeo (S/W): https://smartstore.naver.com/abus/
    products/10385660040
  * neibeo seumateuseutoeo (Solution): https://smartstore.naver.com/abus
    /products/10298346364

 YouTube

 

  * Product Information: https://www.youtube.com/watch?v=z8g8LMhoh_o&
    list=PLwx5dnMDVC9Y7dAjm9r26CZUw1uU5VIeq
  * Home Karaoke (Pop): https://www.youtube.com/watch?v=MqQP3ewvJUk&
    list=PLwx5dnMDVC9bVxfGo58U-R-w3fUHqwiD6
  * Home Karaoke (K-Pop): https://www.youtube.com/watch?v=v6qjf_ELsLA
    &list=PLwx5dnMDVC9Z8kB01tQKfzTysaCCxC3C8
  * Home Karaoke (J-Pop): https://www.youtube.com/watch?v=KKLzoWHFAxw
    &list=PLwx5dnMDVC9bd6y3wXs-bOas2cXIi-GAq

 Credits

 

  * Demucs: https://github.com/facebookresearch/demucs
  * yt-dlp: https://github.com/yt-dlp/yt-dlp
  * gradio: https://github.com/gradio-app/gradio
  * edge-TTS: https://github.com/rany2/edge-tts
  * F5-TTS: https://github.com/SWivid/F5-TTS.git
  * openai-whisper: https://github.com/openai/whisper
  * faster-whisper: https://github.com/SYSTRAN/faster-whisper
  * whisper-timestamped: https://github.com/linto-ai/
    whisper-timestamped

(c)[?] Copyright

 

[ABUS-logo] by ABUS

About

Comprehensive Gradio WebUI for audio processing, powered by Whisper
engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features
Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube
downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and
multi-language translation. Perfect for content creators and
developers.

www.youtube.com/watch?v=z8g8LMhoh_o&list=
PLwx5dnMDVC9Y7dAjm9r26CZUw1uU5VIeq

Topics

text-to-speech translator translation podcasts tts speech-synthesis 
subtitles speech-recognition webui speech-to-text transcription 
gradio stt whisper voice-conversion voice-cloning yt-dlp 
faster-whisper

Resources

Readme

License

MIT license
Activity

Stars

1.7k stars

Watchers

14 watching

Forks

125 forks
Report repository

Releases 7

 
v1.6.7 Latest
Nov 24, 2024
+ 6 releases

Sponsor this project

  * buy_me_a_coffee buymeacoffee.com/abus

Packages 0

No packages published

Languages

  * Python 89.8%
  * CSS 5.0%
  * JavaScript 3.3%
  * Batchfile 1.9%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.