https://github.com/abus-aikorea/voice-pro Skip to content Navigation Menu Toggle navigation Sign in * Product + GitHub Copilot Write better code with AI + Security Find and fix vulnerabilities + Actions Automate any workflow + Codespaces Instant dev environments + Issues Plan and track work + Code Review Manage code changes + Discussions Collaborate outside of code + Code Search Find more, search less Explore + All features + Documentation + GitHub Skills + Blog * Solutions By company size + Enterprises + Small and medium teams + Startups By use case + DevSecOps + DevOps + CI/CD + View all use cases By industry + Healthcare + Financial services + Manufacturing + Government + View all industries View all solutions * Resources Topics + AI + DevOps + Security + Software Development + View all Explore + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up Reseting focus You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} abus-aikorea / voice-pro Public * * Notifications You must be signed in to change notification settings * Fork 125 * Star 1.7k Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and multi-language translation. Perfect for content creators and developers. www.youtube.com/watch?v=z8g8lmhoh_o&list= plwx5dnmdvc9y7dajm9r26czuw1uu5vieq License MIT license 1.7k stars 125 forks Branches Tags Activity Star Notifications You must be signed in to change notification settings * Code * Issues 6 * Pull requests 0 * Discussions * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Discussions * Actions * Projects * Security * Insights abus-aikorea/voice-pro main BranchesTags [ ] Go to file Code Folders and files Last Last Name Name commit commit message date Latest commit History 61 Commits .github .github app app docs docs model model rvc rvc src src .gitattributes .gitattributes .gitignore .gitignore LICENSE LICENSE README.md README.md configure.bat configure.bat one_click.cp310-win_amd64.pyd one_click.cp310-win_amd64.pyd requirements-voice-cpu.txt requirements-voice-cpu.txt requirements-voice-gpu.txt requirements-voice-gpu.txt start-abus.py start-abus.py start-voice.py start-voice.py start.bat start.bat uninstall.bat uninstall.bat View all files Repository files navigation * README * MIT license Voice-Pro: The best gradio web-ui for transcription, translation and text-to-speech hangugeo [?] English [?] Zhong Wen Jian Ti [?] Zhong Wen Fan Ti [?] Ri Ben Yu GitHub License GitHub Release Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode. * YouTube Downloader: You can download YouTube videos and extract the audio (mp3, wav, flac). * Vocal Remover: Use MDX-Net supported in UVR5 and the Demucs engine developed by Meta for voice separation. * STT: Supports speech-to-text conversion with Whisper, Faster-Whisper, and whisper-timestamped. * Translator: Google Translator. Short text translation, subtitle file translation. * TTS: Text to Speech. Edge-TTS. E2 and F5-TTS that support zero-shot voice cloning. * We provide Celeb voices for free. Try creating your own podcast. You can check it in the F5-TTS tab. Run screen * TTS tab : Podcast Production using F5-TTS f5-tts-demo-elon-zuckerberg-1115-3.mp4 * Studio tab : Transcription, Translation & Text-to-Speech voice-pro-demo-v1.6.7-1080p.mp4 Key Features * Studio tab + Provides integrated environment for YouTube downloader, noise removal, subtitles, translation, and TTS + All video/audio formats supported by ffmpeg can be used + Selectable output audio format (wav, flac, mp3) + Speech recognition and subtitle creation for 100 languages + Select subtitle creation options suitable for PC performance (Whisper Model & Compute Type) + Translation into over 100 languages and voice generation through TTS + The BGM and sound effects from the original video are maintained in the multilingual video. + Supports TTS voice speed, volume, and pitch adjustment [main_page] * Whisper Caption tab + A tab dedicated to creating subtitles. Supports over 90 languages + Display subtitles created with the video + World-Level Highlight function provided + Denoise function provided (1-Demucs, 2-MDXNet) * Translate tab + Dedicated tab for translation. Supports over 100 languages + Supports subtitle files (ass, ssa, srt, mpl2, tmp, vtt, microdvd, json) + Direct text input is also possible + Automatically detects the language of uploaded files * TTS tab + Edge-TTS and F5-TTS are supported. + Edge-TTS supports over 100 languages and more than 400 voices. + Pitch, Volume, and Speed can be adjusted. + F5-TTS supports Zero-Shot Voice Cloning. + You can create podcasts using Celeb Voices. [tts_f5_mul] * Live Translation tab + Real-time voice recognition & translation support + Select audio input source such as Mic, Speaker, etc. + Provides the ability to save captured audio, recognized subtitles, and translated subtitles * Batch tab + Batch processing for large amounts of files + Subtitles, translation, TTS Execution environment * OS: Windows 10/11 (64bits) * Linux and Mac OS are not supported. * GPU: NVIDIA graphics card supporting CUDA 12.1 recommended. * VRAM: 4GB or more. 8GB or more recommended. * RAM: 4GB or more * HDD: At least 20GB of free space during installation * Internet connection required (installation and translation work) Installation Voice-Pro can be easily installed with one click. Just run configure.bat and start.bat step 1. Package preparation * Clone or download the latest release (Source code (zip)) from GitHub Release git clone https://github.com/abus-aikorea/voice-pro.git step 2. Install and run the program 1. Run configure.bat + Install git, ffmpeg and CUDA (if using NVIDIA GPU) on Windows. + You only need to run it the first time. + An internet connection is required, and it may take over an hour depending on the system. + Never close the Windows-Command window during installation. 2. Run start.bat + Start Voice-Pro. Web-UI will run automatically. + When running for the first time, Voice-Pro is installed first. + An internet connection is required, and it may take over an hour depending on the system. + Never close the Windows-Command window during installation. + If a problem occurs during installation, delete the installer_files folder and run start.bat again. step 3. Uninstall program * Run uninstall.bat: + Remove the installer_files folder. + Remove ffmepg, git and CUDA packages installed on Windows (if selected) * Voice-Pro has portable installation as standard. To uninstall the program, deleting the installation folder is sufficient. Tips & Tricks If Browser does not run automatically * Close the Windows-Commnad window and run start.bat again. * Run the browser directly and enter the address displayed in the Windows-Command window (e.g. http://127.0.0.1:7892) in the address bar. If a CUDA Out-Of-Memory error occurs * Check the GPU memory status in Windows Task Manager - Performance tab. * Set the Denoise level to 0 or 1. Denoise level 2 requires at least 8GB of GPU memory. * Set Compute Type to int type. The float type has better quality, but requires more GPU memory. How to improve the quality of subtitles? * The quality of subtitles tends to improve with larger Whisper models, but this is not necessarily the case. large > medium > small > base > tiny * Among compute types, float type has good performance. The int type is a model that reduces GPU usage and increases speed through model quantization. On the other hand, performance decreases. * If you increase the denoise level, more background sounds will be removed, and only the remaining voice will be used for voice recognition. It does not always guarantee good results. caution Windows Defender may give a warning about untrusted application and disallow further execution of Voice-Pro. If SmartScreen security level is set to "Warn", just click "More info" and then click "Run anyway". If SmartScreen is set to level "Block" there will be no button to run the installation. In this case, open the properties of the start.bat file, and check "Unblock", apply the change and run the start.bat again. [windows_sm] When Windows Defender mistakenly recognizes a batch file as a Trojan, this is often called a 'False Positive'. To solve this problem, you can go through the following steps: 1. File exception handling: In Windows Defender, you can set certain files or processes to skip security scanning. To do this, follow the steps below: + Click the 'Start' button and go to 'Settings'. + Click 'Update & Security'. + Select 'Windows Security' and go to 'Virus & threat protection'. + Click 'Manage Virus & Threat Protection Settings'. + Select 'Add exception' in 'Virus & threat protection settings'. + Select 'File or Folder', find the batch file in question and add it as an exception. 2. Temporarily disable Windows Defender: This may be a temporary solution. However, you must be careful when using this method as it may expose your computer to other threats. 3. Report the problem to anti-virus software: If you are sure that the file is not a Trojan horse, you can report it to Microsoft as a False Positive. Microsoft will review this and take any necessary action. Contact us * e-mail: abus.aikorea@gmail.com * homepage(Korean): https://abuskorea.imweb.me * Amazon(US): https://www.amazon.com/dp/B0DBR69JPL * Amazon(Japan): https://www.amazon.co.jp/dp/B0DBVRJ542 * Amazon(Singapore): https://www.amazon.sg/dp/B0DCGKL8R4 * Amazon(UAE): https://www.amazon.ae/dp/B0DCGKM7FF * neibeo seumateuseutoeo (S/W): https://smartstore.naver.com/abus/ products/10385660040 * neibeo seumateuseutoeo (Solution): https://smartstore.naver.com/abus /products/10298346364 YouTube * Product Information: https://www.youtube.com/watch?v=z8g8LMhoh_o& list=PLwx5dnMDVC9Y7dAjm9r26CZUw1uU5VIeq * Home Karaoke (Pop): https://www.youtube.com/watch?v=MqQP3ewvJUk& list=PLwx5dnMDVC9bVxfGo58U-R-w3fUHqwiD6 * Home Karaoke (K-Pop): https://www.youtube.com/watch?v=v6qjf_ELsLA &list=PLwx5dnMDVC9Z8kB01tQKfzTysaCCxC3C8 * Home Karaoke (J-Pop): https://www.youtube.com/watch?v=KKLzoWHFAxw &list=PLwx5dnMDVC9bd6y3wXs-bOas2cXIi-GAq Credits * Demucs: https://github.com/facebookresearch/demucs * yt-dlp: https://github.com/yt-dlp/yt-dlp * gradio: https://github.com/gradio-app/gradio * edge-TTS: https://github.com/rany2/edge-tts * F5-TTS: https://github.com/SWivid/F5-TTS.git * openai-whisper: https://github.com/openai/whisper * faster-whisper: https://github.com/SYSTRAN/faster-whisper * whisper-timestamped: https://github.com/linto-ai/ whisper-timestamped (c)[?] Copyright [ABUS-logo] by ABUS About Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and multi-language translation. Perfect for content creators and developers. www.youtube.com/watch?v=z8g8LMhoh_o&list= PLwx5dnMDVC9Y7dAjm9r26CZUw1uU5VIeq Topics text-to-speech translator translation podcasts tts speech-synthesis subtitles speech-recognition webui speech-to-text transcription gradio stt whisper voice-conversion voice-cloning yt-dlp faster-whisper Resources Readme License MIT license Activity Stars 1.7k stars Watchers 14 watching Forks 125 forks Report repository Releases 7 v1.6.7 Latest Nov 24, 2024 + 6 releases Sponsor this project * buy_me_a_coffee buymeacoffee.com/abus Packages 0 No packages published Languages * Python 89.8% * CSS 5.0% * JavaScript 3.3% * Batchfile 1.9% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.