https://github.com/mezbaul-h/june Skip to content Navigation Menu Toggle navigation Sign in * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + GitHub Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} mezbaul-h / june Public * Notifications You must be signed in to change notification settings * Fork 7 * Star 301 * Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit License MIT license 301 stars 7 forks Branches Tags Activity Star Notifications You must be signed in to change notification settings * Code * Issues 3 * Pull requests 0 * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Security * Insights mezbaul-h/june This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. master BranchesTags Go to file Code Folders and files Name Name Last commit Last commit message date Latest commit History 45 Commits june_va june_va scripts scripts .editorconfig .editorconfig .env.sample .env.sample .gitattributes .gitattributes .gitignore .gitignore .pylintrc .pylintrc LICENSE LICENSE Makefile Makefile PROMPTS PROMPTS README.md README.md demo.gif demo.gif pyproject.toml pyproject.toml requirements-dev.txt requirements-dev.txt requirements.txt requirements.txt View all files Repository files navigation * README * MIT license june-va Local Voice Chatbot: Ollama + HF Transformers + Coqui TTS Toolkit OVERVIEW june-va is a local voice chatbot that combines the power of Ollama (for language model capabilities), Hugging Face Transformers (for speech recognition), and the Coqui TTS Toolkit (for text-to-speech synthesis). It provides a flexible, privacy-focused solution for voice-assisted interactions on your local machine, ensuring that no data is sent to external servers. demo-text-only-interaction Interaction Modes * Text Input/Output: Provide text inputs to the assistant and receive text responses. * Voice Input/Text Output: Use your microphone to give voice inputs, and receive text responses from the assistant. * Text Input/Audio Output: Provide text inputs and receive both text and synthesised audio responses from the assistant. * Voice Input/Audio Output (Default): Use your microphone for voice inputs, and receive responses in both text and synthesised audio form. INSTALLATION Pre-requisites * Ollama * Python 3.10+ (with pip) You will also need the following native package installed on your machine: apt install portaudio19-dev # requirement for PyAudio From Source To install directly from source: git clone https://github.com/mezbaul-h/june.git cd june pip install . USAGE Pull the language model (default is llama3:8b-instruct-q4_0) with Ollama first, if you haven't already: ollama pull llama3:8b-instruct-q4_0 Next, run the program (with default configuration): june-va This will use llama3:8b-instruct-q4_0 for LLM capabilities, openai/ whisper-small.en for speech recognition, and tts_models/en/ljspeech/ glow-tts for audio synthesis. You can also customize behaviour of the program with a json configuration file: june-va --config path/to/config.json [?][?] The configuration file is optional. To learn more about the structure of the config file, see the Configuration section. [?][?] Regarding Voice Input After seeing the Listening for sound... message, you can speak directly into the microphone. Unlike typical voice assistants, there's no wake command required. Simply start speaking, and the tool will automatically detect and process your voice input. Once you finish speaking, maintain silence for 3 seconds to allow the assistant to process your voice input. Voice Conversion Many of the models (e.g., tts_models/multilingual/multi-dataset/ xtts_v2) supported by Coqui's TTS Toolkit support voice cloning. You can use your own speaker profile with a small audio clip (approximately 1 minute for most models). Once you have the clip, you can instruct the assistant to use it with a custom configuration like the following: { "tts": { "model": "tts_models/multilingual/multi-dataset/xtts_v2", "generation_args": { "language": "en", "speaker_wav": "/path/to/your/target/voice.wav" } } } CONFIGURATION The application can be customised using a configuration file. The config file must be a JSON file. The default configuration is as follows: { "llm": { "disable_chat_history": false, "model": "llama3:8b-instruct-q4_0" }, "stt": { "device": "torch device identifier (`cuda` if available; otherwise `cpu`", "generation_args": { "batch_size": 8 }, "model": "openai/whisper-small.en" }, "tts": { "device": "torch device identifier (`cuda` if available; otherwise `cpu`", "model": "tts_models/en/ljspeech/glow-tts" } } When you use a configuration file, it overrides the default configuration but does not overwrite it. So you can partially modify the configuration if you desire. For instance, if you do not wish to use speech recognition and only want to provide prompts through text, you can disable that by using a config file with the following configuration: { "stt": null } Similarly, you can disable the audio synthesiser, or both, to only use the virtual assistant in text mode. If you only want to modify the device on which you want to load a particular type of model, without changing the other default attributes of the model, you could use: { "tts": { "device": "cpu" } } Configuration Attributes llm - Language Model Configuration * llm.device: Torch device identifier (e.g., cpu, cuda, mps) on which the pipeline will be allocated. * llm.disable_chat_history: Boolean indicating whether to disable or enable chat history. Enabling chat history will make interactions more dynamic, as the model will have access to previous contexts, but it will consume more processing power. Disabling it will result in less interactive conversations but will use fewer processing resources. * llm.model: Name of the text-generation model tag on Ollama. Ensure this is a valid model tag that exists on your machine. * llm.system_prompt: Give a system prompt to the model. If the underlying model does not support a system prompt, an error will be raised. stt - Speech-to-Text Model Configuration * tts.device: Torch device identifier (e.g., cpu, cuda, mps) on which the pipeline will be allocated. * stt.generation_args: Object containing generation arguments accepted by Hugging Face's speech recognition pipeline. * stt.model: Name of the speech recognition model on Hugging Face. Ensure this is a valid model ID that exists on Hugging Face. tts - Text-to-Speech Model Configuration * tts.device: Torch device identifier (e.g., cpu, cuda, mps) on which the pipeline will be allocated. * tts.generation_args: Object containing generation arguments accepted by Coqui's TTS API. * tts.model: Name of the text-to-speech model supported by the Coqui's TTS Toolkit. Ensure this is a valid model ID. About Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit Topics python text-to-speech ai chatbot speech-recognition chatbots cli-app speech-to-text command-line-tool whisper assistant-chat-bots huggingface large-language-models llm coqui-tts Resources Readme License MIT license Activity Stars 301 stars Watchers 4 watching Forks 7 forks Report repository Contributors 2 * * Languages * Python 97.4% * Shell 1.9% * Makefile 0.7% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.