https://github.com/lifeiteng/OmniSenseVoice Skip to content Navigation Menu Toggle navigation Sign in * Product + GitHub Copilot Write better code with AI + Security Find and fix vulnerabilities + Actions Automate any workflow + Codespaces Instant dev environments + Issues Plan and track work + Code Review Manage code changes + Discussions Collaborate outside of code + Code Search Find more, search less Explore + All features + Documentation + GitHub Skills + Blog * Solutions By size + Enterprise + Teams + Startups By industry + Healthcare + Financial services + Manufacturing By use case + CI/CD & Automation + DevOps + DevSecOps * Resources Topics + AI + DevOps + Security + Software Development + View all Explore + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up Reseting focus You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} lifeiteng / OmniSenseVoice Public * Notifications You must be signed in to change notification settings * Fork 10 * Star 485 Omni SenseVoice: High-Speed Speech Recognition with words timestamps [?] 485 stars 10 forks Branches Tags Activity Star Notifications You must be signed in to change notification settings * Code * Issues 6 * Pull requests 0 * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Security * Insights lifeiteng/OmniSenseVoice This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main BranchesTags Go to file Code Folders and files Last commit Last Name Name message commit date Latest commit History 21 Commits src/omnisense src/omnisense tests tests .flake8 .flake8 .gitignore .gitignore .pre-commit-config.yaml .pre-commit-config.yaml README.md README.md setup.py setup.py View all files Repository files navigation * README Omni SenseVoice The Ultimate Speech Recognition Solution Built on SenseVoice, Omni SenseVoice is optimized for lightning-fast inference and precise timestamps--giving you a smarter, faster way to handle audio transcription! Install pip install . Usage omnisense transcribe [OPTIONS] AUDIO_PATH Key Options: * --language: Automatically detect the language or specify (auto, zh, en, yue, ja, ko). * --textnorm: Choose whether to apply inverse text normalization (withitn for inverse normalized or woitn for raw). * --device-id: Run on a specific GPU (default: -1 for CPU). * --quantize: Use a quantized model for faster processing. * --help: Display detailed help information. Benchmark omnisense benchmark -s -d --num-workers 2 --device-id 0 --batch-size 10 --textnorm woitn --language en benchmark/data/manifests/libritts/ libritts_cuts_dev-clean.jsonl Optimize GPU WER [?] RTF [?] Speed Up baseline(onnx) NVIDIA L4 GPU 4.47% 0.1200 1x torch NVIDIA L4 GPU 5.02% 0.0022 50x * With Omni SenseVoice, experience up to 50x faster processing without sacrificing accuracy. # LibriTTS DIR=benchmark/data lhotse download libritts -p dev-clean benchmark/dataLibriTTS lhotse prepare libritts -p dev-clean benchmark/data/LibriTTS/LibriTTS benchmark/data/manifests/libritts lhotse cut simple --force-eager -r benchmark/data/manifests/libritts/libritts_recordings_dev-clean.jsonl.gz \ -s benchmark/data/manifests/libritts/libritts_supervisions_dev-clean.jsonl.gz \ benchmark/data/manifests/libritts/libritts_cuts_dev-clean.jsonl omnisense benchmark -s -d --num-workers 2 --device-id 0 --batch-size 10 - -textnorm woitn --language en benchmark/data/manifests/libritts/libritts_cuts_dev-clean.jsonl omnisense benchmark -s --num-workers 4 --device-id 0 --batch-size 16 --textnorm woitn --language en benchmark/data/manifests/libritts/libritts_cuts_dev-clean.jsonl Contributing Step 1: Code Formatting Set up pre-commit hooks: pip install pre-commit==3.6.0 pre-commit install Step 2: Pull Request Submit your awesome improvements through a PR. About Omni SenseVoice: High-Speed Speech Recognition with words timestamps [?] Resources Readme Activity Stars 485 stars Watchers 8 watching Forks 10 forks Report repository Releases No releases published Packages 0 No packages published Languages * Python 100.0% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.