[HN Gopher] Coqui.ai TTS: A Deep Learning Toolkit for Text-to-Sp...
___________________________________________________________________
Coqui.ai TTS: A Deep Learning Toolkit for Text-to-Speech
Author : stefankuehnel
Score : 111 points
Date : 2024-06-11 16:25 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| spacemanspiff01 wrote:
| I believe the company behind this shit down at the end of 2023
| giancarlostoro wrote:
| One of my favorite typos. ;) Also coqui is a frog in Puerto
| Rico (that wound up in Hawaii, sneaking into someone's luggage
| or something to that effect), when you hear them at night, what
| you are hearing is their mating call if I remember correctly.
| Jayakumark wrote:
| Its good except for license.
| sa-code wrote:
| Is the license still relevant if the company has shut down?
| cal85 wrote:
| Yes
| marcooliv wrote:
| how?
| dlx wrote:
| The license forbids commercial use unless you buy a
| license. The problem is, no one seems to be selling one
| ;)
| nishithfolly wrote:
| This was a great team. Sad to see they had to shut down.
| modeless wrote:
| XTTSv2 is only slightly behind StyleTTS 2 near the top of the TTS
| Arena leaderboard, though they are both far behind Eleven Labs:
| https://huggingface.co/spaces/TTS-AGI/TTS-Arena
|
| Personally I prefer StyleTTS 2, and it has a better license. But
| XTTSv2 has a streaming mode with pretty low latency which is
| nice. I did run into hallucination issues though. It will
| hallucinate nonsense words or insert extra syllables in words,
| pretty frequently.
|
| As others mentioned they shut down so there won't be any updates
| to XTTS.
| eginhard wrote:
| They just shared the paper for XTTS, which got accepted to
| Interspeech and might be the reason for this being posted now:
| https://arxiv.org/abs/2406.04904
| jsemrau wrote:
| Interesting. I got quite good results for my longform substack
| by combining xTTS2 with Nvidia's Nemo.
| WhitneyLand wrote:
| Anyone have a sense for how these compare to OpenAI's TTS?
| vessenes wrote:
| NB: Coqui is no longer actively maintained. I'm not sure what the
| team is up to now. The open market is definitely in need of an
| upgraded TTS offering; eleven labs is far ahead at the moment.
| eginhard wrote:
| We do maintain a fork, mostly with bug fixes for now:
| https://github.com/idiap/coqui-ai-TTS PRs welcome :)
| dlx wrote:
| Any progress on the license situation? I'd love to work more
| on it, but worried about it being a bit of a dead end due to
| uncertainty about the future of the license and not being
| able to use it in any commercial projects.
| personjerry wrote:
| Not surprising. When I was researching options for a client I
| tried a few companies including ElevenLabs and Play.ht, each
| seemed happy to talk to us... except Coqui. I think I went as
| far as reporting bugs to them, just to have them aggressively
| ignore me. I guess they're more of a research team than a
| business?
| ritonlajoie wrote:
| Are there any project which would make TTS with my own voice with
| some training on my voice ?
| mttpgn wrote:
| Yes, elevenlabs can.
| eginhard wrote:
| Yes, you can train/fine-tune models on your own voice with
| Coqui
| willwade wrote:
| Elevenlabs, coqui, piper, Microsoft, Google, Apple. Seriously.
| They all can these days. Don't forget acapela or nuance.
| Kerbonut wrote:
| I really like Parler TTS on the TTS Arena.
| phyce wrote:
| Coqui is great, but another fantastic tool for TTS I recommend
| checking out is Piper. The voice quality is great, it's extremely
| lightweight, and it's fast enough to generate TTS in realtime
| https://github.com/rhasspy/piper
___________________________________________________________________
(page generated 2024-06-11 23:00 UTC)