[HN Gopher] Coqui.ai TTS: A Deep Learning Toolkit for Text-to-Sp...
       ___________________________________________________________________
        
       Coqui.ai TTS: A Deep Learning Toolkit for Text-to-Speech
        
       Author : stefankuehnel
       Score  : 111 points
       Date   : 2024-06-11 16:25 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | spacemanspiff01 wrote:
       | I believe the company behind this shit down at the end of 2023
        
         | giancarlostoro wrote:
         | One of my favorite typos. ;) Also coqui is a frog in Puerto
         | Rico (that wound up in Hawaii, sneaking into someone's luggage
         | or something to that effect), when you hear them at night, what
         | you are hearing is their mating call if I remember correctly.
        
       | Jayakumark wrote:
       | Its good except for license.
        
         | sa-code wrote:
         | Is the license still relevant if the company has shut down?
        
           | cal85 wrote:
           | Yes
        
             | marcooliv wrote:
             | how?
        
               | dlx wrote:
               | The license forbids commercial use unless you buy a
               | license. The problem is, no one seems to be selling one
               | ;)
        
       | nishithfolly wrote:
       | This was a great team. Sad to see they had to shut down.
        
       | modeless wrote:
       | XTTSv2 is only slightly behind StyleTTS 2 near the top of the TTS
       | Arena leaderboard, though they are both far behind Eleven Labs:
       | https://huggingface.co/spaces/TTS-AGI/TTS-Arena
       | 
       | Personally I prefer StyleTTS 2, and it has a better license. But
       | XTTSv2 has a streaming mode with pretty low latency which is
       | nice. I did run into hallucination issues though. It will
       | hallucinate nonsense words or insert extra syllables in words,
       | pretty frequently.
       | 
       | As others mentioned they shut down so there won't be any updates
       | to XTTS.
        
         | eginhard wrote:
         | They just shared the paper for XTTS, which got accepted to
         | Interspeech and might be the reason for this being posted now:
         | https://arxiv.org/abs/2406.04904
        
         | jsemrau wrote:
         | Interesting. I got quite good results for my longform substack
         | by combining xTTS2 with Nvidia's Nemo.
        
         | WhitneyLand wrote:
         | Anyone have a sense for how these compare to OpenAI's TTS?
        
       | vessenes wrote:
       | NB: Coqui is no longer actively maintained. I'm not sure what the
       | team is up to now. The open market is definitely in need of an
       | upgraded TTS offering; eleven labs is far ahead at the moment.
        
         | eginhard wrote:
         | We do maintain a fork, mostly with bug fixes for now:
         | https://github.com/idiap/coqui-ai-TTS PRs welcome :)
        
           | dlx wrote:
           | Any progress on the license situation? I'd love to work more
           | on it, but worried about it being a bit of a dead end due to
           | uncertainty about the future of the license and not being
           | able to use it in any commercial projects.
        
         | personjerry wrote:
         | Not surprising. When I was researching options for a client I
         | tried a few companies including ElevenLabs and Play.ht, each
         | seemed happy to talk to us... except Coqui. I think I went as
         | far as reporting bugs to them, just to have them aggressively
         | ignore me. I guess they're more of a research team than a
         | business?
        
       | ritonlajoie wrote:
       | Are there any project which would make TTS with my own voice with
       | some training on my voice ?
        
         | mttpgn wrote:
         | Yes, elevenlabs can.
        
         | eginhard wrote:
         | Yes, you can train/fine-tune models on your own voice with
         | Coqui
        
         | willwade wrote:
         | Elevenlabs, coqui, piper, Microsoft, Google, Apple. Seriously.
         | They all can these days. Don't forget acapela or nuance.
        
       | Kerbonut wrote:
       | I really like Parler TTS on the TTS Arena.
        
       | phyce wrote:
       | Coqui is great, but another fantastic tool for TTS I recommend
       | checking out is Piper. The voice quality is great, it's extremely
       | lightweight, and it's fast enough to generate TTS in realtime
       | https://github.com/rhasspy/piper
        
       ___________________________________________________________________
       (page generated 2024-06-11 23:00 UTC)