[HN Gopher] TTS: Text-to-Speech for All
___________________________________________________________________
TTS: Text-to-Speech for All
Author : doener
Score : 173 points
Date : 2021-04-13 11:49 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| mileycyrusXOXO wrote:
| The example is very impressive! Sounds very natural.
| banana_giraffe wrote:
| It's a cool demo. Though, to my ears it's still a bit far from my
| dream of having something cheap or free I can feed more niche
| books I like and use to create an audiobook version of them.
|
| https://vocaroo.com/1oOjiLNCagur
| Causality1 wrote:
| That's still my personal TTS dream as well. Google's Read Aloud
| voice blows everything else out of the water but I've found by
| experimentation that it will only read the first three and a
| half hours of web page text.
| [deleted]
| Abishek_Muthian wrote:
| TTS tech is so accessible that product developers should consider
| integrating it within their product for the sake of visually
| impaired and not leave the product to the mercy of operating
| system's accessibility features.
|
| I feel the biggest causality of online advertisement is
| accessibility, Those with eyes (eye-sight) are more valuable to
| the mega corps than those without and so Internet is full of rich
| graphics; making the lives of those without proper vision
| miserable.
| suyash wrote:
| Always be 100% compatible with OS provided accessibility
| guidelines before adding additional support. Most disabled
| users are familiar with native accessibility tools and that
| should come first.
| mnemotronic wrote:
| The acronym would be more fun if the product was Text-InTo-
| Speech. Yea.... -1
| ancarda wrote:
| Will there be a wide choice of accents? The link in the README
| <https://erogol.github.io/ddc-samples/> seemed to only have a
| single voice
| erogol wrote:
| Yep. The aim is to solve TTS for all languages one at a time.
|
| You can check out the released models page for the other models
| and languages.
|
| https://github.com/coqui-ai/TTS/releases
| ftyers wrote:
| When are you going to do Chuvash ? ;)
| sandreas wrote:
| Maybe interesting:
|
| https://colab.research.google.com/drive/1SPl226SwzrfMZltrVag...
|
| https://github.com/keithito/tacotron
|
| https://www.youtube.com/watch?v=ijhZR43TOwc
|
| https://heartbeat.fritz.ai/a-2019-guide-to-speech-synthesis-...
| gxqoz wrote:
| Hrmm the link to an example from Pocket leads me to hope that
| these are coming to that app. The current TTS for listening to
| saved articles is decent but certainly not state of the art.
| Raed667 wrote:
| Is there a way to get this working in Firefox?
| Isn0gud wrote:
| It seems like this is another dead mozilla project now, given
| that the people who worked on this started a new project:
| https://github.com/coqui-ai
| echelon wrote:
| I work on TTS (created https://vo.codes) and my impression of
| the Mozilla project was that it was incredibly understaffed.
| Unrealistically so to ever lead to any kind of product or
| platform.
|
| Maybe this new organization can accomplish the goal of easy and
| open trainable TTS. I'd really like to see it.
| kdavis wrote:
| You can see some Coqui[0] TTS examples here[1].
|
| [0] https://coqui.ai/
|
| [1] https://erogol.github.io/ddc-samples/
| erogol wrote:
| Check out Coqui TTS where we continue the work.
|
| https://github.com/coqui-ai/TTS
|
| Mozilla TTS is not maintained anymore (at least ATM).
|
| Disclaimer: I've created both of the projects.
| adkadskhj wrote:
| In an example[1], it sounds decent but i noticed a fuzzy white
| noise whenever the voice is talking. Is this the algorithm, or
| compression? If it's the algorithm, why?
|
| [1]: https://soundcloud.com/user-565970875/pocket-article-
| wavernn...
| throwawaysea wrote:
| I actually don't hear the fuzzy white noise, but maybe it's
| because of my tinnitus. Is it during a certain part of the
| recording? To my ears this sounds surprisingly high fidelity
| and natural sounding.
| adkadskhj wrote:
| It's only during when the .. "person" talks. Which makes it
| quite noticeable to me because it starts and stops. It is
| rather faint, so i might not even notice it if it was
| consistent.
| erogol wrote:
| It mainly reflects the quality of the trained dataset, the
| earlier stages of the project and some experiments.
|
| I suggest you the check the latest uploads on soundcloud.
| xcodevn wrote:
| This is a well known problem. The noise is due to mu-law
| compression. The 16 bit audio samples are compressed to 8, 9,
| or 10 bits before feeding to the neutral net. The reason is
| because predicting a categorical distribution of 2^16 values
| requires too many parameters. The noise was also in samples
| from the famous Wavenet from Deepmind (they used 8 bit mu law).
|
| There are two ways to avoid this: 1. predict 8 high (coarse)
| bits, 8 low (fine) bits separately as in the original waveRNN
| paper. 2. use a mixture of logistic distributions as the
| predictive output as in the recent Lyra vocoder from Google.
| Tade0 wrote:
| How does the number of parameters scale with resolution?
|
| Specifically, how much slower this would be if the audio was,
| say, 10 bits?
|
| I recall a lab exercise in college where we were supposed to
| increase the resolution of a quantizer until we reached a
| decent tone and 10 bits were the point at which we reached
| satisfying quality.
| xcodevn wrote:
| It is a single matrix multiplication to predict
| probabilities of all possible outputs. For example, with a
| hidden state of 1024 dimensions, and 8 bits output, it is
| 1024x256 parameters. 10 bits will need 1024x1024 params.
| eddyg wrote:
| I hear it as well, even when using the speaker on my phone and
| not headphones (where it seems like it would be even more
| noticeable).
| marcodiego wrote:
| FLOSS TTS and STT is badly needed right now. Being able to use
| voice recognition and speech synthesis should not be restricted
| to a small oligoply.
| synesthesiam wrote:
| Shameless plug for Rhasspy:
| https://rhasspy.readthedocs.io/en/latest/
| monkeydust wrote:
| One of my products involves providing a lot of dense data to
| traders overlayed with performance measures based on proprietary
| models.
|
| We are working on automatically extracting some insights for the
| user and using NLP to present them like news articles.
|
| It wouldn't take a huge lift from that to use TTS to provide
| another way for user to digest the data.
|
| Would make for a cool demo but wonder how sticky it would be.
| cromwellian wrote:
| NVidia pimped this at GTC21 as "state of the art TTS" which is
| why I think it's getting renewed attention, , but to my ears, it
| doesn't sound anywhere near WaveNet (Google), Siri, or Alexa.
| swiley wrote:
| I'm personally very suspicious of any software coming from
| NVidia at this point.
| [deleted]
| uniqueid wrote:
| What's the plan with this? Is it to incorporate it into Firefox
| to improve its Web Speech API implementation?
| hjek wrote:
| I hope so. The examples sounds so much better than Espeak.
|
| Edit: Oh, I see this project _uses_ Espeak. Interesting.
| [deleted]
| synesthesiam wrote:
| Larynx TTS has a similar goal: https://rhasspy.github.io/larynx/
|
| It was originally based on Mozilla TTS, but I've since moved to
| exporting models to Onnx for speed.
___________________________________________________________________
(page generated 2021-04-14 23:00 UTC)