Posts by joemo@mastodon.social
(DIR) Post #ARTKrk7OAsIRvelQSe by joemo@mastodon.social
2023-01-09T16:29:33Z
0 likes, 0 repeats
@hn50 open source tortoise-TTS has been able to do this for 6+ months now (maybe MSFT just forked it?), also a theoretical copy of DALL-E. The issue is not so much accuracy as how compute intensive (GPU intensive, really) it is to do the sort of careful mimicking, and with good prosody. Tortoise is ~5 seconds of a $1200 GPU to do one second of spoken text. https://github.com/neonbjb/tortoise-tts