[HN Gopher] EzAudio: Enhancing Text-to-Audio Generation with Eff...
___________________________________________________________________
EzAudio: Enhancing Text-to-Audio Generation with Efficient
Diffusion Transformer
Author : blacktechnology
Score : 81 points
Date : 2024-09-24 03:16 UTC (19 hours ago)
(HTM) web link (haidog-yaqub.github.io)
(TXT) w3m dump (haidog-yaqub.github.io)
| maxglute wrote:
| >A man talking as water splashes and gurgles and a motor engine
| hums in the background.
|
| This the first time I heard AI Simlish. I wonder what the
| training data was. Seems like work is done by John Hopkins and
| Tencent, but the fake AI language sounds... Indic? Are there
| other examples of AI generating speech in... hallucinated
| languages?
| alex_duf wrote:
| Simlish is the first thing that cam to my mind too.
| ben_w wrote:
| > Are there other examples of AI generating speech in...
| hallucinated languages?
|
| Sure:
| https://suno.com/song/0c05e4bd-5879-4e1d-9bdd-555d76569501
|
| No chance that it's getting ancient Summerian correct.
| Y_Y wrote:
| Of course, you can't just mathematically derive a language
| that isn't in the training set.
|
| Except Sanskrit, naturally.
| mhuffman wrote:
| If you go to their demo[0] and type in a prompt to ask it to
| say something (eg. a person says "hello") it seems to
| hallucinate a response in a made up language ... maybe, I don't
| speak every language.
|
| [0]https://huggingface.co/spaces/OpenSound/EzAudio
| tigermafia wrote:
| Elevenlabs started rolling out a generator for very basic sound
| effects. Using it made me wonder what the application for things
| like this would be. If it was realtime it could be used for games
| but then there is the lack of predictable quality control.
|
| For (cinematic) sounddesign the quality is not nearly good enough
| yet. For simple home-style videos dozens of (more fun) options
| exist - foley, free sound libraries, freesound.org, going out
| with a phone and record stuff.
| earthnail wrote:
| Same as image generation. When it gets to a certain quality
| level, it's much faster to describe what you want than to
| search for it.
| mhuffman wrote:
| >Using it made me wonder what the application for things like
| this would be.
|
| Almost certainly in video shorts or high volume video content,
| I would think.
| doctorpangloss wrote:
| The quality of the audio is giving me these vibes:
|
| https://www.youtube.com/watch?v=ngZ0K3lWKRc
|
| Hayao Miyazaki, 7 years ago, on AI generated motion capture.
| CamperBob2 wrote:
| (Shrug) Art, like science, advances one funeral at a time.
| doctorpangloss wrote:
| I don't know. "Hayao Miyazaki bad" is a loser idea. It is
| insane to me that this board will waste millions of
| characters litigating the opensourceyness of licenses, but
| when it comes to using their own ears and making a gut
| opinion millions are capable of doing every day: no.
|
| For music specifically, until someone invents a viable
| alternative to Spotify, which is the same as inventing an
| audience that pays more for music, I am not sure how even if
| EzAudio were good - if anything existed which generated
| unlimited, high quality music - will change much.
|
| Good music generation will strengthen, not weaken, Spotify.
| It will change whom gets paid by Spotify, in that the sons
| and daughters of record and banking executives can truly be
| talentless, but it will not change who is doing the paying.
|
| Anyway, isn't this the status quo right now? There is maybe
| 10,000x more high quality music tracks than an individual
| could ever listen to in his lifetime practicably. And it
| _does_ make Spotify a better value every day.
| owenpalmer wrote:
| "A man yells, slams a door and then speaks."
|
| These are hilarious.
| zaptrem wrote:
| _Classic_ "code and weights released at X." But when you go to
| the repo at X there's nothing there and possibly never will be.
| cchance wrote:
| People don't realize that an entire job field of creating these
| sounds today in post for videos and movies. As this sort of model
| improves that fields basically gone
| smrtinsert wrote:
| Depends on quality. Birds in a stream. Are the birds accurate
| to the movies location? Is the fidelity good enough? Does it
| match the movie? I don't think it's as simple as prompt and
| done
___________________________________________________________________
(page generated 2024-09-24 23:01 UTC)