[HN Gopher] EzAudio: Enhancing Text-to-Audio Generation with Eff...
       ___________________________________________________________________
        
       EzAudio: Enhancing Text-to-Audio Generation with Efficient
       Diffusion Transformer
        
       Author : blacktechnology
       Score  : 81 points
       Date   : 2024-09-24 03:16 UTC (19 hours ago)
        
 (HTM) web link (haidog-yaqub.github.io)
 (TXT) w3m dump (haidog-yaqub.github.io)
        
       | maxglute wrote:
       | >A man talking as water splashes and gurgles and a motor engine
       | hums in the background.
       | 
       | This the first time I heard AI Simlish. I wonder what the
       | training data was. Seems like work is done by John Hopkins and
       | Tencent, but the fake AI language sounds... Indic? Are there
       | other examples of AI generating speech in... hallucinated
       | languages?
        
         | alex_duf wrote:
         | Simlish is the first thing that cam to my mind too.
        
         | ben_w wrote:
         | > Are there other examples of AI generating speech in...
         | hallucinated languages?
         | 
         | Sure:
         | https://suno.com/song/0c05e4bd-5879-4e1d-9bdd-555d76569501
         | 
         | No chance that it's getting ancient Summerian correct.
        
           | Y_Y wrote:
           | Of course, you can't just mathematically derive a language
           | that isn't in the training set.
           | 
           | Except Sanskrit, naturally.
        
         | mhuffman wrote:
         | If you go to their demo[0] and type in a prompt to ask it to
         | say something (eg. a person says "hello") it seems to
         | hallucinate a response in a made up language ... maybe, I don't
         | speak every language.
         | 
         | [0]https://huggingface.co/spaces/OpenSound/EzAudio
        
       | tigermafia wrote:
       | Elevenlabs started rolling out a generator for very basic sound
       | effects. Using it made me wonder what the application for things
       | like this would be. If it was realtime it could be used for games
       | but then there is the lack of predictable quality control.
       | 
       | For (cinematic) sounddesign the quality is not nearly good enough
       | yet. For simple home-style videos dozens of (more fun) options
       | exist - foley, free sound libraries, freesound.org, going out
       | with a phone and record stuff.
        
         | earthnail wrote:
         | Same as image generation. When it gets to a certain quality
         | level, it's much faster to describe what you want than to
         | search for it.
        
         | mhuffman wrote:
         | >Using it made me wonder what the application for things like
         | this would be.
         | 
         | Almost certainly in video shorts or high volume video content,
         | I would think.
        
       | doctorpangloss wrote:
       | The quality of the audio is giving me these vibes:
       | 
       | https://www.youtube.com/watch?v=ngZ0K3lWKRc
       | 
       | Hayao Miyazaki, 7 years ago, on AI generated motion capture.
        
         | CamperBob2 wrote:
         | (Shrug) Art, like science, advances one funeral at a time.
        
           | doctorpangloss wrote:
           | I don't know. "Hayao Miyazaki bad" is a loser idea. It is
           | insane to me that this board will waste millions of
           | characters litigating the opensourceyness of licenses, but
           | when it comes to using their own ears and making a gut
           | opinion millions are capable of doing every day: no.
           | 
           | For music specifically, until someone invents a viable
           | alternative to Spotify, which is the same as inventing an
           | audience that pays more for music, I am not sure how even if
           | EzAudio were good - if anything existed which generated
           | unlimited, high quality music - will change much.
           | 
           | Good music generation will strengthen, not weaken, Spotify.
           | It will change whom gets paid by Spotify, in that the sons
           | and daughters of record and banking executives can truly be
           | talentless, but it will not change who is doing the paying.
           | 
           | Anyway, isn't this the status quo right now? There is maybe
           | 10,000x more high quality music tracks than an individual
           | could ever listen to in his lifetime practicably. And it
           | _does_ make Spotify a better value every day.
        
       | owenpalmer wrote:
       | "A man yells, slams a door and then speaks."
       | 
       | These are hilarious.
        
       | zaptrem wrote:
       | _Classic_ "code and weights released at X." But when you go to
       | the repo at X there's nothing there and possibly never will be.
        
       | cchance wrote:
       | People don't realize that an entire job field of creating these
       | sounds today in post for videos and movies. As this sort of model
       | improves that fields basically gone
        
         | smrtinsert wrote:
         | Depends on quality. Birds in a stream. Are the birds accurate
         | to the movies location? Is the fidelity good enough? Does it
         | match the movie? I don't think it's as simple as prompt and
         | done
        
       ___________________________________________________________________
       (page generated 2024-09-24 23:01 UTC)