[HN Gopher] Show HN: Voice-Pro - AI Voice Cloning
___________________________________________________________________
Show HN: Voice-Pro - AI Voice Cloning
Imagine creating a podcast where Mark Zuckerberg interviews Elon
Musk - using their actual voices? What sounds like science fiction
is now reality. Voice-Pro is an open-source Gradio WebUI that
breaks the boundaries of audio manipulation. Powered by cutting-
edge Whisper engines, this tool turns voice replication into
child's play. Key Features: - Zero-shot Voice Cloning - Voice
Changer with 50+ Celebrity Voices - YouTube Audio Downloading -
Vocal Isolation - Multi-Language Text-to-Speech (Edge-TTS, F5-TTS)
- Multi-Language Translation - Powered by Whisper Engines
(Whisper, Faster-Whisper, Whisper-Timestamped) Video Demos: 1.
Voice-Pro Usage Tutorial: https://youtu.be/z8g8LMhoh_o 2. Voice
Cloning Celebrity Podcast Demo: https://youtu.be/Wfo7vQCD4no 3.
Full Demo Playlist:
https://www.youtube.com/playlist?list=PLwx5dnMDVC9Y7dAjm9r26...
Whether you're a content creator, developer, or audio experiment
enthusiast, Voice-Pro provides a user-friendly interface to push
the boundaries of audio manipulation. GitHub:
https://github.com/abus-aikorea/voice-pro
Author : abuskorea
Score : 222 points
Date : 2024-11-28 02:37 UTC (20 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| jncfhnb wrote:
| Is there speech to speech? I have been hoping for a model I can
| use to do voice acting with inflection
| amrrs wrote:
| Do you mean Inflection's Pi?
| bryanrasmussen wrote:
| I think they mean speech "in the style of" the same as
| repaint this picture in the style of Van Gogh, so they will
| do the audio and put the correct inflection on things but
| then rerender it with the voice of Katharine Hepburn for
| example.
|
| on edit: example of course showing the difficulty as so much
| of Hepburn was her inflection.
| jncfhnb wrote:
| More so I wish to voice act a line and then have the bot
| mimic it with a different voice but with the same
| contextual voicing.
|
| "I'm going to kill you" could be delivered (laughing
| jokingly / seething with rage / ominously and creepily).
| I'd like a bot that can mimic the delivery in a different
| voice.
| muglug wrote:
| These tools make it very easy to scam vulnerable people, and have
| pretty limited use otherwise.
| tsujamin wrote:
| Bulldozing grandma is just the cost of technological progress
| /s
| uh_uh wrote:
| This tech is going to be ubiquitous, it's just too easy to
| distribute it. Grandma better starts adapting now.
| thejazzman wrote:
| Because people make it so, not because the natural order of
| the world gets us there
|
| For some reason because we can validates that we should.
| Any jackass has the power of a research team of phds. It's
| kinda weird.
| uh_uh wrote:
| Demanding responsible behaviour from everybody is not
| going to work. Some people don't care about negative
| externalities that much and it's enough if only a few of
| them decide not to play ball. So either grandma needs to
| adapt which will upset some people or distributing the
| tech should be regulated/prosecuted which will upset
| another group of people.
| rockemsockem wrote:
| I think either way grandma needs to adapt though since
| Russian scammers and trolls are still going to run scams
| with fake voices.
| 123yawaworht456 wrote:
| how very politically correct of you to pretend it's
| Russians who scam your grandmas
| chefandy wrote:
| Indeed. Humans ascended to dominance because we can
| cooperate. This every-man-for-themself idea is an
| aberration, not the natural order as so many claim. It's
| rather astounding to think otherwise considering the
| logistics of how we're communicating right now.
| uh_uh wrote:
| Cooperation works if the potential damage caused by a
| rouge actor is sufficiently low. Otherwise, it's too easy
| to sabotage things. This is why we don't want random
| rouge states to have nukes. AI will give so much leverage
| to rouge actors that it will significantly shift the game
| theory in favour of not cooperating.
| chefandy wrote:
| > Cooperation works if the potential damage caused by a
| rouge actor is sufficiently low. Otherwise, it's too easy
| to sabotage things. This is why we don't want random
| rouge states to have nukes. AI will give so much leverage
| to rouge actors that it will significantly shift the game
| theory in favour of not cooperating.
|
| Governments successfully collectively controlling
| dangerous things so they don't fall into the hands of
| rogue bad actors fundamentally opposes the extreme
| individualist every-man-for-himself perspective in every
| conceivable way. It's the absolute opposite of "it's
| everybody's responsibility to protect themselves because
| everybody else is only going to look out for themselves."
|
| And when individuals have that much leverage, collective
| action is the only conceivable way to oppose it. Some of
| those things might be cultural, like mores, some might be
| laws, some might be more martial. I don't see how extreme
| individualism even theoretically could be more powerful.
| uh_uh wrote:
| Are you suggesting government action against putting up
| code like this to GitHub? It's ok if you are, but I want
| to put into more concrete terms what we're talking about.
| chefandy wrote:
| You're the one that made the direct government control
| analogy. I mentioned a number of non-individualistic
| mechanisms in my previous comment. I'm not going to keep
| engaging in a fishing expedition of things to argue about
| -- I think it's pretty clear what aspect of your stance I
| disagree with-- and am going to leave it at that.
| chefandy wrote:
| You can't adapt around brain age making it more difficult
| to distinguish truth from lies.
| casey2 wrote:
| Yeah, I don't really get the hulabaloo, if granny doesn't
| have the mental fortitude to keep up with the times she
| shouldn't be managing her own money. I guess better her
| son/daughter than a scammer but both are better than
| letting money rot. Put granny on foodstamps and pay $1 for
| her rent controled housing be done with it.
| zelphirkalt wrote:
| Are we forgetting, that there are many elderly people
| without living descendants?
| weq wrote:
| This tech is not only great for bulldozing grandma, its great
| at stealing content from other creators and rebranding it as
| your own. Based on the github, it kind of seems like thats
| exactly whats being advertised as the use case. Steal content
| from BBC, cut it up and pull the noise out/vocals/revoice the
| content so the algorithm cant detect the plagorism easily.
| The imagine detection is no where no the audio detection for
| copyright strikes.
|
| There is a massive problem with this on youtube. Pretty much
| every category on youtube now has a host of these bots
| trolling content and playing the youtube strike system like a
| banjo. There are channels detected to showing you how to
| setup these content mills. This tool can make you good money.
| sfjailbird wrote:
| First generative AI destroyed Google search, and now it has
| pretty much destroyed YouTube. Social platforms, including
| this one, are probably goners too. We live in interesting
| times.
| Larrikin wrote:
| I'm absolutely using celebrity voices for my Home Assistant
| voice. Amazon has spent the last couple years removing the
| voices for Alexa that people had paid for.
| nickthegreek wrote:
| I'd love some more info on using custom voices in HA. I have
| an esp32-s3-box that I am setting up holiday to do voice with
| HA.
| pmarreck wrote:
| If you have a how--to, I'd love to work on one for my home. I
| feel like this is all right around the corner...
| chefandy wrote:
| Gen AI space to everyone else: _"Your computer scientists were
| so preoccupied with whether or not they should, they didn't
| stop to think if they could just do it anyway"_
| chefandy wrote:
| To be fair, they've got pretty serious potential for letting
| tech companies get paid for a seasoned voice actor's unique
| delivery, tone, inflection, etc rather than the voice actor
| themselves.
| whaaaaat wrote:
| > they've got pretty serious potential for letting tech
| companies get paid for a seasoned voice actor's unique
| delivery, tone, inflection, etc rather than the voice actor
| themselves.
|
| I think you mean "steal the labor of an actor"?
| chefandy wrote:
| Sure, and people that already agree with you will feel good
| reading it, but other people who don't agree see it as an
| attack. It's pretty much impossible to slip a new idea into
| someone's mind if your approach made them slam the door
| before even considering it. So what's the benefit of saying
| it like that?
| gmueckl wrote:
| It calls attention to the ethical implications of using a
| part of someone else's personal identity without their
| direct involvement.
| MrDrMcCoy wrote:
| Indirect involvement can still be ok within the confines
| of a license agreement for using the actor's voice.
| gmueckl wrote:
| But this requires a legal framework that mandates such
| licenses and effective emforcement / procecution of
| violations.
|
| As far as I know, most countries are lagging behind when
| it comes to updating legislation to set binding rules
| around that.
| ideashower wrote:
| > Indirect involvement can still be ok within the
| confines of a license agreement for using the actor's
| voice.
|
| This assumes existence of a license agreement or
| likeness/right of publicity law that prevents
| unauthorized use. But this is far from the case.
|
| Companies have shown willingness to use actors' voices to
| create synthetic voices without permission, compensation,
| or regard for their livelihoods. [1][2][3]
|
| [1] https://animehunch.com/popular-japanese-voice-actors-
| band-to...
|
| [2] https://www.theatlantic.com/technology/archive/2024/0
| 5/eleve...
|
| [3] https://www.yahoo.com/entertainment/morgan-freeman-
| calls-una...
| MrDrMcCoy wrote:
| Of course we need laws in place to require such
| licensing. The fact that people are having their voice
| stolen now does not mean that there should never be a
| case where a voice can legally be cloned and used by a
| third party.
| chefandy wrote:
| So does what I said. Someone taking pay for someone
| else's work is pretty unambiguously shitty. But when you
| call taking anything that isn't a physical item theft, a
| large percentage of people-- especially in the 'data
| wants to be free' crowd-- will roll their eyes, think
| "that's ridiculous... they aren't stealing anything. That
| voice actor still has their voice" and just stop
| listening. The only people that feel the impact of
| statements like that are people that already agree. It
| turns it from an intellectual discussion to a
| reinforcement of existing tribes. Divisive language works
| for rallying those who already agree around a specific
| cause but it's not even useless-- it's
| counterproductive-- for changing people's minds. When's
| the last time someone you disagreed with changed your
| mind by being more aggressive towards your stance, and
| more terse in their portrayal of the dichotomy? If you
| can even think of one time that it has, you're in the
| extreme minority.
| ranger_danger wrote:
| How many victims will it take for lawmakers to do something
| about this?
| tiborsaas wrote:
| It's already illegal to scam somebody. While it's always
| positive to protect people more, what can be done here? Any
| alternative I can imagine is massively oppressive of the
| current state of the software industry.
|
| You can regulate large companies, you can regulate published
| software sold for profit, but it's impossible to regulate
| free and open source tools.
|
| You essentially have to regulate access to computing power if
| you want to prevent bad actors doing bad things using these
| sort of tools.
| bryanrasmussen wrote:
| >You can regulate large companies, you can regulate
| published software sold for profit, but it's impossible to
| regulate free and open source tools.
|
| Regulation is putting legal limitations on things, if it is
| impossible to regulate free and open source tools then it
| would be impossible to regulate murder and lots of other
| things, but it turns out it isn't impossible, sure - murder
| happens - but people get caught for it and punished.
|
| Sorry, but this argument is much like the early internet
| triumphalism - back when people said it was impossible to
| regulate. Turns out lots of countries now regulate it.
| tiborsaas wrote:
| It depends on what you do with the tool. Going with your
| murder analogy, if there's a stabbing epidemic what do
| you do? 1) Ban knives 2) invest in public safety 3)
| investigate the root causes and improve on them?
|
| I'm also not sure what's so regulated about the internet
| besides net neutrality in certain countries. Of course
| the government can put limits on the network, like
| banning services, but it's easy since they are rather
| easy to target. With content traveling on the network
| it's much harder to say if it's legit or not.
|
| > lots of countries
|
| What about those countries that don't regulate it and
| people will keep pumping out better, leaner and faster
| models from there? Spreading software is trivial, all you
| achieve is the public won't be aware of what's possible.
|
| The more I think about it if anything should be regulated
| that's a requirement to provide third party (probably
| government backed) ID verification system so it would be
| possible for my mom to know it's me calling here.
| Basically kill called ID spoofing.
| bryanrasmussen wrote:
| >I'm also not sure what's so regulated about the internet
| besides net neutrality in certain countries.
|
| generally things are regulated on the internet that were
| not going to ever be regulated because it was on the
| internet - example - sales taxes, perhaps you are old
| enough to remember when sales tax collection would not
| ever be enforceable on internet transactions - those
| idiot lawyer don't know, it's on the internet, the sale
| didn't happen in that country or in that state no sales
| taxes will never happen on the internet hah hah. It's
| unenforceable, it is logically undoable, there are so
| many edge cases - ugh, the law just does not understand
| technology!
|
| oops, sales taxes now on internet purchases.
|
| GDPR is another example of things that are regulated on
| the internet that basically most of HN years before it
| happened was completely convinced would be impossible!!
|
| If this thing becomes too big a problem for the societies
| regulations will be done, with varying levels of
| effectiveness I'm sure.
|
| And then in twenty years time we will be saying what, you
| can't regulate genital eating viral synths because a guy
| can make those in his garage and spread them via nasal
| spray, this technology is unstoppable and unregulatable,
| not like some open source deepfake library!!
| bavell wrote:
| It's always amusing listening to techies' musings on
| law... lots of misunderstandings, I suspect due to the
| helpful but inaccurate "code but for humans" analogy.
|
| Obligatory/relevant xkcd: https://xkcd.com/538/
| vunderba wrote:
| Lots of countries impose exactly what specific
| regulations with respect to open source tooling?
|
| The closest thing I can think of is maybe the regulation
| of DRM ripping tools, but they're still out there in the
| wild and determined actors can easily get ahold of them.
| So I'm not at all confident that regulation will have any
| measurable meaningful effect.
| notTooFarGone wrote:
| The fable of the "determined actor".
|
| The "determined actor" can get bombs, tanks, fissure
| material. There noone says "WHELP they can get it anyway
| so why bother regulating it LMAO" - somehow this is
| different in anything not physical?
| bryanrasmussen wrote:
| >Lots of countries impose exactly what specific
| regulations with respect to open source tooling?
|
| that something is not currently regulated does not mean
| it can never be regulated, further it does not seem
| likely that they would regulate open source tooling but
| rather some uses and if they open source tooling allowed
| those uses then what would happen is -
|
| github and other big sources of code would refuse to host
| it as containing not legally allowed things, so for
| example if they regulated it in the U.S then Github stops
| allowing it, and everyone moves to some European git
| provider.
|
| At the same time bigger companies will stop using the
| library because liability.
|
| Europe then regulates and can't be in European git
| repos.. at some point many devs abandon particular
| library because not worth it (I get it this is actually
| for the love of doing the illegal thing so they won't
| abandon but despite the power of love most things in this
| world do not actually run on it)
|
| Can determined actors get ahold of them and do the things
| with them the law forbids them to do, sure! That's called
| crime. Then law enforcement catches determined actors and
| puts them in prison, that's called the real world!
|
| Will criminals stop - nope because there is benefit to
| what they're doing. Maybe some will stop because they
| will think screw it I can make more money working for the
| man. And some will be caught sooner or later. And maybe
| in version two of the regulations there will be AI
| enhancements - this crime was committed with AI allowing
| us to take all your belongings and add 10 years to your
| sentence and deprive you of the right to ever own a
| computing device again...etc. etc. And some people will
| stop and others will get more violent and aggressive
| about their criminal business.
|
| I don't know necessarily what measurable meaningful
| effect means, for some people it will be measurable and
| meaningful, for some not, for some of society the
| regulation would in many ways be worse than what it is
| fighting against. I'm not saying regulation will solve
| problems 100%, I'm just saying this whole they can't
| regulate us thing because "TECH!!!" that developers seem
| to regularly go through with anything they set their eye
| on is a pipe dream.
| mnau wrote:
| > impossible to regulate free and open source tools
|
| BS. Can you imagine a legislation? Yes, thus it can be
| done.
|
| As an early example, the CRA (Cyber Resilience Act) already
| contains provisions about open source stewards and
| security. So far they are legal persons, aka foundations,
| but could easily relate to any contributor or maintainer.
| tiborsaas wrote:
| As I made the comment, I can't really imagine anything
| that's not so absurd that has a more than zero chance of
| happening.
|
| Seriously, what can anybody do about random hacker Joe
| publishing under the name XoX? Even if they burn GitHub
| and friends to the ground, if something is useful it will
| be really really hard to get rid of it. Remember youtube-
| dl? It's now https://github.com/yt-dlp/yt-dlp
|
| If they make anything that cripples open source
| development they will feel it quite soon when they
| realize that it also cripples their world as much of the
| tooling and infrastructure also depends on it.
|
| Killing open source is like killing the internet itself.
| russell_h wrote:
| Serious question: what do you think lawmakers should do?
| ideashower wrote:
| For people's image being used without their permission:
| strengthen U.S. right of publicity laws with private right
| of action, enabling people to sue for unauthorized use of
| their voice or likeness.
| ranger_danger wrote:
| Digital signatures as part of audio/video that can't be
| easily modified or faked which can trace the origin of a
| piece of media. Some camera manufacturers are already
| working on it.
| CamperBob2 wrote:
| How do you propose to keep watermark-free models out of
| the hands of evildoers? I can't build my own digital
| camera or laser printer, but I can certainly write
| software.
| 123yawaworht456 wrote:
| how many victims did it take for lawmakers to do something
| about Photoshop/GIMP/etc?
| rockemsockem wrote:
| Quit being a doomer or keep it to yourself. This reminds me of
| the sound boards that were popular in the early 2000s except
| way more versatile. Some things are just good for people to
| have fun and THAT'S OKAY.
| whaaaaat wrote:
| People are allowed to recognize the realistic negative
| outcomes of technology, especially on a forum that frequently
| discusses the tradeoffs of modern, cutting edge technologies.
| rockemsockem wrote:
| So many AI posts are overrun with this kind of complaining
| from folks with limited imaginations.
|
| On a forum that frequently discusses technology with
| enthusiasm you'd think there'd be more enthusiasm and more
| constructive criticism instead of blanket write-offs.
| Mordisquitos wrote:
| I would argue that being able to see the drawbacks and
| potential negative externalities of a new technology is
| not a sign of a "limited imagination", but quite the
| contrary. An actual display of a limited imagination is
| the inability to imagine how a new technology can (and
| will) be abused in society by bad actors.
| Ukv wrote:
| Developing some insight on its negative potential could
| demonstrate imagination, but the claim that it could be
| used to scam people is pretty much just rote repetition
| by now - an obligatory point made in every article and
| under every post about this tech (and not something that
| I think actually works out in practice the way most
| imagine it, since cold-call scam operations that dial
| numbers at a huge scale expecting most not to pick up
| can't really find a voice clip prior to each automated
| call).
|
| As for positive applications, some I see:
|
| * Allowing those with speech impairments to communicate
| using their natural voice again
|
| * Allowing those uncomfortable with their natural voice,
| such as transgender people, to communicate closer to how
| they wish to be perceived
|
| * Translation of a user's voice, maintaining emotion and
| intonation, for natural cross-language communication on
| calls
|
| * Professional-quality audio from cheap microphone setups
| (for video tutorials, indie games, etc.)
|
| * Doing character voices for a D&D session, audiobook,
| etc.
|
| * Customization of voice assistants, such as to use a
| native accent/dialect
|
| * Movies, podcasts, audiobooks, news broadcasts, etc.
| made available in a huge range of languages
|
| * If integrated with something like airpods, babelfish-
| like automatic isolation and translation of any speech
| around you
|
| * Privacy from being able to communicate online or record
| videos without revealing your real voice, which I think
| is why many (myself included) currently resort to text-
| only
|
| * New forms of interactive media - customised movies,
| audio dramas where the listener plays a role, videogame
| NPCs that react with more than just prerecorded lines,
| etc.
|
| * And of course: memes, satire, and parody
|
| I appreciate HN's general view on technologies like
| encrypted messaging - not falling into "we need to ban
| this now because pedophiles could use it" hysteria. But
| for anything involving machine learning, I'm concerned
| how often the hacker mentality seems to go out the window
| and we instead get people advocating for it to be made
| illegal to host the code, for instance.
| Mordisquitos wrote:
| Of the 11 positive applications that you listed, only the
| 1st, 3rd, 11th and arguably the 4th would benefit from
| voice _cloning_ , which is what's being promoted here.
| The rest are solved merely by (improved) TTS and do not
| require the cloning of any actual human voice.
|
| Also, notice how the legitimate use-cases 1, 3 and 4
| imply the user consenting to clone _their own_ voice,
| which is fine. However, the only use-case which would
| require cloning a specific human voice belonging to a
| third party, use-case 11, is _" memes, satire, and
| parody"_... and not much imagination is needed to see how
| steep and buttery that Teflon slippery slope is.
| Ukv wrote:
| > Of the 11 positive applications that you listed, only
| the 1st, 3rd, 11th and arguably the 4th would benefit
| from voice cloning, which is what's being promoted here.
| The rest are solved merely by (improved) TTS and do not
| require the cloning of any actual human voice.
|
| 2, 5, 6, 9: It's true that in theory all you need is some
| way to capture the characteristics of a desired voice,
| but voice-cloning methods are the way to do this
| currently. If you want a voice assistant with a native
| accent, you fine-tune on the voice of a native speaker -
| opposed to turning a bunch of dials manually.
|
| 7, 8, 10: Here I think there _is_ benefit specifically
| from sounding like a particular person. The dynamically
| generated lines of movie characters /videogame NPCs
| should be consistent with the actor's pre-recorded lines,
| for instance, and hearing someone in their own voice is
| more natural for communication and makes conversation
| easier to follow.
|
| Pedantically, what's promoted here is a tool which
| features voice cloning prominently but not exclusively -
| other workflows demonstrated (like generating subtitles)
| seem mostly unobjectionable.
|
| > Also, notice how the legitimate use-cases 1, 3 and 4
| imply the user consenting to clone their own voice, which
| is fine
|
| I think all, outside of potentially 8 and 11, could be
| done with full consent of the voice being cloned - an
| agreement with the movie actor to use their voice for
| dubbing to other languages, for example. That's already a
| significant number of use-cases for this tool.
|
| > use-case 11, is "memes, satire, and parody"... and not
| much imagination is needed to see how steep and buttery
| that Teflon slippery slope is.
|
| IMO prohibition around satire/parody would be the
| slippery slope, particularly with the potential for
| selective enforcement.
| casey2 wrote:
| I like tools like these cause they make zero trust default even
| more obvious, and their "pretty limited use" is saving people
| hours of work.
| anonzzzies wrote:
| They are pretty good for leaving messages for my blind friend.
| I generally find calling / voice texts a waste of time (I type
| and read far faster than I talk or listen, not to mention the
| ability to reread etc), but my blind friend prefers getting
| voice messages when on his phone and this works for us. I type
| and send and when he comes back with something, Whisper makes
| it into text for me.
| mistercow wrote:
| It's weird to me that people look at a technology and then
| assume from their reckoning that they know all the uses for
| that technology immediately. Most technological progress
| happens because someone notices a creative use for something
| that already exists which nobody else has noticed.
| yawnxyz wrote:
| > When Windows Defender mistakenly recognizes a [virus] as a
| Trojan, this is often called a 'False Positive'. To solve this
| problem, you can go through the following steps:
| kfarr wrote:
| Yeah I also noticed the install instructions is run this batch
| file that gets administrator access and starts downloading
| things...
| gruez wrote:
| It's not any worse than all the projects on github with an
| "easy" install instructions of "curl ... | sudo sh". Heck,
| even an innocent "sudo make install" command can easily
| contain a malicious payload.
| chefandy wrote:
| Yeah it's not great but it's definitely not unusual. And
| windows reputation-based execution blocking does have false
| positives. I work for a company that has some very very
| popular products and some that only see a few dozen
| downloads per week, and despite being signed, it still
| takes a while for new versions to build enough rep to not
| trigger the block.
| tonyedgecombe wrote:
| It's not really the sort of tool that should require admin
| rights though.
| elif wrote:
| Yea not to mention the entire homebrew ecosystem is built
| around trusting random people's shell scripts.
|
| MacOS devs blindly trust it like it's the app store.
| pmarreck wrote:
| A simple `brew cat <packagename>` (possibly piping to bat
| if you want syntax highlighting) should spit out the ruby
| install formula for that package, for inspection.
| nozzlegear wrote:
| The assumption is that maintainers at Homebrew are
| reviewing each pull request before being merged, though
| it's obviously not a full security audit. Homebrew will
| also use macOS's sandboxing if a formula needs to be
| built during installation, which will limit file access
| to specific Homebrew directories and restrict network
| access.
|
| But I agree that everyone should review the Homebrew
| install script for any package they're installing if
| they're concerned about security.
| safeimp wrote:
| Project looks interesting. Are there short term plans to support
| MacOS?
|
| If not, any recommendations for alternative projects?
| ilrwbwrkhv wrote:
| There are a bunch of yc start-ups who are building new models and
| stuff in the space. I fear they are going to get decimated really
| soon as the quality of local llamas keep improving.
| shannifin wrote:
| I don't have much real use for celebrity voices (other than fun
| experimentation), but I'd love to be able to clone my own voice
| and character voices for the purposes of creating audiobooks /
| audioplays without having to pay monthly fees with monthly usage
| limits. So I'm excited by this sort of project!
|
| P.S. Are there any tools for synthetic voice creation? Maybe
| melding two or more voices together, or just exploring latent
| space? Would be fun for character creation to create completely
| new voices.
| dyauspitr wrote:
| I've used tortoise tts before and trained it on my voice and a
| mix of voices. It's not perfect but still impressive.
| thelittleone wrote:
| Have you tried eleven labs? I used that. Had to record 3 hours
| of training audio reading books and and news articles. But the
| result was really good.
| shannifin wrote:
| They're great! They just cost too much for how much output I
| want.
| stavros wrote:
| How much did the training cost?
| vunderba wrote:
| I'd be interested as well. This is where I imagine the space is
| going - particularly as the potential for litigation increases
| around cloning.
|
| Game studios will spin up a bunch of unique virtual voices for
| all the dialogue of extras. It'll probably be longer before we
| see replacements of main characters though. There's been some
| research in speech-to-speech transference as well - this means
| that company employee A records the character B's line with the
| appropriate emotional nuance (angry, sad, etc.) and the
| emotional aspect is copied on top of the generated TTS.
| jerpint wrote:
| StyleTTSv2 is pretty good and open source, you can easily
| traverse its latent space for voice
| joshdavham wrote:
| Looks cool! Also, is there a reason you went with a Web-UI
| instead of making a native desktop app?
| harryf wrote:
| Have you considered supporting whisper-at -
| https://github.com/YuanGongND/whisper-at ? Being able to identify
| sounds on a timeline can be useful e.g. politicians speech and
| how the audience is reacting to it (e.g. clapping, applauding)
| newusertoday wrote:
| are there any TTS models which are decent but can work on devices
| without GPU and have relatively low RAM(4GB)
| grahamgooch wrote:
| Great stuff well done. What is your latency for real time Audio?
| whaaaaat wrote:
| > Imagine creating a podcast where Mark Zuckerberg interviews
| Elon Musk - using their actual voices?
|
| I'm imagining it. It _sucks_ to imagine.
|
| I'm imagining it being used to scam people. I'm imagining it to
| leech off of performers who have worked very hard to build a
| recognizable voice (and it _is_ a lot of work to speak like a
| performer). I 'm imagining how this will be used in revenge porn.
| I'm imagining how this will be used to circumvent access to voice
| controlled things.
|
| This is bad. You should feel bad.
|
| And I know you are thinking, "Wait, but I worked really hard on
| this!" Sorry, I appreciate that it might be technically
| impressive, but you've basically come out with "we've invented a
| device that mixes bleach and ammonia automatically in your
| bedroom! It's so efficient at mixing those two, we can fill a
| space with chlorine gas in under 10 seconds! Imagine a world
| where every bedroom could become a toxic site with only the push
| of a button.
|
| That this is posted here, proudly, is quite frankly astoundingly
| embarrassing for you.
| farzd wrote:
| You do realise this is not the first AI release to clone
| voices?
| yyuugg wrote:
| I don't think the parent said they were. "I'm the Nth person
| to do a shitty thing!" doesn't absolve them of doing a shitty
| thing. Just because there are other thieves doesn't make
| theft ok.
| cess11 wrote:
| Sure, and PoisonIvy wasn't the first RAT. So what? Does it
| get more ethical to assist fraudsters and so on once more
| people are doing it?
| Ukv wrote:
| I'd claim the way most people imagine it being used for
| scamming, cold-calls impersonating someone the victim knows,
| doesn't really end up working out in practice because scam
| operations dial numbers at a huge scale expecting most not to
| pick up a "scam likely" call (or be away, or a dead number,
| etc.). Having to find a voice clip prior to each unanswered
| call would tank the quantity they're able to make.
|
| For spear-phishing (impersonate CEO, tell assistant to transfer
| money) it's more feasible, but I hope it forces acceptance that
| "somebody sounds like X over the phone" is not and has never
| been a good verification method - people have been falling for
| scams like those fake ransom calls[0] for decades.
|
| Not that there aren't potential harms, but I think they're
| outweighed by positive applications. Those uncomfortable with
| their natural voice, such as transgender people, can
| communicate closer to how they wish to be perceived - or
| someone whose voice has been impaired (whether just a temporary
| cold or a permanent disorder/illness/accident) can use it from
| previous recordings. Privacy benefits from being able to
| communicate online or record videos without revealing your real
| voice, which I think is why many (myself included) currently
| resort to text-only. There's huge potential in the translation
| and vocal isolation aspects aiding communication - feels to me
| as though we're heading towards creating our own babelfish.
| There's also a bunch of creative applications - doing character
| voices for a D&D session or audiobook, memes/satire, and likely
| new forms of interactive media (customised movies, audio dramas
| where the listener plays a role, videogame NPCs that react with
| more than just prereccorded lines, etc.)
|
| [0]: https://www.fbi.gov/news/stories/virtual-kidnapping
| yyuugg wrote:
| I think most people in America are more wary of foreign
| sounding voices. If the person on the other end sounds like a
| good ol boy, they get more trust.
|
| Scammers don't have to sound like a specific person to be
| helped by software like this.
| Ukv wrote:
| That aspect feels to me like "I used to racially profile
| people on the street to judge risk, but winter clothing now
| obscures skin color at a distance". There are heuristics
| that give non-zero information but are harmful to use, with
| the cost borne by some marginalized group, and I don't see
| it as a negative for use of such heuristics to be made less
| feasible. Reducing people's use of accent as a factor would
| be a positive for the ~1.5B Indians that aren't scammers,
| for instance.
|
| I think there's also an autonomy argument to be made, if
| the alternative is to the effect of ensuring that people
| cannot use tools hide their accent (and particularly if, as
| above, the intent is so they can be discriminated against
| based on it). Even though it isn't something we've really
| been able to do before, I think it's generally a person's
| own right to modify their voice.
| aboardRat4 wrote:
| Without Linux support it is going to have a very limited
| audience.
| okwhateverdude wrote:
| There is nothing in here that precludes you from running this
| on any OS that supports python + CUDA. They use miniconda for
| installation of python and python packages, but this could just
| as easily be a venv + system CUDA install or even better: a
| container. This is only one tiny Dockerfile away from running
| anywhere.
| vunderba wrote:
| I do think that voice cloning for personal usage has actual
| genuine uses - in fact there was a relatively interesting news
| article about a person who was irrevocably losing their voice who
| had their vocal pattern cloned.
|
| https://www.voanews.com/a/illness-took-away-her-voice-ai-cre...
|
| That being said, it does seem a bit bizarre that the repo's home
| page is proudly trumpeting the ability to co-opt other people's
| identities without their permission (and yes your unique vocal
| pattern is definitely part of your identity - I mean it's used in
| some forms of biometric data). They're doing the project a bit of
| a disservice.
| onetokeoverthe wrote:
| _proudly trumpeting the ability to co-opt other people 's
| identities without their permission_
|
| EXACTLY. Clone the wrong person's voice and it's game over.
| satvikpendem wrote:
| It's useful for some things, like satire. Presidents Play is a
| good series in YouTube where it uses US presidents' cloned
| voices for comedic satire.
| bbarnett wrote:
| A gun is useful to shoot someone, what has that to do with it
| being right or wrong?
| satvikpendem wrote:
| Not sure you picked the most cogent example because lots of
| people will debate you on that topic...
| VPenkov wrote:
| It does have actual genuine uses. I'm in the process of
| recording a series of tutorials for my peers but I'd like them
| to hear things in my voice so it doesn't sound like I have
| offloaded the work to someone else.
|
| I don't know if this helps or harms the credibility but I can't
| really talk more than an hour without seriously straining my
| voice. So cloning it sounds like a great use-case for someone
| with a similar problem.
|
| Looking forward to trying this.
| vunderba wrote:
| I like this idea. I've been playing with the idea of having
| all my blog entries have corresponding narration with my own
| voice but I'd love to see some kind of voice cloner + gradio
| interface that let's me make some adjustments to things like
| cadence, delivery, etc. (I mean beyond just making me sound
| like Alvin and the Chipmunks).
| NoMoreNicksLeft wrote:
| When my IoT geiger counter starts going off, I do what the in-
| home PA system's voice to be Admiral Adama warning my family of
| an imminent radiological threat, and preparing the Vipers for
| launch.
|
| Edward James Olmos if you're reading this, I'm willing to pay a
| license fee, but then I expect actual recordings and not just
| AI bullshit. I'm not pirating your voice, you're refusing to
| let me hire it.
| ranger_danger wrote:
| Randy Travis also used AI on his last album after losing his
| voice.
| chefandy wrote:
| Of course there are legitimate uses, which means everyone
| should have completely unfettered access and nobody selling it
| should be responsible for irresponsible users. Personally, I'm
| sick of the government limiting my artistic freedom because the
| mediums I use might be misused by a tiny group of bad actors.
| For example, it's unnecessarily difficult to source pineapple
| grenades for my large scale abstract punched tin crafts. The
| other people who live in my apartment building haven't
| complained when I asked if they had a problem with it, so
| what's the problem? And when I can get ahold of it, white
| phosphorous makes a great addition to my annual deep-woods
| pyrotechnic light shows. I just don't understand this nanny
| state garbage.
| notpachet wrote:
| Take my upvote you greedy bastard.
| wingworks wrote:
| Just heads up, this is a trail, you have to pay to use it after
| 30mins..
|
| Easier and (cheaper?) to just use elevenlabs.
| vulcanidic wrote:
| It's a bit of a hassle, but after closing the Windows command,
| you can restart the program and use it indefinitely. The
| results you worked on will still remain in the workspace
| folder.
| ldoughty wrote:
| Yeah, felt like it positions itself as open source project here
| and on GitHub, but buries the cost in other pages... Doesn't
| even say the subscription cost anywhere I could find (in
| English). Not a huge fan of this advertising model.
| jamesy0ung wrote:
| I haven't looked at the code, but can you just patch out the 30
| minute limit?
| batch12 wrote:
| Looks to me like the app code is compiled into pyd files. One
| could try and decompile. Interestingly, it's licensed as MIT.
| XorNot wrote:
| The real utility of something like this is for reducing the
| creative costs of voice-acting. i.e. something like this is a
| massive boone for mod-makers where making fully voiced anything
| is a huge undertaking - i.e. while my friends and family could
| probably provide their voice if I asked, getting a decent
| recording and performance out of them is just not going to be
| possible.
|
| But if I can get the performance I want and shift it to another
| voice, then fully voicing free works becomes very accessible
| (even better would be generative AI which could take a sample of
| what you want and re-render it into something which sounds like a
| more professional performance - voice in-fill I suppose).
| youngNed wrote:
| I'm looking down the comments, but not really seeing much about
| what this actually is, by my very quick look, it's a front end
| for f5-tts with a yt-dlp and whisper?
|
| Is there anything new in this?
| dangoodmanUT wrote:
| Yeah they made an easy to use frontend. Don't be the dropbox
| guy
| vulcanidic wrote:
| I completely agree with you. This is just a web front-end,
| and there's nothing new about it. However, it's very easy.
| It's not easy to create something like this.
| Uehreka wrote:
| We can't just keep saying "Don't be the dropbox guy" as a
| comeback to criticism of new technology. Anyone who uses that
| phrase should have to place a bet in a prediction market that
| only pays out if the product they're talking about succeeds.
| Blindly supporting stuff out of a sort of "Pascal's Wager
| against looking foolish later" should have some cost if
| you're wrong.
| bn-l wrote:
| Let's default to being supportive and very careful with
| being negative.
| Uehreka wrote:
| That kind of imbalance makes it easier for scammers and
| hucksters to get away with things. It is not a feelgood
| prescription with no cost.
| youngNed wrote:
| Wind your neck in.
|
| I simply asked "is there anything new in this?" because, i
| was interested to know if, you know, there was anything new
| in this.
| OceanBreeze77 wrote:
| Are banks moving away from voice verification as a means to
| identity checks? It seems like it's getting easier and easier to
| clone voices.
| tgv wrote:
| I'm with the nay-sayers. Your product doesn't bring any good to
| this world, but it does make it easier to harm people. It's a
| disgrace.
| Ylpertnodi wrote:
| "If, by whiskey...."
| Hard_Space wrote:
| This doesn't appear to have any training facility, so its misuse
| would seem to be limited to the pre-trained voices supplied - for
| the casual user (and the ease-of-use seems to be the central
| issue in these comments).
| throwaway314155 wrote:
| My experience with voice cloning is that training is typically
| not required for it to work. You just embed a bit of audio of
| the desired voice to be cloned using the backing VAE and the
| model can do the rest.
|
| Is it not the same with this project?
| deskr wrote:
| Isn't it funny how some text changes the voice in your head? Now
| you're hearing the best voice. It's amazing. I tell you. It's the
| greatest voice. Everybody's talking about it. They are saying
| it's incredible. They say they've never heard as beautiful a
| voice before.
| cies wrote:
| I needed until "Everybody's talking about it" to hear it in
| _his_ voice :)
|
| Please no spoilers!
| amazingamazing wrote:
| Voices can be beautiful.
| bitwize wrote:
| When Arnold Schwarzenegger was governor of California, he
| refused clemency for notorious gang founder Stanley "Tookie"
| Williams, who was sentenced to death for four murders in 1979.
|
| https://www.ocregister.com/2005/12/12/governors-full-stateme...
|
| Reading over the governor's statement explaining his reasons
| for denial of clemency, my brain couldn't help but do so in an
| Arnold voice. Sometimes, to amuse friends, I would read
| portions of it aloud while doing the voice.
|
| Maybe it's a bit tasteless, like the anime-girl Demon Core
| memes, but there's just something about hearing the legal and
| administrative justification for proceeding with an execution
| in the voice of the Terminator.
|
| I'm the same way with famous YouTubers. If I see "Guru Larry"
| Bundy Jr. or Clint "LGR" Basinger leave a comment on someone
| else's video, my brain reads it in their voice.
| giarc wrote:
| My neighbour is a detective and did a course on crypto scams. He
| told me scammers call someone's cell phone, record their
| voicemail greeting and use that to clone their voice. Then can
| then have a very real life conversation with their grandparent
| and take their money.
|
| I'm all for innovation, but I don't really see the use case of
| cloning random voices to make podcasts? Listening to Zuck
| interview Elon? ok...?
| eurekin wrote:
| Technically, wouldn't a simple "Hold on, I'll call you back"
| test call stop that?
| stitched2gethr wrote:
| Yes, if the callee has reason to believe the caller isn't who
| they say they are. But this will never enter the mind of
| someone who's retirement age.
| bagels wrote:
| Some old people become very gullible.
| Loughla wrote:
| In all fairness, the number of old people who even know
| that realistic recreations of their loved ones voices is
| even possible is probably pretty low.
| a2128 wrote:
| Scammers will use pressure and emotion. "Grandpa they put me
| in jail, I need you to bail me out please, there's not much
| time!" The last thing on the victim's mind is to hang up on
| what sounds like their crying distressed grandson to call
| them back. Sometimes even calling back won't work, the real
| grandson isn't picking up their phone and the scammer is
| saying that it's because they're in jail and their phone was
| taken.
| botanical76 wrote:
| I've been thinking a lot about this possibility. I think
| people will have to come up with family passwords
| eventually. A word or phrase that is regularly practised,
| but strictly private, for verification in times of crisis.
|
| For example, my family's passphrase is- just kidding.
| hollerith wrote:
| Either than or Android and iOS will add something like
| Caller ID but with actual authentication.
| notpachet wrote:
| My family already does this.
| alias_neo wrote:
| It's really easy for a technical person to do as well.
|
| I use Coqui TTS[0] as part of my home automation, I wrote a
| small python script that lets me upload a voice clip for it to
| clone after I got the idea from HeyWillow[1], and a small shim
| that lets me send the output to a Home Assistant media player
| instead of using their standard output device. I run the TTS
| container on a VM with a Tesla P4 (~PS100 to buy) and get about
| 1x-2x (roughly the same time it'd take to say it, to process)
| using the large model.
|
| Just for a giggle, I uploaded a few 3s-5s second clip of myself
| speaking and cloned my voice, then executed a command to our
| living room media player to call my wife into the room; from
| another room, she was 100% convinced it was myself speaking
| words I'd never spoken.
|
| I tried playing with a variety of sentences for a few hours and
| overall, it sounded almost exactly like me, to me, with the
| exception of some "attitude" and "intonation" I know I wouldn't
| use in my speech. I didn't notice much of an improvement using
| much longer clips; the short ones were "good enough".
|
| Tangentially, it really bugs me that most phone providers in
| the UK insist you record a "personal greeting" now before
| they'll let you check your voice mail box, I just record
| silence, because the last thing I want/need is a voicemail
| greeting in my voice confirming to some randomer I didn't want
| calling me, who I am and that my number is active, even more so
| knowing how I can clone any voice to a reasonably good accuracy
| with just a few seconds of audio.
|
| [0] https://github.com/coqui-ai/TTS [1] https://heywillow.io/
| morkalork wrote:
| Just need to use this with some recordings of Majel Barrett, make
| a voice interface for Claude's computer use agent and we'll be
| all set.
| pmarreck wrote:
| > Linux and Mac OS are not supported
|
| Well, that's a big old fail. Just a reminder: The given (and
| proper) home of open source is on an open source OS.
| bguberfain wrote:
| Thanks for sharing this! But I have some doubts about hidden
| installation procedures. It imports all functions from one_click
| (from one_click import *), which points to a compiled file. It
| then runs functions like install_webui and
| install_extra_packages. At least suspicious.
| vulcanidic wrote:
| Try recording the installation process with a camera. The
| entire installation process is displayed in the Windows
| command. It's just installing Python packages and downloading
| the AI model and audio files. That's all.
| didibus wrote:
| Pretty easy for a script to not print everything it does at
| the command line. You have to inspect the code if you want to
| be sure.
| bguberfain wrote:
| The file I mentioned is just the begining... there is a
| folder full of .dll files, renamed to .pyd. I understand that
| this is the proprietary part, that limits usage for 30
| minutes, but I think it is too closed for a MIT license.
| lysace wrote:
| I have resorted to using separate physical computers + vlan
| network separation when exploring untrusted AI workloads. Yes,
| it costs, but so does a breach.
|
| Thanks for raising this aspect.
|
| Btw https://github.com/haimgel/display-switch helps a lot.
| jordimash wrote:
| If you are looking for automatic dubbing without voice cloning:
| https://github.com/Softcatala/open-dubbing
| kristopolous wrote:
| The syncing of the original English is way off. I don't really
| know how they got that to be so broken.
| sroussey wrote:
| The description, since many commentators are not clicking though
| but asking questions this answers:
|
| Comprehensive Gradio WebUI for audio processing, powered by
| Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped).
| Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS),
| YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-
| TTS), and multi-language translation. Perfect for content
| creators and developers.
___________________________________________________________________
(page generated 2024-11-28 23:01 UTC)