hngopher.com

       [HN Gopher] Show HN: Voice-Pro - AI Voice Cloning
       ___________________________________________________________________
        
       Show HN: Voice-Pro - AI Voice Cloning
        
       Imagine creating a podcast where Mark Zuckerberg interviews Elon
       Musk - using their actual voices?  What sounds like science fiction
       is now reality.  Voice-Pro is an open-source Gradio WebUI that
       breaks the boundaries of audio manipulation.  Powered by cutting-
       edge Whisper engines, this tool turns voice replication into
       child's play.  Key Features:  - Zero-shot Voice Cloning  - Voice
       Changer with 50+ Celebrity Voices  - YouTube Audio Downloading  -
       Vocal Isolation  - Multi-Language Text-to-Speech (Edge-TTS, F5-TTS)
       - Multi-Language Translation  - Powered by Whisper Engines
       (Whisper, Faster-Whisper, Whisper-Timestamped)  Video Demos:  1.
       Voice-Pro Usage Tutorial: https://youtu.be/z8g8LMhoh_o  2. Voice
       Cloning Celebrity Podcast Demo: https://youtu.be/Wfo7vQCD4no  3.
       Full Demo Playlist:
       https://www.youtube.com/playlist?list=PLwx5dnMDVC9Y7dAjm9r26...
       Whether you're a content creator, developer, or audio experiment
       enthusiast,  Voice-Pro provides a user-friendly interface to push
       the boundaries of audio manipulation.  GitHub:
       https://github.com/abus-aikorea/voice-pro
        
       Author : abuskorea
       Score  : 222 points
       Date   : 2024-11-28 02:37 UTC (20 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jncfhnb wrote:
       | Is there speech to speech? I have been hoping for a model I can
       | use to do voice acting with inflection
        
         | amrrs wrote:
         | Do you mean Inflection's Pi?
        
           | bryanrasmussen wrote:
           | I think they mean speech "in the style of" the same as
           | repaint this picture in the style of Van Gogh, so they will
           | do the audio and put the correct inflection on things but
           | then rerender it with the voice of Katharine Hepburn for
           | example.
           | 
           | on edit: example of course showing the difficulty as so much
           | of Hepburn was her inflection.
        
             | jncfhnb wrote:
             | More so I wish to voice act a line and then have the bot
             | mimic it with a different voice but with the same
             | contextual voicing.
             | 
             | "I'm going to kill you" could be delivered (laughing
             | jokingly / seething with rage / ominously and creepily).
             | I'd like a bot that can mimic the delivery in a different
             | voice.
        
       | muglug wrote:
       | These tools make it very easy to scam vulnerable people, and have
       | pretty limited use otherwise.
        
         | tsujamin wrote:
         | Bulldozing grandma is just the cost of technological progress
         | /s
        
           | uh_uh wrote:
           | This tech is going to be ubiquitous, it's just too easy to
           | distribute it. Grandma better starts adapting now.
        
             | thejazzman wrote:
             | Because people make it so, not because the natural order of
             | the world gets us there
             | 
             | For some reason because we can validates that we should.
             | Any jackass has the power of a research team of phds. It's
             | kinda weird.
        
               | uh_uh wrote:
               | Demanding responsible behaviour from everybody is not
               | going to work. Some people don't care about negative
               | externalities that much and it's enough if only a few of
               | them decide not to play ball. So either grandma needs to
               | adapt which will upset some people or distributing the
               | tech should be regulated/prosecuted which will upset
               | another group of people.
        
               | rockemsockem wrote:
               | I think either way grandma needs to adapt though since
               | Russian scammers and trolls are still going to run scams
               | with fake voices.
        
               | 123yawaworht456 wrote:
               | how very politically correct of you to pretend it's
               | Russians who scam your grandmas
        
               | chefandy wrote:
               | Indeed. Humans ascended to dominance because we can
               | cooperate. This every-man-for-themself idea is an
               | aberration, not the natural order as so many claim. It's
               | rather astounding to think otherwise considering the
               | logistics of how we're communicating right now.
        
               | uh_uh wrote:
               | Cooperation works if the potential damage caused by a
               | rouge actor is sufficiently low. Otherwise, it's too easy
               | to sabotage things. This is why we don't want random
               | rouge states to have nukes. AI will give so much leverage
               | to rouge actors that it will significantly shift the game
               | theory in favour of not cooperating.
        
               | chefandy wrote:
               | > Cooperation works if the potential damage caused by a
               | rouge actor is sufficiently low. Otherwise, it's too easy
               | to sabotage things. This is why we don't want random
               | rouge states to have nukes. AI will give so much leverage
               | to rouge actors that it will significantly shift the game
               | theory in favour of not cooperating.
               | 
               | Governments successfully collectively controlling
               | dangerous things so they don't fall into the hands of
               | rogue bad actors fundamentally opposes the extreme
               | individualist every-man-for-himself perspective in every
               | conceivable way. It's the absolute opposite of "it's
               | everybody's responsibility to protect themselves because
               | everybody else is only going to look out for themselves."
               | 
               | And when individuals have that much leverage, collective
               | action is the only conceivable way to oppose it. Some of
               | those things might be cultural, like mores, some might be
               | laws, some might be more martial. I don't see how extreme
               | individualism even theoretically could be more powerful.
        
               | uh_uh wrote:
               | Are you suggesting government action against putting up
               | code like this to GitHub? It's ok if you are, but I want
               | to put into more concrete terms what we're talking about.
        
               | chefandy wrote:
               | You're the one that made the direct government control
               | analogy. I mentioned a number of non-individualistic
               | mechanisms in my previous comment. I'm not going to keep
               | engaging in a fishing expedition of things to argue about
               | -- I think it's pretty clear what aspect of your stance I
               | disagree with-- and am going to leave it at that.
        
             | chefandy wrote:
             | You can't adapt around brain age making it more difficult
             | to distinguish truth from lies.
        
             | casey2 wrote:
             | Yeah, I don't really get the hulabaloo, if granny doesn't
             | have the mental fortitude to keep up with the times she
             | shouldn't be managing her own money. I guess better her
             | son/daughter than a scammer but both are better than
             | letting money rot. Put granny on foodstamps and pay $1 for
             | her rent controled housing be done with it.
        
               | zelphirkalt wrote:
               | Are we forgetting, that there are many elderly people
               | without living descendants?
        
           | weq wrote:
           | This tech is not only great for bulldozing grandma, its great
           | at stealing content from other creators and rebranding it as
           | your own. Based on the github, it kind of seems like thats
           | exactly whats being advertised as the use case. Steal content
           | from BBC, cut it up and pull the noise out/vocals/revoice the
           | content so the algorithm cant detect the plagorism easily.
           | The imagine detection is no where no the audio detection for
           | copyright strikes.
           | 
           | There is a massive problem with this on youtube. Pretty much
           | every category on youtube now has a host of these bots
           | trolling content and playing the youtube strike system like a
           | banjo. There are channels detected to showing you how to
           | setup these content mills. This tool can make you good money.
        
             | sfjailbird wrote:
             | First generative AI destroyed Google search, and now it has
             | pretty much destroyed YouTube. Social platforms, including
             | this one, are probably goners too. We live in interesting
             | times.
        
         | Larrikin wrote:
         | I'm absolutely using celebrity voices for my Home Assistant
         | voice. Amazon has spent the last couple years removing the
         | voices for Alexa that people had paid for.
        
           | nickthegreek wrote:
           | I'd love some more info on using custom voices in HA. I have
           | an esp32-s3-box that I am setting up holiday to do voice with
           | HA.
        
           | pmarreck wrote:
           | If you have a how--to, I'd love to work on one for my home. I
           | feel like this is all right around the corner...
        
         | chefandy wrote:
         | Gen AI space to everyone else: _"Your computer scientists were
         | so preoccupied with whether or not they should, they didn't
         | stop to think if they could just do it anyway"_
        
         | chefandy wrote:
         | To be fair, they've got pretty serious potential for letting
         | tech companies get paid for a seasoned voice actor's unique
         | delivery, tone, inflection, etc rather than the voice actor
         | themselves.
        
           | whaaaaat wrote:
           | > they've got pretty serious potential for letting tech
           | companies get paid for a seasoned voice actor's unique
           | delivery, tone, inflection, etc rather than the voice actor
           | themselves.
           | 
           | I think you mean "steal the labor of an actor"?
        
             | chefandy wrote:
             | Sure, and people that already agree with you will feel good
             | reading it, but other people who don't agree see it as an
             | attack. It's pretty much impossible to slip a new idea into
             | someone's mind if your approach made them slam the door
             | before even considering it. So what's the benefit of saying
             | it like that?
        
               | gmueckl wrote:
               | It calls attention to the ethical implications of using a
               | part of someone else's personal identity without their
               | direct involvement.
        
               | MrDrMcCoy wrote:
               | Indirect involvement can still be ok within the confines
               | of a license agreement for using the actor's voice.
        
               | gmueckl wrote:
               | But this requires a legal framework that mandates such
               | licenses and effective emforcement / procecution of
               | violations.
               | 
               | As far as I know, most countries are lagging behind when
               | it comes to updating legislation to set binding rules
               | around that.
        
               | ideashower wrote:
               | > Indirect involvement can still be ok within the
               | confines of a license agreement for using the actor's
               | voice.
               | 
               | This assumes existence of a license agreement or
               | likeness/right of publicity law that prevents
               | unauthorized use. But this is far from the case.
               | 
               | Companies have shown willingness to use actors' voices to
               | create synthetic voices without permission, compensation,
               | or regard for their livelihoods. [1][2][3]
               | 
               | [1] https://animehunch.com/popular-japanese-voice-actors-
               | band-to...
               | 
               | [2] https://www.theatlantic.com/technology/archive/2024/0
               | 5/eleve...
               | 
               | [3] https://www.yahoo.com/entertainment/morgan-freeman-
               | calls-una...
        
               | MrDrMcCoy wrote:
               | Of course we need laws in place to require such
               | licensing. The fact that people are having their voice
               | stolen now does not mean that there should never be a
               | case where a voice can legally be cloned and used by a
               | third party.
        
               | chefandy wrote:
               | So does what I said. Someone taking pay for someone
               | else's work is pretty unambiguously shitty. But when you
               | call taking anything that isn't a physical item theft, a
               | large percentage of people-- especially in the 'data
               | wants to be free' crowd-- will roll their eyes, think
               | "that's ridiculous... they aren't stealing anything. That
               | voice actor still has their voice" and just stop
               | listening. The only people that feel the impact of
               | statements like that are people that already agree. It
               | turns it from an intellectual discussion to a
               | reinforcement of existing tribes. Divisive language works
               | for rallying those who already agree around a specific
               | cause but it's not even useless-- it's
               | counterproductive-- for changing people's minds. When's
               | the last time someone you disagreed with changed your
               | mind by being more aggressive towards your stance, and
               | more terse in their portrayal of the dichotomy? If you
               | can even think of one time that it has, you're in the
               | extreme minority.
        
         | ranger_danger wrote:
         | How many victims will it take for lawmakers to do something
         | about this?
        
           | tiborsaas wrote:
           | It's already illegal to scam somebody. While it's always
           | positive to protect people more, what can be done here? Any
           | alternative I can imagine is massively oppressive of the
           | current state of the software industry.
           | 
           | You can regulate large companies, you can regulate published
           | software sold for profit, but it's impossible to regulate
           | free and open source tools.
           | 
           | You essentially have to regulate access to computing power if
           | you want to prevent bad actors doing bad things using these
           | sort of tools.
        
             | bryanrasmussen wrote:
             | >You can regulate large companies, you can regulate
             | published software sold for profit, but it's impossible to
             | regulate free and open source tools.
             | 
             | Regulation is putting legal limitations on things, if it is
             | impossible to regulate free and open source tools then it
             | would be impossible to regulate murder and lots of other
             | things, but it turns out it isn't impossible, sure - murder
             | happens - but people get caught for it and punished.
             | 
             | Sorry, but this argument is much like the early internet
             | triumphalism - back when people said it was impossible to
             | regulate. Turns out lots of countries now regulate it.
        
               | tiborsaas wrote:
               | It depends on what you do with the tool. Going with your
               | murder analogy, if there's a stabbing epidemic what do
               | you do? 1) Ban knives 2) invest in public safety 3)
               | investigate the root causes and improve on them?
               | 
               | I'm also not sure what's so regulated about the internet
               | besides net neutrality in certain countries. Of course
               | the government can put limits on the network, like
               | banning services, but it's easy since they are rather
               | easy to target. With content traveling on the network
               | it's much harder to say if it's legit or not.
               | 
               | > lots of countries
               | 
               | What about those countries that don't regulate it and
               | people will keep pumping out better, leaner and faster
               | models from there? Spreading software is trivial, all you
               | achieve is the public won't be aware of what's possible.
               | 
               | The more I think about it if anything should be regulated
               | that's a requirement to provide third party (probably
               | government backed) ID verification system so it would be
               | possible for my mom to know it's me calling here.
               | Basically kill called ID spoofing.
        
               | bryanrasmussen wrote:
               | >I'm also not sure what's so regulated about the internet
               | besides net neutrality in certain countries.
               | 
               | generally things are regulated on the internet that were
               | not going to ever be regulated because it was on the
               | internet - example - sales taxes, perhaps you are old
               | enough to remember when sales tax collection would not
               | ever be enforceable on internet transactions - those
               | idiot lawyer don't know, it's on the internet, the sale
               | didn't happen in that country or in that state no sales
               | taxes will never happen on the internet hah hah. It's
               | unenforceable, it is logically undoable, there are so
               | many edge cases - ugh, the law just does not understand
               | technology!
               | 
               | oops, sales taxes now on internet purchases.
               | 
               | GDPR is another example of things that are regulated on
               | the internet that basically most of HN years before it
               | happened was completely convinced would be impossible!!
               | 
               | If this thing becomes too big a problem for the societies
               | regulations will be done, with varying levels of
               | effectiveness I'm sure.
               | 
               | And then in twenty years time we will be saying what, you
               | can't regulate genital eating viral synths because a guy
               | can make those in his garage and spread them via nasal
               | spray, this technology is unstoppable and unregulatable,
               | not like some open source deepfake library!!
        
               | bavell wrote:
               | It's always amusing listening to techies' musings on
               | law... lots of misunderstandings, I suspect due to the
               | helpful but inaccurate "code but for humans" analogy.
               | 
               | Obligatory/relevant xkcd: https://xkcd.com/538/
        
               | vunderba wrote:
               | Lots of countries impose exactly what specific
               | regulations with respect to open source tooling?
               | 
               | The closest thing I can think of is maybe the regulation
               | of DRM ripping tools, but they're still out there in the
               | wild and determined actors can easily get ahold of them.
               | So I'm not at all confident that regulation will have any
               | measurable meaningful effect.
        
               | notTooFarGone wrote:
               | The fable of the "determined actor".
               | 
               | The "determined actor" can get bombs, tanks, fissure
               | material. There noone says "WHELP they can get it anyway
               | so why bother regulating it LMAO" - somehow this is
               | different in anything not physical?
        
               | bryanrasmussen wrote:
               | >Lots of countries impose exactly what specific
               | regulations with respect to open source tooling?
               | 
               | that something is not currently regulated does not mean
               | it can never be regulated, further it does not seem
               | likely that they would regulate open source tooling but
               | rather some uses and if they open source tooling allowed
               | those uses then what would happen is -
               | 
               | github and other big sources of code would refuse to host
               | it as containing not legally allowed things, so for
               | example if they regulated it in the U.S then Github stops
               | allowing it, and everyone moves to some European git
               | provider.
               | 
               | At the same time bigger companies will stop using the
               | library because liability.
               | 
               | Europe then regulates and can't be in European git
               | repos.. at some point many devs abandon particular
               | library because not worth it (I get it this is actually
               | for the love of doing the illegal thing so they won't
               | abandon but despite the power of love most things in this
               | world do not actually run on it)
               | 
               | Can determined actors get ahold of them and do the things
               | with them the law forbids them to do, sure! That's called
               | crime. Then law enforcement catches determined actors and
               | puts them in prison, that's called the real world!
               | 
               | Will criminals stop - nope because there is benefit to
               | what they're doing. Maybe some will stop because they
               | will think screw it I can make more money working for the
               | man. And some will be caught sooner or later. And maybe
               | in version two of the regulations there will be AI
               | enhancements - this crime was committed with AI allowing
               | us to take all your belongings and add 10 years to your
               | sentence and deprive you of the right to ever own a
               | computing device again...etc. etc. And some people will
               | stop and others will get more violent and aggressive
               | about their criminal business.
               | 
               | I don't know necessarily what measurable meaningful
               | effect means, for some people it will be measurable and
               | meaningful, for some not, for some of society the
               | regulation would in many ways be worse than what it is
               | fighting against. I'm not saying regulation will solve
               | problems 100%, I'm just saying this whole they can't
               | regulate us thing because "TECH!!!" that developers seem
               | to regularly go through with anything they set their eye
               | on is a pipe dream.
        
             | mnau wrote:
             | > impossible to regulate free and open source tools
             | 
             | BS. Can you imagine a legislation? Yes, thus it can be
             | done.
             | 
             | As an early example, the CRA (Cyber Resilience Act) already
             | contains provisions about open source stewards and
             | security. So far they are legal persons, aka foundations,
             | but could easily relate to any contributor or maintainer.
        
               | tiborsaas wrote:
               | As I made the comment, I can't really imagine anything
               | that's not so absurd that has a more than zero chance of
               | happening.
               | 
               | Seriously, what can anybody do about random hacker Joe
               | publishing under the name XoX? Even if they burn GitHub
               | and friends to the ground, if something is useful it will
               | be really really hard to get rid of it. Remember youtube-
               | dl? It's now https://github.com/yt-dlp/yt-dlp
               | 
               | If they make anything that cripples open source
               | development they will feel it quite soon when they
               | realize that it also cripples their world as much of the
               | tooling and infrastructure also depends on it.
               | 
               | Killing open source is like killing the internet itself.
        
           | russell_h wrote:
           | Serious question: what do you think lawmakers should do?
        
             | ideashower wrote:
             | For people's image being used without their permission:
             | strengthen U.S. right of publicity laws with private right
             | of action, enabling people to sue for unauthorized use of
             | their voice or likeness.
        
             | ranger_danger wrote:
             | Digital signatures as part of audio/video that can't be
             | easily modified or faked which can trace the origin of a
             | piece of media. Some camera manufacturers are already
             | working on it.
        
               | CamperBob2 wrote:
               | How do you propose to keep watermark-free models out of
               | the hands of evildoers? I can't build my own digital
               | camera or laser printer, but I can certainly write
               | software.
        
           | 123yawaworht456 wrote:
           | how many victims did it take for lawmakers to do something
           | about Photoshop/GIMP/etc?
        
         | rockemsockem wrote:
         | Quit being a doomer or keep it to yourself. This reminds me of
         | the sound boards that were popular in the early 2000s except
         | way more versatile. Some things are just good for people to
         | have fun and THAT'S OKAY.
        
           | whaaaaat wrote:
           | People are allowed to recognize the realistic negative
           | outcomes of technology, especially on a forum that frequently
           | discusses the tradeoffs of modern, cutting edge technologies.
        
             | rockemsockem wrote:
             | So many AI posts are overrun with this kind of complaining
             | from folks with limited imaginations.
             | 
             | On a forum that frequently discusses technology with
             | enthusiasm you'd think there'd be more enthusiasm and more
             | constructive criticism instead of blanket write-offs.
        
               | Mordisquitos wrote:
               | I would argue that being able to see the drawbacks and
               | potential negative externalities of a new technology is
               | not a sign of a "limited imagination", but quite the
               | contrary. An actual display of a limited imagination is
               | the inability to imagine how a new technology can (and
               | will) be abused in society by bad actors.
        
               | Ukv wrote:
               | Developing some insight on its negative potential could
               | demonstrate imagination, but the claim that it could be
               | used to scam people is pretty much just rote repetition
               | by now - an obligatory point made in every article and
               | under every post about this tech (and not something that
               | I think actually works out in practice the way most
               | imagine it, since cold-call scam operations that dial
               | numbers at a huge scale expecting most not to pick up
               | can't really find a voice clip prior to each automated
               | call).
               | 
               | As for positive applications, some I see:
               | 
               | * Allowing those with speech impairments to communicate
               | using their natural voice again
               | 
               | * Allowing those uncomfortable with their natural voice,
               | such as transgender people, to communicate closer to how
               | they wish to be perceived
               | 
               | * Translation of a user's voice, maintaining emotion and
               | intonation, for natural cross-language communication on
               | calls
               | 
               | * Professional-quality audio from cheap microphone setups
               | (for video tutorials, indie games, etc.)
               | 
               | * Doing character voices for a D&D session, audiobook,
               | etc.
               | 
               | * Customization of voice assistants, such as to use a
               | native accent/dialect
               | 
               | * Movies, podcasts, audiobooks, news broadcasts, etc.
               | made available in a huge range of languages
               | 
               | * If integrated with something like airpods, babelfish-
               | like automatic isolation and translation of any speech
               | around you
               | 
               | * Privacy from being able to communicate online or record
               | videos without revealing your real voice, which I think
               | is why many (myself included) currently resort to text-
               | only
               | 
               | * New forms of interactive media - customised movies,
               | audio dramas where the listener plays a role, videogame
               | NPCs that react with more than just prerecorded lines,
               | etc.
               | 
               | * And of course: memes, satire, and parody
               | 
               | I appreciate HN's general view on technologies like
               | encrypted messaging - not falling into "we need to ban
               | this now because pedophiles could use it" hysteria. But
               | for anything involving machine learning, I'm concerned
               | how often the hacker mentality seems to go out the window
               | and we instead get people advocating for it to be made
               | illegal to host the code, for instance.
        
               | Mordisquitos wrote:
               | Of the 11 positive applications that you listed, only the
               | 1st, 3rd, 11th and arguably the 4th would benefit from
               | voice _cloning_ , which is what's being promoted here.
               | The rest are solved merely by (improved) TTS and do not
               | require the cloning of any actual human voice.
               | 
               | Also, notice how the legitimate use-cases 1, 3 and 4
               | imply the user consenting to clone _their own_ voice,
               | which is fine. However, the only use-case which would
               | require cloning a specific human voice belonging to a
               | third party, use-case 11, is _" memes, satire, and
               | parody"_... and not much imagination is needed to see how
               | steep and buttery that Teflon slippery slope is.
        
               | Ukv wrote:
               | > Of the 11 positive applications that you listed, only
               | the 1st, 3rd, 11th and arguably the 4th would benefit
               | from voice cloning, which is what's being promoted here.
               | The rest are solved merely by (improved) TTS and do not
               | require the cloning of any actual human voice.
               | 
               | 2, 5, 6, 9: It's true that in theory all you need is some
               | way to capture the characteristics of a desired voice,
               | but voice-cloning methods are the way to do this
               | currently. If you want a voice assistant with a native
               | accent, you fine-tune on the voice of a native speaker -
               | opposed to turning a bunch of dials manually.
               | 
               | 7, 8, 10: Here I think there _is_ benefit specifically
               | from sounding like a particular person. The dynamically
               | generated lines of movie characters /videogame NPCs
               | should be consistent with the actor's pre-recorded lines,
               | for instance, and hearing someone in their own voice is
               | more natural for communication and makes conversation
               | easier to follow.
               | 
               | Pedantically, what's promoted here is a tool which
               | features voice cloning prominently but not exclusively -
               | other workflows demonstrated (like generating subtitles)
               | seem mostly unobjectionable.
               | 
               | > Also, notice how the legitimate use-cases 1, 3 and 4
               | imply the user consenting to clone their own voice, which
               | is fine
               | 
               | I think all, outside of potentially 8 and 11, could be
               | done with full consent of the voice being cloned - an
               | agreement with the movie actor to use their voice for
               | dubbing to other languages, for example. That's already a
               | significant number of use-cases for this tool.
               | 
               | > use-case 11, is "memes, satire, and parody"... and not
               | much imagination is needed to see how steep and buttery
               | that Teflon slippery slope is.
               | 
               | IMO prohibition around satire/parody would be the
               | slippery slope, particularly with the potential for
               | selective enforcement.
        
         | casey2 wrote:
         | I like tools like these cause they make zero trust default even
         | more obvious, and their "pretty limited use" is saving people
         | hours of work.
        
         | anonzzzies wrote:
         | They are pretty good for leaving messages for my blind friend.
         | I generally find calling / voice texts a waste of time (I type
         | and read far faster than I talk or listen, not to mention the
         | ability to reread etc), but my blind friend prefers getting
         | voice messages when on his phone and this works for us. I type
         | and send and when he comes back with something, Whisper makes
         | it into text for me.
        
         | mistercow wrote:
         | It's weird to me that people look at a technology and then
         | assume from their reckoning that they know all the uses for
         | that technology immediately. Most technological progress
         | happens because someone notices a creative use for something
         | that already exists which nobody else has noticed.
        
       | yawnxyz wrote:
       | > When Windows Defender mistakenly recognizes a [virus] as a
       | Trojan, this is often called a 'False Positive'. To solve this
       | problem, you can go through the following steps:
        
         | kfarr wrote:
         | Yeah I also noticed the install instructions is run this batch
         | file that gets administrator access and starts downloading
         | things...
        
           | gruez wrote:
           | It's not any worse than all the projects on github with an
           | "easy" install instructions of "curl ... | sudo sh". Heck,
           | even an innocent "sudo make install" command can easily
           | contain a malicious payload.
        
             | chefandy wrote:
             | Yeah it's not great but it's definitely not unusual. And
             | windows reputation-based execution blocking does have false
             | positives. I work for a company that has some very very
             | popular products and some that only see a few dozen
             | downloads per week, and despite being signed, it still
             | takes a while for new versions to build enough rep to not
             | trigger the block.
        
             | tonyedgecombe wrote:
             | It's not really the sort of tool that should require admin
             | rights though.
        
             | elif wrote:
             | Yea not to mention the entire homebrew ecosystem is built
             | around trusting random people's shell scripts.
             | 
             | MacOS devs blindly trust it like it's the app store.
        
               | pmarreck wrote:
               | A simple `brew cat <packagename>` (possibly piping to bat
               | if you want syntax highlighting) should spit out the ruby
               | install formula for that package, for inspection.
        
               | nozzlegear wrote:
               | The assumption is that maintainers at Homebrew are
               | reviewing each pull request before being merged, though
               | it's obviously not a full security audit. Homebrew will
               | also use macOS's sandboxing if a formula needs to be
               | built during installation, which will limit file access
               | to specific Homebrew directories and restrict network
               | access.
               | 
               | But I agree that everyone should review the Homebrew
               | install script for any package they're installing if
               | they're concerned about security.
        
       | safeimp wrote:
       | Project looks interesting. Are there short term plans to support
       | MacOS?
       | 
       | If not, any recommendations for alternative projects?
        
       | ilrwbwrkhv wrote:
       | There are a bunch of yc start-ups who are building new models and
       | stuff in the space. I fear they are going to get decimated really
       | soon as the quality of local llamas keep improving.
        
       | shannifin wrote:
       | I don't have much real use for celebrity voices (other than fun
       | experimentation), but I'd love to be able to clone my own voice
       | and character voices for the purposes of creating audiobooks /
       | audioplays without having to pay monthly fees with monthly usage
       | limits. So I'm excited by this sort of project!
       | 
       | P.S. Are there any tools for synthetic voice creation? Maybe
       | melding two or more voices together, or just exploring latent
       | space? Would be fun for character creation to create completely
       | new voices.
        
         | dyauspitr wrote:
         | I've used tortoise tts before and trained it on my voice and a
         | mix of voices. It's not perfect but still impressive.
        
         | thelittleone wrote:
         | Have you tried eleven labs? I used that. Had to record 3 hours
         | of training audio reading books and and news articles. But the
         | result was really good.
        
           | shannifin wrote:
           | They're great! They just cost too much for how much output I
           | want.
        
           | stavros wrote:
           | How much did the training cost?
        
         | vunderba wrote:
         | I'd be interested as well. This is where I imagine the space is
         | going - particularly as the potential for litigation increases
         | around cloning.
         | 
         | Game studios will spin up a bunch of unique virtual voices for
         | all the dialogue of extras. It'll probably be longer before we
         | see replacements of main characters though. There's been some
         | research in speech-to-speech transference as well - this means
         | that company employee A records the character B's line with the
         | appropriate emotional nuance (angry, sad, etc.) and the
         | emotional aspect is copied on top of the generated TTS.
        
         | jerpint wrote:
         | StyleTTSv2 is pretty good and open source, you can easily
         | traverse its latent space for voice
        
       | joshdavham wrote:
       | Looks cool! Also, is there a reason you went with a Web-UI
       | instead of making a native desktop app?
        
       | harryf wrote:
       | Have you considered supporting whisper-at -
       | https://github.com/YuanGongND/whisper-at ? Being able to identify
       | sounds on a timeline can be useful e.g. politicians speech and
       | how the audience is reacting to it (e.g. clapping, applauding)
        
       | newusertoday wrote:
       | are there any TTS models which are decent but can work on devices
       | without GPU and have relatively low RAM(4GB)
        
       | grahamgooch wrote:
       | Great stuff well done. What is your latency for real time Audio?
        
       | whaaaaat wrote:
       | > Imagine creating a podcast where Mark Zuckerberg interviews
       | Elon Musk - using their actual voices?
       | 
       | I'm imagining it. It _sucks_ to imagine.
       | 
       | I'm imagining it being used to scam people. I'm imagining it to
       | leech off of performers who have worked very hard to build a
       | recognizable voice (and it _is_ a lot of work to speak like a
       | performer). I 'm imagining how this will be used in revenge porn.
       | I'm imagining how this will be used to circumvent access to voice
       | controlled things.
       | 
       | This is bad. You should feel bad.
       | 
       | And I know you are thinking, "Wait, but I worked really hard on
       | this!" Sorry, I appreciate that it might be technically
       | impressive, but you've basically come out with "we've invented a
       | device that mixes bleach and ammonia automatically in your
       | bedroom! It's so efficient at mixing those two, we can fill a
       | space with chlorine gas in under 10 seconds! Imagine a world
       | where every bedroom could become a toxic site with only the push
       | of a button.
       | 
       | That this is posted here, proudly, is quite frankly astoundingly
       | embarrassing for you.
        
         | farzd wrote:
         | You do realise this is not the first AI release to clone
         | voices?
        
           | yyuugg wrote:
           | I don't think the parent said they were. "I'm the Nth person
           | to do a shitty thing!" doesn't absolve them of doing a shitty
           | thing. Just because there are other thieves doesn't make
           | theft ok.
        
           | cess11 wrote:
           | Sure, and PoisonIvy wasn't the first RAT. So what? Does it
           | get more ethical to assist fraudsters and so on once more
           | people are doing it?
        
         | Ukv wrote:
         | I'd claim the way most people imagine it being used for
         | scamming, cold-calls impersonating someone the victim knows,
         | doesn't really end up working out in practice because scam
         | operations dial numbers at a huge scale expecting most not to
         | pick up a "scam likely" call (or be away, or a dead number,
         | etc.). Having to find a voice clip prior to each unanswered
         | call would tank the quantity they're able to make.
         | 
         | For spear-phishing (impersonate CEO, tell assistant to transfer
         | money) it's more feasible, but I hope it forces acceptance that
         | "somebody sounds like X over the phone" is not and has never
         | been a good verification method - people have been falling for
         | scams like those fake ransom calls[0] for decades.
         | 
         | Not that there aren't potential harms, but I think they're
         | outweighed by positive applications. Those uncomfortable with
         | their natural voice, such as transgender people, can
         | communicate closer to how they wish to be perceived - or
         | someone whose voice has been impaired (whether just a temporary
         | cold or a permanent disorder/illness/accident) can use it from
         | previous recordings. Privacy benefits from being able to
         | communicate online or record videos without revealing your real
         | voice, which I think is why many (myself included) currently
         | resort to text-only. There's huge potential in the translation
         | and vocal isolation aspects aiding communication - feels to me
         | as though we're heading towards creating our own babelfish.
         | There's also a bunch of creative applications - doing character
         | voices for a D&D session or audiobook, memes/satire, and likely
         | new forms of interactive media (customised movies, audio dramas
         | where the listener plays a role, videogame NPCs that react with
         | more than just prereccorded lines, etc.)
         | 
         | [0]: https://www.fbi.gov/news/stories/virtual-kidnapping
        
           | yyuugg wrote:
           | I think most people in America are more wary of foreign
           | sounding voices. If the person on the other end sounds like a
           | good ol boy, they get more trust.
           | 
           | Scammers don't have to sound like a specific person to be
           | helped by software like this.
        
             | Ukv wrote:
             | That aspect feels to me like "I used to racially profile
             | people on the street to judge risk, but winter clothing now
             | obscures skin color at a distance". There are heuristics
             | that give non-zero information but are harmful to use, with
             | the cost borne by some marginalized group, and I don't see
             | it as a negative for use of such heuristics to be made less
             | feasible. Reducing people's use of accent as a factor would
             | be a positive for the ~1.5B Indians that aren't scammers,
             | for instance.
             | 
             | I think there's also an autonomy argument to be made, if
             | the alternative is to the effect of ensuring that people
             | cannot use tools hide their accent (and particularly if, as
             | above, the intent is so they can be discriminated against
             | based on it). Even though it isn't something we've really
             | been able to do before, I think it's generally a person's
             | own right to modify their voice.
        
       | aboardRat4 wrote:
       | Without Linux support it is going to have a very limited
       | audience.
        
         | okwhateverdude wrote:
         | There is nothing in here that precludes you from running this
         | on any OS that supports python + CUDA. They use miniconda for
         | installation of python and python packages, but this could just
         | as easily be a venv + system CUDA install or even better: a
         | container. This is only one tiny Dockerfile away from running
         | anywhere.
        
       | vunderba wrote:
       | I do think that voice cloning for personal usage has actual
       | genuine uses - in fact there was a relatively interesting news
       | article about a person who was irrevocably losing their voice who
       | had their vocal pattern cloned.
       | 
       | https://www.voanews.com/a/illness-took-away-her-voice-ai-cre...
       | 
       | That being said, it does seem a bit bizarre that the repo's home
       | page is proudly trumpeting the ability to co-opt other people's
       | identities without their permission (and yes your unique vocal
       | pattern is definitely part of your identity - I mean it's used in
       | some forms of biometric data). They're doing the project a bit of
       | a disservice.
        
         | onetokeoverthe wrote:
         | _proudly trumpeting the ability to co-opt other people 's
         | identities without their permission_
         | 
         | EXACTLY. Clone the wrong person's voice and it's game over.
        
         | satvikpendem wrote:
         | It's useful for some things, like satire. Presidents Play is a
         | good series in YouTube where it uses US presidents' cloned
         | voices for comedic satire.
        
           | bbarnett wrote:
           | A gun is useful to shoot someone, what has that to do with it
           | being right or wrong?
        
             | satvikpendem wrote:
             | Not sure you picked the most cogent example because lots of
             | people will debate you on that topic...
        
         | VPenkov wrote:
         | It does have actual genuine uses. I'm in the process of
         | recording a series of tutorials for my peers but I'd like them
         | to hear things in my voice so it doesn't sound like I have
         | offloaded the work to someone else.
         | 
         | I don't know if this helps or harms the credibility but I can't
         | really talk more than an hour without seriously straining my
         | voice. So cloning it sounds like a great use-case for someone
         | with a similar problem.
         | 
         | Looking forward to trying this.
        
           | vunderba wrote:
           | I like this idea. I've been playing with the idea of having
           | all my blog entries have corresponding narration with my own
           | voice but I'd love to see some kind of voice cloner + gradio
           | interface that let's me make some adjustments to things like
           | cadence, delivery, etc. (I mean beyond just making me sound
           | like Alvin and the Chipmunks).
        
         | NoMoreNicksLeft wrote:
         | When my IoT geiger counter starts going off, I do what the in-
         | home PA system's voice to be Admiral Adama warning my family of
         | an imminent radiological threat, and preparing the Vipers for
         | launch.
         | 
         | Edward James Olmos if you're reading this, I'm willing to pay a
         | license fee, but then I expect actual recordings and not just
         | AI bullshit. I'm not pirating your voice, you're refusing to
         | let me hire it.
        
         | ranger_danger wrote:
         | Randy Travis also used AI on his last album after losing his
         | voice.
        
         | chefandy wrote:
         | Of course there are legitimate uses, which means everyone
         | should have completely unfettered access and nobody selling it
         | should be responsible for irresponsible users. Personally, I'm
         | sick of the government limiting my artistic freedom because the
         | mediums I use might be misused by a tiny group of bad actors.
         | For example, it's unnecessarily difficult to source pineapple
         | grenades for my large scale abstract punched tin crafts. The
         | other people who live in my apartment building haven't
         | complained when I asked if they had a problem with it, so
         | what's the problem? And when I can get ahold of it, white
         | phosphorous makes a great addition to my annual deep-woods
         | pyrotechnic light shows. I just don't understand this nanny
         | state garbage.
        
           | notpachet wrote:
           | Take my upvote you greedy bastard.
        
       | wingworks wrote:
       | Just heads up, this is a trail, you have to pay to use it after
       | 30mins..
       | 
       | Easier and (cheaper?) to just use elevenlabs.
        
         | vulcanidic wrote:
         | It's a bit of a hassle, but after closing the Windows command,
         | you can restart the program and use it indefinitely. The
         | results you worked on will still remain in the workspace
         | folder.
        
         | ldoughty wrote:
         | Yeah, felt like it positions itself as open source project here
         | and on GitHub, but buries the cost in other pages... Doesn't
         | even say the subscription cost anywhere I could find (in
         | English). Not a huge fan of this advertising model.
        
         | jamesy0ung wrote:
         | I haven't looked at the code, but can you just patch out the 30
         | minute limit?
        
           | batch12 wrote:
           | Looks to me like the app code is compiled into pyd files. One
           | could try and decompile. Interestingly, it's licensed as MIT.
        
       | XorNot wrote:
       | The real utility of something like this is for reducing the
       | creative costs of voice-acting. i.e. something like this is a
       | massive boone for mod-makers where making fully voiced anything
       | is a huge undertaking - i.e. while my friends and family could
       | probably provide their voice if I asked, getting a decent
       | recording and performance out of them is just not going to be
       | possible.
       | 
       | But if I can get the performance I want and shift it to another
       | voice, then fully voicing free works becomes very accessible
       | (even better would be generative AI which could take a sample of
       | what you want and re-render it into something which sounds like a
       | more professional performance - voice in-fill I suppose).
        
       | youngNed wrote:
       | I'm looking down the comments, but not really seeing much about
       | what this actually is, by my very quick look, it's a front end
       | for f5-tts with a yt-dlp and whisper?
       | 
       | Is there anything new in this?
        
         | dangoodmanUT wrote:
         | Yeah they made an easy to use frontend. Don't be the dropbox
         | guy
        
           | vulcanidic wrote:
           | I completely agree with you. This is just a web front-end,
           | and there's nothing new about it. However, it's very easy.
           | It's not easy to create something like this.
        
           | Uehreka wrote:
           | We can't just keep saying "Don't be the dropbox guy" as a
           | comeback to criticism of new technology. Anyone who uses that
           | phrase should have to place a bet in a prediction market that
           | only pays out if the product they're talking about succeeds.
           | Blindly supporting stuff out of a sort of "Pascal's Wager
           | against looking foolish later" should have some cost if
           | you're wrong.
        
             | bn-l wrote:
             | Let's default to being supportive and very careful with
             | being negative.
        
               | Uehreka wrote:
               | That kind of imbalance makes it easier for scammers and
               | hucksters to get away with things. It is not a feelgood
               | prescription with no cost.
        
           | youngNed wrote:
           | Wind your neck in.
           | 
           | I simply asked "is there anything new in this?" because, i
           | was interested to know if, you know, there was anything new
           | in this.
        
       | OceanBreeze77 wrote:
       | Are banks moving away from voice verification as a means to
       | identity checks? It seems like it's getting easier and easier to
       | clone voices.
        
       | tgv wrote:
       | I'm with the nay-sayers. Your product doesn't bring any good to
       | this world, but it does make it easier to harm people. It's a
       | disgrace.
        
         | Ylpertnodi wrote:
         | "If, by whiskey...."
        
       | Hard_Space wrote:
       | This doesn't appear to have any training facility, so its misuse
       | would seem to be limited to the pre-trained voices supplied - for
       | the casual user (and the ease-of-use seems to be the central
       | issue in these comments).
        
         | throwaway314155 wrote:
         | My experience with voice cloning is that training is typically
         | not required for it to work. You just embed a bit of audio of
         | the desired voice to be cloned using the backing VAE and the
         | model can do the rest.
         | 
         | Is it not the same with this project?
        
       | deskr wrote:
       | Isn't it funny how some text changes the voice in your head? Now
       | you're hearing the best voice. It's amazing. I tell you. It's the
       | greatest voice. Everybody's talking about it. They are saying
       | it's incredible. They say they've never heard as beautiful a
       | voice before.
        
         | cies wrote:
         | I needed until "Everybody's talking about it" to hear it in
         | _his_ voice :)
         | 
         | Please no spoilers!
        
         | amazingamazing wrote:
         | Voices can be beautiful.
        
         | bitwize wrote:
         | When Arnold Schwarzenegger was governor of California, he
         | refused clemency for notorious gang founder Stanley "Tookie"
         | Williams, who was sentenced to death for four murders in 1979.
         | 
         | https://www.ocregister.com/2005/12/12/governors-full-stateme...
         | 
         | Reading over the governor's statement explaining his reasons
         | for denial of clemency, my brain couldn't help but do so in an
         | Arnold voice. Sometimes, to amuse friends, I would read
         | portions of it aloud while doing the voice.
         | 
         | Maybe it's a bit tasteless, like the anime-girl Demon Core
         | memes, but there's just something about hearing the legal and
         | administrative justification for proceeding with an execution
         | in the voice of the Terminator.
         | 
         | I'm the same way with famous YouTubers. If I see "Guru Larry"
         | Bundy Jr. or Clint "LGR" Basinger leave a comment on someone
         | else's video, my brain reads it in their voice.
        
       | giarc wrote:
       | My neighbour is a detective and did a course on crypto scams. He
       | told me scammers call someone's cell phone, record their
       | voicemail greeting and use that to clone their voice. Then can
       | then have a very real life conversation with their grandparent
       | and take their money.
       | 
       | I'm all for innovation, but I don't really see the use case of
       | cloning random voices to make podcasts? Listening to Zuck
       | interview Elon? ok...?
        
         | eurekin wrote:
         | Technically, wouldn't a simple "Hold on, I'll call you back"
         | test call stop that?
        
           | stitched2gethr wrote:
           | Yes, if the callee has reason to believe the caller isn't who
           | they say they are. But this will never enter the mind of
           | someone who's retirement age.
        
           | bagels wrote:
           | Some old people become very gullible.
        
             | Loughla wrote:
             | In all fairness, the number of old people who even know
             | that realistic recreations of their loved ones voices is
             | even possible is probably pretty low.
        
           | a2128 wrote:
           | Scammers will use pressure and emotion. "Grandpa they put me
           | in jail, I need you to bail me out please, there's not much
           | time!" The last thing on the victim's mind is to hang up on
           | what sounds like their crying distressed grandson to call
           | them back. Sometimes even calling back won't work, the real
           | grandson isn't picking up their phone and the scammer is
           | saying that it's because they're in jail and their phone was
           | taken.
        
             | botanical76 wrote:
             | I've been thinking a lot about this possibility. I think
             | people will have to come up with family passwords
             | eventually. A word or phrase that is regularly practised,
             | but strictly private, for verification in times of crisis.
             | 
             | For example, my family's passphrase is- just kidding.
        
               | hollerith wrote:
               | Either than or Android and iOS will add something like
               | Caller ID but with actual authentication.
        
               | notpachet wrote:
               | My family already does this.
        
         | alias_neo wrote:
         | It's really easy for a technical person to do as well.
         | 
         | I use Coqui TTS[0] as part of my home automation, I wrote a
         | small python script that lets me upload a voice clip for it to
         | clone after I got the idea from HeyWillow[1], and a small shim
         | that lets me send the output to a Home Assistant media player
         | instead of using their standard output device. I run the TTS
         | container on a VM with a Tesla P4 (~PS100 to buy) and get about
         | 1x-2x (roughly the same time it'd take to say it, to process)
         | using the large model.
         | 
         | Just for a giggle, I uploaded a few 3s-5s second clip of myself
         | speaking and cloned my voice, then executed a command to our
         | living room media player to call my wife into the room; from
         | another room, she was 100% convinced it was myself speaking
         | words I'd never spoken.
         | 
         | I tried playing with a variety of sentences for a few hours and
         | overall, it sounded almost exactly like me, to me, with the
         | exception of some "attitude" and "intonation" I know I wouldn't
         | use in my speech. I didn't notice much of an improvement using
         | much longer clips; the short ones were "good enough".
         | 
         | Tangentially, it really bugs me that most phone providers in
         | the UK insist you record a "personal greeting" now before
         | they'll let you check your voice mail box, I just record
         | silence, because the last thing I want/need is a voicemail
         | greeting in my voice confirming to some randomer I didn't want
         | calling me, who I am and that my number is active, even more so
         | knowing how I can clone any voice to a reasonably good accuracy
         | with just a few seconds of audio.
         | 
         | [0] https://github.com/coqui-ai/TTS [1] https://heywillow.io/
        
       | morkalork wrote:
       | Just need to use this with some recordings of Majel Barrett, make
       | a voice interface for Claude's computer use agent and we'll be
       | all set.
        
       | pmarreck wrote:
       | > Linux and Mac OS are not supported
       | 
       | Well, that's a big old fail. Just a reminder: The given (and
       | proper) home of open source is on an open source OS.
        
       | bguberfain wrote:
       | Thanks for sharing this! But I have some doubts about hidden
       | installation procedures. It imports all functions from one_click
       | (from one_click import *), which points to a compiled file. It
       | then runs functions like install_webui and
       | install_extra_packages. At least suspicious.
        
         | vulcanidic wrote:
         | Try recording the installation process with a camera. The
         | entire installation process is displayed in the Windows
         | command. It's just installing Python packages and downloading
         | the AI model and audio files. That's all.
        
           | didibus wrote:
           | Pretty easy for a script to not print everything it does at
           | the command line. You have to inspect the code if you want to
           | be sure.
        
           | bguberfain wrote:
           | The file I mentioned is just the begining... there is a
           | folder full of .dll files, renamed to .pyd. I understand that
           | this is the proprietary part, that limits usage for 30
           | minutes, but I think it is too closed for a MIT license.
        
         | lysace wrote:
         | I have resorted to using separate physical computers + vlan
         | network separation when exploring untrusted AI workloads. Yes,
         | it costs, but so does a breach.
         | 
         | Thanks for raising this aspect.
         | 
         | Btw https://github.com/haimgel/display-switch helps a lot.
        
       | jordimash wrote:
       | If you are looking for automatic dubbing without voice cloning:
       | https://github.com/Softcatala/open-dubbing
        
         | kristopolous wrote:
         | The syncing of the original English is way off. I don't really
         | know how they got that to be so broken.
        
       | sroussey wrote:
       | The description, since many commentators are not clicking though
       | but asking questions this answers:
       | 
       | Comprehensive Gradio WebUI for audio processing, powered by
       | Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped).
       | Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS),
       | YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-
       | TTS), and multi-language translation. Perfect for content
       | creators and developers.
        
       ___________________________________________________________________
       (page generated 2024-11-28 23:01 UTC)