[HN Gopher] OpenVoice: Versatile Instant Voice Cloning
       ___________________________________________________________________
        
       OpenVoice: Versatile Instant Voice Cloning
        
       Author : saeedesmaili
       Score  : 258 points
       Date   : 2024-01-01 15:16 UTC (7 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | peddling-brink wrote:
       | GitHub: https://github.com/myshell-ai/OpenVoice Checkpoint:
       | hxxps://myshell-public-repo-
       | hosting.s3.amazonaws.com/checkpoints_1226.zip
       | 
       | (Checkpoint link defanged because I'm allergic to direct links to
       | zip files hosted on Amazon. Nor have I reviewed what the file
       | contains.)
        
         | crazysim wrote:
         | Thanks for the link to the repo. It's very useful.
         | 
         | As for the checkpoint, I'm not allergic and I don't do security
         | theater:
         | 
         | https://github.com/myshell-ai/OpenVoice?tab=readme-ov-file#i...
         | links to
         | 
         | https://myshell-public-repo-hosting.s3.amazonaws.com/checkpo...
        
           | peddling-brink wrote:
           | Why do you call that security theater? I found and provided
           | the information, but didn't make it clickable. Anyone can
           | decide for themselves to navigate there.
           | 
           | Your comment comes off as passive aggressive.
        
             | IshKebab wrote:
             | I think he's referring to your "defanging" which you
             | implied was security related but doesn't actually achieve
             | anything at all.
        
               | fieldcny wrote:
               | They are making you think about what you are doing before
               | you click the link. that's not theatre that's keeping
               | people from clicking arbitrary links to zip files which
               | can auto-execute code once downloaded.
               | 
               | I'd suggest that those who think it is theatre probably
               | don't understand the implications of that action.
        
               | arccy wrote:
               | just downloading a zip file won't auto execute anything.
               | and you can't meaningfully review it without downloading
               | it, so it pretty much is security theatre
        
               | seabass-labrax wrote:
               | On which operating systems can Zip files automatically
               | self-execute? Android .APKs come to mind, although in
               | this case, Android asks you whether you want to install
               | the application and thus gives you a chance to prevent
               | the execution.
        
               | IshKebab wrote:
               | We understand exactly the implications of that action.
               | There are no implications.
               | 
               | Simply downloading a zip from Amazon has zero risk. Even
               | _opening_ an arbitrary zip has essentially zero risk. RCE
               | from opening a zip is obviously a really critical and
               | valuable vulnerability and would not be wasted with a
               | public link.
               | 
               | Combine that with the fact that this comes from a voice
               | cloning GitHub repo and the chance of this having some
               | 0-day are infinitesimal.
               | 
               | Finally just making the link non-clickable does not add
               | security. Nobody can take any action to increase their
               | security because they have to slightly edit a link (not
               | that they would because it's sensible a clickable link in
               | the GitHub readme).
               | 
               | So yes, I fully understand the implications and it is
               | definitely security theatre.
               | 
               | I suggest that those who think that it _isn 't_ probably
               | haven't really thought about the threat model.
        
             | janalsncm wrote:
             | What is the threat vector of the functional https link that
             | hxxps solves?
        
         | dotancohen wrote:
         | What does allergic mean in this context?
        
           | peddling-brink wrote:
           | That file could contain anything. I don't know the authors or
           | have any idea of their reputation.
           | 
           | I wanted to expose it so people didn't have to comb through
           | the github, but decided to make it unclickable out of an
           | abundance of caution. This appears to have offended people.
           | 
           | I would not have hesitated to link to hugging face. That is a
           | known quantity.
        
             | chrisweekly wrote:
             | FWIW I appreciate the courtesy and context; agreed that
             | it's not the best idea to link directly to zip files (let
             | alone those of questionable provenance).
        
       | dcreater wrote:
       | Any GitHub link?
        
         | saeedesmaili wrote:
         | Github: https://github.com/myshell-ai/OpenVoice Demo:
         | https://research.myshell.ai/open-voice
        
       | smellf wrote:
       | Examples: https://research.myshell.ai/open-voice
       | 
       | Seems impressive!
        
       | colesantiago wrote:
       | > This repository is licensed under a Creative Commons
       | Attribution-NonCommercial 4.0 International License, which
       | prohibits commercial usage. MyShell reserves the ability to
       | detect whether an audio is generated by OpenVoice, no matter
       | whether the watermark is added or not.
       | 
       | So it is not 'open' then and you cannot make money out of this?
        
         | DandyDev wrote:
         | It is open, just not by your definition. You can view, use and
         | modify the code to your hearts content. Sounds pretty open to
         | me!
        
           | jahewson wrote:
           | Open for business!
           | 
           | No wait...
        
           | CaptainFever wrote:
           | To be specific, while it is not a bad license, it does not
           | quality for the Free Cultural Works mark as defined by the
           | Creative Commons and Freedom Defined:
           | https://creativecommons.org/public-domain/freeworks/
        
           | bbor wrote:
           | Well... "use" isn't exactly free, this the complaint. On a
           | scale of free to not free, "cannot use this for my work" is a
           | pretty big jump to the latter end IMO
        
             | c0pium wrote:
             | Careful, you're saying the quiet part out loud; freedom is
             | about profiting off the uncompensated work of others.
        
               | bbor wrote:
               | Well ultimately we all need to eat. If someone wants to
               | be compensated in today's society, they either need to
               | join a gift-based sub-society (see: OSS foundations, NGOs
               | in general) or sell something. Trust me, I totally agree
               | that freedom of information should be a completely
               | separate concern from resource allocation
               | 
               | EDIT: I guess there's a third option, "work another job
               | and use OSS on your off hours". Which feels... idk,
               | disrespectful of the whole enterprise. OSS software
               | development is important enough to deserve a wage IMO, to
               | say the least
        
               | satvikpendem wrote:
               | To your last point, people pay what the market will bear.
               | In this case, it's free, so don't be surprised that if
               | you give something away for free that people, well, take
               | it for free. Importance has nothing to do with it.
        
           | beardog wrote:
           | As long as your hearts content isn't commercial
        
           | cjbprime wrote:
           | And not by opensource.org's definition, which prohibits use
           | restrictions. It's not reasonable to act like OP is being
           | idiosyncratic when this fails to meet the protected
           | definition of "open source".
        
             | gpm wrote:
             | The term "open source" is not protected, the OSI
             | (opensource.org) attempted and failed to acquire a
             | trademark on that term.
        
               | cjbprime wrote:
               | Fair enough. Is there _any_ shared definition of  "open
               | source" which permits use restrictions, then?
        
           | abetusk wrote:
           | By the commonly held definition of open, in the context of
           | "open source", it is not open.
           | 
           | > You can view, use and modify the code to your hearts
           | content.
           | 
           | The non-commercial clause of their license specifically
           | prohibits commercial use, so we cannot use this source, and
           | presumably the data that the source uses, to our hearts
           | content.
           | 
           | The OSI has a definition of open source that clearly states
           | commercial use is required [0].
           | 
           | Wikipedias entry on Open Source Licensing also stipulates
           | that commercial re-use is required [1].
           | 
           | There is a term called "source available" which is more in
           | line with your intent.
           | 
           | [0] https://opensource.org/osd/
           | 
           | [1] https://en.wikipedia.org/wiki/Open-source_license
        
         | throwup238 wrote:
         | You can't. Scammers who don't care about noncommercial licenses
         | sure can!
        
           | pclmulqdq wrote:
           | Yep, this is one of those "only bad actors" licenses,
           | probably as a cash grab.
           | 
           | It will _definitely_ stop those bad actors from scamming
           | people this time, right? Right?
        
           | evanmoran wrote:
           | This is the most insightful take. Licenses like this prevent
           | certain businesses in certain countries, but it is quite
           | harmful as it adds a powerful tool for
           | propaganda/scammers/etc who don't care about the laws.
           | 
           | Additionally, it only really hurts small businesses &
           | startups as the big companies all have teams that can make
           | their own version or pay for 3rd party apis for easily. So
           | yeah, us startup folks won't like this license much as it
           | basically is aimed at us the most.
           | 
           | Either way, congrats with the tech. It does look very
           | impressive!
        
             | cyanydeez wrote:
             | erm, it's existence provides to scammers.
             | 
             | unless you're proposing it's use in detecting itself is
             | some how symmetrical, which I really don't think is
             | anything but unproven conjecture.
        
       | iAkashPaul wrote:
       | That watermark detection rights at the end is real sus
        
         | diggan wrote:
         | What exactly are you talking about? The paper doesn't mention
         | any watermark at all, as far as I can see/search.
        
           | cwillu wrote:
           | The readme on the linked github reads: "MyShell reserves the
           | ability to detect whether an audio is generated by OpenVoice,
           | no matter whether the watermark is added or not."
        
             | lostlogin wrote:
             | As you say, right at the bottom https://github.com/myshell-
             | ai/OpenVoice
        
               | diggan wrote:
               | Ah, thank you. Guess that's OK that the company/service
               | do whatever they want, the paper/technique doesn't
               | involve watermarks, so it'd be easy to remove/modify
               | whatever they do in the library/service itself.
        
         | fbdab103 wrote:
         | At least right now, there is a literal add_watermark function,
         | so probably easy enough to remove that surface level. Unless
         | they added something cute to the training data to poison the
         | well.
         | 
         | https://github.com/myshell-ai/OpenVoice/blob/a33963c3d764bee...
        
       | huqedato wrote:
       | Welcome to the new era of fakes and scams beyond our wildest
       | imagination !
        
         | danielbln wrote:
         | Elevenlabs has been around for a while now. Genie has been out
         | of the bottle for a bit, and the sooner the notion that
         | anything digital can be easily faked seeps into the wider
         | consciousness the better. Trust nothing.
        
           | ethanbond wrote:
           | It can both be true that people need to adapt/"trust
           | nothing," and that this is bad.
        
           | ignu wrote:
           | I've seen some prank calls (a YouTuber cloned Tucker
           | Carlson's voice and called Alex Jones) but he just had a
           | sound bank with a few pre-generated lines and it fell apart
           | pretty quickly.
           | 
           | At least for now there's too much lag to do a real time
           | conversation with a cloned voice.
           | 
           | Speech to Text > LLM Response > Generate Audio
           | 
           | If that time can shrink to subsecond, I think there'll be
           | madness. (Specifically thinking of romance scammers)
        
             | shinycode wrote:
             | Awful, bots on their own having real conversations with
             | people with the voice of a loved one. Scamming on steroids
        
             | ben_w wrote:
             | At last summer's WeAreDevelopers World Congress in Berlin,
             | one of the talks I went to was by someone who did this with
             | their own voice, to better respond to (really long?)
             | WhatsApp messages they kept getting.
             | 
             | It worked a bit too well, as it could parse the sound file
             | and generate a complete response faster than real-time,
             | leading people to ask if he'd actually listened to the
             | messages they sent him.
             | 
             | Also they had trouble believing him when he told them how
             | he'd done it.
        
           | smt88 wrote:
           | > * the notion that anything digital can be easily faked
           | seeps into the wider consciousness the better. Trust
           | nothing.*
           | 
           | This is a society-destroying idea.
           | 
           | Most of us, especially younger people, only know how to vote,
           | where there are wars, or even what our parents are doing by
           | using digital media.
           | 
           | If digital media becomes untrustworthy, everyone will live in
           | a warped and fragile alternate reality that no one can agree
           | on.
        
             | diggan wrote:
             | > Trust nothing
             | 
             | > This is a society-destroying idea.
             | 
             | Believe it or not, this is how much of the population saw
             | The Internet when it first came close to being mainstream.
             | Everyone and their mother said "Don't believe anything you
             | read on the cybernet", which ended up ironic as everyone
             | and their mother ended up being the ones to believe
             | anything on the cybernet anyways.
             | 
             | > everyone will live in a warped and fragile alternate
             | reality that no one can agree on.
             | 
             | How is this any different from today? The various corners
             | of the internet (which is mostly divided by languages:
             | English, Russian, Spanish, Chinese and Portuguese) already
             | have these vastly different realities and ground-truths.
             | 
             | I'm sure we could survive another Internet-Winter where
             | people trust everything a bit less than today.
        
               | smt88 wrote:
               | It's vastly different than today because today (or at
               | least a few years ago), I could trust videos and voices
               | delivered digitally. I can't do that anymore.
        
               | ilikehurdles wrote:
               | Technology and society will adapt, just as we adapted
               | encryption to verify credentials and secure banking data
               | online, we'll end up with a validation signal for video
               | and audio.
        
         | treprinum wrote:
         | VALL-E is on Github for over a year already...
        
         | underlines wrote:
         | This aera is barely new. Look at how old some of the projects
         | are:
         | 
         | https://github.com/underlines/awesome-ml/blob/master/audio-a...
         | 
         | The thing that changes is the complexity to run it. I was
         | training my wife's voice and my voice for fun and needed 15min
         | of audio and trained on my 3080 for 40 minutes.
         | 
         | Now it's 2 Minutes.
        
           | thfuran wrote:
           | Yes, and the more accessible it is, the more widespread it
           | will be.
        
         | ponector wrote:
         | Maybe this will teach people to rise up awareness and take
         | personal security serious. Like not to trust anyone who is
         | calling, especially from legacy line. Phone number and voice
         | could be easily cloned.
        
       | gnfargbl wrote:
       | I had to phone my bank, which is one of the bigger players in the
       | UK high street market, a couple of days ago. They're _still_
       | encouraging me to enroll in their idiotic  "my voice is my
       | password" programme. At this stage in the evolution of AI, that
       | feels simply negligent.
        
         | toss1 wrote:
         | Fidelity Investments just did something even worse ~a week ago
         | - It asked me to reply to a few questions, then announced that
         | I'd just been enrolled in it's voice identification program (or
         | whatever they call it).
         | 
         | Now I've got Just Another Item on my ToDo list, to get that
         | undone. Gawd, does every company promote it's stupidest people
         | to management?
        
           | crazysim wrote:
           | Clone management's voices and post it to their social
           | media/etc. Super undo it!
        
           | hasty_pudding wrote:
           | They promote their best schmoozers to management.
           | 
           | They have so much money that competence no longer matters and
           | bootlicking will get you much farther.
        
           | throwup238 wrote:
           | _> Gawd, does every company promote it 's stupidest people to
           | management?_
           | 
           | Yes: https://en.wikipedia.org/wiki/Dilbert_principle
           | 
           | Ironically, this is the place where they can do the _least_
           | damage.
        
           | ben_w wrote:
           | I don't know if GDPR (or any of its cousins) applies to you,
           | but this kind of thing sounds _exactly_ like the sort of
           | thing it 's supposed to outlaw.
        
             | jokethrowaway wrote:
             | How? Your bank stores personal data covered by gdpr but
             | enabling crappy secure systems is not the domain of gdpr.
             | 
             | Most likely this is caused by SCA another European
             | directive that ruined our lives with extra security hoops
             | (for payment providers) for little extra security - or even
             | worse in case of voice password or security questions
        
               | ben_w wrote:
               | A person's voice is, I believe, personal data.
               | 
               | > Processing personal data is generally prohibited,
               | unless it is expressly allowed by law, or the data
               | subject has consented to the processing
               | 
               | - https://gdpr-info.eu/issues/consent/
        
               | seabass-labrax wrote:
               | Importantly, you can also revoke consent at any time
               | under the GDPR. Unlimited consent isn't possible, so the
               | bank would have to make the (dubious) claim that such
               | processing did not require permission at all.
        
         | Havoc wrote:
         | Investec? Yeah thinking I need to phone them to disable mine
        
       | cwillu wrote:
       | From the github readme:
       | 
       | "MyShell reserves the ability to detect whether an audio is
       | generated by OpenVoice, no matter whether the watermark is added
       | or not."
       | 
       | Call me skeptical...
        
       | hasty_pudding wrote:
       | Holy cow! If this works without curated audio...this is amazing!
        
       | senthilnayagam wrote:
       | current leader in open source voice cloning is RVC, would like to
       | see how it compares to it.
        
         | echelon wrote:
         | RVC is voice conversion (audio to audio), and it's typically
         | finetuned.
         | 
         | This is zero shot TTS. Samples create vector encodings that
         | serve as input to inference. There's no retraining the model
         | unless you want it to generalize or perform better.
        
       | pclmulqdq wrote:
       | Wonderful company, not a scam at all:
       | https://docs.myshell.ai/tokenomics
        
       | SubiculumCode wrote:
       | whats with the crypto thing?
        
       | programjames wrote:
       | I love this paper. It reads very much like "this is what we did,
       | and we want to help others do it too." Also, the section "Remark
       | on Novelty" is golden: "OpenVoice does not intend to invent the
       | submodules in the model structure ... The contribution of
       | OpenVoice is the decoupled framework that seperates the voice
       | style and language control from the tone color cloning." They
       | don't try to hype up their contribution.
        
       | jamespattn wrote:
       | Can someone give me a practical use case where this adds a net
       | benefit to society?
        
         | shinycode wrote:
         | Aside from the fact that is will be easier to scam people, I
         | fail to see benefits. We can already translate everything with
         | the same synthetic voice
        
         | Lerc wrote:
         | Unifying a voice in tutorial videos so that the difference in
         | voice does not distract the learner.
         | 
         | Auto non-toxic rephrasing of online chat in video games, let
         | people hear their voice but paraphrase what they said in a
         | manner that doesn't turn the platform into a cesspit.
         | 
         | Cloning your own voice so that you can turn a script into audio
         | without 50 takes and then having to remove a million Ums and
         | errs.
        
           | grayhatter wrote:
           | > Auto non-toxic rephrasing of online chat in video games,
           | let people hear their voice but paraphrase what they said in
           | a manner that doesn't turn the platform into a cesspit.
           | 
           | that feels very orwellian
        
             | ben_w wrote:
             | George Orwell -- 'If you want a picture of the future,
             | imagine a boot stamping on a human face--for ever.'
             | 
             | I think this is closer to the direction of Huxley in Brave
             | New World, where a deeper understanding of how to
             | manipulate without brute force creates a very different
             | dystopian society than 1984.
        
               | haroldp wrote:
               | "Don't you see that the whole aim of Newspeak is to
               | narrow the range of thought? In the end we shall make
               | thoughtcrime literally impossible, because there will be
               | no words in which to express it."
        
               | ben_w wrote:
               | Censorship by itself doesn't stop people thinking (or
               | even expressing) forbidden thoughts, it stops a person's
               | words reaching other people.
               | 
               | BNW had a similar effect by conditioning, rather than by
               | applying the strong form of the Sapir-Whorf hypothesis.
        
           | paradox460 wrote:
           | Real time translation in the speakers own voice.
        
             | treetalker wrote:
             | While listening to the examples given, I noted the cross-
             | language ones. I'm eager to improve my accents in my
             | nonnative languages by cloning my voice and comparing
             | recordings of how I do sound with how I would sound as a
             | native speaker!
        
             | abetusk wrote:
             | This is an exceptional use case!
             | 
             | Mr Beast talked about translating his videos to other
             | languages to get more reach. This can be done for people
             | with limited budget or just in general so people can watch
             | videos without needing subtitles.
             | 
             | I wouldn't be surprised if we saw this incorporated into YT
             | in the near future.
        
         | diggan wrote:
         | Person A used to be able to speak, but lost their voice in a
         | accident/because of reason Y. Luckily, there is surviving
         | audio/video with their voice on it, so a text-to-voice with
         | their own voice could be created for them to use.
        
         | goodluckchuck wrote:
         | Possibly speech therapy.
         | 
         | Certainly entertainment. Movies / TV. It opens a new
         | opportunity for videogames with generative characters.
        
         | kushie wrote:
         | apple has Personal Voice for accessibility
        
         | ldoughty wrote:
         | From an indie game dev standpoint, I can probably say a
         | sentence or two in a given way using my standard headset
         | microphone.. and something like this would allow for clean
         | voice lines fairly easily, as long as they don't need to stress
         | too much emotion... But for a $0 game, that would still be
         | beneficial. Imagine all the 2D Zelda/FF like games that don't
         | get played today because people would rather listen to dialogue
         | than read.
         | 
         | Of course, there's also the preservation of the voice of a
         | loved one. I would probably pay to hear my father's voice again
         | but there"s probably only one or two VHS tapes with his voice
         | on it.
        
         | nickpsecurity wrote:
         | My pastor has an injured, vocal cord that makes him sound
         | gritty at times. A technology like this applied to old copies
         | of his speaking might make him sound like he used to. I don't
         | know if he'd use something like that since we mostly rely on
         | the Spirit of Christ to open hearts to the truth.
         | 
         | Outside public speakers, there's probably other people whose
         | lost their voice or have trouble vocalizing who might want to
         | sound like their old selves. This could help them.
         | 
         | Disclaimer: I think these techs will more often do damage than
         | good. I'm just brainstorming an answer to your question.
        
         | grayhatter wrote:
         | No.
         | 
         | The real answer is yes, I could probably come up with some
         | contrived examples, like I lost my voice in a freak LLM
         | accident and now want to clone my old voice. But this doesn't
         | (you don't?) really need a net benefit reason to figure it out
         | and publish it. Because why? I assume, because "this shouldn't
         | exist!" which is just a more palatable wa to phrase "won't
         | someone think of the children".
         | 
         | Society doesn't benefit from ignorance, so given it can exist,
         | what's the problem with it existing? Why does it need a
         | practical reason? Because people will do bad things with it?
         | Duh, but I'd rather everyone know then just the bad guys
        
           | johnnyworker wrote:
           | > Why does it need a practical reason?
           | 
           | To at least give us something as a consolation for all the
           | havoc all sorts of deep fakes will wreak on societies. It's
           | like asking what a knife can be used for other than murder.
           | It's a valid question.
        
           | jamespattn wrote:
           | My question wasn't to imply that I don't think a given
           | technology should or shouldn't exist.
           | 
           | I was curious to see if anyone could name at the top of their
           | head some practical use cases that they feel net out the
           | potential harms of cloning and misusing someone else's voice.
           | 
           | There's some nice and certainly practical examples, but I
           | don't feel any of them would net out the harms.
           | 
           | Perhaps there's a use case that we can't even comprehend yet
           | that would though!
        
           | lbrunson wrote:
           | By this logic there shouldn't be regulation on anything,
           | because the bad guys will have it any way.
           | 
           | While you can't make it go away, you can disincentivize
           | propagation and use which can be the difference between
           | thousands of cases of scams/extortions and millions. Until
           | there's a stronger argument for voice cloning models (talking
           | to a dead loved one is creepy and not a positive argument)
           | then we shouldn't encourage tools with overwhelmingly
           | nefarious utility.
        
         | abetusk wrote:
         | James Earl Jones, presumably hedging against his eventual
         | demise, has allowed his voice to be used for things like the
         | Star Wars franchise [0].
         | 
         | Small, independent film makers can now use a skeleton crew to
         | voice parts.
         | 
         | I can't imagine it would be anything other than a niche
         | service, but hearing the voice and, potentially, interacting
         | with a chatbot/LLM with the voice of a passed love one.
         | 
         | This is off the top of my head. I would also guess that this
         | technology is a stepping stone for other weird, interesting and
         | profoundly helpful uses.
         | 
         | [0] https://www.theverge.com/2022/9/24/23370097/darth-vader-
         | jame...
        
         | stale2002 wrote:
         | Well we could just look at the obvious and existing use cases
         | for text to speech stuff.
         | 
         | Alexa, siri, and similar, are all common place.
         | 
         | Another huge usecase would be anything to do with voice acting.
         | Either in video games, cartoons, or the like.
         | 
         | This would completely democratize voice acting material, and
         | would empower anyone to be able to do this for cheap.
        
           | mattlondon wrote:
           | ... and put 99% of voice actors out of business. We'll
           | eventually end up with _every_ TV show, movie, and, video
           | game being voiced by Ryan Gosling and Beyonce because market
           | research.
        
         | dqv wrote:
         | If you've ever done voice prompt recordings for a phone system,
         | voice cloning would be super helpful for doing one off tweaks,
         | especially if you have to record a bunch. Instead of
         | rerecording 20 messages, which can sometimes take hours, you
         | can use a clone of your own voice to make the necessary
         | modifications. My friend does a lot of recordings as part of
         | his job and when I showed him the Adobe voice editing preview
         | he got really excited. It has the potential to make tweaks a
         | lot easier, less time consuming, and reduce voice strain.
        
         | userbinator wrote:
         | You would be able to translate media into the language of your
         | choice, but also retaining the original voices.
        
       | qwertox wrote:
       | And suddenly it becomes a bit weird:
       | 
       | https://docs.myshell.ai/tokenomics
       | 
       | Tokenomics
       | 
       | Disclaimer: MyShell is currently in the testing phase, and the
       | content of the whitepaper may be subject to change in the future.
       | 
       | $SHELL is the token used for user incentive, governance and in-
       | app utility.
       | 
       | The total supply of $SHELL is 1,000,000,000
        
         | diggan wrote:
         | And luckily, this submission seems to be about the
         | paper/technology OpenVoice, not about the company MyShell
         | (whatever that is).
        
           | qwertox wrote:
           | License[0]: This repository is licensed under a Creative
           | Commons Attribution-NonCommercial 4.0 International License,
           | which prohibits commercial usage. MyShell reserves the
           | ability to detect whether an audio is generated by OpenVoice,
           | no matter whether the watermark is added or not.
           | 
           | [0] https://github.com/myshell-ai/OpenVoice
        
       | z991 wrote:
       | I commend the authors on making this easy to try! However it
       | doesn't work very well for me for general voice cloning. I read
       | the first paragraph of the wikipedia page on books and had it
       | generate the next sentence. It's obviously computer generated to
       | my ear.
       | 
       | Audio sample: https://storage.googleapis.com/dalle-
       | party/sample.mp3
       | 
       | Cloned voice (converted to mp3):
       | https://storage.googleapis.com/dalle-party/output_en_default...
       | 
       | All I did was install the packages with pip and then run
       | "demo_part1.ipynb" with my audio sample plugged in. Ran almost
       | instantly on my laptop 3070 Ti / 8GB. (Also, I admit to not
       | reading the paper, I just ran the code)
        
         | pclmulqdq wrote:
         | Looking at the website and the examples, it's pretty clearly
         | set up to make stylized anime voices.
        
         | fbdab103 wrote:
         | Thanks for the real example. Sounded quite generated to my ear
         | as well. Wonder if it can do any better with more source
         | material.
        
         | thorum wrote:
         | My experience with other tools like xtts is you really need to
         | have a studio-quality voice sample to get the best results.
        
           | amluto wrote:
           | The most obvious problem to my ears is the syllable timing
           | and inflection of the generated speech, and, intuitively,
           | this doesn't seem like a recording quality issue. It's as if
           | it did a mostly credible job of emulating the speaker trying
           | to talk like a robot.
        
             | hwillis wrote:
             | The biggest trip-up is the pronunciation of
             | "prototypically", and you had "typically" in your original.
             | Maybe it's overfitting to a stilted proto-typically? Could
             | try with a different, less similar sentence
        
         | dijksterhuis wrote:
         | > It's obviously computer generated to my ear.
         | 
         | From the README                   Disclaimer              This
         | is an open-source implementation that approximates the
         | performance of the internal voice clone technology of
         | myshell.ai. The online version in myshell.ai has better 1)
         | audio quality, 2) voice cloning similarity, 3) speech
         | naturalness and 4) computational efficiency.
        
       | tremarley wrote:
       | Their Tokenomics page say
       | 
       | $SHELL is the token used for user incentive, governance and in-
       | app utility.
       | 
       | The total supply of $SHELL is 1,000,000,000
       | 
       | Team, Treasury, Advisors & Private Sale = 55% allocation
       | 
       | Community Incentive = 40% allocation
       | 
       | Liquidity = 5%
        
       | monkeydust wrote:
       | So I guess we could (legally) now create a voice chatbot using
       | Mickey Mouse audio from Steamboat Willie?
        
         | andylynch wrote:
         | Possibly, except there is no dialogue in it.
        
       | starwin1159 wrote:
       | I hope someone can handle Cantonese one day
        
       | RagnarD wrote:
       | My first and ongoing thought is that immoral/criminal uses of
       | voice cloning vastly exceed any legitimate ones.
        
         | airstrike wrote:
         | Which just means we need to build protocols around this risk,
         | rather than foolishly trying to shove the genie back in the
         | bottle, lest we be left with _only_ the criminal uses
        
         | squigz wrote:
         | Out of curiosity, what/how many legitimate use cases have you
         | considered?
        
         | graphe wrote:
         | What of commercial uses being greater than illegitimate ones?
         | YouTube will give people the ability to hear it in their own
         | localized language in the author's voice.
        
       | ijhuygft776 wrote:
       | Is there some similar software that allows you to add lets say 40
       | years to a voice?
        
       | anotherevan wrote:
       | Is it possible to use this (or Eleven Labs) to generate a voice
       | model to plug into an Android phone's TTS?
       | 
       | I have a friend with a paralysed larynx who is often using his
       | phone or a small laptop to type in order to communicate. I know
       | he would love it if it was possible to take old recordings of him
       | speaking and use that to give him back "his" voice, at least in
       | some small measure.
        
         | Share6323 wrote:
         | That would be awesome
        
       ___________________________________________________________________
       (page generated 2024-01-01 23:00 UTC)