[HN Gopher] Launch HN: Play.ht (YC W23) - Generate and clone voi...
       ___________________________________________________________________
        
       Launch HN: Play.ht (YC W23) - Generate and clone voices from 20
       seconds of audio
        
       Hey HN, we are Mahmoud and Hammad, co-founders of Play.ht, a text-
       to-speech synthesis platform. We're building Large Language Speech
       Models across all languages with a focus on voice expressiveness
       and control.  Today, we are excited to share beta access to our
       latest model, Parrot, that is capable of cloning any voice with a
       few seconds of audio and generating expressive speech from text.
       You can try it out here: https://playground.play.ht. And there are
       demo videos at https://www.youtube.com/watch?v=aL_hmxTLHiM and
       https://www.youtube.com/watch?v=fdEEoODd6Kk.  The model also
       captures accents well and is able to speak in all English accents.
       Even more interesting, it can make non-English speakers speak
       English while preserving their original accent. Just upload a non-
       English speaker clip and try it yourself.  Existing text to speech
       models either lack expressiveness, control or directability of the
       voice. For example, making a voice speak in a specific way, or
       emphasizing on a certain word or parts of the speech. Our goal is
       to solve these across all languages. Since the voices are built on
       LLMs they are able to express emotions based on the context of the
       text.  Our previous speech model, Peregrine, which we released last
       September, is able to laugh, scream and express other emotions:
       https://play.ht/blog/introducing-truly-realistic-text-to-spe.... We
       posted it to HN here:
       https://news.ycombinator.com/item?id=32945504.  With Parrot, we've
       taken a slightly different approach and trained it on a much larger
       data set. Both Parrot and Peregrine only speak English at the
       moment but we are working on other languages and are seeing
       impressive early results that we plan to share soon.  Content
       creators of all kinds (gaming, media production, elearning) spend a
       lot of time and effort recording and editing high-quality audio. We
       solve that and make it as simple as writing and editing text. Our
       users range from individual creators looking to voice their videos,
       podcasts, etc to teams at various companies creating dynamic audio
       content.  We initially built this product for ourselves to listen
       to books and articles online and then found the quality of TTS is
       very low, so we started working on this product until, eventually
       we trained our own models and built a business around it. There are
       many robotic TTS services out there, but ours allows people to
       generate truly human-level expressive speech and allows anyone to
       clone voices instantly with strong resemblance. We initially used
       existing TTS models and APIs but when we started talking to our
       customers in gaming, media production, and others, people didn't
       like the monotone robotic TTS style. So we doubled down in training
       a new model based on the new emerging architectures using
       transformers and self supervised learning.  On our platform, we
       offer two types of voice cloning: high-fidelity and zero-shot.
       High-fidelity voice cloning requires around 20 minutes of audio
       data and creates an expressive voice that is more robust and
       captures the accent of the target voice with all its nuances. Zero-
       shot clones the voice with only a few seconds of audio and captures
       most of the accent and tone, but isn't as nuanced because it has
       less data to work with. We also offer a diverse library of over a
       hundred voices for various use cases.  We offer two ways to use
       these models on the platform: (1) our text to voice editor, that
       allows users to create and manage their audio files in projects,
       etc.; and (2) our API - https://docs.play.ht/reference/api-getting-
       started. The API supports streaming and polling and we are working
       on reducing the latency to make it real time. We have a free plan
       and transparent pricing available for anyone to upgrade.  We are
       thrilled to be sharing our new model, and look forward to feedback!
        
       Author : hammadh
       Score  : 249 points
       Date   : 2023-03-27 16:27 UTC (6 hours ago)
        
       | Natfan wrote:
       | This is already being used for scams.
       | 
       | https://playground.play.ht/listen/1079 (https://archive.ph/HKjue)
       | 
       | How exactly do you expect to combat this type of content?
        
         | hammadh wrote:
         | The intention for this playground was to let people try the
         | model. We actually have auto moderation on the user facing
         | platform (https://play.ht/) and malicious text gets blocked and
         | the user get flagged.
        
         | bradleysz wrote:
         | This is not a full solution, just spitballing, but I wonder how
         | effective it would be to have a flagging system built with GPT4
         | where the prompt was some form of "This is text submitted to a
         | text-to-voice model. Determine the probability that this is
         | being used maliciously." Then manually review anything that
         | returns >X%.
        
       | ms7892 wrote:
       | play.ht user here! Awesome service and thanks to you guys I made
       | my first 50$ while generating voices and using them in a short
       | explainer video.
        
         | skrebbel wrote:
         | nice voting ring you got there
        
           | icemelt8 wrote:
           | Are the founder's names causing you panic? can't you
           | appreciate a good startup?
        
             | skrebbel wrote:
             | I had to re-read your comment four times before I even
             | understood what you were on about.
             | 
             | I'm mildly agitated by your comment, so I'd like to take
             | the liberty of pointing out that _you_ are the only one in
             | this entire thread linking the names Hammad and Mahmoud to
             | racism. Everybody else in the entire thread is talking
             | about the product on its merits. There 's a heated debate
             | _and nobody gives a fuck about where the founders are
             | from_. That 's how it should be. Stop making the world a
             | worse place than you found it.
             | 
             | And, FWIW, I think that the product looks pretty neat. And
             | that the voting ring was just too obvious a play :-)
        
             | JohnFen wrote:
             | Huh? What do the founder's names have to do with anything?
             | 
             | Personally, this sounds like an extremely irresponsible
             | startup, but I don't know much about it so I'm trying to
             | reserve judgement.
        
       | roboboy wrote:
       | wicked cool
        
       | MattRix wrote:
       | I think this tech is super cool, but why is the API priced with
       | subscription tiers rather than just some per-word rate? It would
       | make it easier to develop with and budget for if the cost was
       | based on actual usage (like the OpenAI API is, for example).
        
         | hammadh wrote:
         | Yes, we are working on making the API pay as you go soon.
         | Thanks for the feedback!
        
       | howon92 wrote:
       | I'm not sure the negative comments regarding the misuse of the
       | tech here are warranted. Doesn't Google's Speech API allow you to
       | train a model for custom voice too?
        
       | cpill wrote:
       | Can't see a paper reference. Not interested.
        
       | MuffinFlavored wrote:
       | https://play.ht/app/voice-cloning > Clone a voice now
       | 
       | Pops a modal: Try Voice Cloning for Free!
       | 
       | Enter a credit card for $0.00/mo with no other information on
       | screen
       | 
       | Bounce.
       | 
       | Why not let me play around with it a little without asking for a
       | credit card?
        
         | devmunchies wrote:
         | I think if you are cloning voices, you _should_ be required to
         | have a credit card or some other KYC identifier. Even if it 's
         | free. This kind of highly abusable tech should have a paper
         | trail IMO.
        
           | MuffinFlavored wrote:
           | I guess I misunderstood/didn't think it all the way through.
           | Not sure what the balance should be but... I just wanted to
           | see how it would be at cloning _my_ voice (not  "a" voice
           | that doesn't belong to me) as a quick gauge to "is this
           | technology ready to play around with".
        
           | antibasilisk wrote:
           | yeah because that's working great for crypto lol
        
             | flangola7 wrote:
             | What do you mean? KYC is required on every US exchange.
        
               | antibasilisk wrote:
               | It's trivial to get your money to exchanges run by
               | people/machines who don't care to comply with US law, or
               | to render the KYC worthless in the first instance.
        
               | Firmwarrior wrote:
               | Exactly, that's why the crypto space doesn't have any
               | scams
        
           | delgaudm wrote:
           | As someone whose voice has been cloned without my consent, I
           | could not agree more.
        
         | hammadh wrote:
         | It's an effort to prevent abuse. We previously asked users to
         | pay upfront but most people want to try it out first.
        
           | MuffinFlavored wrote:
           | Do you accept anonymous Visa/Mastercard/etc. gift cards in
           | this payment method? If you do... are you actually preventing
           | abuse or just making it slightly more complicated to pull
           | off?
        
           | jameshiew wrote:
           | I would mention something to that effect in the modal because
           | it wasn't clear to me why it was asking for card details at
           | that point for "$0.00/mo" (though I guessed the reason).
           | Maybe something like "To prevent abuse, we require card
           | details, but you won't be charged", but worded better.
        
             | hammadh wrote:
             | Thank you. Would fix this.
        
             | MuffinFlavored wrote:
             | > but you won't be charged
             | 
             | "no matter what based on your usage/you are locked into the
             | free tier" would have helped for sure
             | 
             | i still would've bounced because i just wanted to goof off
             | with it quickly while it captured my attention and
             | requiring a payment method is just... terrible friction to
             | capture users being able to quickly test one of the key
             | features you advertise, but i guess if fraud concerns are
             | that bad, that's the tradeoff you have to accept?
        
       | delgaudm wrote:
       | How do you assert that the cloned voice has been truly permitted
       | by the voice owner? I've had my voice cloned without my consent
       | by other people using Descript and Eleven Labs.
       | 
       | What is your process for verifying consent?
        
         | mikecoles wrote:
         | TIL, the Booth Junkie is on HN. Love your work, sir.
        
           | delgaudm wrote:
           | Thanks my friend!
        
         | 1xdevloper wrote:
         | It's mentioned in the second demo video that they have a strict
         | process to prevent cases like yours. I think Descript started
         | asking for identity verification after its service was abused.
         | This one probably has a similar process too.
        
           | dkdbejwi383 wrote:
           | I think the previous comment wants to know what the "strict
           | process" is exactly.
        
           | yellow_lead wrote:
           | But they don't say what it is
        
           | whywhywouldyou wrote:
           | Right, and I'm sure their "strict process" is something like
           | "we take it down after you notify us and provide proof that
           | the voice is yours".
        
           | [deleted]
        
         | ros86 wrote:
         | When I tried this service previously, you had to read (out
         | loud) something saying that you were giving consent.
        
           | Aachen wrote:
           | I'd be curious what the false positive rate on that is. Can
           | you clone anyone's by collecting a set of ten voices with
           | unique timbre reading the required statement plus pitch
           | control to get close enough? A hundred? Or can you trick the
           | neural net by giving it something that sounds like white
           | noise to humans until the NN triggers in the right way and
           | goes "ok yep that's a match, you're authorised now"?
           | 
           | Probably not something we'll get to hear as part of the PR
           | pitch.
           | 
           | Or is the consent statement the thing that will be cloned and
           | is there no separate training audio? Then it might actually
           | work and you'll just have to get close enough that the human
           | you're trying to fool can't distinguish anymore (defeating
           | the need for this tech in the first place, at least in
           | targeted rather than automated cases).
        
             | ros86 wrote:
             | Yeah, good point - don't know. When I tried I actually did
             | get a (personal?) email saying that it didn't match closely
             | enough. After uploading another sample (based on a
             | different text) it went through.
             | 
             | I like your idea of just training on the consent text! That
             | wasn't the case when I tried it as you needed around 3h
             | (optimally) of training data.
        
           | [deleted]
        
           | barking_biscuit wrote:
           | Just use another voice cloning service to do that.
        
       | iJohnDoe wrote:
       | Warning! Don't put in sensitive info or PII data in your tests.
       | Everything you create is publicly shown on the playground and
       | site. Even stuff from your account.
        
       | jascii wrote:
       | I'm having a hard time coming up with a non-nefarious use case
       | for this.
        
         | bovermyer wrote:
         | I'd get a kick out of having my own blog posts read to me in
         | James Earl Jones's voice.
         | 
         | Or, heck, my own voice. Though it'd be surreal to hear not-me-
         | but-me saying things I've never said.
        
         | jeroenhd wrote:
         | Voice generator tech has created some decent surreal memes
         | (like audio recordings of Biden, Obama, and Trump playing video
         | games together).
         | 
         | Outside of memes or maybe the occasional well-intentioned
         | prank, I really can't think of anything either.
        
           | Rubinsalamander wrote:
           | Massively reducing costs for Voice Over in Video Games. This
           | should make it even feasible to create mods with audio which
           | would be great :)
        
             | inerte wrote:
             | I think "talking" with dead relatives or friends will
             | become real pretty soon.
             | 
             | If people can find comfort hearing their mom say words of
             | encouragement in a tough situation, I think a lot of people
             | would do it. Kinda hard because for some others that would
             | mean never getting closure.
             | 
             | Weird stuff is certainly about to happen...
        
               | starkparker wrote:
               | The last thing on earth I'd want is for any aspect of my
               | dead relatives to be reanimated through technology. No.
               | That's absolutely fucking horrific to consider. I don't
               | need a hallucinating AI pretending to be my dead wife.
               | That's literally shambolic.
               | 
               | There is vastly more potential for that to be abused by
               | others than used in any emotionally or socially
               | constructive way.
        
             | jeroenhd wrote:
             | I would consider studios taking voice actors' voices and
             | using them to generate new content beyond their contract to
             | be abuse. I'm sure big corporations are rubbing their hands
             | in anticipation, but I'm sure killing the VA industry will
             | make the world just a tiny bit worse for everyone else.
             | 
             | Mods are more difficult to attach a moral judgement to. I
             | don't think I'd really consider them malicious, as long as
             | they're not sold, but there's a very thin line between a
             | high quality mod and stealing someone's voice.
        
               | buu700 wrote:
               | On the other hand, why shouldn't voice actors _benefit_
               | from this tech?
               | 
               | I can easily imagine a future where AI-generated
               | impersonations are deemed by courts or new legislation to
               | be protected by personality rights. In that world, voice
               | actors could expand their business by offering deeply
               | discounted rates for AI-generated work.
               | 
               | Alternatively, if/when tech like Play.ht is consistently
               | good enough, maybe it just becomes a standard practice
               | for all voice acting work to include a combination of
               | human- and AI-generated content, like a programmer using
               | Copilot or a writer using GPT.
        
               | gamblor956 wrote:
               | I'm sure programmers would love to expand their business
               | opportunities by offering deeply discounted rates for
               | creating AI-generated code.
               | 
               | No? Then why do you assume that someone else would want
               | to do the same in their profession?
               | 
               | As AI-generated content is not protectable under IP law,
               | it's a non-starter for games, film, TV, or music for
               | anything except background filler.
        
               | buu700 wrote:
               | Sure, why not? If you could earn more money and produce
               | more value to society with the same amount of labor, and
               | the legal/regulatory environment supported it, I wouldn't
               | see a reason not to.
               | 
               | If you had a solo contracting business, and the
               | technology existed to fully outsource a development
               | project to AI based on carefully documented requirements,
               | using it would be a cheaper alternative to
               | subcontracting. Rather than writing every line of code by
               | hand, you would transition to becoming an architect,
               | project manager, code reviewer, and QA tester. Now you're
               | one person with the resources and earning potential of an
               | entire development shop.
               | 
               | I have my fair share of complaints about AI coding tools,
               | but that isn't one of them. Maybe the increase in supply
               | would result in a lower average software engineering
               | income, but it wouldn't have to if demand kept pace with
               | supply.
               | 
               | Furthermore, code is more fungible than a person's voice.
               | If someone wants a particular celebrity's voice, that
               | celebrity has a monopoly on it. Thus, it's not obvious
               | that increasing the supply of one's voice acting work
               | would decrease its value. (I suspect the opposite to be
               | the case, until a point of diminishing returns.)
               | 
               | Although the voice acting case has a similar concern;
               | will we get an explosion in new and/or higher-quality
               | media, or will we see a consolidation to a smaller number
               | of well known voice actors taking an outsized amount
               | amount of work? Another issue, if we look beyond
               | impersonation specifically, is that human voices may
               | become marginalized over time in favor of entirely
               | synthetic voices. I imagine that this would start with
               | synthetic voices playing minor roles alongside
               | human/human-impersonated voices, but over time certain
               | synthetic voices would organically become recognizable in
               | their own rights.
               | 
               | Again, I see plenty of concerns with AI in general, but
               | more of a mixed bag than strictly negative, and there
               | isn't anything inherently nefarious about this product in
               | particular.
               | 
               | Personally, I'm optimistic about what society looks like
               | in the long run if humanity proves to be a responsible
               | steward of increasingly advanced AI. By the time we're at
               | a point where 90% of people can be effectively automated
               | out of a job, we'll have had to have figured out some
               | alternative way of distributing resources among the
               | population, i.e. a meaningful UBI backed by continued
               | growth of our species' collective wealth and
               | productivity. I can easily imagine a not- _too_ -distant
               | world that is effectively post-scarcity, where it's not
               | frowned upon to spend years (or lifetimes) on non-income-
               | generating pursuits, and where the only jobs performed by
               | humans are entrepreneur, executive, politician, judge,
               | general, teacher, and other things of that must be done
               | by humans for one reason or another.
               | 
               | So am I happy that AI is encroaching on skilled labor? In
               | the short term, not necessarily. But it's not necessarily
               | bad either, it's the reality that we're in, and long-term
               | I'm more optimistic than not.
        
               | [deleted]
        
         | mahmoudfelfel wrote:
         | We have been seeing some of these genuine use cases: youtube
         | creators, audiobooks, elearning videos, podcasts, commercials,
         | dubbing, and gaming.
        
         | atentaten wrote:
         | Generating audio for an audio book: If an author could speak
         | for 20 minutes and then generate audio for an entire book from
         | the book's text and the model, I think that would be very
         | useful.
        
           | sva_ wrote:
           | 20 seconds*
        
             | atentaten wrote:
             | The OP mentioned that for so called, "High-fidelity voice
             | cloning", it would take 20 minutes of training. I think a
             | book author would want the best quality possible to
             | reproduce their voice.
        
               | JohnFen wrote:
               | Why reproduce their voice? There's no value-add there.
        
               | sva_ wrote:
               | Many people prefer an audiobook version of a book to be
               | read by the original author, which isn't always the case.
               | If an author could make that version happen by using 20
               | minutes of their time + text2speech of the whole book,
               | that would be an immensely positive value proposition on
               | the side of this company.
               | 
               | But I'm not sure. Part of why I'd prefer the original
               | author to read a book is that they vocally emphasize
               | certain parts of the book, and I don't think these models
               | could do that at this point.
        
               | JohnFen wrote:
               | > Many people prefer an audiobook version of a book to be
               | read by the original author
               | 
               | Right, but having AI read the book in the author's voice
               | is definitely not the author reading the work.
               | 
               | As you mention, the reason that people like to hear the
               | author read it is because it's the author reading it,
               | theoretically emphasizing and acting things out according
               | to what was intended. It's not just to hear the author's
               | voice.
               | 
               | So I don't see what the value-add is.
        
         | zanderwohl wrote:
         | I am toying about with building a virtual puppet software in
         | the style of watchmeforever. I have a number of voices I do for
         | the stage and DnD that I would be willing to train a few models
         | on so I could give my puppets unique voices.
        
         | rockemsockem wrote:
         | Anything written can be listened to with this tech. Any news
         | article, any short story, a draft of a piece of writing you're
         | working on. There is too much text for human beings to read it
         | all.
        
           | scrollaway wrote:
           | > _There is too much text for human beings to read it all._
           | 
           | so your logic is that all that text should be audio and
           | people will consume more? Because I got news for you, reading
           | is faster than listening.
        
       | devinprater wrote:
       | Using a screen reader to browse the page, there are a few
       | unlabeled buttons and links. After the "Load 7 new" button, there
       | is an unlabeled button, followed by the time of the recording. If
       | this doesn't sound better, I'll keep using 11Labs. That one is
       | more accessible.
        
       | jeroenhd wrote:
       | Listening to the demos I'm not entirely convinced by this
       | (https://playground.play.ht/listen/189 was pretty funny). I
       | wonder if this company will end up taking down (and subsequently
       | pricing out most people using this tech for fun) arbitrary voice
       | generation just like its competitors have so far.
       | 
       | Going to the demo page and hearing a random snippet of Musk-
       | worship was pretty weird. Out of all audio tracks to place at the
       | top of your demos, you chose this?
        
         | mugr wrote:
         | Wow, I call to the team behind this. I really STRONGLY think
         | you should at least implement some sort of URL stealthing. I'm
         | not a Web Security expert, but it reminds me of a talk where
         | some company just made medical records 'public' like this.
        
           | mikrotikker wrote:
           | On the contrary, this should be accessible so we can see what
           | people are generating.
        
           | h1fra wrote:
           | Oopsie, the infamous id int Auto Increment
        
             | airstrike wrote:
             | https://playground.play.ht/listen/1339 and
             | https://playground.play.ht/listen/210 are hilarious
        
         | yreg wrote:
         | The demo page says 'Recently generated', you have listened to
         | the last snippet someone made.
        
           | jeroenhd wrote:
           | I know the demo page was user generated. My Musk comment
           | referred to this page: https://play.ht/ultra-realistic-
           | voices/
        
       | JimmyRuska wrote:
       | This is horrifying in terms of scamming, ransom threats and
       | phishing. Eg calling as the CEO in the CEO's voice urgently
       | asking for a password, wire or data. People calling your family
       | saying its you based on some youtube video, asking for immediate
       | financial help. People saying someone has been kidnapped and they
       | need a ransom. This is uncommon in the US but happens all the
       | time to elderly in places like Mexico. With this tech scammer
       | cartels can control your voice from a prompt to muppet distressed
       | requests to the people you care about. In my opinion, these type
       | of services should be banned by some kind of regulation.
        
       | achr2 wrote:
       | The world is going to become a much worse place over the next few
       | years. I want to be an optimist, but it will take huge leaps in
       | humanity's societal structures for AI not to be a net negative
       | for the vast majority of people on this planet in the short term.
        
         | jader201 wrote:
         | I'm starting to see early signs that some of those Black Mirror
         | episodes really aren't that far off.
        
         | mikrotikker wrote:
         | Yea I already started moving to the very edges of society and
         | trying to move even further. Had pretty successful crops just
         | need to scale it up and reduce need for outside resources more
         | and more.
        
         | 01100011 wrote:
         | AI is just one threat and there are others. Regardless of
         | whether or not COVID-19 was man made, it could have been, and
         | it's just the first of what will be many pandemics in the next
         | few decades. Barrier to entry for genetic engineering and
         | bioweaponry is lower than ever, and within the reach of
         | hobbyists or NGOs.
        
           | flangola7 wrote:
           | OpenAI has a protein synthesis plugin.
        
         | chatmasta wrote:
         | Did the world become a much worse place when the internet
         | arrived, or did the positives end up outweighing the negatives?
         | There were a few years where Nigerian scammers could convince
         | grandma they were a prince who needed a bank transfer, but
         | eventually grandma figured out the scam. I don't see why AI
         | would be any different - sure, there may be an increase in new
         | scams for a few years, but people will learn and adapt, like
         | they always have. And meanwhile the positive aspects of AI will
         | have a positive impact on society. Let's not throw the baby out
         | with the bathwater because we can imagine all the ways to abuse
         | new technology.
         | 
         | Maybe instead of freaking out and trying to restrict
         | innovation, we should be working on insurance products to
         | mitigate the financial risk of scams, and educational content
         | to reduce their effectiveness. The fact that many scams are
         | even possible in the first place stems from the absurd idea
         | that "identity theft" is _your_ fault [0], so maybe we could
         | start there.
         | 
         | If my bank uses my voice as my password, or if my phone company
         | is willing to present fraudulent caller ID telling me it really
         | is my son calling to ask for money, then is the problem really
         | the scammers, or is it the easily defraudable systems with no
         | incentives to reduce abuse of their platforms?
         | 
         | [0] https://youtube.com/watch?v=CS9ptA3Ya9E
        
       | secondbreakfast wrote:
       | Do you have an API where you can get audio clips back reasonably
       | quickly? Like if I wanted to use this in a voice support bot,
       | could I send a text blurb to an API and fairly quickly get back
       | an audio file?
        
       | [deleted]
        
       | barking_biscuit wrote:
       | What we really need is something on par with this or Eleven Labs
       | that's open source. Then the real fun will begin. At this point I
       | think it's just a matter of time.
        
       | JohnFen wrote:
       | This is a good reminder that we all need to have a "safe word"
       | that we can use to verify to the important people in our life
       | that the voice they may be hearing on the phone or elsewhere is
       | really us.
       | 
       | Get a panicky call from "me" in the middle of the night? If I
       | don't include my safe word, that call isn't from me.
        
         | gus_massa wrote:
         | That scam was popular here in Argentina a few years ago. We
         | call it "virtual kidnaping"
         | https://www.fbi.gov/news/stories/virtual-kidnapping , nobody is
         | kidnaped, it's just a scam using a phone call.
         | 
         | It's not very important that the voice is similar to the
         | supposed victim. Usually the person in the call is weeping and
         | it's very difficult to recognize the voice. Moreover a
         | confusing voice at 2am may be interpreted as any of your
         | relatives or friends, but an exact voice can be interpreted
         | only as one and it's easier to know that that person is safe.
        
           | JohnFen wrote:
           | Some scammers tried to pull this scam off on my stepfather
           | years ago. He got a call that, through the wailing and tears,
           | told him that I'd been thrown into a Mexican prison and
           | needed bail money immediately.
           | 
           | He was 90% convinced that it was true, but my mother made him
           | call me before doing anything, which saved him about $10k.
           | She thought it was suspicious that I would have left the
           | country without mentioning it to her.
           | 
           | If the person he was talking to was relatively calm and
           | sounded like me, it might have been successful.
        
         | mlboss wrote:
         | Very good suggestion
        
       | bfeynman wrote:
       | Wow the actual speech part is terrible, the number of
       | mispronunciations is surprising.
        
       | [deleted]
        
       | ed_balls wrote:
       | You probably won't want to use sequential IDs
       | https://playground.play.ht/listen/1
        
       | felipelalli wrote:
       | Something went wrong. Please try again. (Firebase: Error
       | (auth/popup-blocked).}))
        
       | andion wrote:
       | Remarkable, I spoke spanish on the training audio without
       | realising. Then every two options were one with a latino-accented
       | english and one with an indian-accented english
        
       | sys32768 wrote:
       | Crank calling will be so much better with this tech.
        
       | girthbrooks wrote:
       | You should do the right thing and eradicate this immediately.
        
       | jbaczuk wrote:
       | Is there any positive use case for this technology? YC, care to
       | comment?
        
         | rockemsockem wrote:
         | I'm honestly a bit baffled at the lack of thought here.
         | 
         | Anything written can be listened to with this tech. Any news
         | article, any short story, a draft of a piece of writing you're
         | working on. There is too much text for human beings to read it
         | all.
         | 
         | Translating from one medium to another is extremely useful.
        
           | arroz wrote:
           | Stop being lazy
           | 
           | It is faster to read than to listen to something
        
             | snerbles wrote:
             | I'm not going to read while driving.
             | 
             | TTS with natural inflection opens up a world of narration
             | to stories with audiences too small for human narration.
        
             | kirkbackus wrote:
             | Not for everybody, also it would be great to consume
             | textual content in a variety of voices, or even my own.
        
             | flangola7 wrote:
             | Some of us have processing disorders. I can listen about 3x
             | as fast as I can read.
        
       | midenginedcoupe wrote:
       | Well, nothing could possibly go wrong, eh?
       | 
       | If your homepage is toadying up to Musk by claiming he has
       | "limitless intellect" then I've already heard enough.
       | 
       | We have a duty to consider how what we build can be used to harm
       | others. If the obvious and many ways this could be abused aren't
       | covered anywhere I can find on your website, then I'm going to
       | conclude you haven't considered them. Which is terrible.
        
         | cpill wrote:
         | yeah, we should just make all AI research illegal. I mean,
         | without gate keeper the world will fall apart! Did you know
         | that you can draw and write _anything_ with a pencil! We need
         | to get onto that next. And the internet, you can publish
         | anything you want on there!
        
           | coreyisthename wrote:
           | You seem to be intentionally missing the point.
        
       | h1fra wrote:
       | Congrats on launching. People already made a lot of feedback on
       | the product itself so I'll keep mine.
       | 
       | Just a few note on the UX:
       | 
       | - Recording your own voice should contain a script too, that
       | could help increase the quality of the sampling because I
       | struggled to say anything relevant.
       | 
       | - Recording again, there is no time so it's hard to say when it's
       | okay to stop
       | 
       | - You enforce the checkbox "not [...] to generate any sexual
       | content" yet you have a filter to display only nswf
       | 
       | - It doesn't work at all with non-english voices, maybe you can
       | add a warning or a way to fine tune depending on the language?
       | 
       | - There is no way to delete a voice nor an account, that's a huge
       | red flag especially when dealing with PII like this.
       | 
       | - An other person has said it already, but generated voices are
       | identified by an Auto Increment, making it easy to access PII of
       | an other person. I would recommend at the very least a random
       | string or an UUID
       | 
       | - All generated voices are public and no way to delete them
        
         | hammadh wrote:
         | Thanks, we intended the playground to be merely a testing tool
         | for the new model we're building. We'll improve based on your
         | feedback!
        
       | alex_lav wrote:
       | Is there a way for me to preemptively request (demand) my voice
       | (likeness) never be used by this service? How would one go about
       | doing that?
        
         | jeroenhd wrote:
         | Looking at the voices they use as a demo on their blog post
         | (https://play.ht/blog/introducing-truly-realistic-text-to-
         | spe... is just one example) I don't think consent is really on
         | their radar.
         | 
         | Their FAQ says
         | 
         | > Can I clone anyone's voice?
         | 
         | > Yes, we allow you to clone another person's voice if you have
         | their consent. As you can imagine, cloning a voice which sounds
         | exactly like the person is a powerful thing and can be easily
         | misused. We deeply care about ethics and privacy and have
         | implemented verfication processes and regulations to avoid
         | people cloning anyone's voice without their consent.
         | 
         | But I very much doubt that they've gotten the consent of even
         | half the celebrity voices they're using to promote their
         | service.
        
       | ipv6ipv4 wrote:
       | The claim is that it needs only 20 seconds of audio to clone a
       | voice. I gave it a short clean recording, and the clone failed
       | with a request for a 2-3 hour recording.
       | 
       | Didn't work for me.
        
       | Dowwie wrote:
       | I recommend you immediately add identity verification (state-
       | issued identification verification), set up appropriate secrets
       | store for PII, and audit trail EVERYTHING your users are doing,
       | storing the contents in a secure location. Yesterday. This
       | service will be used to harm others, shortly. I do think that
       | there are exciting, honest things that can be done with this
       | service but you need to set up some friction for use. Know-your-
       | customer rules are going to apply to this category in short time.
       | 
       | People here are talking about taking this service offline but I
       | think everyone needs to be thinking about countermeasures,
       | working on those services next. The genie is already out of the
       | bottle. The degree of effort to put this together is low enough
       | that it will be replicated around the world.
        
         | abirch wrote:
         | I'm imagining the legal implications though I'm not a lawyer.
         | If granny gets ripped off by someone impersonating me with this
         | site, seems like Granny could sue Play.ht.
         | 
         | Play.ht will want to have as much information as possible about
         | their users.
        
           | hammadh wrote:
           | You are right, and unfortunately that is a possibility, and
           | we are working on having measure in place to guard against
           | such attempts. We have auto moderation on the input text that
           | will block such audio being generated. Such users are flagged
           | in the system.
        
             | Avicebron wrote:
             | What are you filtering for in the input text that would
             | block something like a phone scam?
        
           | yreg wrote:
           | How would granny prove the scammer used play.ht?
        
             | tomesco wrote:
             | If law enforcement ever busts a scammer and discovers a
             | tool like this was essential to the scam, that would
             | generate lawsuits.
        
               | yreg wrote:
               | True
        
         | KaiserPro wrote:
         | Like this example here: https://playground.play.ht/listen/1554
         | which says:
         | 
         | > "Hi Mom, I need some help. Some guys hit me over the head and
         | put me in a van, and they're saying they'll kill me if you
         | don't wire money to this bank account."
         | 
         | top class.
         | 
         |  _EDIT_ this was about one page down on the  "see what people
         | are generating" page
        
           | muyuu wrote:
           | On the bright side, it's not a very convincing rendition of a
           | human.
        
             | braingenious wrote:
             | I agree. That guy sounds very nonchalant for being in life-
             | threatening distress.
        
               | muyuu wrote:
               | not just that, he sounds remarkably computer-like
               | 
               | PS on the downvote: sorry if I did hurt someone's
               | feelings, but it's the truth
        
               | SirLJ wrote:
               | Totally agree, this sounds awful
        
               | bentlegen wrote:
               | It doesn't matter. Given enough time and progress, it
               | will be indistinguishable.
        
           | mdrzn wrote:
           | Damn that's a perfect example.
        
           | _tom_ wrote:
           | My stepmother tells she has been getting this type of scam,
           | minus the accurate voice, for years. About one a year.
           | 
           | I'm not sure she would have spotted the scam if it had
           | sounded right.
        
         | chatmasta wrote:
         | Gasp! Yawn. HN has become so pearl-clutchingly alarmist
         | recently. Everybody relax.
         | 
         | The solution to scams is to educate people on scams, as quickly
         | as you can do so in the changing environment, by publishing
         | information about what's possible with the latest technology.
         | The solution is _not_ to require onerous identity verification
         | for every software product that could be used by scammers,
         | because they 'll just move to the next product that doesn't
         | require it, or they'll simply provide fraudulent documents. Or
         | you'll get "resellers" who provide their own fraudulent KYC
         | documents and then sell access to their account to other
         | criminals on the black market, making it even more difficult to
         | monitor for abuse.
         | 
         | If you want a startup offering such tools to protect people
         | from scams, they can do it by collecting data on what the tools
         | are used for - it should be pretty obvious based on transcripts
         | who is using it to scam people.
        
         | gsich wrote:
         | Or it will be used for memes.
        
         | hammadh wrote:
         | Couldn't agree more with your comment. We are working on
         | counter measures like manual verification of voice, a
         | classifier to detect cloned speech, etc. As of now we have auto
         | moderation in place that detects and blocks hate/harmful
         | speech.
        
           | Firmwarrior wrote:
           | The cat's out of the bag, I'd say you guys should just go
           | full steam ahead and make sure it's your names in the
           | headlines
           | 
           | No need for a bunch of onerous kyc or anything IMO
        
             | woeirua wrote:
             | Yes, definitely take this advice from some random user on
             | HN. Can't possibly go wrong.
        
         | digitallyfree wrote:
         | While verification could be done for a cloud service like this
         | one, what's more concerning is that locally run models with
         | this tech will be coming soon (think of LLAMA and Stable
         | Diffusion). KYC is merely a stopgap and honestly we'll need
         | effective solutions for detecting vocal cloning impersonation
         | in the future.
        
         | twodave wrote:
         | While I agree with you, the problem is far bigger than any one
         | company in my opinion. These tools are already accessible
         | enough to individuals that no audio or video is trustworthy,
         | regardless of its source. I suspect we can still detect whether
         | most faked audio/video is authentic or not algorithmically, but
         | that's going to turn into an arms race eventually. And IMO none
         | of the "answers" are ones that you really want to see made
         | real, either.
         | 
         | We're in for some really strange times.
        
           | Pxtl wrote:
           | I feel like this will be the thing that finally forces
           | digital signing into the public eye. "Wait, is that video
           | real?" "Well, it was signed by a reputable news source."
        
             | twodave wrote:
             | Right, which leads to a place where nothing is trusted
             | unless it came from some central authority or from some
             | trusted piece of hardware. I'm not looking forward to the
             | day when I have to use e.g. an Apple or Google piece of
             | hardware or some locked down kiosk or "be famous" in order
             | to conduct business.
        
               | mikrotikker wrote:
               | I'll be living in a cabin in the woods by that point.
        
               | Pxtl wrote:
               | The film industry has been pointing cameras at screens
               | for decades. Trusted hardware won't work.
        
               | twodave wrote:
               | I assume trusted hardware would include things like LIDAR
               | and biometrics, but if you're assuming those can be
               | beaten then it will be a different kind of arms race, for
               | sure.
        
         | perlwle wrote:
         | A couple in Canada were reportedly scammed out of $21,000 after
         | getting a call from an AI-generated voice pretending to be
         | their son.
         | 
         | https://www.businessinsider.com/couple-canada-reportedly-los...
        
         | braingenious wrote:
         | How is it that
         | 
         | > I recommend you immediately add identity verification (state-
         | issued identification verification)
         | 
         | and
         | 
         | > The genie is already out of the bottle. The degree of effort
         | to put this together is low enough that it will be replicated
         | around the world.
         | 
         | are thoughts that end up in the same post?
         | 
         | If the genie is out of the bottle, it's your proposed solution
         | that _everybody_ that runs a model like this implements bank-
         | style KYC?
         | 
         | What do you propose should happen _when_ this sort of software
         | becomes freely available for everyone? When (not if) that
         | happens, what will your suggestion have accomplished?
        
       | selflesssieve wrote:
       | I can't wait for spoofed messages from my loved ones.
        
         | apazzolini wrote:
         | The scam via voicemail possibilities are endless!
        
       | phkahler wrote:
       | This is the first startup here where I think the tech should
       | essentially be illegal.
       | 
       | It's cool tech, yes I'm impressed at the achievement. Nuclear
       | weapons are impressive too.
       | 
       | OTOH this kind of thing is getting easier and easier to do, so
       | what's a realistic way forward?
        
         | Pxtl wrote:
         | Honestly I'm starting to wonder this about AI in general. I
         | mean realistically, there's a decent chance will be looking at
         | general AI soon. The best-case scenario endgame of that is
         | creating a benevolent God. It might be time to start asking
         | ourselves if that's what we want.
        
         | skybrian wrote:
         | You may get your wish. The FTC posted an article about this a
         | week ago. [1]
         | 
         | > The FTC Act's prohibition on deceptive or unfair conduct can
         | apply if you make, sell, or use a tool that is effectively
         | designed to deceive - even if that's not its intended or sole
         | purpose.
         | 
         | It seems like an awfully broad rule? But they probably could go
         | after this startup if they noticed it.
         | 
         | There are some kinds of businesses where making sure the
         | regulators like what you're doing is pretty much a
         | prerequisite. On the other hand, plenty of companies got where
         | they are today by pushing the limits.
         | 
         | [1] https://www.ftc.gov/business-
         | guidance/blog/2023/03/chatbots-...
        
           | mikrotikker wrote:
           | Great ensure only the 3 letter agencies have access to the
           | tech and can sink or float a political candidate
           | algorithmically.
           | 
           | Democracy is dead.
        
             | twojacobtwo wrote:
             | Wouldn't fewer people having access make your proposed
             | scenario less likely, regardless?
             | 
             | If other people continue to have access as well as the '3
             | letter agencies', the same power will still exists for the
             | agencies, except that there will also be an essentially
             | unlimited number of other people who could be used as
             | scapegoats.
             | 
             | If only '3 letter agencies' have access, they would
             | obviously be the first ones to come under scrutiny if a
             | case of misuse were discovered.
        
               | mikrotikker wrote:
               | Yes but people will continue generating meme recordings
               | like the ones going around showing the recent POTUSs
               | gaming dialogue. Thus showing everyone not to trust
               | anything.
               | 
               | Without that, we'll just never know, and Joe Blow who
               | never saw a deepfake of Joe Biden praising the stickiness
               | of the latest OG Kush will trust anything.
        
           | burkaman wrote:
           | Wow, this is a great article. Obviously writing is easier
           | than enforcing, but I'm pretty impressed with whoever at the
           | FTC is already thinking so clearly about this stuff.
        
           | marak830 wrote:
           | (apologies for going off topic here)
           | 
           | Wow. I would have imagined an article from the FTC to be
           | more... Bland, for want of a better term.
        
             | fwlr wrote:
             | The FTC consistently has one of the absolute best author
             | voices in all of government. Pick a blog post at random and
             | see what I mean. Their index on tech is probably the area
             | you have the most domain knowledge in and so it's probably
             | the best area to evaluate them:
             | https://www.ftc.gov/business-guidance/blog/term/1428
             | 
             | Clear, direct, confident, not overloaded with qualifiers,
             | not afraid of metaphor, self-summarizing, signposting, and
             | most importantly it always has an _energy_ of some kind
             | that government communication (in seeking to appear
             | neutral) regularly lacks - having that energy is why it
             | doesn't feel "bland". I wonder if they have internal
             | documents to guide their writers, or if it's mostly
             | information stored in the heads of Lesley Fair and Michael
             | Atleson (who between them seem write most - all? - of the
             | posts).
        
           | hammadh wrote:
           | Thanks for sharing this ^
        
         | rgrieselhuber wrote:
         | We're getting to the point where all voice conversations will
         | need to be authenticated via OTP, even between family members,
         | on the phone. Especially for banking, etc.
        
           | yreg wrote:
           | Voice conversations on FaceTime, WhatsApp, etc. are already
           | authenticated. Perhaps it's time to stop using non-VoIP
           | calls?
        
             | rgrieselhuber wrote:
             | Can you call your bank with that?
        
               | yreg wrote:
               | I wish I could. But unlike my family members, the bank
               | doesn't authenticate me by my voice.
        
         | burkaman wrote:
         | I guess eventually people will go back to only meeting face to
         | face for important communications. I don't know what the way
         | forward is for news.
         | 
         | I truly do not understand people like these founders, obviously
         | they understand the future they're creating. "If not us,
         | someone else would do it" is not an excuse. Neither is "I like
         | money".
        
           | mikrotikker wrote:
           | I stopped trusting the news years ago, between whoring out
           | for engaging but divisive content and obvious political bias,
           | it's been a crapshoot since GWBs cronies gutted the FCC.
        
           | andai wrote:
           | There's a book I've been waiting years for the audiobook to
           | come out. Plenty of legitimate uses for this tech. Plenty of
           | horrible ones too. It's the same with many technologies, no?
           | I don't think outright banning it makes any sense.
        
             | burkaman wrote:
             | Maybe there are legitimate uses, but that isn't one. There
             | is no need for an audiobook narrator to sound like a real
             | person, an AI narrator should be a realistic-sounding but
             | completely fabricated voice.
             | 
             | Example: https://blog.elevenlabs.io/enter-the-new-year-
             | with-a-bang/
        
           | samstave wrote:
           | As this is what every "organized crime" (people who dont want
           | prying eyes) groups have done for centuries.
           | 
           | Next, normal people will adopt the Mafia's 'cover the mouth
           | while pretending to use a tooth-pick whilst talking to
           | prevent lip reading from remote viewers (same thing sports
           | people do currently.
           | 
           | -
           | 
           | My granmother was deaf for the latter half of her life. She
           | became an expert lip reader.
           | 
           | It was fun going to restaurants with her as she would tell me
           | what people at the tables far away were talking about "oh
           | that couple isnt having a happy time..."
        
           | bagels wrote:
           | People still accept contracts based (in part) on scribbles on
           | paper. Fraud will happen, just like it does for signatures.
           | I'm sure sometimes countermeasures will be done (including
           | meeting in person), but it's not like video chat or phone
           | calls will completely disappear.
        
           | alcover wrote:
           | > _face to face for important communications_
           | 
           | Funny how computing perfected communication and ultimately
           | will undermine itself.
           | 
           | > _I don 't know what the way forward is for news._
           | 
           | I'd say every packet of voice/img will have to be signed by
           | the recording device and checked at rendering time.
           | 
           | > _I truly do not understand people like these founders_
           | 
           | Me neither. Don't do it. We don't need that, and the
           | malevolent use of this will confuse people to an extreme
           | point.
           | 
           | Even the 'good' use of having a deceased relative utter new
           | sentences is beyond strange. This is too far gone. And I'm no
           | luddite.
        
             | mikrotikker wrote:
             | This is democratising the tech. Otherwise only the
             | intelligence agencies will have it and we will continue to
             | be duped not knowing what is possible.
        
               | twojacobtwo wrote:
               | We don't need to all be able to use the tech for it to be
               | known publicly.
               | 
               | Apply your same logic to any other easily misused tech:
               | 
               | "We must all have easy access to bio-engineered viruses.
               | Otherwise only..."
               | 
               | "We all need to have access to nuclear weapons. Otherwise
               | only..."
               | 
               | Not all tech should be in everyone's hands.
        
               | mikrotikker wrote:
               | Its a different kind of tech. A society changing tech
               | that can be used surreptitiously. It needs to be in
               | people faces in (for example) the form of over the top
               | and ridiculous memes.
               | 
               | That is not possible with or comparable to, things such
               | as bioweapons.
        
           | bitL wrote:
           | There are some cool uses like dubbing movies in foreign
           | languages while keeping the original "voice styles" or having
           | your long dead relatives talking to you in some memorabilia
           | etc. It could also cause unexpected creativity explosion e.g.
           | in games or fan fiction movies. To avoid misuses we might
           | perhaps find the only good use of blockchain.
        
             | psychoslave wrote:
             | >long dead relatives talking to you in some memorabilia
             | etc.
             | 
             | It seems a bit weird to me though. I mean, looking back at
             | old records can still pass as mere nostalgic behavior.
             | Wanting new sentences pronounced in disguise of lost
             | relative voices is not great in term of respect for these
             | people to share my own feelings.
             | 
             | Also I guess that now there is not much preventing
             | completely new songs with whatever lyrics staring voices of
             | Elvis, Hendrix and Pavarotti. Actually a continuous flow of
             | on the fly generated lyrics seems perfectly plausible at
             | this level, isn't it?
        
               | bitL wrote:
               | My grandpa wrote a fiction book so having him read it to
               | me would be kinda cool, even if he's long gone. Still, he
               | technically exists in the 4D universe but the time
               | dimension no longer overlaps with mine.
        
               | burkaman wrote:
               | It is weird, and as usual there is a Black Mirror episode
               | with this exact premise. The unforeseen consequences in
               | the episode even seem pretty realistic based on current
               | GPT behavior.
               | 
               | https://en.wikipedia.org/wiki/Be_Right_Back
        
             | sroussey wrote:
             | Foreign language dubbing is a great use case. And the
             | ability to alter the video such the the lips are synced to
             | the dubbed version would be a great addition. I can't
             | believe studios are using these things already (the video
             | part in particular).
        
               | squarefoot wrote:
               | I think we're not that far from the day all movies will
               | be produced by AI, including all parts in various
               | languages using the most popular actors per given market,
               | all accurately translated and of course perfectly synced
               | since there would be no dubbing but creation on the fly.
               | First they'll use virtual copies of real actors by
               | purchasing rights from their estate, until the public
               | will slowly accept full virtual and cheaper ones. I give
               | them 20 years max, and I'm being optimistic
               | (pessimistic?).
        
             | sroussey wrote:
             | About the blockchain comment... For years, I've been
             | expecting camera makers (including phone makers) to offer
             | image hash verification on blockchain at the moment of
             | image capture. I'm surprised it's not routine.
        
               | burkaman wrote:
               | 1. Expensive
               | 
               | 2. Requires internet
               | 
               | 3. What image do you verify? Between auto-retouching,
               | manual retouching, compression, filetype conversion, an
               | image file might be invisibly transformed 10 times in
               | between capture and Instagram upload.
               | 
               | 4. Useless for disproving fake images until every camera
               | manufacturer in the world has implemented this.
               | 
               | 5. Hostile to customers, now your picture doesn't get the
               | green verified badge or whatever if you decide to crop it
               | or something.
        
               | bitL wrote:
               | Ad 3) all of them. They will be recorded in a
               | cryptographically recorded chain and you'll be able to
               | backtrack all steps.
        
               | SketchySeaBeast wrote:
               | I'm now wondering how many images are generated a second
               | that would all need to be recorded. How much is it going
               | to cost to take a photo?
        
               | palata wrote:
               | I guess a mix between impractical and completely useless?
        
             | palata wrote:
             | The only thing that blockchain can do that couldn't be done
             | before is cryptocurrencies (not sharing my opinion about
             | them here).
             | 
             | Pretty sure this is not a good use of blockchain, and I
             | don't see how it would remotely avoid misuses.
        
           | JohnFen wrote:
           | > I guess eventually people will go back to only meeting face
           | to face for important communications
           | 
           | This seems to be the only realistic future. This sort of
           | technology literally makes it impossible to trust anything
           | electronic.
           | 
           | People were worried about the balkanization of the internet,
           | but now they look like optimists.
        
             | tempest_ wrote:
             | PGP exists, it is just no one uses it
        
               | JohnFen wrote:
               | PGP doesn't help you with phone and video calls.
        
               | ROTMetro wrote:
               | Thanks, off to work on a new startup!
        
               | anoonmoose wrote:
               | not really the point. a PGP-style tech could easily exist
               | for phone or video tomorrow, if it doesn't already. but
               | PGP-style tech for email (called "PGP") has existed for
               | 32 years and basically no one uses it. whether or not the
               | tech exists doesn't matter nearly as much as whether or
               | not people actually use it.
        
               | palata wrote:
               | It's called signing.
               | 
               | I send you a bunch of bytes signed with my private key
               | (which somehow you have to verify in a trusted way) and
               | you can be sure that I am the person who signed those
               | bytes (unless I was compromised).
        
               | JohnFen wrote:
               | I'm unclear as to what your point is, then...
        
               | anoonmoose wrote:
               | you said: "This sort of technology literally makes it
               | impossible to trust anything electronic."
               | 
               | we said: "No, because there is also technology that makes
               | it possible to trust anything electronic with very nearly
               | 100% reliability. But no one uses it"
               | 
               | I think your first statement is both technically wrong
               | and generally wrong. Electronic trust is a solved
               | problem...it's just that right now, it's really not as
               | big a deal as some people are worried about it being, so
               | we haven't generally implemented the solution. We could
               | make electronic cars for long time before other things
               | made them commercially viable.
        
               | mikrotikker wrote:
               | You can short circuit all that shit by merely
               | compromising the device.
        
               | JohnFen wrote:
               | Ahh, I see.
               | 
               | I disagree that electronic trust is a solved problem. It
               | is mathematically solved, yes, but the reason that it
               | isn't widely used is because it's still intrusive and
               | painful to do. A solution that isn't acceptable to the
               | masses isn't an effective solution.
               | 
               | If it could be done in a way that is invisible (like
               | HTTPS, for instance), then it would be ubiquitous. That's
               | the part of the problem space that still needs
               | resolution.
        
               | DANmode wrote:
               | It doesn't have to be PGP specifically to satisfy the
               | goal of strong keypair-based security finally having
               | traction -- does it?
               | 
               | USB etc hardware keys can be used protect accounts like
               | Gmail, Coinbase, web hosting services, many more now.
               | 
               | Fun fact: it's possible to receive Facebook user
               | notification emails encrypted against your public PGP
               | key.
        
         | patall wrote:
         | But what is a realistic way forward? Do you think that scammers
         | won't have this technology in 2 years? Can we really prevent
         | any illegal use of neural networks at this point? With weapons
         | that you actually have to physically buy, you can intervene on
         | a country level (to some degree). But already with those 3D
         | printed ones, we are basically doomed. Of course it's a tragedy
         | of the commons type of situation. But banning all legal uses
         | does not prevent the illegal ones.
        
           | jeroenhd wrote:
           | Most scammers are incredibly lazy and honestly not all that
           | competent. There's no need for them to change that if you can
           | prey on the weak and vulnerable.
           | 
           | The difference between "the paper is out there" and "there's
           | a button to do this" is quite obvious in cases like software
           | exploits. A report of finding a vulnerability rarely leads to
           | a massive automated exploitation campaign, but if that report
           | also contains a proof of concept the amount of automated
           | attacks radically increase. I believe the same is true for
           | many other types of crime: even a mild bar to entry will
           | prevent a significant amount of criminals from advancing
           | their techniques.
           | 
           | I think the negative impact of these voice changers is much
           | bigger than the advantage we gain as a society. Criminals
           | will always exist, even crafty ones, but "we can't prevent
           | crime so let's not bother trying to do anything about it" is
           | not a great take in my opinion.
        
           | anigbrowl wrote:
           | _Of course it 's a tragedy of the commons type of situation_
           | 
           | TOTC is about resource depletion. GYI. It's not applicable
           | here.
        
           | [deleted]
        
         | madeofpalk wrote:
         | I have a "large" (~40K 10s lines) corpus of captioned dialogue
         | from a video game that I briefly investigated training a model
         | similar to this to "clone voices" with, but pretty quickly came
         | to the realisation that doing so would be pretty unethetical to
         | all involved.
         | 
         | It became more apparent to me how icky this is as the voice
         | actor of one of the most iconic characters in the game died
         | suddenly 10 days ago...
        
         | elicash wrote:
         | What if the voice sample is somebody saying they give specific
         | consent to be cloned by that service?
         | 
         | You could of course clone a voice to generate that "consent" --
         | but at that point there's no additional harm done because
         | they'd already have the clone.
         | 
         | It's unrealistic that this tech won't exist somewhere, even if
         | the big actors stay away for ethical reasons. A voice auth
         | practice strikes me as a good compromise.
        
           | [deleted]
        
           | JohnFen wrote:
           | > You could of course clone a voice to generate that
           | "consent" -- but at that point there's no additional harm
           | done because they'd already have the clone.
           | 
           | But this possibility renders the idea unviable in the first
           | place, does it not?
        
         | dxbydt wrote:
         | It is a big issue in India. We have a few Bollywood celebrities
         | with "trademark voices" - voices so distinct you would
         | instantly associate it with that celeb. There is a Huge mimicry
         | culture with 100s of extremely talented mimics who can clone
         | any voice. On top of which, there is a gigantic radio audience,
         | so the celebs despite making millions in Bollywood films,
         | advertise cement, coconut oil, fountain pens, tobacco, beauty
         | creams, online casinos etc in radio clips, using their
         | distinctive voice.
         | 
         | This makes for a rather explosive combination. I could, as some
         | tobacco exec, hire some mimic to promote cigarette sales using
         | a celebrity's distinct voice. By the time the regulators catch
         | up, the spot has aired a few million times & made a potload of
         | money.
         | 
         | A bunch of celebs[1][2] have trademarked their voice...but
         | enforcement is spotty.
         | 
         | [1] https://economictimes.indiatimes.com/news/new-
         | updates/amitab... [2]
         | https://www.financialexpress.com/archive/when-celebrities-se...
        
         | ugh123 wrote:
         | Do you want to tell us why it should be illegal? Comparing
         | something like this to nuclear weapons is a bit hyperbolic, at
         | least without giving more context.
        
           | gus_massa wrote:
           | Nuclear weapons are only ilegal if you don't have them.
        
         | manuelmoreale wrote:
         | I'm with you on this. I can't honestly think a good use case
         | for the average user to generate audio this way. Maybe some
         | niche use case in like movie or tv production where you can
         | generate a missing line without flying in an actor or
         | something. Or maybe for generating dialogues for videogames.
         | But those are business use cases, not things for the genral
         | public.
        
           | notafraudster wrote:
           | My Dad died before my kids were born. When he got cancer, he
           | recorded himself reading "The Night Before Christmas", which
           | is about enough audio for the high quality version of this
           | technology. Is it ghoulish on my part to want to hear his
           | voice again or for my kids to hear it? Maybe. Do I really
           | care what you think (or really what _I_ think) about that?
           | No.
        
             | manuelmoreale wrote:
             | You shouldn't care what I think. You shouldn't care what
             | anyone here thinks. Creating fake memories is not something
             | I'd ever consider doing but that's just me.
        
             | kossTKR wrote:
             | It doesn't really sound healthy to generate content of
             | loved ones.
             | 
             | Yes in a few years you would be able to generate a complete
             | avatar of someone, but it isn't them, and i think it will
             | mess with you mentally.
        
             | peteforde wrote:
             | People who would tell you not to use your recorded audio to
             | create more simulations of your father speaking are the
             | same sort of folks with strong opinions about what other
             | people do in the bedroom.
             | 
             | I happen to be someone who believes that it's wonderful
             | your dad left you with this artifact. It was a touching
             | sentiment then, and now it can serve his obvious purpose
             | many times over.
             | 
             | He didn't record himself as a side-effect of disease, or
             | because he loved that particular story in the sound of his
             | voice. He wanted people in the future to be able to hear
             | what he sounded like!
             | 
             | Given that he could not have foreseen voice cloning (and
             | therefore not explicitly asked for it) I cannot think of a
             | more obvious example of someone wanting their voice to
             | survive them.
             | 
             | I wish more folks would record The Night Before Christmas.
        
             | ss108 wrote:
             | Sorry to hear. Hope your kids enjoy his recording.
             | 
             | But yes, it would be weird to generate more stuff spoken by
             | your father by using this technology. And beyond that,
             | what's even the point? It's not your dad.
        
           | rtp4me wrote:
           | Personally, I would find this very useful. I (used to) create
           | internal tech training videos for our organization and would
           | routinely stumble when doing voice overlay. Even though
           | everything was scripted out, it took lots of editing time to
           | get the audio and video aligned without the vocal stumbling
           | (ahs, ehs, silence, voice redos, etc). Just my $0.02
        
           | RobotToaster wrote:
           | Pretty much any hobby video game development or animation?
        
             | arroz wrote:
             | So basically stealing other people voice for you hobby?
             | Great!
        
               | skeaker wrote:
               | I'm 100% certain that people will offer up their own
               | voices as open-source. I would be happy to, albeit maybe
               | anonymously.
        
               | exolymph wrote:
               | Who says it has to be someone else's voice?
        
               | manuelmoreale wrote:
               | I mean, if it is your voice, can't you just record it
               | directly? Isn't that better than going through an
               | artificial middle man that has to be primed first with
               | some recording of yours?
        
               | pseudo0 wrote:
               | It would be a huge time saver. Typically a pro does
               | around 100 lines per hour, an amateur doing multiple
               | takes would be significantly slower. So a character with
               | 1000 lines could easily be 20-30 hours of work, just for
               | a first draft. It would be pretty amazing to be able to
               | just revise the script and auto-generate a new recording,
               | even if the quality is only 90% there.
               | 
               | Just like with image generation models, this will
               | massively raise the bar for what amateurs can do with a
               | limited budget and limited time. It's hard to justify
               | spending thousands of dollars on voice acting and art for
               | a hobby project, but now amateurs can get something that
               | is 90% there and substitute professional work if the
               | project takes off.
        
           | imjonse wrote:
           | Artificially generated but faithful to the original voice for
           | people losing theirs or being unable to speak for various
           | medical reasons. Obviously the fun/deceiving use-cases are
           | much more numerous.
        
           | biomcgary wrote:
           | When my sons were young, I would tell them elaborate stories
           | where they were the main characters. I recorded most of the
           | stories, but it is full of verbal fillers (um,ahh), since I
           | was making it up as I went. I would love to convert the audio
           | to text with Whisper, filter out fillers and then output the
           | cleaned up version in my own voice. I could see this type of
           | workflow being very popular with podcasters.
        
             | peteforde wrote:
             | You should absolutely do this, but please skip the Whisper.
             | 
             | The reason is that if you speak with lots of verbal
             | fillers, that's actually an important part of how you sound
             | to other people. It makes sense to clean up audio for a
             | podcast, but not for your great grandchildren.
             | 
             | A voice cloner doesn't care that you say "um" too much.
             | It's parsing audio for phonemes.
        
         | elil17 wrote:
         | Right on the top of their page is the example: "good afternoon
         | sir, I will just need your credit card number and security code
         | to proceed." Wow.
        
         | krashidov wrote:
         | Serious question. What is the difference in implications
         | between this and a professional voice impersonator? I don't
         | think it's as dangerous as we think it is. All of the
         | consequences that Play.ht bring to society are already possible
         | today and have been for some time. The difference is that it
         | will be easier, but I don't think that makes it any more
         | dangerous.
        
           | jtr1 wrote:
           | > The difference is that it will be easier Seems like you
           | know what the difference is, you just haven't assigned it the
           | proper weight.
        
           | palata wrote:
           | Put the recent tech advances together: we can now ask bots to
           | generate an identity online that looks like a legit human
           | (with pictures, audio, text).
           | 
           | Of course a human could do that manually, but with those AIs
           | it's a completely different scale, and it can be automated
           | (so someone with no skills can click a button "generate 10k
           | fake identities online").
           | 
           | Maybe even with one click, those techs could generate a fake
           | coworker and send phishing e-mails. Suddenly every single
           | e-mail you receive (or friend request or call) could be a
           | very convincing fake. You don't have to be a high-value
           | target anymore, it's all automated.
           | 
           | That makes it much more dangerous: from "can be forged
           | manually with time and resources" to "everyone can do it at
           | scale for free".
        
           | kossTKR wrote:
           | Scaling + ease of use.
           | 
           | Compare:
           | 
           | 1) Use lots of time to find person who can impersonate
           | specific other person. Unless you give them a lot of money or
           | threats to shut up you can't use them to real time decieve
           | someone.
           | 
           | 2) Clone 1 million voices from tiktok in 1 minute. Contact 10
           | million relatives with a synth voice that is intelligent
           | enough to answer questions.
           | 
           | We will have billions of AI's, containers, programs, agents
           | running around trying to deceive absolutely everyone and
           | their grandmother 24/7 soon.
        
           | detrites wrote:
           | Two differences I can think of, either side the debate:
           | 
           | 1) The impersonation can be carried out in real-time by the
           | criminal themselves. No need to employ anyone else. (No trail
           | leading to them.)
           | 
           | 2) Pro impersonators aren't common in society. They are
           | limited as an asset and not duplicatable. So, using one
           | cannot spread like wildfire and overwhelm our awareness that
           | voice impersonation is something of a common risk.
           | 
           | Maybe the second could hold the first in check. I think
           | disruptive tech like this & similar advances in visuals come
           | with a societal impact that lessens potentials for realising
           | the bigger fears. But people just love fears.
        
           | 123pie123 wrote:
           | there's a massive higher bar in effort and costs in getting
           | an impersonator
           | 
           | whilst this is cheap and easy - increasing the potential for
           | scams in big way - even to the point of automating the scam
        
         | detrites wrote:
         | It's amazing how many think declaring something illegal will
         | stop criminally-minded people having and using it.
        
           | jvalencia wrote:
           | It won't stop it, but will allow enforcement agencies to
           | enforce. Otherwise, they have no legal recourse to do so.
        
             | chatmasta wrote:
             | It's already illegal to defraud someone into sending you
             | money.
        
             | detrites wrote:
             | Enforce what?
             | 
             | Why should the ability to impersonate a persons voice
             | suddenly become a crime in itself?
             | 
             | Should we arrest Jim Carrey?
             | 
             | Isn't it when the thing was used to do something _else_
             | illegal when enforcement is required?
        
               | squeaky-clean wrote:
               | Sure but they can make extra penalties for using these in
               | illegal acts. For example robbing someone, and robbing
               | someone while using a gun get different sentences.
        
           | JohnFen wrote:
           | Nobody thinks that.
        
         | VoodooJuJu wrote:
         | >Introducing the National Postal Service - send a letter to
         | anyone for a nominal fee. No need for a personal courier, armed
         | escort, or patrician status.
         | 
         | >This is the kind of thing that should be illegal. Now, any
         | Plebian could essentially write a letter to anyone,
         | impersonating anyone. Forged letters could drag us into a war
         | with Persia - for Jupiter's sake!
        
           | burkaman wrote:
           | Yes, good point, mail fraud used to be a major problem and we
           | started passing laws to deal with it 150 years ago.
           | 
           | https://www.uspis.gov/history-spotlight/history-of-the-
           | mail-...
           | 
           | Maybe we'll need a new specialized law enforcement agency
           | like the Postal Inspectors to deal with the inevitable wave
           | of AI-assisted crime.
        
           | nkrisc wrote:
           | No, it's more like those "stress tester" services that you're
           | definitely-certainly-fingers-crossed supposed to only point
           | at your own servers.
           | 
           | Sure, this is marketed as generating your own voice to read
           | scripts for you YouTube channel, but are they actually
           | verifying who's voice you're generating?
        
           | palata wrote:
           | You completely miss the point: scale.
           | 
           | Try to deceive people by learning about their contacts,
           | writing a convincing letter and sending it. How long does it
           | take you to prepare one letter?
           | 
           | Now those AIs potentially allow you to just generate millions
           | of those with one click. The problem is the scale: everyone
           | can do it at no cost and at scale.
        
           | whywhywouldyou wrote:
           | This is a hilariously bad attempt at discrediting the
           | original argument. There's a vast difference between forging
           | a letter and replicating the unique vocal fingerprint of any
           | human being, on demand.
           | 
           | I suppose if we approach the point that we can create robotic
           | clones of anyone, anywhere, that look, sound, and move like
           | anyone on the planet, that will be just like the post office
           | too, right?
        
             | salad-tycoon wrote:
             | What are some of the differences? Besides the glaringly
             | obvious text vs audio. I mean prior to telegraphs if I got
             | a letter from my sweet heart with a lock of hair or
             | something and a request for funds I'd probably believe it,
             | especially if it took days or weeks to communicate back and
             | forth?
        
               | burkaman wrote:
               | Impersonating a letter is similar to having an
               | impressionist record an impersonation of someone's voice.
               | It's difficult, very imperfect, and not very scalable.
               | 
               | The analogy for this technology would be a robot that can
               | perfectly imitate someone's handwriting and vocabulary
               | using one letter as a reference.
        
         | oceanplexian wrote:
         | Making then the illegal would accomplish nothing since it's
         | already out in the wild. You can generate audio with high
         | quality on fine-tuned versions of Tortoise TTS, which was
         | originally trained on a cluster of NVIDIA 3090's, so it's
         | within reach for any smart person to train a from-scratch model
         | on consumer hardware. Realistically? We have to accept that
         | this tech exists and there will be both positive and negative
         | outcomes from it.
        
           | JohnFen wrote:
           | > Making then the illegal would accomplish nothing since it's
           | already out in the wild.
           | 
           | Not true. Making it illegal wouldn't make it nonexistent --
           | that's true. But making it illegal would provide at least
           | some method of mitigating some of the harm.
           | 
           | That's more than what we have right now.
           | 
           | > We have to accept that this tech exists and there will be
           | both positive and negative outcomes from it.
           | 
           | Of course. But that doesn't mean it's futile to try to reduce
           | the negative outcomes.
        
             | oceanplexian wrote:
             | > Not true. Making it illegal wouldn't make it nonexistent
             | -- that's true. But making it illegal would provide at
             | least some method of mitigating some of the harm.
             | 
             | It's already illegal to impersonate someone to steal money
             | or scam them, and those laws were on the books before
             | computers existed.
             | 
             | > Of course. But that doesn't mean it's futile to try to
             | reduce the negative outcomes.
             | 
             | You can run something on a consumer GPU and it's every bit
             | as good if you know how to dial it in. By the end of the
             | year you'll be able to download a nicely packaged "voice
             | cloner" from a torrent that runs on a cheap laptop. IMHO
             | any effort on regulation is far better spent informing
             | people rather than trying to put the cat back in the bag.
        
               | knodi123 wrote:
               | > It's already illegal to impersonate someone to steal
               | money or scam them, and those laws were on the books
               | before computers existed.
               | 
               | There are two hurdles a criminal has to get past:
               | 
               | 1. decide to break the law 2. figure out how to pull off
               | their scam
               | 
               | It sounds like you're saying that since hurdle #1 already
               | exists, hurdle #2 is irrelevant? No, of course it isn't.
               | That's like saying that gun control can't possible help
               | because it's already illegal to shoot someone.
               | 
               | Adding difficulty to a crime reduces (but does not
               | eliminate) the prevalence of the crime.
        
               | JohnFen wrote:
               | > It's already 100% futile.
               | 
               | I don't think so at all. There are all sorts of things
               | you can technically do with ease that are illegal for
               | good reason. Laws against them aren't futile.
               | 
               | But I admit that perhaps I'm being overly optimistic
               | here. I'm just trying very hard to see any way that this
               | stuff can end up not being a complete societal disaster.
        
           | throw10920 wrote:
           | Scale matters. The difference between 50K smart and dedicated
           | criminals being able to use this technique vs _anyone with a
           | web browser_ is significant.
        
         | hammadh wrote:
         | You are right, the technology will become ubiquitous,
         | therefore, at least for platforms like us, it's a
         | responsibility to have countermeasures and safeguards to
         | prevent abuse and harm people. There'll always be people who
         | will find ways to abuse but making it more and more difficult
         | and evolving on that seems like a way forward.
         | 
         | We have these measures in place and are working on others to
         | make sure the technology is used towards the betterment of
         | humanity.
         | 
         | 1/ Auto moderation on text to block harmful/malicious speech.
         | 2/ As someone pointed out in the comments, we had a manual
         | review process in place where the user is required to read out
         | a consent and a member from Play.ht would review it before
         | approving the voice. We're working on improving and adding this
         | back. 3/ The user facing service is paywalled so we don't allow
         | everyone in. 4/ Users trying to create malicious content are
         | flagged and reviewed. 5/ A classifier to detect AI generated
         | speech
        
         | jimlongton wrote:
         | Our only hope is that politicians and celebrities get sick of
         | their voices and likeness being used to scam people or sell
         | crypto or viagra and get laws passed against this type of
         | impersonation.
        
           | p0pcult wrote:
           | actually, this is the way. force their hand.
        
         | mahmoudfelfel wrote:
         | Cofounder here, What you see in the above demo is a very rate-
         | limited demo of our upcoming model. We realize how dangerous
         | this technology can be and have built a lot of mitigations on
         | our main product (Play.ht) to reduce possible abuse: - We
         | strictly moderate the generated text of any sexual, offensive,
         | racist, or threatening content. It automatically gets detected
         | and blocked.
         | 
         | - We built and are offering for free a tool that can identify
         | AI generated vs human-generated audio (https://play.ht/voice-
         | classifier-detect-ai-voices/), we will continue to invest in
         | this tool, and we hope it helps with deploying this technology
         | safely.
         | 
         | - If we get any reports of a cloned voice without consent, we
         | block the user and remove the voice instantly.
         | 
         | - The price of high-fidelity voice cloning is too high for
         | scammers to use at scale; we have been live with it for four
         | months and haven't had any cases of abuse so far.
         | 
         | Like any technology, it has the potential to be abused, and we
         | are working hard to mitigate that and deploy it safely. We will
         | continue to observe the use cases and user feedback and improve
         | the safety of the service accordingly.
         | 
         | Since we launched voice cloning 4 months ago, we have seen
         | enough genuine use cases which motivated us to keep moving
         | forward and figure out safe ways to make the technology useful
         | for all.
        
           | hn_throwaway_99 wrote:
           | > We strictly moderate the generated text of any sexual,
           | offensive, racist, or threatening content.
           | 
           | This is exactly what makes me so angry about "AI safety"
           | initiatives: they are largely worrying about the wrong thing.
           | People have been so focused on the "this may make some
           | obscene joke, or be biased against some skin colors" that
           | they have completely missed out on the much more serious
           | harms that AI will cause with respect to, in this case,
           | impersonation scams.
           | 
           | Congrats, people can't say the N-word with your technology,
           | but they can say "Hi Bob, just calling to verify that we did
           | indeed change the target account where you should wire your
           | invoice payment."
        
           | Loughla wrote:
           | >We strictly moderate the generated text of any sexual,
           | offensive, racist, or threatening content.
           | 
           | This won't be the problem. My voice calling my parents asking
           | for money to be sent to a random account will be the problem.
           | And none of that will be sexual, offensive, racist, or
           | threatening.
           | 
           | >we are working hard to mitigate that and deploy it safely.
           | 
           | How?
           | 
           | >we have seen enough genuine use cases
           | 
           | What?
        
             | hansoolo wrote:
             | Exactly. Would love to see some testimonials on that...
        
           | JohnFen wrote:
           | > haven't had any cases of abuse so far.
           | 
           | How do you know this?
        
         | fortyseven wrote:
         | > This is the first startup here where I think the tech should
         | essentially be illegal.
         | 
         | Yes, I agree, only criminals should be allowed to freely run
         | it.
        
       | downboots wrote:
       | Yikes
        
       | swader999 wrote:
       | Just in time for April fools day.
        
       | degun wrote:
       | This is the stuff of nightmares. I tried to create a voice based
       | on Jorge Luis Borges. I generated a voice and then a sample from
       | a text and it sounded like a haunted spirit coming to collect my
       | soul.
       | 
       | Alas, there is no stopping now.
        
       | jeremyis wrote:
       | Hi! Congrats on launching!
       | 
       | We recently evaluated play.ht for TTS but decided against it
       | because you had an async API which was harder to implement.
       | Alternatives have sync APIs (including Google Cloud). Do you have
       | plans to release a sync client for standard TTS?
        
       | jschveibinz wrote:
       | Come up with a shibboleth for your family group(s) and keep it to
       | yourselves. That will help to combat the scammers.
        
       | joshmn wrote:
       | Playing with this now, wow.
       | 
       | My mom passed away a few years ago. I always let her calls go to
       | my voicemail so I could have them. I was using Google Voice at
       | the time so this worked wonderfully. Unfortunately, I will not
       | listen to many of them -- she was an alcoholic and I can't bear
       | to listen to her while drunk. The few I have of her when she's
       | sober I listen to occasionally.
       | 
       | Having said, this is really nice.
        
         | testmasterflex wrote:
         | Sorry man. :( I wish you well
        
       | nsxwolf wrote:
       | This is going to be the shortest gold rush in history. Make your
       | money now because in a couple years you'll be able to build and
       | deploy your own Play.ht for free with a single ChatGPT prompt.
        
       | Josh5 wrote:
       | Way to make sound like an American!
        
       | koolala wrote:
       | Why doesn't your launch info-post here mention anything about
       | safety and the obvious concerns here? "We deeply care about
       | ethics and privacy and have implemented verfication processes and
       | regulations to avoid people cloning anyone's voice without their
       | consent." I found this in your FAQ.
        
       | rvz wrote:
       | All the better to accelerate the dystopia. At this point, it is
       | clear that no-one cares and it is every AI maker for themselves.
        
       | rolph wrote:
       | might there be some way to DRM this, so a key is required to
       | access the media ?
       | 
       | im thinking this would provide an opportunity to out the media as
       | AI generated
        
       | mathverse wrote:
       | This is actually something I would use myself. I've checked out
       | the few AI voices but they are not of the same quality. Majority
       | of them sound still very robotic and not like the real person.
       | The only passable one for me was "William"
        
       | arroz wrote:
       | This should be illegal
        
       | alexb_ wrote:
       | It's astounding to me how quickly HN turned into "We need to
       | track people who use technology in case they use it for crime"
       | when it comes to this. No we don't! Technology does not need to
       | be relentlessly tracked by government agencies "for your
       | protection" - haven't you learned anything?
        
       | RobotToaster wrote:
       | Can it sing?
        
         | WakoMan12 wrote:
         | [flagged]
        
       | mmayberry wrote:
       | The lack of moderation and NSFW content in the playground is
       | absolutely horrific. Why would you even have that option?
        
         | [deleted]
        
       | mugr wrote:
       | This is somewhat related, though I do not know how it was made.
       | https://lexman.rocks
        
         | rahimnathwani wrote:
         | They fine-tuned https://github.com/neonbjb/tortoise-tts
         | 
         | Neither they nor the tortoise-tts author have made public their
         | code/techniques for fine-tuning.
        
       | rockzom wrote:
       | Some part of me views this moment in history as such a force
       | multiplier that it seems myopic to squabble about the nickels and
       | dimes we'd get for our bird sounds and cave paintings. I wish I
       | was smarter and already caught up enough to take advantage of all
       | of this.
       | 
       | I tried resisting the urge to stop myself from even posting this
       | comment, but I'm willing to make an ass of myself so that
       | somebody who knows more about this than me can try to steel-man
       | this for me.
       | 
       | What do we do? The penny is the smallest bit of USD we have, and
       | hyper-fractional parts that incrementally make up an unimaginably
       | large whole are now the world we live in. It's difficult to
       | imagine a world where you receive 175 billion royalty
       | transactions of 1/1,000,000,000th of a $0.01 in a given year, but
       | maybe that's a reasonable scenario to think about when it comes
       | to the couple of bucks an average teen or adult should get from
       | their presumed default contributions to large language models.
       | 
       | Remember trying to hit that minimum threshold payment for Google
       | Adsense with your Blogspot, then finally getting your check after
       | 15 years? If nothing else, we shouldn't blithely tolerate that
       | again, because we didn't sign up for this. (We signed up for
       | stuff even worse than this, technically, but in those cases at
       | least we clicked "I Agree.")
        
       | geophph wrote:
       | <insert-dr-Ian-malcolm-gif-here>
       | 
       | Did you ever wonder if you really should do something like this?
        
       | [deleted]
        
       | yding wrote:
       | Very cool! Amazing work.
        
       | patrickmcnamara wrote:
       | This did not work for me at all. I tried my own voice and it just
       | made me sound like a young American instead of my actual Irish
       | accent. I almost sounded like Microsoft Sam in both samples.
        
       | anigbrowl wrote:
       | _Hey HN, we are Mahmoud and Hammad_
       | 
       | Are you though? You might just be computer-generated.
       | 
       | While I'm very impressed with this technically (and as a pro-
       | audio person I feel validated to see my predictions of a few
       | years back coming true so dramatically), I don't see anything
       | about risk management in here. Your tech absolutely will get used
       | by scammers, given the overabundance of voice data on the open
       | internet. How are you going to hedge against that?
        
         | mahmoudfelfel wrote:
         | We have many mitigations in place to increase the safety of
         | this service, I mentioned some of that here
         | https://news.ycombinator.com/item?id=35331310
        
           | anigbrowl wrote:
           | That's interesting. But I think it's a mistake to focus on
           | relying on price to prevent abuse at scale. The use case for
           | abuse of this technology is in highly targeted frauds, not
           | broad-spectrum scams like insurance robocalls. Additionally,
           | this will be zero deterrent to deep-pocketed actors like
           | political action committees that generate fakery to influence
           | elections and the like.
           | 
           | I'm trying not to be reflexively dismissive, and I know the
           | technology is evolving so fast that your individual company
           | can't necessarily pre-empt it, any more than an email
           | software supplier is responsible for the existence of
           | phishing. But I work adjacent to the security space (studying
           | violent extremists) and I can think of a _ton_ of ways to
           | abuse this where economics would be absolutely zero
           | deterrent.
        
       | TOMDM wrote:
       | Listening to the examples, this feels like an all around worse
       | version of Eleven Labs.
        
       | [deleted]
        
       | icemelt8 wrote:
       | Wow hammad, here on HN :P couldn't believe until few years ago
        
       | hn_throwaway_99 wrote:
       | This is currently the top example for me (with the NSFW check
       | _off_ ):
       | 
       | > I want you to lay me down, gently, and show me why you're known
       | as the most agile tongue this side of the Mississippi
       | 
       | Whoever wrote that, bravo, I needed a good laugh today.
        
       | zoklet-enjoyer wrote:
       | I'm going to use this to pump out podcasts
        
       | woeirua wrote:
       | The potential for scamming is limitless with this. Elderly people
       | were vulnerable to phone calls from their "relatives" before when
       | the voices didn't even sound that close. Can you imagine what the
       | hit rate is going to be on these scams when the voices are nearly
       | _identical_ to the voice of their relative? Also, at some point I
       | expect that even answering the phone and saying  "Hello" will be
       | enough for some AI model to zero-shot clone your voice with
       | enough fidelity to pass to most people. Tech like this is going
       | to absolutely destroy what little remains of voice conversations
       | over phones.
        
       | mugr wrote:
       | I don't know about you, but I just listen to all the uploads from
       | everyone uploading stuff there at
       | https://playground.play.ht/listen/$tracknumber and also download
       | all of them with the nice download button provided (I don't). But
       | it would be really nice to have these credible (after some
       | editing) recordings of senior officials saying all sorts of
       | politically incorrect things.
        
       | dejobaan wrote:
       | I love the voice quality, and have been talking with a bunch of
       | other game devs about how this and other TTS solutions have been
       | making remarkable strides recently (also visited your GDC booth
       | this past week!). Some years ago now, I worked on an experiment
       | that auto-generated a gaming-centric TV show on Steam[1], but one
       | of the big hurdles was that TTS was pretty flat (Amazon Polly);
       | we couldn't get as expressive as [2] for instance. A few years
       | ago, you could get emotive performances from TTS, but you needed
       | to put in a lot of post-processing work from an audio engineer
       | (e.g., Sonantic's TTS[3]). Stuff like ElevenLabs/PlayHT etc. seem
       | like they'd solve that part of the problem.
       | 
       | As an independent game dev, I think we'll use TTS for placeholder
       | VO a bunch - the writers can try out a pile of different
       | material, and we only have to have a VO actor record at the end.
       | And the current $600/year subscription for your "Ultra Realistic
       | Voices" is a steal when used for that part of production. But for
       | smaller studios, the pricing structure can make it tough to
       | evaluate a new tool properly. What I really want to do is to
       | spend 6 months having someone play around with the tech,
       | integrating it into our toolchains, testing it out on
       | playtesters, and so forth (and the 5,000 word free version won't
       | do that for us). That $600 to try it out really isn't
       | unreasonable, but when I'm also testing alongside Polly,
       | ElevenLabs, Altered.ai, Uberwhatsit, Murf, and whatever other
       | subscriptions, it's easy enough to say, "okay, well, maybe we
       | don't need to add one more."
       | 
       | I'm not sure what the solution is, but I think smaller studios,
       | who will be the ones to experiment with/benefit from this tech
       | most in the coming year will give it a pass because we're all
       | penny-pinchers.
       | 
       | [1] https://store.steampowered.com/labs/ultracast
       | 
       | [2]
       | https://cdn.cloudflare.steamstatic.com/store/labs/ultracast/...
       | 
       | [3] https://venturebeat.com/games/sonantic-uses-ai-to-infuse-
       | emo...
        
         | hammadh wrote:
         | Thanks for the feedback. We certainly want to support gaming
         | studios of all sizes and are working with few of them to
         | understand the workflows. What we've seen is not everyone wants
         | a high-fidelity clone (which cost more), most of their
         | voiceovers can be done with zeroshot cloning (quick clones that
         | don't cost much).
        
       | antibasilisk wrote:
       | the thing about these models is the latency is always way too
       | high for any sort of voice-assistant
        
       | praveenhm wrote:
       | Is there any free open-source alternative available to voice
       | cloning, how far whisper goes?
        
         | UncleEntity wrote:
         | Whisper does speech-to-text.
         | 
         | And there are open-source alternatives but I don't think the
         | quality is super good.
         | 
         | There's also enough information out there to do this yourself
         | with a bunch of GPU time, I have some ideas I want to try out
         | but don't have the (GPU) time.
        
       | devmunchies wrote:
       | How is the latency for real-time TTS? I remember kicking the
       | tires several months back but went with one of the big 3 cloud
       | providers since they had lower latency.
       | 
       | I also like that the cloud provider supports SSML and I can
       | explicitly configure the emotion, whereas Playht dynamically
       | changed the emotion based on context of the text.
        
         | oceanplexian wrote:
         | Low latency would open up a whole lot of interesting
         | applications. Even Elevenlabs doesn't seem to have low enough
         | latency in my testing to work as a convincing voice assistant
         | or to, for example, work in real time on a phone call. For that
         | we likely need QUIC or some kind of streaming protocol.
        
         | hammadh wrote:
         | The latency is not real-time yet but we're working on getting
         | it to near real time. Regarding controlling the voice, we've
         | added a few params like rate, voice guidance and temperature
         | but for the most part the emotion is dependent on the text for
         | now.
        
       | gigel82 wrote:
       | Results are pretty good. But I've got slightly better sounding
       | cloning from Tortoise TTS, and I can run that locally:
       | https://github.com/neonbjb/tortoise-tts
        
       | olup wrote:
       | Too expensive. Eleven labs is somewhat cheaper, but imo there
       | won't be a clear leader in this market until the prices are at
       | least 10x cheaper (which will happen soon enough)
        
       | 0xDEF wrote:
       | When OpenAI and Google insists on "safety" it leaves the door
       | open for startups that do things like this.
        
       | lfciv wrote:
       | Seems like we'll sort of inevitably end up with a sort of cross-
       | site verifiable identity on the internet. All content requiring
       | some sort of verified user backing it. Generally will be
       | interesting to see what an internet with less anonymity looks
       | like.
        
         | JohnFen wrote:
         | > Generally will be interesting to see what an internet with
         | less anonymity looks like.
         | 
         | It will be much more dangerous, in all probability.
        
       | bithavoc wrote:
       | So you could grab leaked info from a YouTuber and fully
       | impersonate a celebrity in any service that doesn't support
       | 2FA(?), this is also very bad for any podcaster.
        
         | Firmwarrior wrote:
         | Voices aren't unique at all. Nobody should have been using a
         | voice pattern as authentication at any point
        
       | afro88 wrote:
       | Tried it out and it made me sound British (I'm Australian, but I
       | only have a mild accent). It seems to have gotten my tone of
       | voice close but not my accent.
       | 
       | And then my pacing seems really off. Even a simple "Hey this is
       | afro88. How's it going?" sounded inhuman.
        
         | mahmoudfelfel wrote:
         | You can try the high-fidelity voice cloning here
         | https://play.ht/voice-cloning/
        
       | peteforde wrote:
       | When does the GPT4 Play.ht plugin launch?
       | 
       | It's already trivial for a developer to wire up GPT output to API
       | calls, so pearl clutching isn't helpful. I'd rather focus on
       | potential positive outcomes.
        
       | cloudking wrote:
       | Congrats on the launch, your text to speech quality is
       | unparalleled.
        
       | juliennakache wrote:
       | Looks great. I've waiting for a service like that ever since
       | Microsoft released their paper on speech synthesis using voice
       | samples. Feature requests: - make the voice generation available
       | via API so devs can embed that in their app - expose a streaming
       | API like Polly so we can feed it text in real time and get the
       | voice as an audio stream - make it Hipaa compliant and have a
       | plans that offer signing a BAA
       | 
       | I'll be your first customer if you do this! You can get in touch
       | with me at @juliennakache
        
         | hammadh wrote:
         | We have an API - https://docs.play.ht/reference/api-getting-
         | started
         | 
         | We have a beta streaming endpoint but the latency is not real
         | time yet (something we're working on) and are adding an
         | endpoint to create voices.
        
       | tanepiper wrote:
       | "Trusted by 7000+ users and teams of all sizes" [posts a bunch of
       | company logos]
       | 
       | You've just launched in beta, how can you claim this? I'm always
       | very suspicious of this (I take this from the position of being a
       | tech lead at a multi-billion euro retailer who's logo you'll
       | never be able to use)
       | 
       | Is this one developer? A team? Or is this just marketing bullshit
       | for VCs who somehow don't verify if this is true or not?
        
         | swyx wrote:
         | exactly. i hate this practice of just spraying logos all over
         | with no context. give me 3 logos but each with a written case
         | study or zoom conversation or even a tweet saying what they use
         | it for and you get more trust than 100 logos
        
         | hammadh wrote:
         | We launched the playground.play.ht in beta to share the new
         | speech model we are working on. We've been operating play.ht
         | for a while and have teams from these companies using the
         | platform.
        
       | sinuhe69 wrote:
       | How authentic is the result compared to what John Mayer did with
       | Steve Job's voice?
       | https://twitter.com/BEASTMODE/status/1637613704312242176
        
         | code51 wrote:
         | Sounding quite authentic to me. I tried to compare:
         | 
         | Sentence: "Undoubtedly the biggest global event that occurred
         | in 2020 was the COVID-19 pandemic."
         | 
         | - https://soundcloud.com/kynes-0/steve-jobs-cloned-voice-
         | bigge...
         | 
         | Sentence: "we've been working on artificial general
         | intelligence for many years, and we believe that we're on the
         | cusp of a major breakthrough."
         | 
         | - https://soundcloud.com/kynes-0/steve-jobs-cloned-voice-
         | break...
        
           | code51 wrote:
           | adding another example to reflect the high influence of style
           | and why either contextual awareness or voice-to-voice
           | guidance is essential for these tasks:
           | 
           | Steve Jobs (cloned voice) reading: "Do not go gentle into
           | that good night"
           | 
           | - https://soundcloud.com/kynes-0/steve-jobs-cloned-voice-do-
           | no...
        
       | SergeAx wrote:
       | Great product, giving it a try. Here you saying that 20 seconds
       | is enough, and on a "clone" page there is an instruction about 30
       | minutes for better result. Is there any kind of instruction about
       | how to create a good sample of the voice? For example, should I
       | speak English, or any language will do? Do you have some stats on
       | corellation between sample length and generation quality? Thank
       | you!
        
         | hammadh wrote:
         | Thanks. What we've shared here is a demo tool to show our new
         | speech model that can clone a voice with few seconds of audio.
         | You can try that with English or non-English recordings, but
         | the generated voice can only speak English at the moment. If
         | you are looking for high-fidelity cloning, you can sign up and
         | try it in our app here - https://play.ht/voice-cloning/
         | 
         | High-fidelity cloning requires at least 20 mins of good quality
         | audio. The more the better.
        
       | SergeAx wrote:
       | To the alarmists here: just look up the internet for "voice
       | cloning ML". There are lilerally YouTube tutorials on it. Stop
       | being luddits, you cannot stop the progress.
        
       | 1xdevloper wrote:
       | This is the only platform that seems to be offering unlimited
       | voice generation for a fixed monthly price. Does this have a
       | real-time streaming option?
        
       | mkarliner wrote:
       | Useless. Try training it on non US voices. I speak English, not
       | American and the generate voice sounded nothing like me. By the
       | way, I was SVP Engineering at a voice modification company.
        
         | yreg wrote:
         | This demo seems very far from useless.
        
       | EntrePrescott wrote:
       | > We are thrilled to be sharing our new model
       | 
       | really? that's a nice gesture... so where can I download it?
        
       | boosteri wrote:
       | Nice, wonder how long will it take for banks to cotton on and get
       | rid of the stupid "my voice is my password" verification
       | mechanism
        
       | paul7986 wrote:
       | YC is assisting / help to fund the next big telephone scam in
       | which more and more people across the world will fall victim to
       | cloned voice audio scams. Grandma is called by her grandsons
       | voice asking for money, but it's not him. YEAH!
       | 
       | I had envisioned the scammers leveling up to this scam last week
       | or so in a comment here. Though google news a few days later
       | showed me it's already happening...
       | 
       | https://www.dailymail.co.uk/news/article-11897239/Houston-co...
        
       ___________________________________________________________________
       (page generated 2023-03-27 23:00 UTC)