[HN Gopher] VASA-1: Lifelike audio-driven talking faces generate...
       ___________________________________________________________________
        
       VASA-1: Lifelike audio-driven talking faces generated in real time
        
       Author : vyrotek
       Score  : 198 points
       Date   : 2024-04-18 00:55 UTC (1 days ago)
        
 (HTM) web link (www.microsoft.com)
 (TXT) w3m dump (www.microsoft.com)
        
       | nowhereai wrote:
       | woah. so far not in the news. this is the only article
       | https://www.ai-gen.blog/2024/04/microsoft-vasa-1-ai-technolo...
        
       | acidburnNSA wrote:
       | Oh no. "Cameras on please!" will be replaced by "AI generated
       | faces off please!" in teams.
        
       | mdrzn wrote:
       | Holy shit these are really high quality and basically in realtime
       | on a 4090. What a time to be alive.
        
         | rbinv wrote:
         | It really is something. 40 FPS on a 4090, damn.
        
       | nojvek wrote:
       | I like the considerations topic.
       | 
       | There's likely also a an unsaid statement. This is for us only
       | and we'll be the only ones making money from it with our
       | definition of "safety" and "positive".
        
       | gavi wrote:
       | The GPU requirements for realtime video generation are very
       | minimal in the grand scheme of things. Assault on reality itself.
        
       | nycdatasci wrote:
       | "We have no plans to release an online demo, API, product,
       | additional implementation details, or any related offerings until
       | we are certain that the technology will be used responsibly and
       | in accordance with proper regulations."
        
         | feyman_r wrote:
         | /s it doesn't have the phrase LLM in the title
        
         | justinclift wrote:
         | > until we are certain that the technology will be used
         | responsibly ...
         | 
         | That's basically "never" then, so we'll see how long they hold
         | out.
         | 
         | Scammers are already using the existing voice/image/video
         | generation apparently fairly successfully. :(
        
           | ilaksh wrote:
           | Eventually someone will implement one of these really good
           | recent ones as open source and then it will be on replicate
           | etc. right now the open source ones like SadTalker and Video
           | Retalking are not live and are unconvincing.
        
           | spacemanspiff01 wrote:
           | Having a delay, where people can see what's coming down the
           | pipe, does have value. In a year there may/will be a open
           | source model.
           | 
           | But knowing that this is possible is important to know.
           | 
           | I'm fairly clued in, and am constantly surprised at how fast
           | things are changing.
        
             | justinclift wrote:
             | > But knowing that this is possible ...
             | 
             |  _Who_ knowing this is possible?
             | 
             | The general elderly person isn't going to know any time
             | soon. The SV IT people probably will.
             | 
             | It's not an even distribution of knowledge. ;/
        
         | sitzkrieg wrote:
         | money will change that
        
         | araes wrote:
         | Translation: "We're attempting to preserve our moat, and this
         | is the correct PR blurb. We'll release an API once we're far
         | enough ahead and extracted enough money."
         | 
         | Like somebody on Ars noted "anybody notice it's an election
         | year?" You don't need to release an API, all online videos are
         | now suspicious authenticity. Somebody make a video of Trump or
         | Biden's eyes following the mouse cursor around. Real videos
         | turned into fake videos.
        
       | qwertox wrote:
       | So an ugly person will be able to present his or her ideas on the
       | same visual level as a beautiful person. Is this some sort of
       | democratization?
        
       | fullstackchris wrote:
       | lol how does something like this get only 50ish votes but some
       | hallucinating video slop generator from some of the _other_
       | competitors gets thousands?
        
       | fluffet wrote:
       | This is absolutely crazy. And it'll only get better from here.
       | Imagine "VASA-9" or whatever.
       | 
       | I thought deepfakes were still quite a bit away but after this I
       | will have to be way more careful online. It's not far from behind
       | something that can show up in your "YouTube shorts" feed and
       | trick you if you didn't already know it was AI.
        
         | smusamashah wrote:
         | This is good but nowhere as good as EMO
         | https://humanaigc.github.io/emote-portrait-alive/
         | (https://news.ycombinator.com/item?id=39533326)
         | 
         | This one has too much movement and looks eerie/robotic/uncanny
         | valley. While EMO looks just perfect.
        
           | vessenes wrote:
           | Hard disagree -- I think you might be misremembering how EMO
           | looks in practice -- I'm sure we'll learn VASA-1 "telltales"
           | but to my eyes there are far fewer than EMO - zero of the EMO
           | videos were 'perfect' for me, and many show little glitches
           | or missing sync. VASA-1 still blinks a bit more than I think
           | is natural, but it looks much more fluid.
           | 
           | Both are, BTW, AMAZING!! Pretty crazy.
        
             | smusamashah wrote:
             | In VASA there is way to much body movement instead of just
             | being he head as if camera is moving in the strong winds.
             | EMO is a lot more human like. In the very first video on
             | the EMO page I still cannot see it as a generated video,
             | its that real. The lip movement, the expressions are in
             | almost in perfect sync with the voice. That is absolutely
             | not the case with VASA
        
       | gedy wrote:
       | My first thought was "oh no the interview fakes", but then I
       | realized - what if they just kept using the face? Would I care?
        
         | acidburnNSA wrote:
         | Yeah, even if they just use LLMs to do all the work, or are a
         | LLM themselves, as long as they can do the work I guess.
         | 
         | Weird implications for various regulations though.
        
       | pxoe wrote:
       | maybe making a webpage with 27 videos isn't the greatest web
       | design idea
        
         | sitzkrieg wrote:
         | the busted two scrolling sections on mobile really doesnt help
        
       | IshKebab wrote:
       | Oh god don't watch their teeth! Proper creepy.
       | 
       | Still, apart from the teeth this looks extremely convincing!
        
         | ygjb wrote:
         | yeah, teeth, tongue movement and lack of tongue shape and the
         | "stretching" of the skin around the cheeks in the images pushed
         | the videos right into the uncanny valley for me.
        
       | ilaksh wrote:
       | The paper mentions it uses Diffusion Transformers. The open
       | source implementation that comes up in Google is Facebook
       | Research's PyTorch implementation which is a non-commercial
       | license. https://github.com/facebookresearch/DiT
       | 
       | Is there something equivalent but MIT or Apache?
       | 
       | I feel like diffusion transformers are key now.
       | 
       | I wonder if OpenAI implemented their SORA stuff from scratch or
       | if they built on the Facebook Research diffusion transformers
       | library. That would be interesting if they violated the non-
       | commercial part.
       | 
       | Hm. Found one: https://github.com/milmor/diffusion-transformer-
       | keras
        
       | jazzyjackson wrote:
       | i get why this is interesting but why is it desirable?
       | 
       | real jurassic park "too preoccupied with whether they could"
       | vibes
        
         | acidburnNSA wrote:
         | Now I can join the meeting "in a suit" while being out
         | paddleboarding!
        
       | m3kw9 wrote:
       | If you see talking heads with static/simple/blurred backgrounds
       | from now on, assume it is fake. In the near future they will
       | accompany realistic backgrounds and even less detectable fakes,
       | we will have to assume all vids could be faked.
        
         | Retric wrote:
         | I still find the faces themselves to be really obviously wrong.
         | The sound is just off, close enough to tell who is being
         | imitated but not particularly good.
        
           | tredre3 wrote:
           | Especially the hair "physics" and sometimes the teeth shift
           | around a bit.
           | 
           | But that's nitpicking. It's good enough to fool someone not
           | watching too closely. And the fact that the result is _this_
           | good with a single photo is truly astonishing, we used to
           | have to train models on thousands of photos for days only to
           | end up with a worse result!
        
         | hypeatei wrote:
         | I wonder how video evidence in court is going to be affected by
         | this. Both from a defense and prosecution perspective.
         | 
         | Technically videos could've been faked before but it would
         | require a ton of effort and skill that no average person would
         | have.
        
           | greenavocado wrote:
           | There will be a new cottage industry of AI detectives that
           | serve as expert witnesses and they will attest to the
           | originality of media to the court
        
       | alfalfasprout wrote:
       | What this is starting to reveal is that there's a clear need for
       | some kind of chain of custody system that guarantees the
       | authenticity of what we see. Nikon/Canon tried doing this in the
       | past, but improper storage of private keys lead to
       | vulnerabilities. As far as I'm aware it's never extended to video
       | either.
       | 
       | With modern secure hardware keys it may yet be possible. The
       | difficulty is that any kind of photo/video manipulation would
       | break the signature (and there are practical reasons to want to
       | be able to edit videos obviously).
       | 
       | In the ideal world, any mutation to the original source content
       | could be traceable to the original source content. But that's not
       | an easy problem to solve.
        
         | throw__away7391 wrote:
         | No, we are merely returning to the pre-photography state of
         | things where a mere printed image is not sufficient evidence
         | for anything.
        
           | tass wrote:
           | There goes the dashcam industry...
        
             | barbazoo wrote:
             | You're being downvoted but I think the comment raises a
             | good question. what will happen when someone gets accused
             | of doctoring their dashcam footage? Or any footage used for
             | evidence.
        
           | hx8 wrote:
           | True, an image, audio clip, or video is not enough evidence
           | to establish truth.
           | 
           | We still need a way to establish truth. It's important for
           | security cameras, for politics, and for public figures. Here
           | are some things we could start looking into.
           | 
           | * Cameras that sign their output. Yes, this camera caught
           | this video, and it hasn't been modified. This is a must for
           | recordings being used in court evidence IMO. Otherwise
           | framing a crime is as easy as a few deep fakes and planting
           | some DNA or fingerprints at the scene of the crime.
           | 
           | * People digitally signing pictures/audio/videos of them.
           | Even if they digitally modified the data it shows that they
           | consent to having their image associated with that message.
           | It reduces the strength of the attack vector of deep fake
           | videos for reputation sabotage.
           | 
           | * Malicious content source detection and flagging. Think
           | email spam filter type tagging of fake content. Community
           | notes on X would be another good example.
           | 
           | * Digital manipulation detection. I'm less than hopeful this
           | will be the way in the long term, but could be used to
           | disprove some fraud.
        
             | alchemist1e9 wrote:
             | Blockchains can be used for cryptography time-stamping.
             | 
             | I've always had a suspicion that governments and large
             | companies would prefer a world without hard cryptographic
             | proofs. After wikileaks they noticed DKIM can cause them
             | major blowback. Somehow general public isn't aware all the
             | emails were proven authentic with DKIM signatures and even
             | in fairly educated circles people believe the "emails were
             | fake" but it's not actually possible.
        
             | alex_suzuki wrote:
             | Signing is great, but the hard part is managing keys and
             | trust.
        
           | BobaFloutist wrote:
           | Pre-photography it at least took effort, practice, and time,
           | to draw something convincing. Any skill with that much of a
           | barrier to entry kind of automatically reduces the ability to
           | be anonymous. And we didn't have the ability to
           | instantaneously distribute images world-wide.
        
           | anigbrowl wrote:
           | _merely_
           | 
           | You say this as if it were not a big deal, but losing a
           | century's worth of authentication infrastructure/practises is
           | a Bad Thing which will have large negative externalities.
        
         | bonton89 wrote:
         | I expect this type of system to be implemented in my lifetime.
         | It will allow whistleblowers and investigative sources to be
         | discredited or tracked down and persecuted.
        
       | TriangleEdge wrote:
       | Why is this research being done? Is this some kind of arms race?
       | The only purpose of this technology I can think of is getting
       | spies to abuse others.
       | 
       | Am I going to have to do AuthN and AuthZ on every phone call and
       | zoom now?
        
         | Arnavion wrote:
         | On the other hand, if deepfaking becomes common enough that
         | everyone stops trusting everything they read / see on the
         | internet, it would be a net good against the spread of
         | disinformation compared to today.
        
           | hiatus wrote:
           | I don't see that as an outcome. We have already seen a grand
           | erosion of trust in institutions. Moving to an even lower
           | trust society does not sound like it would have positive
           | consequences for discourse, public policy, or society at
           | large.
        
             | rightbyte wrote:
             | Ironically low effort deep fakes might increase trust in
             | organizations that have had the budget to fake stuff since
             | their inception. The losers are 'citizen journalist'
             | broadcasting on Youtube etc.
        
           | notaustinpowers wrote:
           | I don't see the extinction of trust through the introduction
           | of garbage falsehoods to be a net good.
           | 
           | Believing that everything you eat is poisoned is no way to
           | live. Believing that everything you see is a lie is also no
           | way to live.
        
           | anigbrowl wrote:
           | _everyone stops trusting everything_
           | 
           | Why would you expect this to happen? Lots of people are
           | gullible, if it were otherwise a lot of well-known
           | politicians would be out of a job or would never have been
           | elected to begin with.
        
             | ryandrake wrote:
             | If it's even commoner than "common enough" then anyone
             | could at least try to help their gullible friends and
             | family by sending them a deepfake video of them
             | doing/saying something they've never said. A lot of people
             | will suddenly wise up when a problem affects them directly.
        
           | piva00 wrote:
           | That's the whole issue though, spread of disinformation
           | eroded trust, furthering this into obliteration of all trust
           | is not a good outcome.
        
         | tithe wrote:
         | I get the feeling it's "someone's going to do this, so it might
         | as well be us."
         | 
         | It's fascinating how research can take on a life of its own and
         | will be pushed, by someone, to its own conclusion. Even for
         | immensely destructive technologies (e.g., atomic weapons,
         | viruses), the impact of a technology is its own attractor
         | (could you say that's risk-seeking behavior?)
         | 
         | > Am I going to have to do AuthN and AuthZ on every phone call
         | and zoom now?
         | 
         | "Alexa, I need an alibi for yesterday at noon."
        
         | andybak wrote:
         | Because the text for this is only a slight variation of the
         | tech for a broad range of legitimate applications?
         | 
         | Because even this precise tech has legitimate use cases?
         | 
         | > The only purpose of this technology I can think of is getting
         | spies to abuse others.
         | 
         | Can you really not think of any other use cases?
        
         | 1659447091 wrote:
         | Advertising. Now you and your friends star in the streaming
         | commercials and digital billboards near you! (whether you want
         | to or not)
        
       | balls187 wrote:
       | I'm curious what is the reason for deepfake research and what the
       | practical application is.
       | 
       | Can someone explain the commercial need to take someones likeness
       | and generate video content?
       | 
       | If I was an a-list celebrity, I would give permission for coke to
       | make a commercial with my likeness, provided I am allowed final
       | approval of the finished ad?
       | 
       | Do I have an avatar that attends my zoom work calls?
        
         | bugglebeetle wrote:
         | State disinformation and propaganda campaigns.
        
           | NortySpock wrote:
           | Corporate disinformation and propaganda campaigns.
           | 
           | Personal disinformation and propaganda campaigns.
           | 
           | Oh Brave New World, that has such fake people in it!
        
         | hypeatei wrote:
         | Entertainment maybe? I know that's not necessarily an ethical
         | reason but some have made hilarious AI-generated songs already.
        
         | mensetmanusman wrote:
         | The purpose is to give remote workers the ability to clone
         | themselves and automate their many jobs. /s
         | 
         | (but actually, because laziness is the driver of all
         | innovation, I wouldn't be surprised if this happens).
        
         | JamesBarney wrote:
         | Video games, entertainment, and avatars seems like the big
         | ones.
        
           | HeatrayEnjoyer wrote:
           | If that is really the reason then this is insane and everyone
           | involved should put their keyboards down and stop what they
           | are doing.
           | 
           | This would be as if we invented and sold nuclear weapons to
           | dig out quarry mines faster. The inconvenience it saves us
           | quickly disappears into the overwhelming shadow of the
           | enormous harm now enabled.
        
             | ImPostingOnHN wrote:
             | _> This would be as if we invented and sold nuclear weapons
             | to dig out quarry mines faster._
             | 
             | "Project Plowshare was the overall United States program
             | for the development of techniques to use nuclear explosives
             | for peaceful construction purposes." _[0]_
             | 
             |  _0:https://en.wikipedia.org/wiki/Project_Plowshare_
        
               | wumeow wrote:
               | Yeah, and it was terminated. Much harder to put this
               | genie back in the bottle.
        
         | r1chardnl wrote:
         | Apple Vision Pro personas competition
        
           | selimnairb wrote:
           | Thank god. I will finally be able to use my EyeMac with
           | dignity.
        
         | SkyPuncher wrote:
         | One the surface, it's a simple, understandable demo for the
         | masses. While at the same time, it hints at deeper commercial
         | usage.
         | 
         | Disney has been using digital likeness to maintain characters
         | who's actors/actresses have died. Princess Leia is the most
         | prominent example. Arguably, there is significant realistic
         | value in being able to generate a human-like character that
         | doesn't have to be recast. That character can be any age, at
         | any time, and look exactly like the actor/actress.
         | 
         | For actors/actresses, I suspect many of them will start
         | licensing their image/likeness as they look to wind down their
         | careers. It gives them on-going income with very little effort.
        
         | szundi wrote:
         | Imagine being the CEO and you just grab your salary and
         | options, go home, sit in the hot tub while one of the interns
         | carefully prompts GPT and VASE how you are giving a speech
         | online about strategic directions. /s
        
         | jdietrich wrote:
         | In this case, replacing humans in service jobs. From the paper:
         | 
         |  _" Such technology holds the promise of enriching digital
         | communication, increasing accessibility for those with
         | communicative impairments, transforming education methods with
         | interactive AI tutoring, and providing therapeutic support and
         | social interaction in healthcare."_
         | 
         | A convincing simulacrum of empathy could plausibly be the most
         | profitable product since oil.
        
         | bonton89 wrote:
         | Propaganda, political manipulation, narrative nudging, regular
         | scams and advertising.
         | 
         | Even though most of those things are illegal you could just
         | have foreign cat's paw firms do it. Maybe you fire them for
         | "going to far" after the damage is done, assuming some even
         | manages to connect the dots.
        
         | criddell wrote:
         | If beautiful people have an advantage in the job market, maybe
         | people will use deepfake technology when doing zoom interviews?
         | Maybe they will use it to alter their accent?
        
       | physhster wrote:
       | A fantastic technological advance for election interference!
        
         | RGamma wrote:
         | Such an _exciting_ startup idea! I 'm _thrilled_!
        
       | karaterobot wrote:
       | We need some clear legislation around this right now.
        
         | stronglikedan wrote:
         | counterpoint: we don't need any more legislation
        
           | qwertox wrote:
           | I tend towards agreeing with you. Many of the problems, like
           | impersonation, are already illegal.
           | 
           | And replacing a person which spreads lies, as can be seen in
           | most TV or glossy cover ads, shouldn't trigger some new legal
           | action. The only difference is that now the actor is also a
           | lie.
           | 
           | And countries which use actors or news anchors for spreading
           | propaganda surely won't see an issue with replacing them with
           | AI characters.
           | 
           | People who then get to read that their most favorite,
           | stunningly beautiful Instagram or TikTok influencer is
           | nothing but a fat, chips-eating ugly person using AI, may try
           | to raise some legal issues to soothe their disappointment.
           | They then might raise a point which sounds reasonable, but
           | which would then force politicians to also tackle the lies
           | which are spread in TV/Magazines ads.
           | 
           | Maybe clearly labeling any use of this tech, maybe even with
           | a QR code linking to who is the owner of the AI, similar to
           | QR codes on meat packaging which allow you to track the
           | origin of the meat, would be something what laws could be
           | helpful with, in the spirit of transparency.
        
         | 4ndrewl wrote:
         | In which jurisdiction?
        
           | karaterobot wrote:
           | What jurisdiction would not benefit from legislation around
           | duplicating people's identities using AI?
        
         | CamperBob2 wrote:
         | Legislation only impairs the good guys.
        
       | binkHN wrote:
       | Full details at https://www.microsoft.com/en-
       | us/research/project/vasa-1/
        
       | FredPret wrote:
       | Anyone have any good ideas for how we're going to do politics
       | now?
       | 
       | Today a big ML model can do this and it's somewhat regulate-able,
       | tomorrow people can do this on their contact-lens supercomputers
       | and anyone can generate a video of anything.
       | 
       | Is going back to personally knowing your local representative the
       | only way? How will we vote for national candidates if nobody
       | knows what they think or say?
        
         | qup wrote:
         | People in my circles have been saying this for a few years now,
         | and we've yet to see it happen.
         | 
         | I've got my popcorn ready.
         | 
         | But you can rest easy. Everyone just votes for the candidate
         | their party picked, anyway.
        
           | FredPret wrote:
           | It'll happen - deepfakes aren't good enough yet. But when
           | they become ubiquitous and hard to spot, it'll be chaos until
           | the average person is mentally inoculated against believing
           | any video / anything on the internet.
           | 
           | I wonder if it's possible to digitally sign footage as it's
           | captured? It'd be nice to have some share-able demonstrably
           | true media.
           | 
           | Edit: I'm a centrist and I definitely would lean one way or
           | the other based on who the options are (or who I think they
           | are).
        
         | T-A wrote:
         | > Today a big ML model can do this
         | 
         | Not that big:
         | 
         | https://github.com/Zejun-Yang/AniPortrait
         | 
         | https://huggingface.co/ZJYang/AniPortrait/tree/main
        
           | cchance wrote:
           | Didn't see that one pretty cool, not as good as Emo or Vasa
           | but pretty good
        
         | hx8 wrote:
         | Hyper targeted placement of generated content designed to
         | entice you to donate to political campaigns and to vote.
         | Perhaps leading to a point where entire video clips are
         | generated for a single viewer. Politicians and political
         | commentators will lease their likeness and voice out for
         | targeted messaging to be generated using their likeness. Less
         | reputable platforms will allow disinformation campaigns to
         | spread.
        
         | 4ndrewl wrote:
         | DNS? Might be that we need a radical (for some) change of
         | viewpoint.
         | 
         | Just as there's no privacy on the internet, how about 'theres
         | very little trust on the internet'. Assume everything not
         | securely signed by a trusted party is false.
        
         | kmlx wrote:
         | > How will we vote for national candidates if nobody knows what
         | they think or say?
         | 
         | i'm going to burst your bubble here, but most voters have no
         | idea about policies or candidates. most voters vote based on
         | inertia or minimal cues, not on policies or candidates.
         | 
         | i suggest you look up "The American Voter", "The Democratic
         | Dilemma: Can Citizens Learn What They Need to Know?" and
         | "American National Election Studies".
        
         | dwb wrote:
         | We already rely on chains of trust going back to the original
         | source, and will still. I find these alarmist posts a bit
         | mystifying - before photography, anyone could fake a quote of
         | anyone, and human civilisation got quite far. We had a bit over
         | a hundred years where phographic-quality images were possible
         | and very hard to fake (which did and still does vary with
         | technology), but clearly now we're past that. We'll manage!
        
           | BobaFloutist wrote:
           | Yeah I mean tabloids have been fooling people with doctored
           | photos for decades.
           | 
           | Potentially we'll need slightly tighter regulations on formal
           | press (so that people that care for accurate information have
           | a place they can get it) and definitely we'll want to steer
           | the culture back towards holding them accountable for
           | misinformation, but credulous people have always had easy
           | access to bad information.
           | 
           | I'm _much_ more worried at the potential abuse cases that
           | involve ordinary people that aren 't public figures, and have
           | much less ability to defend themselves. Heck, even
           | celebrities are a more vulnerable targets than politicians.
        
           | marcusverus wrote:
           | Presidential elections are frequently pretty close. Taking
           | the electoral college into account (not the popular vote,
           | which doesn't matter) Donald Trump won the 2016 election by a
           | grand total of ~80,000 votes in three states[0].
           | 
           | Knowing that retractions rarely get viral exposure, it's not
           | difficult to imagine that a few sufficiently-viral videos
           | could swing enough votes to impact a presidential election.
           | _Especially_ when considering that the average person is not
           | up to speed on the current state of the tech, and so has not
           | been prompted to build up the mindset that 's required to
           | fend off this new threat.
           | 
           | [0] https://www.washingtonpost.com/news/the-
           | fix/wp/2016/12/01/do...
        
             | dwb wrote:
             | Plausible. I was thinking over the longer-term.
        
           | woleium wrote:
           | The issue is better phrased as "how will we survive the
           | transition while some folk still believe the video they are
           | seeing is irrefutable proof the event happened?"
        
           | GeoAtreides wrote:
           | In the before times we didn't have social media and its
           | algorithms and reach. Does it matter that the chains of trust
           | debunk a viral lie 24 hours after it had spread? Not that
           | there's a lot of trust in the chains of trust to begin with.
           | And if you still have trust, then you're not the target of
           | the viral lie. And if you still have trust, then how long can
           | you hold on that trust when the lies keep coming 24/7 one
           | after another without end. As one movie critic once put it:
           | You might not have noticed it, but your brain did. Very
           | malleable this brain of ours.
           | 
           | The civilization might be fine, sure. Now, democracy, on the
           | other hand...
        
         | hooverd wrote:
         | People already believe any quote you slap on a JPEG.
        
         | TimedToasts wrote:
         | > Anyone have any good ideas for how we're going to do politics
         | now?
         | 
         | If a business is showing a demo of this you can be assured that
         | the Government already has this tech and has for a period of
         | time.
         | 
         | > How will we vote for national candidates if nobody knows what
         | they think or say?
         | 
         | You don't know what they think or say _now_ - hopefully this
         | disabuses people of this notion.
        
       | SirMaster wrote:
       | It looks all warpy and stretchy. That's not how skin and face
       | muscles work. Looks fake to me.
        
       | cs702 wrote:
       | And it's only going to get faster, better, easier, cheaper.[a]
       | 
       | Meanwhile, yesterday my credit card company asked me if I wanted
       | to use voice authentication for verifying my identity "more
       | securely" on the phone. Surely the company spent many millions of
       | dollars to enable this new security-theater feature.
       | 
       | It begs the question: Is every single executive and manager at my
       | credit card company _completely unaware_ that right now anyone
       | can clone anyone else 's voice by obtaining a short sample audio
       | clip taken from any social network? If anyone is aware, why is
       | the company acting like this?
       | 
       | Corporate America is _so far behind the times_ it 's not even
       | funny.
       | 
       | ---
       | 
       | [a] With apologies to Daft Punk.
        
         | user_7832 wrote:
         | > Is every single executive and manager at my credit card
         | company completely unaware that right now anyone can clone
         | anyone else's voice by obtaining a short sample audio clip
         | taken from any social network?
         | 
         | Your mistake is assuming the company cares. The "company" is a
         | hundred different disjointed departments that only care about
         | not getting caught Equifax-style (or filing for bankruptcy if
         | caught). If the marketing director sees a shiny new thing that
         | might boost some random KPI they may not really care about
         | security.
         | 
         | However in the rare chance that your bank is actually half
         | decent, I'd suggest contacting their IT/Security teams about
         | your concerns. Maybe you'll save some folks from getting
         | scammed?
        
           | cyanydeez wrote:
           | Also, this feature is probably just some midd level execs
           | plan for a bonus, not a rigorously reviewed and planned. It's
           | also probably in the pipeline for a decade so if they don't
           | push it out, suddenly they get no bonus for cancelling a
           | project.
           | 
           | Corporations are ultimately no better than governments and
           | likely worse depending on what their regulatory environment
           | looks like.
        
         | fragmede wrote:
         | I mean, what do you want them to do? If we think their security
         | officers are freaking out and holding meetings right now about
         | what to do, or if they're asleep at the wheel, we'd be seeing
         | the same thing from the outside, no?
        
         | ryandrake wrote:
         | Any time you add a "new" security gate to your product, it
         | should be _in addition to_ and not _instead of_ the existing
         | gates. Biometrics should not replace username /password, they
         | should be in addition to. Security Questions like "What was
         | your first pet's name" should not be able to get you in the
         | backdoor. SMS verification alone should not allow you to reset
         | your password. Same with this voice authentication stuff. It
         | should be another layer, not a replacement of your actual
         | credentials.
         | 
         | If you treat it as OR instead of AND, then your security is
         | only as good as the worst link in the chain.
        
       | andrewstuart wrote:
       | Despite vast investment in AI by VCs and vast numbers of startups
       | in the field, these sort of things remain unavailable as simple
       | consumer installable software.
       | 
       | Every second day HN has some post about some new amazing AI
       | system. Never available to download run and use.
       | 
       | Why the vast investment and no startup selling consumer
       | downloadable software to do it?
        
       | egberts1 wrote:
       | Cool! Now we can expect to see an endless stream of dead
       | president's speeches "LIVE" from the White House.
       | 
       | This should end well.
        
       | BobaFloutist wrote:
       | Oh good!
        
       | smusamashah wrote:
       | This is good but nowhere as good as EMO
       | https://humanaigc.github.io/emote-portrait-alive/
       | (https://news.ycombinator.com/item?id=39533326)
       | 
       | This one has too much fake looking body movement and looks
       | eerie/robotic/uncanny valley. The lips don't sync properly in
       | many places. Eye movement and over all head and body movement is
       | not very natural at all.
       | 
       | While EMO looks just perfect mostly. The very first two videos on
       | EMO page are perfect example of that. See the rap near the end to
       | see how good EMO is at lip sync.
        
         | cchance wrote:
         | Another research project with 0 model release
        
       | RcouF1uZ4gsC wrote:
       | > To show off the model, Microsoft created a VASA-1 research page
       | featuring many sample videos of the tool in action
       | 
       | With AI stuff, I have learned to be very skeptical until and
       | unless a relatively publicly accessible demo with user specified
       | inputs is available.
       | 
       | It is way too easy for humans to cherry pick the nice outputs, or
       | to take advantage of biases in the training data to generate nice
       | outputs, and is not at all reflective of how it holds up in the
       | real world.
       | 
       | Part of the reason why ChatGPT, Stable Diffusion, Dall-E had such
       | an impact is the people could try and see for themselves without
       | being told how awesome it was by the people making it.
        
       | dang wrote:
       | Related: https://arstechnica.com/information-
       | technology/2024/04/micro...
       | 
       | (via https://news.ycombinator.com/item?id=40088826, but we merged
       | that thread hither)
        
       ___________________________________________________________________
       (page generated 2024-04-19 23:01 UTC)