[HN Gopher] VASA-1: Lifelike audio-driven talking faces generate...
___________________________________________________________________
VASA-1: Lifelike audio-driven talking faces generated in real time
Author : vyrotek
Score : 198 points
Date : 2024-04-18 00:55 UTC (1 days ago)
(HTM) web link (www.microsoft.com)
(TXT) w3m dump (www.microsoft.com)
| nowhereai wrote:
| woah. so far not in the news. this is the only article
| https://www.ai-gen.blog/2024/04/microsoft-vasa-1-ai-technolo...
| acidburnNSA wrote:
| Oh no. "Cameras on please!" will be replaced by "AI generated
| faces off please!" in teams.
| mdrzn wrote:
| Holy shit these are really high quality and basically in realtime
| on a 4090. What a time to be alive.
| rbinv wrote:
| It really is something. 40 FPS on a 4090, damn.
| nojvek wrote:
| I like the considerations topic.
|
| There's likely also a an unsaid statement. This is for us only
| and we'll be the only ones making money from it with our
| definition of "safety" and "positive".
| gavi wrote:
| The GPU requirements for realtime video generation are very
| minimal in the grand scheme of things. Assault on reality itself.
| nycdatasci wrote:
| "We have no plans to release an online demo, API, product,
| additional implementation details, or any related offerings until
| we are certain that the technology will be used responsibly and
| in accordance with proper regulations."
| feyman_r wrote:
| /s it doesn't have the phrase LLM in the title
| justinclift wrote:
| > until we are certain that the technology will be used
| responsibly ...
|
| That's basically "never" then, so we'll see how long they hold
| out.
|
| Scammers are already using the existing voice/image/video
| generation apparently fairly successfully. :(
| ilaksh wrote:
| Eventually someone will implement one of these really good
| recent ones as open source and then it will be on replicate
| etc. right now the open source ones like SadTalker and Video
| Retalking are not live and are unconvincing.
| spacemanspiff01 wrote:
| Having a delay, where people can see what's coming down the
| pipe, does have value. In a year there may/will be a open
| source model.
|
| But knowing that this is possible is important to know.
|
| I'm fairly clued in, and am constantly surprised at how fast
| things are changing.
| justinclift wrote:
| > But knowing that this is possible ...
|
| _Who_ knowing this is possible?
|
| The general elderly person isn't going to know any time
| soon. The SV IT people probably will.
|
| It's not an even distribution of knowledge. ;/
| sitzkrieg wrote:
| money will change that
| araes wrote:
| Translation: "We're attempting to preserve our moat, and this
| is the correct PR blurb. We'll release an API once we're far
| enough ahead and extracted enough money."
|
| Like somebody on Ars noted "anybody notice it's an election
| year?" You don't need to release an API, all online videos are
| now suspicious authenticity. Somebody make a video of Trump or
| Biden's eyes following the mouse cursor around. Real videos
| turned into fake videos.
| qwertox wrote:
| So an ugly person will be able to present his or her ideas on the
| same visual level as a beautiful person. Is this some sort of
| democratization?
| fullstackchris wrote:
| lol how does something like this get only 50ish votes but some
| hallucinating video slop generator from some of the _other_
| competitors gets thousands?
| fluffet wrote:
| This is absolutely crazy. And it'll only get better from here.
| Imagine "VASA-9" or whatever.
|
| I thought deepfakes were still quite a bit away but after this I
| will have to be way more careful online. It's not far from behind
| something that can show up in your "YouTube shorts" feed and
| trick you if you didn't already know it was AI.
| smusamashah wrote:
| This is good but nowhere as good as EMO
| https://humanaigc.github.io/emote-portrait-alive/
| (https://news.ycombinator.com/item?id=39533326)
|
| This one has too much movement and looks eerie/robotic/uncanny
| valley. While EMO looks just perfect.
| vessenes wrote:
| Hard disagree -- I think you might be misremembering how EMO
| looks in practice -- I'm sure we'll learn VASA-1 "telltales"
| but to my eyes there are far fewer than EMO - zero of the EMO
| videos were 'perfect' for me, and many show little glitches
| or missing sync. VASA-1 still blinks a bit more than I think
| is natural, but it looks much more fluid.
|
| Both are, BTW, AMAZING!! Pretty crazy.
| smusamashah wrote:
| In VASA there is way to much body movement instead of just
| being he head as if camera is moving in the strong winds.
| EMO is a lot more human like. In the very first video on
| the EMO page I still cannot see it as a generated video,
| its that real. The lip movement, the expressions are in
| almost in perfect sync with the voice. That is absolutely
| not the case with VASA
| gedy wrote:
| My first thought was "oh no the interview fakes", but then I
| realized - what if they just kept using the face? Would I care?
| acidburnNSA wrote:
| Yeah, even if they just use LLMs to do all the work, or are a
| LLM themselves, as long as they can do the work I guess.
|
| Weird implications for various regulations though.
| pxoe wrote:
| maybe making a webpage with 27 videos isn't the greatest web
| design idea
| sitzkrieg wrote:
| the busted two scrolling sections on mobile really doesnt help
| IshKebab wrote:
| Oh god don't watch their teeth! Proper creepy.
|
| Still, apart from the teeth this looks extremely convincing!
| ygjb wrote:
| yeah, teeth, tongue movement and lack of tongue shape and the
| "stretching" of the skin around the cheeks in the images pushed
| the videos right into the uncanny valley for me.
| ilaksh wrote:
| The paper mentions it uses Diffusion Transformers. The open
| source implementation that comes up in Google is Facebook
| Research's PyTorch implementation which is a non-commercial
| license. https://github.com/facebookresearch/DiT
|
| Is there something equivalent but MIT or Apache?
|
| I feel like diffusion transformers are key now.
|
| I wonder if OpenAI implemented their SORA stuff from scratch or
| if they built on the Facebook Research diffusion transformers
| library. That would be interesting if they violated the non-
| commercial part.
|
| Hm. Found one: https://github.com/milmor/diffusion-transformer-
| keras
| jazzyjackson wrote:
| i get why this is interesting but why is it desirable?
|
| real jurassic park "too preoccupied with whether they could"
| vibes
| acidburnNSA wrote:
| Now I can join the meeting "in a suit" while being out
| paddleboarding!
| m3kw9 wrote:
| If you see talking heads with static/simple/blurred backgrounds
| from now on, assume it is fake. In the near future they will
| accompany realistic backgrounds and even less detectable fakes,
| we will have to assume all vids could be faked.
| Retric wrote:
| I still find the faces themselves to be really obviously wrong.
| The sound is just off, close enough to tell who is being
| imitated but not particularly good.
| tredre3 wrote:
| Especially the hair "physics" and sometimes the teeth shift
| around a bit.
|
| But that's nitpicking. It's good enough to fool someone not
| watching too closely. And the fact that the result is _this_
| good with a single photo is truly astonishing, we used to
| have to train models on thousands of photos for days only to
| end up with a worse result!
| hypeatei wrote:
| I wonder how video evidence in court is going to be affected by
| this. Both from a defense and prosecution perspective.
|
| Technically videos could've been faked before but it would
| require a ton of effort and skill that no average person would
| have.
| greenavocado wrote:
| There will be a new cottage industry of AI detectives that
| serve as expert witnesses and they will attest to the
| originality of media to the court
| alfalfasprout wrote:
| What this is starting to reveal is that there's a clear need for
| some kind of chain of custody system that guarantees the
| authenticity of what we see. Nikon/Canon tried doing this in the
| past, but improper storage of private keys lead to
| vulnerabilities. As far as I'm aware it's never extended to video
| either.
|
| With modern secure hardware keys it may yet be possible. The
| difficulty is that any kind of photo/video manipulation would
| break the signature (and there are practical reasons to want to
| be able to edit videos obviously).
|
| In the ideal world, any mutation to the original source content
| could be traceable to the original source content. But that's not
| an easy problem to solve.
| throw__away7391 wrote:
| No, we are merely returning to the pre-photography state of
| things where a mere printed image is not sufficient evidence
| for anything.
| tass wrote:
| There goes the dashcam industry...
| barbazoo wrote:
| You're being downvoted but I think the comment raises a
| good question. what will happen when someone gets accused
| of doctoring their dashcam footage? Or any footage used for
| evidence.
| hx8 wrote:
| True, an image, audio clip, or video is not enough evidence
| to establish truth.
|
| We still need a way to establish truth. It's important for
| security cameras, for politics, and for public figures. Here
| are some things we could start looking into.
|
| * Cameras that sign their output. Yes, this camera caught
| this video, and it hasn't been modified. This is a must for
| recordings being used in court evidence IMO. Otherwise
| framing a crime is as easy as a few deep fakes and planting
| some DNA or fingerprints at the scene of the crime.
|
| * People digitally signing pictures/audio/videos of them.
| Even if they digitally modified the data it shows that they
| consent to having their image associated with that message.
| It reduces the strength of the attack vector of deep fake
| videos for reputation sabotage.
|
| * Malicious content source detection and flagging. Think
| email spam filter type tagging of fake content. Community
| notes on X would be another good example.
|
| * Digital manipulation detection. I'm less than hopeful this
| will be the way in the long term, but could be used to
| disprove some fraud.
| alchemist1e9 wrote:
| Blockchains can be used for cryptography time-stamping.
|
| I've always had a suspicion that governments and large
| companies would prefer a world without hard cryptographic
| proofs. After wikileaks they noticed DKIM can cause them
| major blowback. Somehow general public isn't aware all the
| emails were proven authentic with DKIM signatures and even
| in fairly educated circles people believe the "emails were
| fake" but it's not actually possible.
| alex_suzuki wrote:
| Signing is great, but the hard part is managing keys and
| trust.
| BobaFloutist wrote:
| Pre-photography it at least took effort, practice, and time,
| to draw something convincing. Any skill with that much of a
| barrier to entry kind of automatically reduces the ability to
| be anonymous. And we didn't have the ability to
| instantaneously distribute images world-wide.
| anigbrowl wrote:
| _merely_
|
| You say this as if it were not a big deal, but losing a
| century's worth of authentication infrastructure/practises is
| a Bad Thing which will have large negative externalities.
| bonton89 wrote:
| I expect this type of system to be implemented in my lifetime.
| It will allow whistleblowers and investigative sources to be
| discredited or tracked down and persecuted.
| TriangleEdge wrote:
| Why is this research being done? Is this some kind of arms race?
| The only purpose of this technology I can think of is getting
| spies to abuse others.
|
| Am I going to have to do AuthN and AuthZ on every phone call and
| zoom now?
| Arnavion wrote:
| On the other hand, if deepfaking becomes common enough that
| everyone stops trusting everything they read / see on the
| internet, it would be a net good against the spread of
| disinformation compared to today.
| hiatus wrote:
| I don't see that as an outcome. We have already seen a grand
| erosion of trust in institutions. Moving to an even lower
| trust society does not sound like it would have positive
| consequences for discourse, public policy, or society at
| large.
| rightbyte wrote:
| Ironically low effort deep fakes might increase trust in
| organizations that have had the budget to fake stuff since
| their inception. The losers are 'citizen journalist'
| broadcasting on Youtube etc.
| notaustinpowers wrote:
| I don't see the extinction of trust through the introduction
| of garbage falsehoods to be a net good.
|
| Believing that everything you eat is poisoned is no way to
| live. Believing that everything you see is a lie is also no
| way to live.
| anigbrowl wrote:
| _everyone stops trusting everything_
|
| Why would you expect this to happen? Lots of people are
| gullible, if it were otherwise a lot of well-known
| politicians would be out of a job or would never have been
| elected to begin with.
| ryandrake wrote:
| If it's even commoner than "common enough" then anyone
| could at least try to help their gullible friends and
| family by sending them a deepfake video of them
| doing/saying something they've never said. A lot of people
| will suddenly wise up when a problem affects them directly.
| piva00 wrote:
| That's the whole issue though, spread of disinformation
| eroded trust, furthering this into obliteration of all trust
| is not a good outcome.
| tithe wrote:
| I get the feeling it's "someone's going to do this, so it might
| as well be us."
|
| It's fascinating how research can take on a life of its own and
| will be pushed, by someone, to its own conclusion. Even for
| immensely destructive technologies (e.g., atomic weapons,
| viruses), the impact of a technology is its own attractor
| (could you say that's risk-seeking behavior?)
|
| > Am I going to have to do AuthN and AuthZ on every phone call
| and zoom now?
|
| "Alexa, I need an alibi for yesterday at noon."
| andybak wrote:
| Because the text for this is only a slight variation of the
| tech for a broad range of legitimate applications?
|
| Because even this precise tech has legitimate use cases?
|
| > The only purpose of this technology I can think of is getting
| spies to abuse others.
|
| Can you really not think of any other use cases?
| 1659447091 wrote:
| Advertising. Now you and your friends star in the streaming
| commercials and digital billboards near you! (whether you want
| to or not)
| balls187 wrote:
| I'm curious what is the reason for deepfake research and what the
| practical application is.
|
| Can someone explain the commercial need to take someones likeness
| and generate video content?
|
| If I was an a-list celebrity, I would give permission for coke to
| make a commercial with my likeness, provided I am allowed final
| approval of the finished ad?
|
| Do I have an avatar that attends my zoom work calls?
| bugglebeetle wrote:
| State disinformation and propaganda campaigns.
| NortySpock wrote:
| Corporate disinformation and propaganda campaigns.
|
| Personal disinformation and propaganda campaigns.
|
| Oh Brave New World, that has such fake people in it!
| hypeatei wrote:
| Entertainment maybe? I know that's not necessarily an ethical
| reason but some have made hilarious AI-generated songs already.
| mensetmanusman wrote:
| The purpose is to give remote workers the ability to clone
| themselves and automate their many jobs. /s
|
| (but actually, because laziness is the driver of all
| innovation, I wouldn't be surprised if this happens).
| JamesBarney wrote:
| Video games, entertainment, and avatars seems like the big
| ones.
| HeatrayEnjoyer wrote:
| If that is really the reason then this is insane and everyone
| involved should put their keyboards down and stop what they
| are doing.
|
| This would be as if we invented and sold nuclear weapons to
| dig out quarry mines faster. The inconvenience it saves us
| quickly disappears into the overwhelming shadow of the
| enormous harm now enabled.
| ImPostingOnHN wrote:
| _> This would be as if we invented and sold nuclear weapons
| to dig out quarry mines faster._
|
| "Project Plowshare was the overall United States program
| for the development of techniques to use nuclear explosives
| for peaceful construction purposes." _[0]_
|
| _0:https://en.wikipedia.org/wiki/Project_Plowshare_
| wumeow wrote:
| Yeah, and it was terminated. Much harder to put this
| genie back in the bottle.
| r1chardnl wrote:
| Apple Vision Pro personas competition
| selimnairb wrote:
| Thank god. I will finally be able to use my EyeMac with
| dignity.
| SkyPuncher wrote:
| One the surface, it's a simple, understandable demo for the
| masses. While at the same time, it hints at deeper commercial
| usage.
|
| Disney has been using digital likeness to maintain characters
| who's actors/actresses have died. Princess Leia is the most
| prominent example. Arguably, there is significant realistic
| value in being able to generate a human-like character that
| doesn't have to be recast. That character can be any age, at
| any time, and look exactly like the actor/actress.
|
| For actors/actresses, I suspect many of them will start
| licensing their image/likeness as they look to wind down their
| careers. It gives them on-going income with very little effort.
| szundi wrote:
| Imagine being the CEO and you just grab your salary and
| options, go home, sit in the hot tub while one of the interns
| carefully prompts GPT and VASE how you are giving a speech
| online about strategic directions. /s
| jdietrich wrote:
| In this case, replacing humans in service jobs. From the paper:
|
| _" Such technology holds the promise of enriching digital
| communication, increasing accessibility for those with
| communicative impairments, transforming education methods with
| interactive AI tutoring, and providing therapeutic support and
| social interaction in healthcare."_
|
| A convincing simulacrum of empathy could plausibly be the most
| profitable product since oil.
| bonton89 wrote:
| Propaganda, political manipulation, narrative nudging, regular
| scams and advertising.
|
| Even though most of those things are illegal you could just
| have foreign cat's paw firms do it. Maybe you fire them for
| "going to far" after the damage is done, assuming some even
| manages to connect the dots.
| criddell wrote:
| If beautiful people have an advantage in the job market, maybe
| people will use deepfake technology when doing zoom interviews?
| Maybe they will use it to alter their accent?
| physhster wrote:
| A fantastic technological advance for election interference!
| RGamma wrote:
| Such an _exciting_ startup idea! I 'm _thrilled_!
| karaterobot wrote:
| We need some clear legislation around this right now.
| stronglikedan wrote:
| counterpoint: we don't need any more legislation
| qwertox wrote:
| I tend towards agreeing with you. Many of the problems, like
| impersonation, are already illegal.
|
| And replacing a person which spreads lies, as can be seen in
| most TV or glossy cover ads, shouldn't trigger some new legal
| action. The only difference is that now the actor is also a
| lie.
|
| And countries which use actors or news anchors for spreading
| propaganda surely won't see an issue with replacing them with
| AI characters.
|
| People who then get to read that their most favorite,
| stunningly beautiful Instagram or TikTok influencer is
| nothing but a fat, chips-eating ugly person using AI, may try
| to raise some legal issues to soothe their disappointment.
| They then might raise a point which sounds reasonable, but
| which would then force politicians to also tackle the lies
| which are spread in TV/Magazines ads.
|
| Maybe clearly labeling any use of this tech, maybe even with
| a QR code linking to who is the owner of the AI, similar to
| QR codes on meat packaging which allow you to track the
| origin of the meat, would be something what laws could be
| helpful with, in the spirit of transparency.
| 4ndrewl wrote:
| In which jurisdiction?
| karaterobot wrote:
| What jurisdiction would not benefit from legislation around
| duplicating people's identities using AI?
| CamperBob2 wrote:
| Legislation only impairs the good guys.
| binkHN wrote:
| Full details at https://www.microsoft.com/en-
| us/research/project/vasa-1/
| FredPret wrote:
| Anyone have any good ideas for how we're going to do politics
| now?
|
| Today a big ML model can do this and it's somewhat regulate-able,
| tomorrow people can do this on their contact-lens supercomputers
| and anyone can generate a video of anything.
|
| Is going back to personally knowing your local representative the
| only way? How will we vote for national candidates if nobody
| knows what they think or say?
| qup wrote:
| People in my circles have been saying this for a few years now,
| and we've yet to see it happen.
|
| I've got my popcorn ready.
|
| But you can rest easy. Everyone just votes for the candidate
| their party picked, anyway.
| FredPret wrote:
| It'll happen - deepfakes aren't good enough yet. But when
| they become ubiquitous and hard to spot, it'll be chaos until
| the average person is mentally inoculated against believing
| any video / anything on the internet.
|
| I wonder if it's possible to digitally sign footage as it's
| captured? It'd be nice to have some share-able demonstrably
| true media.
|
| Edit: I'm a centrist and I definitely would lean one way or
| the other based on who the options are (or who I think they
| are).
| T-A wrote:
| > Today a big ML model can do this
|
| Not that big:
|
| https://github.com/Zejun-Yang/AniPortrait
|
| https://huggingface.co/ZJYang/AniPortrait/tree/main
| cchance wrote:
| Didn't see that one pretty cool, not as good as Emo or Vasa
| but pretty good
| hx8 wrote:
| Hyper targeted placement of generated content designed to
| entice you to donate to political campaigns and to vote.
| Perhaps leading to a point where entire video clips are
| generated for a single viewer. Politicians and political
| commentators will lease their likeness and voice out for
| targeted messaging to be generated using their likeness. Less
| reputable platforms will allow disinformation campaigns to
| spread.
| 4ndrewl wrote:
| DNS? Might be that we need a radical (for some) change of
| viewpoint.
|
| Just as there's no privacy on the internet, how about 'theres
| very little trust on the internet'. Assume everything not
| securely signed by a trusted party is false.
| kmlx wrote:
| > How will we vote for national candidates if nobody knows what
| they think or say?
|
| i'm going to burst your bubble here, but most voters have no
| idea about policies or candidates. most voters vote based on
| inertia or minimal cues, not on policies or candidates.
|
| i suggest you look up "The American Voter", "The Democratic
| Dilemma: Can Citizens Learn What They Need to Know?" and
| "American National Election Studies".
| dwb wrote:
| We already rely on chains of trust going back to the original
| source, and will still. I find these alarmist posts a bit
| mystifying - before photography, anyone could fake a quote of
| anyone, and human civilisation got quite far. We had a bit over
| a hundred years where phographic-quality images were possible
| and very hard to fake (which did and still does vary with
| technology), but clearly now we're past that. We'll manage!
| BobaFloutist wrote:
| Yeah I mean tabloids have been fooling people with doctored
| photos for decades.
|
| Potentially we'll need slightly tighter regulations on formal
| press (so that people that care for accurate information have
| a place they can get it) and definitely we'll want to steer
| the culture back towards holding them accountable for
| misinformation, but credulous people have always had easy
| access to bad information.
|
| I'm _much_ more worried at the potential abuse cases that
| involve ordinary people that aren 't public figures, and have
| much less ability to defend themselves. Heck, even
| celebrities are a more vulnerable targets than politicians.
| marcusverus wrote:
| Presidential elections are frequently pretty close. Taking
| the electoral college into account (not the popular vote,
| which doesn't matter) Donald Trump won the 2016 election by a
| grand total of ~80,000 votes in three states[0].
|
| Knowing that retractions rarely get viral exposure, it's not
| difficult to imagine that a few sufficiently-viral videos
| could swing enough votes to impact a presidential election.
| _Especially_ when considering that the average person is not
| up to speed on the current state of the tech, and so has not
| been prompted to build up the mindset that 's required to
| fend off this new threat.
|
| [0] https://www.washingtonpost.com/news/the-
| fix/wp/2016/12/01/do...
| dwb wrote:
| Plausible. I was thinking over the longer-term.
| woleium wrote:
| The issue is better phrased as "how will we survive the
| transition while some folk still believe the video they are
| seeing is irrefutable proof the event happened?"
| GeoAtreides wrote:
| In the before times we didn't have social media and its
| algorithms and reach. Does it matter that the chains of trust
| debunk a viral lie 24 hours after it had spread? Not that
| there's a lot of trust in the chains of trust to begin with.
| And if you still have trust, then you're not the target of
| the viral lie. And if you still have trust, then how long can
| you hold on that trust when the lies keep coming 24/7 one
| after another without end. As one movie critic once put it:
| You might not have noticed it, but your brain did. Very
| malleable this brain of ours.
|
| The civilization might be fine, sure. Now, democracy, on the
| other hand...
| hooverd wrote:
| People already believe any quote you slap on a JPEG.
| TimedToasts wrote:
| > Anyone have any good ideas for how we're going to do politics
| now?
|
| If a business is showing a demo of this you can be assured that
| the Government already has this tech and has for a period of
| time.
|
| > How will we vote for national candidates if nobody knows what
| they think or say?
|
| You don't know what they think or say _now_ - hopefully this
| disabuses people of this notion.
| SirMaster wrote:
| It looks all warpy and stretchy. That's not how skin and face
| muscles work. Looks fake to me.
| cs702 wrote:
| And it's only going to get faster, better, easier, cheaper.[a]
|
| Meanwhile, yesterday my credit card company asked me if I wanted
| to use voice authentication for verifying my identity "more
| securely" on the phone. Surely the company spent many millions of
| dollars to enable this new security-theater feature.
|
| It begs the question: Is every single executive and manager at my
| credit card company _completely unaware_ that right now anyone
| can clone anyone else 's voice by obtaining a short sample audio
| clip taken from any social network? If anyone is aware, why is
| the company acting like this?
|
| Corporate America is _so far behind the times_ it 's not even
| funny.
|
| ---
|
| [a] With apologies to Daft Punk.
| user_7832 wrote:
| > Is every single executive and manager at my credit card
| company completely unaware that right now anyone can clone
| anyone else's voice by obtaining a short sample audio clip
| taken from any social network?
|
| Your mistake is assuming the company cares. The "company" is a
| hundred different disjointed departments that only care about
| not getting caught Equifax-style (or filing for bankruptcy if
| caught). If the marketing director sees a shiny new thing that
| might boost some random KPI they may not really care about
| security.
|
| However in the rare chance that your bank is actually half
| decent, I'd suggest contacting their IT/Security teams about
| your concerns. Maybe you'll save some folks from getting
| scammed?
| cyanydeez wrote:
| Also, this feature is probably just some midd level execs
| plan for a bonus, not a rigorously reviewed and planned. It's
| also probably in the pipeline for a decade so if they don't
| push it out, suddenly they get no bonus for cancelling a
| project.
|
| Corporations are ultimately no better than governments and
| likely worse depending on what their regulatory environment
| looks like.
| fragmede wrote:
| I mean, what do you want them to do? If we think their security
| officers are freaking out and holding meetings right now about
| what to do, or if they're asleep at the wheel, we'd be seeing
| the same thing from the outside, no?
| ryandrake wrote:
| Any time you add a "new" security gate to your product, it
| should be _in addition to_ and not _instead of_ the existing
| gates. Biometrics should not replace username /password, they
| should be in addition to. Security Questions like "What was
| your first pet's name" should not be able to get you in the
| backdoor. SMS verification alone should not allow you to reset
| your password. Same with this voice authentication stuff. It
| should be another layer, not a replacement of your actual
| credentials.
|
| If you treat it as OR instead of AND, then your security is
| only as good as the worst link in the chain.
| andrewstuart wrote:
| Despite vast investment in AI by VCs and vast numbers of startups
| in the field, these sort of things remain unavailable as simple
| consumer installable software.
|
| Every second day HN has some post about some new amazing AI
| system. Never available to download run and use.
|
| Why the vast investment and no startup selling consumer
| downloadable software to do it?
| egberts1 wrote:
| Cool! Now we can expect to see an endless stream of dead
| president's speeches "LIVE" from the White House.
|
| This should end well.
| BobaFloutist wrote:
| Oh good!
| smusamashah wrote:
| This is good but nowhere as good as EMO
| https://humanaigc.github.io/emote-portrait-alive/
| (https://news.ycombinator.com/item?id=39533326)
|
| This one has too much fake looking body movement and looks
| eerie/robotic/uncanny valley. The lips don't sync properly in
| many places. Eye movement and over all head and body movement is
| not very natural at all.
|
| While EMO looks just perfect mostly. The very first two videos on
| EMO page are perfect example of that. See the rap near the end to
| see how good EMO is at lip sync.
| cchance wrote:
| Another research project with 0 model release
| RcouF1uZ4gsC wrote:
| > To show off the model, Microsoft created a VASA-1 research page
| featuring many sample videos of the tool in action
|
| With AI stuff, I have learned to be very skeptical until and
| unless a relatively publicly accessible demo with user specified
| inputs is available.
|
| It is way too easy for humans to cherry pick the nice outputs, or
| to take advantage of biases in the training data to generate nice
| outputs, and is not at all reflective of how it holds up in the
| real world.
|
| Part of the reason why ChatGPT, Stable Diffusion, Dall-E had such
| an impact is the people could try and see for themselves without
| being told how awesome it was by the people making it.
| dang wrote:
| Related: https://arstechnica.com/information-
| technology/2024/04/micro...
|
| (via https://news.ycombinator.com/item?id=40088826, but we merged
| that thread hither)
___________________________________________________________________
(page generated 2024-04-19 23:01 UTC)