[HN Gopher] Highly realistic talking head video generation
       ___________________________________________________________________
        
       Highly realistic talking head video generation
        
       Author : HuiLi1998
       Score  : 100 points
       Date   : 2024-06-16 03:04 UTC (19 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | xixixao wrote:
       | It's getting there, but you can still see that certain sounds are
       | not realistically rendered. For example V in "vengence" the
       | bottom lip has to touch the upper teeth, but in the renderings
       | it's only approximated.
        
       | Anotheroneagain wrote:
       | I just have to think about the implications of going the other
       | way round.
        
         | Loughla wrote:
         | I don't know what that means. What do you mean?
        
         | CoastalCoder wrote:
         | Automated lip-reading?
         | 
         | Ugh... paired with laws that restrict audio recording but not
         | video recording.
        
         | amelius wrote:
         | It means that soon we'll all be wearing burqas to mask our
         | lips.
        
           | 123yawaworht456 wrote:
           | have you overslept 2020-2022?
        
             | amelius wrote:
             | I could never stand the ear-straps.
        
       | 42lux wrote:
       | It's still on the wav2lip[1] level from 5 years ago (look at the
       | teeth) just slightly higher resolution. The only real player with
       | a moat is probably flawlessai. [2]
       | 
       | [1] https://github.com/Rudrabha/Wav2Lip
       | 
       | [2] https://www.flawlessai.com/
        
         | HuiLi1998 wrote:
         | Actually, our work not only targets lip movements, it can also
         | produce more realistic head movements and facial muscle
         | movements. ps: It also achieved a good syncnnet score on 200
         | randomly collected wild image-audio pair. And it is open
         | sourced.:)
        
           | 42lux wrote:
           | I get exactly what you are doing but I don't understand what
           | would be novel about it? It really looks just like a simple
           | SD/wav2vec/insightface/animatediff pipeline, everyone can
           | plug together in comfy or with diffusers. The muscle claim is
           | also a bit dubious...
           | 
           | PS: With insightface models in your reqs the oss aspect is
           | also pretty much void for any use other than research and
           | your readme should reflect that.
        
         | lossolo wrote:
         | What's SOTA open source?
        
         | Loughla wrote:
         | The flawlessai is interesting, but even in their formal example
         | for marketing there are weird tells. The actresses eyes are too
         | jerky and weird during the changed sections. It's strange, but
         | in a way that I'm not sure I could tag without the context of
         | the rest of the video.
         | 
         | Is this what we're in for? Human movement that is just normal
         | enough to pass, but not really natural enough to be comfortable
         | with?
        
           | kevindamm wrote:
           | We're sure to be in an uncanny valley for a little while
           | first, the real question is whether it is followed by a
           | winter or some of that sweet verisimilitude.
        
       | smusamashah wrote:
       | I have found EMO (not open though) [1] to be the best yet.
       | 
       | Look at the rapping example near the end. The lip sync is nearly
       | flawless. The first black and white lady singing is also almost
       | perfect. It even gives them the subtle jerk to pause for
       | breathing. Unless you know and really are looking for flaws, you
       | won't find anything that stands out making them look real.
       | 
       | [1] https://humanaigc.github.io/emote-portrait-alive/
        
         | bamboozled wrote:
         | The black and white lady is nightmare fuel for me personally.
        
           | daveguy wrote:
           | Audrey Hepburn? Was she in a scary movie or play someone
           | scary?
        
             | ziggy_star wrote:
             | She was a very graceful lady in wholesome movies.
             | 
             | The juxtaposition of modern facial expressions of an
             | influencer type singer covering Ed Sheeran at some X factor
             | type television show are what makes it creepy. It is
             | somehow doubly fake and extremely out of character if you
             | are familiar with her.
        
             | cess11 wrote:
             | In the synthetic video she looks like some kind of
             | Frankenstein's monster, brought to life with electrodes or
             | hidden motors, similar to the other video.
             | 
             | Both 'move' in ways that are very unnatural.
        
               | hammyhavoc wrote:
               | Glad it isn't just me.
        
           | smusamashah wrote:
           | What irks you in her? I haven't seen her before at all may be
           | that's the reason I am not seeing anything too strange.
        
         | ziggy_star wrote:
         | You should show this to a woman without explaining. I hesitate
         | to elaborate. Although I guess there is no universal line where
         | the uncanny valley ends as people have different perceptions.
         | 
         | If you really feel like it is flawless and it isn't just an AI
         | believer Rorschach test it is a high watermark of sorts. But it
         | sincerely made my skin crawl.
        
           | smusamashah wrote:
           | Ok. So I showed the first 2 videos to my wife. She noticed
           | the teeth merging looking different each time and then ear.
           | But that was all.
           | 
           | For me, lip sync and body movement is what excited me most.
           | They are closest to real when compared with any similar tech.
        
             | ziggy_star wrote:
             | Crickey. Well I don't know what to think anymore. I guess
             | it got "good enough" for some things. I can still tell.
             | This is going to suck for some people (it feels
             | uncomfortable).
        
               | hammyhavoc wrote:
               | IMO, it sucks for me beyond the level of quality.
               | 
               | For starters, consent is the first problem I have. Yes,
               | lots of examples, but none of the individuals consented
               | to having their likeness used to say things they didn't.
               | Now, abstract this problem of a lack of consent beyond
               | "examples"--the creators of this have no problem with the
               | ethics of not asking for consent, thus the world at large
               | will not either.
               | 
               | Then we have the problem of _how_ it is going to be
               | abused and what problems will exist because of it.
        
         | The_Colonel wrote:
         | It's awesome and I hate it.
         | 
         | This singular peace of technology makes me pessimistic towards
         | the future. Until now, video record was considered to be a very
         | good evidence. Let's say you argue with a person about what X
         | person said. You show them a video and they will be like "ok,
         | he did say that, but...". You could at least set some facts
         | straight, and then discuss the interpretations.
         | 
         | But that will be now gone. You can now generate mass amount of
         | real looking fakes and at the same time label anything you
         | don't like as fake. There's really no independent evidence now,
         | you can only put trust into the medium of your choosing
         | (youtube channel, newspapers, tv station) that they report
         | honestly.
         | 
         | This seems to have only minimal benefits for the society, but
         | huge negatives. But there's no stopping here...
        
           | grepfru_it wrote:
           | This was an inevitable outcome of the advancement of
           | technology. I would argue that we lost trust in all mediums a
           | long time ago it is just now being realized by the masses.
           | 
           | But as usual, we shall adapt and overcome.
        
             | The_Colonel wrote:
             | > But as usual, we shall adapt and overcome.
             | 
             | I don't believe this is a problem we can "overcome". We
             | will need to learn to live with the "alternative facts"
             | being more prominent than now, but I'm not looking forward
             | to it.
        
               | czl wrote:
               | > I don't believe this is a problem we can "overcome".
               | 
               | Digital signatures can not remedy this problem? When you
               | login to your bank how do you know you are logging into
               | your bank? In the future a recording without signatures
               | will be like a bank login without https is today.
        
               | SpicyLemonZest wrote:
               | I know I'm logging into my bank because I initiated the
               | connection, and refuse to believe anyone in any other
               | context who claims to be my bank. People are routinely
               | defrauded by scammers who claim to be their bank, and
               | banks are routinely scammed by people who claim to be an
               | account holder.
        
               | czl wrote:
               | > I know I'm logging into my bank because I initiated the
               | connection, ...
               | 
               | Just because you initiated the connection how do you know
               | the other end is you bank? Do you trust every internet
               | company that carries your packets to the bank? Trust
               | their employees? Trust their security practices? Do you
               | trust firmware on all the devices involved?
               | 
               | > People are routinely defrauded by scammers who claim to
               | be their bank,
               | 
               | I have read about this in the news just like I read about
               | snakes with two heads etc yet I have yet to meet someone
               | that has had this happen to them. What fraction of people
               | that you know have had this happen?
               | 
               | Could it be that these people believe like you do that "I
               | know I'm logging into my bank because I initiated the
               | connection" as opposed to checking the digital signatures
               | on the connection?
        
           | merryje wrote:
           | The Polk County Sheriff's Office recently announced a
           | partnership with Florida Polytechnic University to start
           | working on this, dubbed the Sheriff's Artificial Intelligence
           | Laboratory (SAIL).
           | 
           | https://www.polksheriff.org/news-investigations/polk-
           | county-...
           | 
           | The conference video at 1:00 starts off with a generated clip
           | of Elon Musk saying he's going to move to Polk County. The
           | Sheriff highlights your concerns as well as many others.
           | 
           | Conference video (29:56):
           | https://www.youtube.com/watch?v=DHj18pOcXHc
        
           | jokethrowaway wrote:
           | Why do you care so much about what people said? Before video
           | recording was a thing people didn't have to constantly watch
           | their back and monitor what they were saying in fear of
           | losing their jobs. What happened at a party, stayed at a
           | party.
           | 
           | You may say it's important only for public officials. But why
           | is it important? Because you're giving huge amount of power
           | to single individuals and somehow we're taught that's a good
           | thing - or at least that it's inevitable for keeping peace or
           | to keep crime at bay. What a load of bs. I hope distrust in
           | centralisation increases. It should have been there in the
           | first place.
        
             | czl wrote:
             | You raise a good point yet it is not what people say that
             | matters but what it predicts about them. Modern society is
             | built on trust and the things you want to know but can not
             | observe can often be predicted from what you can observe
             | such as things being said.
        
           | ryandrake wrote:
           | Maybe relying on video "evidence" to prove something, is
           | actually the bug/vulnerability, and this technology will
           | finally "fix" the bug by calling into question all video
           | evidence. I'd rather the tech be widely publicized and out
           | there, so people know it's a thing and can be convinced to
           | disregard video "evidence", than it be kept secret and the
           | public just unknowingly trusting video. Just like people know
           | photoshop is a thing and (hopefully) don't by default believe
           | images they see on the Internet.
        
         | simple10 wrote:
         | Wow! EMO is impressive. Do they plan on open sourcing it?
         | 
         | The page has a link to github[1] right at the top but the repo
         | is basically empty.
         | 
         | https://github.com/HumanAIGC/EMO
        
           | simple10 wrote:
           | Issues comments in EMO repo point to V-Express repo [1] which
           | was released 2 weeks ago and appears to be a fully
           | functioning open source?
           | 
           | [1] https://github.com/tencent-ailab/V-Express
        
       | amelius wrote:
       | Looks like it allows only 1 reference image, so I'm not sure how
       | realistic this is going to be in practice.
        
       | crazygringo wrote:
       | Perhaps realistic physically, but not emotionally.
       | 
       | It's truly bizarre to watch these talking heads because their
       | lips are moving, but their eyes and cheeks aren't moving along
       | with them, except for blinking.
       | 
       | Real people speak with their whole face, not just their lips.
       | 
       | Of course, to do that "right", you need to actually understand
       | the emotional content of what is being spoken. And I'm not
       | talking about highly "emotional" content like in TV drama -- even
       | in a technical presentation, the speaker's face contains lots of
       | emotional signals. Whether warmth, or a sense of humor, or being
       | proud, or excitement of what they're about to reveal, or
       | curiosity about whether the audience understands, etc.
        
         | kleiba wrote:
         | _> It 's truly bizarre to watch these talking heads because
         | their lips are moving, but their eyes and cheeks aren't moving
         | along with them, except for blinking._
         | 
         | Not true?!
         | 
         | Please rewatch the video at the top: especially the eyebrows
         | are quite animated, and the contours of the face also change as
         | well as the shadows that indicate muscle movement in the phase.
        
           | crazygringo wrote:
           | It's just not looking real to me at all. Yeah there's a
           | little bit of random movement, but none of the patterns of
           | movement reflect the ways in which people's faces are
           | actually expressive when speaking.
           | 
           | They look kind of lobotomized, sure with maybe some random
           | eyebrow raises thrown in. It's nothing like whole-face
           | expression.
        
             | ryandrake wrote:
             | I think the point of comparison should be news anchors and
             | other "talking head" media, not real life humans emoting.
             | Maybe it's just me, but people looking directly into and
             | speaking in front of a TV camera about boring things also
             | look lobotomized and uncanny. Check out your local
             | newscasters some time and tell me they're really more
             | realistic than these results.
        
         | czl wrote:
         | You raise good points about the current version of this tech
         | but trained on larger datasets it will likely become _more_
         | emotionally realistic than the average person. Video filters
         | for  "tuning up" emotional expressions will likely be common in
         | the future giving all the emotional range that talented actors
         | have. If you want this ability it may be democratizing but for
         | talented actors it may be dystopian.
        
       | add-sub-mul-div wrote:
       | The AI slop shovelware spam posts will continue until morale
       | improves.
        
       | closedsans wrote:
       | The most comforting thought I've had in a while is that AI just
       | somehow fades away. I'm all for AI and use it regularly, but I
       | can't help but feel uneasy about where this is all headed.
        
       | m3kw9 wrote:
       | Lmao, it is not realistic at all, every movement is grossly
       | exaggerated.
        
       | rhelz wrote:
       | I had logo on my Atari 800 back in '82. After having learned
       | BASIC, it took a while for me to wrap my head around list-
       | oriented programing. After a few weeks of beating my head against
       | the manual, however, I got halfway good at it.
       | 
       | Then, my little brother, 12 years younger than me, in 2nd grade,
       | sits down and 2 days later he's programming circles around me. He
       | could make that atari dance like Fred Astaire.
       | 
       | And I was the one who wanted to be a computer programmer. He had
       | no inclination in that direction at all, and became a
       | businessman.
       | 
       | I learned a very painful and embarrassing lesson about how
       | learning the wrong programming language can give you brain
       | damage.
        
       ___________________________________________________________________
       (page generated 2024-06-16 23:02 UTC)