[HN Gopher] Highly realistic talking head video generation
___________________________________________________________________
Highly realistic talking head video generation
Author : HuiLi1998
Score : 100 points
Date : 2024-06-16 03:04 UTC (19 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| xixixao wrote:
| It's getting there, but you can still see that certain sounds are
| not realistically rendered. For example V in "vengence" the
| bottom lip has to touch the upper teeth, but in the renderings
| it's only approximated.
| Anotheroneagain wrote:
| I just have to think about the implications of going the other
| way round.
| Loughla wrote:
| I don't know what that means. What do you mean?
| CoastalCoder wrote:
| Automated lip-reading?
|
| Ugh... paired with laws that restrict audio recording but not
| video recording.
| amelius wrote:
| It means that soon we'll all be wearing burqas to mask our
| lips.
| 123yawaworht456 wrote:
| have you overslept 2020-2022?
| amelius wrote:
| I could never stand the ear-straps.
| 42lux wrote:
| It's still on the wav2lip[1] level from 5 years ago (look at the
| teeth) just slightly higher resolution. The only real player with
| a moat is probably flawlessai. [2]
|
| [1] https://github.com/Rudrabha/Wav2Lip
|
| [2] https://www.flawlessai.com/
| HuiLi1998 wrote:
| Actually, our work not only targets lip movements, it can also
| produce more realistic head movements and facial muscle
| movements. ps: It also achieved a good syncnnet score on 200
| randomly collected wild image-audio pair. And it is open
| sourced.:)
| 42lux wrote:
| I get exactly what you are doing but I don't understand what
| would be novel about it? It really looks just like a simple
| SD/wav2vec/insightface/animatediff pipeline, everyone can
| plug together in comfy or with diffusers. The muscle claim is
| also a bit dubious...
|
| PS: With insightface models in your reqs the oss aspect is
| also pretty much void for any use other than research and
| your readme should reflect that.
| lossolo wrote:
| What's SOTA open source?
| Loughla wrote:
| The flawlessai is interesting, but even in their formal example
| for marketing there are weird tells. The actresses eyes are too
| jerky and weird during the changed sections. It's strange, but
| in a way that I'm not sure I could tag without the context of
| the rest of the video.
|
| Is this what we're in for? Human movement that is just normal
| enough to pass, but not really natural enough to be comfortable
| with?
| kevindamm wrote:
| We're sure to be in an uncanny valley for a little while
| first, the real question is whether it is followed by a
| winter or some of that sweet verisimilitude.
| smusamashah wrote:
| I have found EMO (not open though) [1] to be the best yet.
|
| Look at the rapping example near the end. The lip sync is nearly
| flawless. The first black and white lady singing is also almost
| perfect. It even gives them the subtle jerk to pause for
| breathing. Unless you know and really are looking for flaws, you
| won't find anything that stands out making them look real.
|
| [1] https://humanaigc.github.io/emote-portrait-alive/
| bamboozled wrote:
| The black and white lady is nightmare fuel for me personally.
| daveguy wrote:
| Audrey Hepburn? Was she in a scary movie or play someone
| scary?
| ziggy_star wrote:
| She was a very graceful lady in wholesome movies.
|
| The juxtaposition of modern facial expressions of an
| influencer type singer covering Ed Sheeran at some X factor
| type television show are what makes it creepy. It is
| somehow doubly fake and extremely out of character if you
| are familiar with her.
| cess11 wrote:
| In the synthetic video she looks like some kind of
| Frankenstein's monster, brought to life with electrodes or
| hidden motors, similar to the other video.
|
| Both 'move' in ways that are very unnatural.
| hammyhavoc wrote:
| Glad it isn't just me.
| smusamashah wrote:
| What irks you in her? I haven't seen her before at all may be
| that's the reason I am not seeing anything too strange.
| ziggy_star wrote:
| You should show this to a woman without explaining. I hesitate
| to elaborate. Although I guess there is no universal line where
| the uncanny valley ends as people have different perceptions.
|
| If you really feel like it is flawless and it isn't just an AI
| believer Rorschach test it is a high watermark of sorts. But it
| sincerely made my skin crawl.
| smusamashah wrote:
| Ok. So I showed the first 2 videos to my wife. She noticed
| the teeth merging looking different each time and then ear.
| But that was all.
|
| For me, lip sync and body movement is what excited me most.
| They are closest to real when compared with any similar tech.
| ziggy_star wrote:
| Crickey. Well I don't know what to think anymore. I guess
| it got "good enough" for some things. I can still tell.
| This is going to suck for some people (it feels
| uncomfortable).
| hammyhavoc wrote:
| IMO, it sucks for me beyond the level of quality.
|
| For starters, consent is the first problem I have. Yes,
| lots of examples, but none of the individuals consented
| to having their likeness used to say things they didn't.
| Now, abstract this problem of a lack of consent beyond
| "examples"--the creators of this have no problem with the
| ethics of not asking for consent, thus the world at large
| will not either.
|
| Then we have the problem of _how_ it is going to be
| abused and what problems will exist because of it.
| The_Colonel wrote:
| It's awesome and I hate it.
|
| This singular peace of technology makes me pessimistic towards
| the future. Until now, video record was considered to be a very
| good evidence. Let's say you argue with a person about what X
| person said. You show them a video and they will be like "ok,
| he did say that, but...". You could at least set some facts
| straight, and then discuss the interpretations.
|
| But that will be now gone. You can now generate mass amount of
| real looking fakes and at the same time label anything you
| don't like as fake. There's really no independent evidence now,
| you can only put trust into the medium of your choosing
| (youtube channel, newspapers, tv station) that they report
| honestly.
|
| This seems to have only minimal benefits for the society, but
| huge negatives. But there's no stopping here...
| grepfru_it wrote:
| This was an inevitable outcome of the advancement of
| technology. I would argue that we lost trust in all mediums a
| long time ago it is just now being realized by the masses.
|
| But as usual, we shall adapt and overcome.
| The_Colonel wrote:
| > But as usual, we shall adapt and overcome.
|
| I don't believe this is a problem we can "overcome". We
| will need to learn to live with the "alternative facts"
| being more prominent than now, but I'm not looking forward
| to it.
| czl wrote:
| > I don't believe this is a problem we can "overcome".
|
| Digital signatures can not remedy this problem? When you
| login to your bank how do you know you are logging into
| your bank? In the future a recording without signatures
| will be like a bank login without https is today.
| SpicyLemonZest wrote:
| I know I'm logging into my bank because I initiated the
| connection, and refuse to believe anyone in any other
| context who claims to be my bank. People are routinely
| defrauded by scammers who claim to be their bank, and
| banks are routinely scammed by people who claim to be an
| account holder.
| czl wrote:
| > I know I'm logging into my bank because I initiated the
| connection, ...
|
| Just because you initiated the connection how do you know
| the other end is you bank? Do you trust every internet
| company that carries your packets to the bank? Trust
| their employees? Trust their security practices? Do you
| trust firmware on all the devices involved?
|
| > People are routinely defrauded by scammers who claim to
| be their bank,
|
| I have read about this in the news just like I read about
| snakes with two heads etc yet I have yet to meet someone
| that has had this happen to them. What fraction of people
| that you know have had this happen?
|
| Could it be that these people believe like you do that "I
| know I'm logging into my bank because I initiated the
| connection" as opposed to checking the digital signatures
| on the connection?
| merryje wrote:
| The Polk County Sheriff's Office recently announced a
| partnership with Florida Polytechnic University to start
| working on this, dubbed the Sheriff's Artificial Intelligence
| Laboratory (SAIL).
|
| https://www.polksheriff.org/news-investigations/polk-
| county-...
|
| The conference video at 1:00 starts off with a generated clip
| of Elon Musk saying he's going to move to Polk County. The
| Sheriff highlights your concerns as well as many others.
|
| Conference video (29:56):
| https://www.youtube.com/watch?v=DHj18pOcXHc
| jokethrowaway wrote:
| Why do you care so much about what people said? Before video
| recording was a thing people didn't have to constantly watch
| their back and monitor what they were saying in fear of
| losing their jobs. What happened at a party, stayed at a
| party.
|
| You may say it's important only for public officials. But why
| is it important? Because you're giving huge amount of power
| to single individuals and somehow we're taught that's a good
| thing - or at least that it's inevitable for keeping peace or
| to keep crime at bay. What a load of bs. I hope distrust in
| centralisation increases. It should have been there in the
| first place.
| czl wrote:
| You raise a good point yet it is not what people say that
| matters but what it predicts about them. Modern society is
| built on trust and the things you want to know but can not
| observe can often be predicted from what you can observe
| such as things being said.
| ryandrake wrote:
| Maybe relying on video "evidence" to prove something, is
| actually the bug/vulnerability, and this technology will
| finally "fix" the bug by calling into question all video
| evidence. I'd rather the tech be widely publicized and out
| there, so people know it's a thing and can be convinced to
| disregard video "evidence", than it be kept secret and the
| public just unknowingly trusting video. Just like people know
| photoshop is a thing and (hopefully) don't by default believe
| images they see on the Internet.
| simple10 wrote:
| Wow! EMO is impressive. Do they plan on open sourcing it?
|
| The page has a link to github[1] right at the top but the repo
| is basically empty.
|
| https://github.com/HumanAIGC/EMO
| simple10 wrote:
| Issues comments in EMO repo point to V-Express repo [1] which
| was released 2 weeks ago and appears to be a fully
| functioning open source?
|
| [1] https://github.com/tencent-ailab/V-Express
| amelius wrote:
| Looks like it allows only 1 reference image, so I'm not sure how
| realistic this is going to be in practice.
| crazygringo wrote:
| Perhaps realistic physically, but not emotionally.
|
| It's truly bizarre to watch these talking heads because their
| lips are moving, but their eyes and cheeks aren't moving along
| with them, except for blinking.
|
| Real people speak with their whole face, not just their lips.
|
| Of course, to do that "right", you need to actually understand
| the emotional content of what is being spoken. And I'm not
| talking about highly "emotional" content like in TV drama -- even
| in a technical presentation, the speaker's face contains lots of
| emotional signals. Whether warmth, or a sense of humor, or being
| proud, or excitement of what they're about to reveal, or
| curiosity about whether the audience understands, etc.
| kleiba wrote:
| _> It 's truly bizarre to watch these talking heads because
| their lips are moving, but their eyes and cheeks aren't moving
| along with them, except for blinking._
|
| Not true?!
|
| Please rewatch the video at the top: especially the eyebrows
| are quite animated, and the contours of the face also change as
| well as the shadows that indicate muscle movement in the phase.
| crazygringo wrote:
| It's just not looking real to me at all. Yeah there's a
| little bit of random movement, but none of the patterns of
| movement reflect the ways in which people's faces are
| actually expressive when speaking.
|
| They look kind of lobotomized, sure with maybe some random
| eyebrow raises thrown in. It's nothing like whole-face
| expression.
| ryandrake wrote:
| I think the point of comparison should be news anchors and
| other "talking head" media, not real life humans emoting.
| Maybe it's just me, but people looking directly into and
| speaking in front of a TV camera about boring things also
| look lobotomized and uncanny. Check out your local
| newscasters some time and tell me they're really more
| realistic than these results.
| czl wrote:
| You raise good points about the current version of this tech
| but trained on larger datasets it will likely become _more_
| emotionally realistic than the average person. Video filters
| for "tuning up" emotional expressions will likely be common in
| the future giving all the emotional range that talented actors
| have. If you want this ability it may be democratizing but for
| talented actors it may be dystopian.
| add-sub-mul-div wrote:
| The AI slop shovelware spam posts will continue until morale
| improves.
| closedsans wrote:
| The most comforting thought I've had in a while is that AI just
| somehow fades away. I'm all for AI and use it regularly, but I
| can't help but feel uneasy about where this is all headed.
| m3kw9 wrote:
| Lmao, it is not realistic at all, every movement is grossly
| exaggerated.
| rhelz wrote:
| I had logo on my Atari 800 back in '82. After having learned
| BASIC, it took a while for me to wrap my head around list-
| oriented programing. After a few weeks of beating my head against
| the manual, however, I got halfway good at it.
|
| Then, my little brother, 12 years younger than me, in 2nd grade,
| sits down and 2 days later he's programming circles around me. He
| could make that atari dance like Fred Astaire.
|
| And I was the one who wanted to be a computer programmer. He had
| no inclination in that direction at all, and became a
| businessman.
|
| I learned a very painful and embarrassing lesson about how
| learning the wrong programming language can give you brain
| damage.
___________________________________________________________________
(page generated 2024-06-16 23:02 UTC)