[HN Gopher] Generating Expressive Portrait Videos with Audio2Video
___________________________________________________________________
Generating Expressive Portrait Videos with Audio2Video
Author : hackerlight
Score : 57 points
Date : 2024-02-28 02:49 UTC (20 hours ago)
(HTM) web link (humanaigc.github.io)
(TXT) w3m dump (humanaigc.github.io)
| artninja1988 wrote:
| Looks like it's from some of the same authors as this paper:
| https://github.com/HumanAIGC/AnimateAnyone . Which sadly never
| released the code or model.
| noduerme wrote:
| Coming soon to a political scandal near you...
| ashildr wrote:
| This is becoming frightening now. Maybe we need to start to
| cryptographically sign original sources and require others to do
| so to prove if something is a real recording? I can spot some
| details that don't seem to be moving 'right' according to
| physics, but this is extremely convincing.
| ShamelessC wrote:
| Gets mentioned a lot. Typical response is "take a
| picture/video/recording of the video with your own crypto
| signature". Thwarted.
| spoonjim wrote:
| If LIDAR data is included, faking it is significantly harder.
| ShamelessC wrote:
| It will never work so long as edge cases appear. In
| reality, the field of journalism will expand their vetting
| processes as best as they can. Otherwise, we're screwed and
| will have to live with the consequences (which may be
| overblown).
| ashildr wrote:
| The owners of "journalism" already decided that vetting
| is irrelevant and costs their money, so they have been
| mostly routing around that part of the process for quite
| a while. Also: Quite a large number of people already
| left common reality and followed propaganda into cloud-
| cuckoo-land. But right now it would be still possible to
| in theory disprove a lot of the nonsense. This will
| completely change and the consequences can't be overlown.
| I think that we need an automated vetting infrastructure,
| a kind of web of trust for recorded media, that helps to
| track down a recording to the original source and decide
| whether we trust the path that is leading there. But I
| don't have any hope. We already have fake phone calls
| from "the president" and doctored photos, imagine what
| the next elections will look like. Hunter Biden will
| personally admit having a threesome with the Hillary and
| Soros, sponsored by RTV.
| HeatrayEnjoyer wrote:
| Just train a model to generate LIDAR data
| ashildr wrote:
| By using some tricks we already have computer generated
| 3D models. I don't see why we would not get 3D models of
| complete videos, toon, especially if state actors with a
| lot of money are interested in that.
| johnfernow wrote:
| On smartphones at least, you could require users to use a
| specific app to establish the authenticity of the
| footage. You could have a code hash signature for the app
| to compare against the current hash and make sure the
| compiled code was not altered. The app can be open source
| so people can trust it and compile it themselves, but if
| the hash doesn't match the videos would be considered
| untrusted. You would also have to take measures to ensure
| that there aren't runtime modifications -- a difficult
| thing to accomplish for sure, but something some
| companies are getting increasingly good at.
|
| In addition to LIDAR data, throw in gyroscope data (would
| make recording a screen more obvious) and GPS data (would
| need the screen where you say you are -- would also need
| to make sure device is not rooted or jailbroken to
| prevent spoofing of GPS) and it becomes even more
| challenging to fake a video. I think securing the app
| against modification or runtime injections is probably
| the biggest point of focus, but even if you were able to
| defeat all those measures, you'd still need to have
| models generate convincing LIDAR, gyroscope and GPS data.
| Not impossible of course, but at that point you need a
| rooted phone that is able to successfully hide that it is
| rooted, several well trained models, and the ability to
| defeat the video app's own security measures against both
| binary and runtime modifications.
|
| On a technical level, it may not be possible to develop
| an app that records videos with LIDAR, gyroscope and GPS
| data that could not be fooled by recording a screen. In
| practice, I think it is possible to develop an app that
| could establish the authenticity of videos that nearly
| everyone except maybe state actors would be incapable of
| defeating (maybe the world's richest corporations might
| also have the funds to do so, though I think the odds of
| a whistleblower or leaks is a bit higher there -- maybe
| Microsoft could gather a team of highly talented AI
| developers to generate fake videos passed as real ones
| without notice, but I think the likelihood of not a
| single one of the employees revealing that info to
| journalists, the government or public is low.)
|
| I share many concerns about device attestation, as it has
| potential to limit a user's freedom to do what they want
| with the hardware that they bought and software and
| services that they pay for. That said, if I really was
| dying to use a rooted phone (I'm not currently, due to it
| being more hassle than it's worth), I wouldn't mind
| buying a second, heavily locked down device for proving
| videos I record are real. The device could even be
| powered by entirely open source software, but have a hash
| for the compiled ISO that is used to install the OS.
| ashildr wrote:
| Yes, but we don't have an accessible vetting infrastructure
| for this. I don't know anything about "ShamelessC" so I have
| no idea if a video of Hillary Clinton drinking the blood of
| children should be trusted or not.
| jerpint wrote:
| Perhaps signing any kind of upload or content with a GPG key
| that proves your identity?
| ashildr wrote:
| Maybe someone more on the autistic spectrum here can tell if they
| feel that the facial expressions are "right"? The examples are
| too convincing to my brain to properly focus on analyzing that
| part.
| smusamashah wrote:
| Not autistic or anything but I got motion sick from Sora
| generated videos. So much that I don't want to look at most pf
| them again, at least on Sora page in original quality.
|
| I could see things getting blurry and some blending. The very
| last video on the page has teeth issue. Expressions were as
| good as an actor for me and lip movement looked just perfect.
| CSSer wrote:
| This is impressive overall, but I wonder if someone can provide
| some insight into, or confirmation of, an observation of mine.
|
| I'm having a hard time describing it, but there's a stiff, shiny,
| and plastic feel to a lot of these. You can really see it in the
| video of the woman wearing sunglasses. No one can, or at least
| likely wouldn't, hold their eyebrow muscles that stiff for that
| long. Combined with the head and mouth movements, the
| expressiveness just feels off, almost animatronic. Does anyone
| else get that?
| unraveller wrote:
| If you can describe it that just means the next version can
| avoid it. But what you describe sounds like modern soap operas
| which are plastic surgery expositions.
|
| The last mona lisa doing shakespeare and the last ai girl doing
| multiple voices are extremely convincing, I think you over
| estimate just how intensely we look at things normally after we
| accept them. This is great for many things right now and will
| be greater once they can perfect all the subtle movements that
| make video worth watching over audio (easier information
| absorption and recall).
| smusamashah wrote:
| This is amazing, even the breathing can be seen. Next big step
| (or may be its done already) is to generate expressive audio and
| we are all set for a generated model you can have a video call
| with.
___________________________________________________________________
(page generated 2024-02-28 23:02 UTC)