[HN Gopher] Generating Expressive Portrait Videos with Audio2Video
       ___________________________________________________________________
        
       Generating Expressive Portrait Videos with Audio2Video
        
       Author : hackerlight
       Score  : 57 points
       Date   : 2024-02-28 02:49 UTC (20 hours ago)
        
 (HTM) web link (humanaigc.github.io)
 (TXT) w3m dump (humanaigc.github.io)
        
       | artninja1988 wrote:
       | Looks like it's from some of the same authors as this paper:
       | https://github.com/HumanAIGC/AnimateAnyone . Which sadly never
       | released the code or model.
        
       | noduerme wrote:
       | Coming soon to a political scandal near you...
        
       | ashildr wrote:
       | This is becoming frightening now. Maybe we need to start to
       | cryptographically sign original sources and require others to do
       | so to prove if something is a real recording? I can spot some
       | details that don't seem to be moving 'right' according to
       | physics, but this is extremely convincing.
        
         | ShamelessC wrote:
         | Gets mentioned a lot. Typical response is "take a
         | picture/video/recording of the video with your own crypto
         | signature". Thwarted.
        
           | spoonjim wrote:
           | If LIDAR data is included, faking it is significantly harder.
        
             | ShamelessC wrote:
             | It will never work so long as edge cases appear. In
             | reality, the field of journalism will expand their vetting
             | processes as best as they can. Otherwise, we're screwed and
             | will have to live with the consequences (which may be
             | overblown).
        
               | ashildr wrote:
               | The owners of "journalism" already decided that vetting
               | is irrelevant and costs their money, so they have been
               | mostly routing around that part of the process for quite
               | a while. Also: Quite a large number of people already
               | left common reality and followed propaganda into cloud-
               | cuckoo-land. But right now it would be still possible to
               | in theory disprove a lot of the nonsense. This will
               | completely change and the consequences can't be overlown.
               | I think that we need an automated vetting infrastructure,
               | a kind of web of trust for recorded media, that helps to
               | track down a recording to the original source and decide
               | whether we trust the path that is leading there. But I
               | don't have any hope. We already have fake phone calls
               | from "the president" and doctored photos, imagine what
               | the next elections will look like. Hunter Biden will
               | personally admit having a threesome with the Hillary and
               | Soros, sponsored by RTV.
        
             | HeatrayEnjoyer wrote:
             | Just train a model to generate LIDAR data
        
               | ashildr wrote:
               | By using some tricks we already have computer generated
               | 3D models. I don't see why we would not get 3D models of
               | complete videos, toon, especially if state actors with a
               | lot of money are interested in that.
        
               | johnfernow wrote:
               | On smartphones at least, you could require users to use a
               | specific app to establish the authenticity of the
               | footage. You could have a code hash signature for the app
               | to compare against the current hash and make sure the
               | compiled code was not altered. The app can be open source
               | so people can trust it and compile it themselves, but if
               | the hash doesn't match the videos would be considered
               | untrusted. You would also have to take measures to ensure
               | that there aren't runtime modifications -- a difficult
               | thing to accomplish for sure, but something some
               | companies are getting increasingly good at.
               | 
               | In addition to LIDAR data, throw in gyroscope data (would
               | make recording a screen more obvious) and GPS data (would
               | need the screen where you say you are -- would also need
               | to make sure device is not rooted or jailbroken to
               | prevent spoofing of GPS) and it becomes even more
               | challenging to fake a video. I think securing the app
               | against modification or runtime injections is probably
               | the biggest point of focus, but even if you were able to
               | defeat all those measures, you'd still need to have
               | models generate convincing LIDAR, gyroscope and GPS data.
               | Not impossible of course, but at that point you need a
               | rooted phone that is able to successfully hide that it is
               | rooted, several well trained models, and the ability to
               | defeat the video app's own security measures against both
               | binary and runtime modifications.
               | 
               | On a technical level, it may not be possible to develop
               | an app that records videos with LIDAR, gyroscope and GPS
               | data that could not be fooled by recording a screen. In
               | practice, I think it is possible to develop an app that
               | could establish the authenticity of videos that nearly
               | everyone except maybe state actors would be incapable of
               | defeating (maybe the world's richest corporations might
               | also have the funds to do so, though I think the odds of
               | a whistleblower or leaks is a bit higher there -- maybe
               | Microsoft could gather a team of highly talented AI
               | developers to generate fake videos passed as real ones
               | without notice, but I think the likelihood of not a
               | single one of the employees revealing that info to
               | journalists, the government or public is low.)
               | 
               | I share many concerns about device attestation, as it has
               | potential to limit a user's freedom to do what they want
               | with the hardware that they bought and software and
               | services that they pay for. That said, if I really was
               | dying to use a rooted phone (I'm not currently, due to it
               | being more hassle than it's worth), I wouldn't mind
               | buying a second, heavily locked down device for proving
               | videos I record are real. The device could even be
               | powered by entirely open source software, but have a hash
               | for the compiled ISO that is used to install the OS.
        
           | ashildr wrote:
           | Yes, but we don't have an accessible vetting infrastructure
           | for this. I don't know anything about "ShamelessC" so I have
           | no idea if a video of Hillary Clinton drinking the blood of
           | children should be trusted or not.
        
         | jerpint wrote:
         | Perhaps signing any kind of upload or content with a GPG key
         | that proves your identity?
        
       | ashildr wrote:
       | Maybe someone more on the autistic spectrum here can tell if they
       | feel that the facial expressions are "right"? The examples are
       | too convincing to my brain to properly focus on analyzing that
       | part.
        
         | smusamashah wrote:
         | Not autistic or anything but I got motion sick from Sora
         | generated videos. So much that I don't want to look at most pf
         | them again, at least on Sora page in original quality.
         | 
         | I could see things getting blurry and some blending. The very
         | last video on the page has teeth issue. Expressions were as
         | good as an actor for me and lip movement looked just perfect.
        
       | CSSer wrote:
       | This is impressive overall, but I wonder if someone can provide
       | some insight into, or confirmation of, an observation of mine.
       | 
       | I'm having a hard time describing it, but there's a stiff, shiny,
       | and plastic feel to a lot of these. You can really see it in the
       | video of the woman wearing sunglasses. No one can, or at least
       | likely wouldn't, hold their eyebrow muscles that stiff for that
       | long. Combined with the head and mouth movements, the
       | expressiveness just feels off, almost animatronic. Does anyone
       | else get that?
        
         | unraveller wrote:
         | If you can describe it that just means the next version can
         | avoid it. But what you describe sounds like modern soap operas
         | which are plastic surgery expositions.
         | 
         | The last mona lisa doing shakespeare and the last ai girl doing
         | multiple voices are extremely convincing, I think you over
         | estimate just how intensely we look at things normally after we
         | accept them. This is great for many things right now and will
         | be greater once they can perfect all the subtle movements that
         | make video worth watching over audio (easier information
         | absorption and recall).
        
       | smusamashah wrote:
       | This is amazing, even the breathing can be seen. Next big step
       | (or may be its done already) is to generate expressive audio and
       | we are all set for a generated model you can have a video call
       | with.
        
       ___________________________________________________________________
       (page generated 2024-02-28 23:02 UTC)