[HN Gopher] OmniHuman-1: Human Animation Models
       ___________________________________________________________________
        
       OmniHuman-1: Human Animation Models
        
       Author : fofoz
       Score  : 125 points
       Date   : 2025-02-04 10:29 UTC (12 hours ago)
        
 (HTM) web link (omnihuman-lab.github.io)
 (TXT) w3m dump (omnihuman-lab.github.io)
        
       | vessenes wrote:
       | These look.. great, by and large. Hands are super natural,
       | coherency is really high. Showing off piano chord blocking is a
       | huge flex.
       | 
       | I'd like to play with this! No code, but bytedance often releases
       | models, so I'm hopeful. It's significantly better than vasa, and
       | looks likely to be an iteration of that architecture.
        
         | liuliu wrote:
         | ByteDance didn't release their text-to-video model, which is
         | the base of this work, so I would think unlikely.
        
           | echelon wrote:
           | Tencent is releasing a ton of stuff though!
           | 
           | https://aivideo.hunyuan.tencent.com/
           | 
           | Github is overflowing with Tencent, Alibaba, and Ant Group
           | models. Typically licensed as Apache 2, and replete with
           | pretrained weights and fine tuning scripts.
        
             | liuliu wrote:
             | The training process in OmniHuman-1 seems to be
             | straightforward to replicate once Tencent releases their
             | image-to-video model too.
        
               | echelon wrote:
               | T2V is already I2V if you're enterprising enough to open
               | up the model and play with the latents. The I2V modality
               | is almost just a trick.
        
               | liuliu wrote:
               | Yes, the Llava model can encode image, and you can encode
               | image into 3D vae space. Without fine-tune the model
               | though, you are not going to have fidelity to original
               | (if only use Llava's SigLIP to encode), or end up with
               | image with limited motion (3D vae encoded latents as the
               | first frame then doing vid2vid).
        
       | golol wrote:
       | Modern operating systems should include by default a very simple
       | private/public key system to sign arbitrary files. I think it
       | should not be very complicated? We badly need this in the age of
       | AI.
        
         | Ajedi32 wrote:
         | How would that help?
        
           | ssalka wrote:
           | Auto-watermarking of AI generated content, I would imagine
        
             | Ajedi32 wrote:
             | What does that have to do with signing arbitrary files?
        
         | echelon wrote:
         | That's too much effort and the use cases are what exactly?
         | Helping the prosecution or defense in lawsuits?
         | 
         | People are going to get so used to AI content that it won't
         | really matter. Culture is plastic. This will be the new norm.
         | 
         | Capturing photons to send signals is the new butter churning.
        
       | iandanforth wrote:
       | Many of these have tells, but this one fully crossed the uncanny
       | valley for me. https://www.youtube.com/watch?v=1NU8NzvAxEg&t=16s
       | 
       | Good to know that I need to now assume performances are AI
       | generated even if it's not obvious that they are!
        
         | lm28469 wrote:
         | With the waxy hair and pulsating microphone ?
        
           | marci wrote:
           | On a phone, just scrolling?
        
           | aylmao wrote:
           | To be fair, the hair looks quite similar to the original:
           | https://www.youtube.com/watch?v=39_OmBO9jVg
        
         | smusamashah wrote:
         | What's the tell in this one? https://omnihuman-
         | lab.github.io/video/hands2.mp4 or https://omnihuman-
         | lab.github.io/video/hands1.mp4
        
           | mrob wrote:
           | First video: Disappearing and appearing shirt buttons.
           | Disappearing, appearing, and shapeshifting rings. Ear appears
           | to be bluescreened despite the rest of the person appearing
           | to be in front of a real background. Belt buckle slides
           | unnaturally.
           | 
           | Second video: Shadows reveal inconsistent lighting direction.
           | Disappearing and appearing studs on the watch strap. It also
           | has bizarre clothing design with buttons on a non-opening
           | shirt and what seems to be a printed fake weaving pattern
           | that doesn't actually correspond to real weaving, but this
           | could theoretically be made in reality.
        
       | smusamashah wrote:
       | This looks better than EMO (also closed source by Alibaba group
       | https://humanaigc.github.io/emote-portrait-alive/). See the rap
       | example on their page. They apparently have EMO2 now which
       | doesn't look as believable to me.
       | 
       | EMO covers head + shoulders while this OmniHuman-1 is covering
       | full body and its looking even better. I would have easily
       | mistaken these for real (specially while doom scrolling) if I was
       | not looking for AI glitches.
       | 
       | UPDATE: Googling animate bytedance site:github.io returns many in
       | the same domain (all proprietry). Found a few good ones.
       | 
       | - https://byteaigc.github.io/X-Portrait2/ Very expressive
       | lifelike portrait animations
       | 
       | - https://byteaigc.github.io/x-portrait/ (previous version of the
       | same, has source https://github.com/bytedance/X-Portrait)
       | 
       | - https://loopyavatar.github.io/ (portrait animations, looks
       | good)
       | 
       | - https://cyberhost.github.io/
       | 
       | - https://grisoon.github.io/INFP/
       | 
       | - https://grisoon.github.io/PersonaTalk/
       | 
       | - https://headgap.github.io/
       | 
       | - https://kebii.github.io/MikuDance/ anime animations
        
       | egnehots wrote:
       | this could be used as an incredible low bitrate codec for some
       | streaming use cases. (video conferencing/podcasts on <3G for ex,
       | just use some keyframes + the audio).
        
       | kiwiguy1 wrote:
       | I run youtube channels with almost 2 billion views and this
       | actually concerns me. I would love to try this in my
       | productions!!
        
       | emsign wrote:
       | It looks funny.
        
       | smusamashah wrote:
       | What are the tells in most of these videos? I can't point at any
       | in many of them. Hands, teeth, lip sync, body and should movement
       | all look correct. Specially the TED talk like presentation
       | examples near bottom.
        
         | thomastjeffery wrote:
         | Try watching them without audio.
         | 
         | They are all yelling. Even the girl with the cat. Too much
         | energy. Too much expression. Too much pause. The pacing is all
         | the same.
        
       | ggerules wrote:
       | This is very good attempt with people playing musical
       | instruments.
       | 
       | But, there are some subtle timing tells, that this is AI
       | generated. Take a look at the singer playing the piano. Timing of
       | the hands with the singer is slightly off. The same goes with the
       | singer and the guitar. I'm not a guitar player or piano player,
       | but I do play a lot of different musical instruments at a high
       | level, and the timing looks off, slightly ahead or behind the
       | actual piece of audio of the piece of music.
        
         | mkagenius wrote:
         | > Timing of the hands with the singer is slightly off.
         | 
         | Sure, only way is up though. I haven't seen this level realism
         | in SORA or the google one. Plus, its synced with audio.
        
       ___________________________________________________________________
       (page generated 2025-02-04 23:01 UTC)