[HN Gopher] OmniHuman-1: Human Animation Models
___________________________________________________________________
OmniHuman-1: Human Animation Models
Author : fofoz
Score : 125 points
Date : 2025-02-04 10:29 UTC (12 hours ago)
(HTM) web link (omnihuman-lab.github.io)
(TXT) w3m dump (omnihuman-lab.github.io)
| vessenes wrote:
| These look.. great, by and large. Hands are super natural,
| coherency is really high. Showing off piano chord blocking is a
| huge flex.
|
| I'd like to play with this! No code, but bytedance often releases
| models, so I'm hopeful. It's significantly better than vasa, and
| looks likely to be an iteration of that architecture.
| liuliu wrote:
| ByteDance didn't release their text-to-video model, which is
| the base of this work, so I would think unlikely.
| echelon wrote:
| Tencent is releasing a ton of stuff though!
|
| https://aivideo.hunyuan.tencent.com/
|
| Github is overflowing with Tencent, Alibaba, and Ant Group
| models. Typically licensed as Apache 2, and replete with
| pretrained weights and fine tuning scripts.
| liuliu wrote:
| The training process in OmniHuman-1 seems to be
| straightforward to replicate once Tencent releases their
| image-to-video model too.
| echelon wrote:
| T2V is already I2V if you're enterprising enough to open
| up the model and play with the latents. The I2V modality
| is almost just a trick.
| liuliu wrote:
| Yes, the Llava model can encode image, and you can encode
| image into 3D vae space. Without fine-tune the model
| though, you are not going to have fidelity to original
| (if only use Llava's SigLIP to encode), or end up with
| image with limited motion (3D vae encoded latents as the
| first frame then doing vid2vid).
| golol wrote:
| Modern operating systems should include by default a very simple
| private/public key system to sign arbitrary files. I think it
| should not be very complicated? We badly need this in the age of
| AI.
| Ajedi32 wrote:
| How would that help?
| ssalka wrote:
| Auto-watermarking of AI generated content, I would imagine
| Ajedi32 wrote:
| What does that have to do with signing arbitrary files?
| echelon wrote:
| That's too much effort and the use cases are what exactly?
| Helping the prosecution or defense in lawsuits?
|
| People are going to get so used to AI content that it won't
| really matter. Culture is plastic. This will be the new norm.
|
| Capturing photons to send signals is the new butter churning.
| iandanforth wrote:
| Many of these have tells, but this one fully crossed the uncanny
| valley for me. https://www.youtube.com/watch?v=1NU8NzvAxEg&t=16s
|
| Good to know that I need to now assume performances are AI
| generated even if it's not obvious that they are!
| lm28469 wrote:
| With the waxy hair and pulsating microphone ?
| marci wrote:
| On a phone, just scrolling?
| aylmao wrote:
| To be fair, the hair looks quite similar to the original:
| https://www.youtube.com/watch?v=39_OmBO9jVg
| smusamashah wrote:
| What's the tell in this one? https://omnihuman-
| lab.github.io/video/hands2.mp4 or https://omnihuman-
| lab.github.io/video/hands1.mp4
| mrob wrote:
| First video: Disappearing and appearing shirt buttons.
| Disappearing, appearing, and shapeshifting rings. Ear appears
| to be bluescreened despite the rest of the person appearing
| to be in front of a real background. Belt buckle slides
| unnaturally.
|
| Second video: Shadows reveal inconsistent lighting direction.
| Disappearing and appearing studs on the watch strap. It also
| has bizarre clothing design with buttons on a non-opening
| shirt and what seems to be a printed fake weaving pattern
| that doesn't actually correspond to real weaving, but this
| could theoretically be made in reality.
| smusamashah wrote:
| This looks better than EMO (also closed source by Alibaba group
| https://humanaigc.github.io/emote-portrait-alive/). See the rap
| example on their page. They apparently have EMO2 now which
| doesn't look as believable to me.
|
| EMO covers head + shoulders while this OmniHuman-1 is covering
| full body and its looking even better. I would have easily
| mistaken these for real (specially while doom scrolling) if I was
| not looking for AI glitches.
|
| UPDATE: Googling animate bytedance site:github.io returns many in
| the same domain (all proprietry). Found a few good ones.
|
| - https://byteaigc.github.io/X-Portrait2/ Very expressive
| lifelike portrait animations
|
| - https://byteaigc.github.io/x-portrait/ (previous version of the
| same, has source https://github.com/bytedance/X-Portrait)
|
| - https://loopyavatar.github.io/ (portrait animations, looks
| good)
|
| - https://cyberhost.github.io/
|
| - https://grisoon.github.io/INFP/
|
| - https://grisoon.github.io/PersonaTalk/
|
| - https://headgap.github.io/
|
| - https://kebii.github.io/MikuDance/ anime animations
| egnehots wrote:
| this could be used as an incredible low bitrate codec for some
| streaming use cases. (video conferencing/podcasts on <3G for ex,
| just use some keyframes + the audio).
| kiwiguy1 wrote:
| I run youtube channels with almost 2 billion views and this
| actually concerns me. I would love to try this in my
| productions!!
| emsign wrote:
| It looks funny.
| smusamashah wrote:
| What are the tells in most of these videos? I can't point at any
| in many of them. Hands, teeth, lip sync, body and should movement
| all look correct. Specially the TED talk like presentation
| examples near bottom.
| thomastjeffery wrote:
| Try watching them without audio.
|
| They are all yelling. Even the girl with the cat. Too much
| energy. Too much expression. Too much pause. The pacing is all
| the same.
| ggerules wrote:
| This is very good attempt with people playing musical
| instruments.
|
| But, there are some subtle timing tells, that this is AI
| generated. Take a look at the singer playing the piano. Timing of
| the hands with the singer is slightly off. The same goes with the
| singer and the guitar. I'm not a guitar player or piano player,
| but I do play a lot of different musical instruments at a high
| level, and the timing looks off, slightly ahead or behind the
| actual piece of audio of the piece of music.
| mkagenius wrote:
| > Timing of the hands with the singer is slightly off.
|
| Sure, only way is up though. I haven't seen this level realism
| in SORA or the google one. Plus, its synced with audio.
___________________________________________________________________
(page generated 2025-02-04 23:01 UTC)