[HN Gopher] Audio2Photoreal
___________________________________________________________________
Audio2Photoreal
Author : wildpeaks
Score : 66 points
Date : 2024-01-04 18:03 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| ilaksh wrote:
| That's amazing. It's a non-commercial license though.
|
| How feasible is it to imitate what this model and codebase is
| doing to use it in a commercial capacity?
|
| Did they release the dataset?
|
| It would also be nice if Facebook would consider making an API to
| give Heygen and Diarupt some competition, if they aren't going to
| allow commercial use.
|
| Although there will probably be a bunch of people who become
| millionaires using this for their porn gf bot service who just
| don't care about license restrictions.
| pseudosavant wrote:
| Like the rest of Facebook's AI research... I find this
| underwhelming. Not even good enough to trigger uncanny valley
| issues.
| dtauzell wrote:
| Are there some similar models that are currently better?
| pseudosavant wrote:
| I don't know, but I can't imagine having this as a feature in
| any app (Zoom, etc) and leaving it on. That is how most of
| FB's AI research seems. Not good enough to make into a real
| product or feature.
| TaylorAlexander wrote:
| The nature of this type of research is that there are long
| term goals which are currently unachievable with no clear
| concept for how to approach them, so researchers need to
| start putting small pieces together and working out how to
| make it all work smoothly as a single concept. It looks
| like someone had a neural network for mouth movement.
| Someone had one for body movement, etc. Composing multiple
| systems in to one teaches us how we can approach more
| complex problems and how to better tie things together than
| just inserting the output of one in to the input of
| another.
|
| Long term this type of work helps solve big problems even
| if the intermediate steps don't produce exciting results.
|
| As an example, early image generators were pretty
| uninteresting but today they are widely utilized and
| generally considered impressive. The thing that researchers
| in the field know that the public doesn't is that there's
| 100 boring steps before the exciting release, and some of
| the boring steps are very exciting on a technical level.
| Those intermediate achievements represent 99% of what
| machine learning research actually is and others in the
| field appreciate those works.
| echelon wrote:
| Also CC-NC. They want free feedback, but won't let you use it
| to make anything yourself.
| smusamashah wrote:
| This is amazing if used in games. Game designer can easily
| create realistic body movement just using audio.
| ArekDymalski wrote:
| Impressive. Even at current state it would make RPGs like Fallout
| or Skyrim sooo much more alive ...
| aantix wrote:
| Why would we want an avatar vs a real video stream of the actual
| person?
| kuschku wrote:
| Being able to have an avatar that fits your voice without
| having to actually look like that has many applications.
|
| Whether you're trans or you just want to join a video call
| early in the morning without dressing up, the applications are
| endless.
|
| In many situations we demand that people dress or present a
| certain way, just out of bullshit social expectations. This is
| one way to eat your cake and have it too.
| zamadatix wrote:
| For those use cases you should be able to get much more
| accurate results using a base video stream. This more fits
| use cases where you're lacking a video stream but not
| necessarily because you just don't want to turn it on.
| kridsdale1 wrote:
| A video stream isn't volumetric.
|
| This is for the metaverse.
| bigfishrunning wrote:
| You could generate the avatar clientside and save a ton of
| bandwidth vs a compressed video stream...
| esafak wrote:
| Old recordings of people without pictures, for one!
| zamadatix wrote:
| Given it's by meta I'm guessing it's related to their metaverse
| goals.
| RobCodeSlayer wrote:
| I'm imaging video game applications where the avatars are
| controlled by both online users and LLMs
| plaguuuuuu wrote:
| Either games or its just interesting research that mostly ties
| in with what FB is doing. Cause there are problems like, e.g.
| imagine the bandwidth requirement of streaming 3D copies of
| like 20 people in a room
|
| it's simply not possible within the near future, even today
| zoom/teams video conferencing is somehow highly compressed and
| shit quality with just low res 2D video.
| leshokunin wrote:
| Pretty cool. It's going to take a while to make it into a usable
| product though. Having conversations with people flailing their
| hands algorithmically is going to feel weird until it gets more
| natural. Right now it feels like those "blink every n" scripts.
| kridsdale1 wrote:
| Every video game NPC is basically following such an algorithm.
| CrzyLngPwd wrote:
| It's really impressive.
|
| I wonder where it is headed.
| aaroninsf wrote:
| Below the right wing, the world famous Uncanny Valley of Menlo
| Park, one of the seven blunders of the natural world.
| kridsdale1 wrote:
| Goddamn that's cool.
|
| End-state for Winamp vizualizers: synthesize an entire living
| world from the audio alone.
___________________________________________________________________
(page generated 2024-01-04 23:00 UTC)