[HN Gopher] Phenaki: A model for generating minutes-long, changi...
___________________________________________________________________
Phenaki: A model for generating minutes-long, changing-prompt
videos from text
Author : vagabund
Score : 53 points
Date : 2022-09-29 18:36 UTC (4 hours ago)
(HTM) web link (phenaki.video)
(TXT) w3m dump (phenaki.video)
| ag8 wrote:
| Wow--this qualitatively feels a lot more impressive than the Meta
| model. The two-minute video is better than anything I've seen in
| video generation on that scale.
| anigbrowl wrote:
| I'm happy about it because all the celebs who paid $ to churn
| out an 'AI music video!!?!' with Stable Diffusion and whatever
| shitty demo they had lying around are suddenly revealed as
| tryhard chasing the hype cycle rather than innovators.
| anigbrowl wrote:
| This addresses several of the shortcomings in the AI video
| technology that's the current top story on HN. It's entertaining
| to consider the possibility that the explosion of innovation is
| partly due to artificially generated papers and business entities
| that are busily iterating upon each others' capabilities while we
| write micro-editorials about what that means.
| blondin wrote:
| this does not sound too far-fetched. the paper says anonymous
| authors pending review...
| GaggiX wrote:
| I didn't read both papers deep enough but I think that Make-A-
| Scene by being conditioned only on image embeddings is incapable
| of generating videos that required a broader understanding that
| cannot be encoded in an image embedding like "Camera zooms
| quickly into the eye of the cat" Make-A-Scene is more like text-
| to-animated_image, this model seems more powerful.
| Hard_Space wrote:
| That long embedded video is the nearest T2V has got to breaking
| my cynicism about how long it is going to take to become (at
| least) coherent.
|
| Check it out alongside the project page to see the text that
| formed it alongside, or just watch it here:
|
| https://phenaki.video/stories/2_minute_movie.webp
|
| However, there are some flourishes and timing that are not
| indicated from the prompt text, and I think there is some manual
| tweaking at play (which is okay, it's still impressive).
| Hard_Space wrote:
| This appears to be just one plank of a tripartite shock assault
| on the October conference season.
|
| The other two, also by anonymous authors using the same
| formatting, are:
|
| AudioGen: Textually Guided Audio Generation
| https://openreview.net/forum?id=CYK7RfcOzQ4
|
| and
|
| Re-Imagen: Retrieval-Augmented Text-to-Image Generator
| https://openreview.net/forum?id=XSEBx0iSjFQ
|
| There is a samples site for AudioGen, but it is currently flooded
| and inaccessible:
|
| https://anonymous.4open.science/w/iclr2023_samples-CB68/repo...
| abeppu wrote:
| What's the commonality in formatting you're paying attention
| to? I think the conference asks everyone to use their
| template/style.
|
| But the architecture figures look like they have different
| styles. E.g. the Re-Imagen paper uses rows/stacks of small
| colored circles to represent output tensors, and colored
| rectangles of different ratios to indicate shape differences,
| where the phenaki paper uses stacks of squares for output
| tensors, and differently shaped elements to distinguish
| different kinds of components.
|
| https://github.com/ICLR/Master-Template
___________________________________________________________________
(page generated 2022-09-29 23:01 UTC)