[HN Gopher] Text-to-4D Dynamic Scene Generation
___________________________________________________________________
Text-to-4D Dynamic Scene Generation
Author : Sebastian_09
Score : 108 points
Date : 2023-01-27 10:41 UTC (12 hours ago)
(HTM) web link (make-a-video3d.github.io)
(TXT) w3m dump (make-a-video3d.github.io)
| Sebastian_09 wrote:
| Link to paper https://arxiv.org/abs/2301.11280, dynamic
| visualisations only work in Chrome (?)
| jerpint wrote:
| Can confirm it doesn't work on brave on mobile
| stale2002 wrote:
| Another paper, with no code released?
|
| What's the point then?
| kamray23 wrote:
| It's perfectly reasonable to release a publicly accessible
| paper while keeping the code to yourself, especially if you're
| Meta or OpenAI and wish to commercialize it at some point.
|
| You can recreate things from papers fine. I've done it for
| several projects, it's often nicer than just copy-pasting in
| code and it fixes issues where one side is uisng Montreal's AI
| toolkit and another is using pytorch and one other is using
| keras.
|
| Although for a tool like this, they clearly used pre-trained
| models as a large component, ones with publicly accessible
| weights as well. So replicating it will probably happen in the
| coming months if Meta doesn't (understandably) release the code
| they very clearly plan to use for their own Metaverse product.
| radarsat1 wrote:
| Code is nice, but a paper should be written sufficiently well
| that it gets the ideas across such that the solution can be
| replicated. The _ideas_ are the point, not the implementation.
| smusamashah wrote:
| These videos look too much like the things and their movement
| that I see in dreams. They are blurryish but makes sense but
| actually don't. e.g. the running rabbit, its legs are moving but
| its not. This is almost exactly how I remember dreams, when I see
| people moving, I can rarely notice their limbs moving
| accordingly. When I look at my own hands they might have more
| than 5 five fingers and very vague and blurry hand lines. When i
| try to run or walk, or fly its just as weird as these videos.
|
| This reminds of how the first generation of these kind of image
| generators were said to be 'dreaming'. This also makes me think
| that do our brains really work like these algorithms (or these
| algos are mimicking brains very correctly).
| littlestymaar wrote:
| I've expected NERF + Diffusion models for a while, but it looks
| like there's still a lot of work needed before it gets practical.
| GaggiX wrote:
| Performing these optimization processes during inference time
| has never been very practical for generative tasks, as it
| requires a lot of time, memory (to store the gradient) and the
| quality is usually mediocre. I still remember VQGAN+CLIP, the
| optimization process was to find a latent embedding that would
| maximize the cosine similarity between the CLIP encoded image
| and the CLIP encoded prompt, It worked but not very practical.
| dukeofdoom wrote:
| Getting something that generates multiple angles of the same
| subject in different typical poses would go a long way. I can get
| midjourney to kind of do this by asking for "multiple angles",
| but it's hit or mis.
| ajjenkins wrote:
| Can someone explain what's 4D about this? Is it 4D because the 3D
| models are animated (moving)?
| spdustin wrote:
| 4D: Height, width, depth, and time.
| [deleted]
| radarsat1 wrote:
| > trained only on Text-Image pairs and unlabeled videos
|
| This is fascinating. It's able to pick up sufficiently on the
| fundamentals of 3D motion from 2D videos, while only needing
| static images with descriptions to infer semantics.
| jackling wrote:
| I really wish these datasets were more openly accessiable. I
| always want to try replicating these models but it seems that the
| data is the blocker. Renting the compute needed to create an
| inferiror model does not seem to be an issue, it's always the
| data.
| jug wrote:
| Here we go again. The samples look uncannily similar to the early
| text-to-image stuff we had.
___________________________________________________________________
(page generated 2023-01-27 23:01 UTC)