[HN Gopher] Video models are zero-shot learners and reasoners
___________________________________________________________________
Video models are zero-shot learners and reasoners
Author : meetpateltech
Score : 71 points
Date : 2025-09-25 13:13 UTC (9 hours ago)
(HTM) web link (video-zero-shot.github.io)
(TXT) w3m dump (video-zero-shot.github.io)
| liuliu wrote:
| Very interesting read. I first learned this method from a random
| reddit post a while ago and very happy to see a systematic study
| on this (wish I would save the original post somewhere to
| reference to!).
| ThouYS wrote:
| maybe we really are headed to The One Model that can do it all
| nothrowaways wrote:
| Multimodal models do it already.
| ricardobeat wrote:
| Is it possible to use a model trained on video to output single
| frames?
| mallowdram wrote:
| What is specific about this model? These categories aren't what
| defines intelligence in animal life. Segmentation is a post-hoc
| assertion into visual science, not necessarily an inside-out
| process inherent to perception.
|
| These models aren't the path, they're cheap workarounds that
| exclude the senses.
| pvillano wrote:
| To train an AI to solve problems, you train it extrapolate the
| future from a starting state of having a problem and the
| intention to solve the it.
|
| So much falls out of that reframing.
| pvillano wrote:
| Training is first done as a general predictive model: situation
| => result
|
| Then it's fine-tuned on: situation + intent => action => result
| miguel_martin wrote:
| This is incredible.
___________________________________________________________________
(page generated 2025-09-25 23:01 UTC)