[HN Gopher] Video models are zero-shot learners and reasoners
       ___________________________________________________________________
        
       Video models are zero-shot learners and reasoners
        
       Author : meetpateltech
       Score  : 71 points
       Date   : 2025-09-25 13:13 UTC (9 hours ago)
        
 (HTM) web link (video-zero-shot.github.io)
 (TXT) w3m dump (video-zero-shot.github.io)
        
       | liuliu wrote:
       | Very interesting read. I first learned this method from a random
       | reddit post a while ago and very happy to see a systematic study
       | on this (wish I would save the original post somewhere to
       | reference to!).
        
       | ThouYS wrote:
       | maybe we really are headed to The One Model that can do it all
        
         | nothrowaways wrote:
         | Multimodal models do it already.
        
       | ricardobeat wrote:
       | Is it possible to use a model trained on video to output single
       | frames?
        
       | mallowdram wrote:
       | What is specific about this model? These categories aren't what
       | defines intelligence in animal life. Segmentation is a post-hoc
       | assertion into visual science, not necessarily an inside-out
       | process inherent to perception.
       | 
       | These models aren't the path, they're cheap workarounds that
       | exclude the senses.
        
       | pvillano wrote:
       | To train an AI to solve problems, you train it extrapolate the
       | future from a starting state of having a problem and the
       | intention to solve the it.
       | 
       | So much falls out of that reframing.
        
         | pvillano wrote:
         | Training is first done as a general predictive model: situation
         | => result
         | 
         | Then it's fine-tuned on: situation + intent => action => result
        
       | miguel_martin wrote:
       | This is incredible.
        
       ___________________________________________________________________
       (page generated 2025-09-25 23:01 UTC)