[HN Gopher] Samurai: Adapting Segment Anything Model for Zero-Sh...
       ___________________________________________________________________
        
       Samurai: Adapting Segment Anything Model for Zero-Shot Visual
       Tracking
        
       Author : GordonS
       Score  : 76 points
       Date   : 2024-11-26 10:16 UTC (4 days ago)
        
 (HTM) web link (yangchris11.github.io)
 (TXT) w3m dump (yangchris11.github.io)
        
       | GordonS wrote:
       | Full, unabridged title (which adds something important!):
       | 
       | "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual
       | Tracking with Motion-Aware Memory"
       | 
       | It's the memory part that I find so impressive in the demo
       | videos!
        
       | IshKebab wrote:
       | Very impressive. I wish research like this was more deployable.
       | It always seems to come in the form of a muddy ball of Python,
       | rather than e.g. a C++ or Rust library you could actually deploy
       | in a product.
       | 
       | I get why, but it still seems a shame that there's all this cool
       | ML research that will only make it into actual products in 10
       | years when someone with the resources of Adobe rewrites it in
       | something other than Python.
        
         | HanClinto wrote:
         | I work on deployed embedded ML products using NVidia Jetson,
         | and while there are C++ portions, a lot of it (dare I say most
         | of it?) is written in Python. It's fast enough for our embedded
         | processors, and Docker containers makes such things very
         | deployable -- even in relatively resource-constrained
         | environments. No, we're not on a Raspberry Pi or an Arduino,
         | but I don't think that SAM2 is going to squeeze down reasonably
         | onto something that size anyways.
         | 
         | If the inference code (TensorRT, Tensorflow, Pytorch, whatever)
         | is fast, then what does it matter what the glue code is written
         | in?
         | 
         | Python has become the common vulgate as a trade language
         | between various disciplines, and I'm all 'bout that.
         | 
         | I've only been working in computer vision for 10-ish years, but
         | even when I started, most research projects were in Matlab. The
         | fact that universities have shifted away from Matlab and into
         | Python is a breath of fresh air, lemme' tell ya'.
        
           | stefan_ wrote:
           | > a lot of it (dare I say most of it?) is written in Python
           | 
           | I guess ignorance is bliss once someone has done the work for
           | you of getting it all down into TRT.
        
           | Grosvenor wrote:
           | TIL Vulgate was a Latin version of the bible.
           | 
           | From Apple dictionary:
           | 
           | "the principal Latin version of the Bible, prepared mainly by
           | St. Jerome in the late 4th century, and (as revised in 1592)
           | adopted as the official text for the Roman Catholic Church."
        
         | zackangelo wrote:
         | I've been writing all of our transformer implementations in
         | Rust using the Candle crate and it's been great.
         | 
         | While dealing with CUDA and GPUs on servers is never a joy,
         | deploying fully contained Rust binaries instead of a morass of
         | python scripts has improved the situation for me significantly.
         | 
         | Getting Samurai running on Candle shouldn't be that large of an
         | undertaking. I believe there's already a SAM implementation.
        
       | steinvakt2 wrote:
       | Note that this currently only enables single object tracking.
       | Tried it for my research project (tracking cells on microscopic
       | videos) but it didn't work well. Guess it's more suited for real
       | world 3d scenarios
        
       | alberth wrote:
       | Seems great for tracking POI on CCTV.
        
       ___________________________________________________________________
       (page generated 2024-11-30 23:01 UTC)