hngopher.com

       [HN Gopher] Efficient Track Anything
       ___________________________________________________________________
        
       Efficient Track Anything
        
       Author : t55
       Score  : 146 points
       Date   : 2024-12-03 10:36 UTC (6 days ago)
        
 (HTM) web link (yformer.github.io)
 (TXT) w3m dump (yformer.github.io)
        
       | brunorsini wrote:
       | Does anyone know of a method for plugging the output of models
       | like this one with traditional video editing software like Adobe
       | Premiere?
        
         | Infiltrator wrote:
         | Final Cut Pro 11 now has a "Magnetic Mask" tool which performs
         | well. I don't know if this uses the same sort of models, but is
         | functionally what you're looking for.
        
         | atoav wrote:
         | A thing that "always works" is outputing the transparancy as a
         | grayscale (black=transparent, white=opaque) alpha video. You
         | can combine these in an after effects composition which you can
         | load directly into Premiere: https://helpx.adobe.com/premiere-
         | pro/using/compositing-alpha...
         | 
         | The output of that tool also looks suspiciously like a so-
         | called cryptomatte which many 3D tools use to store masks for
         | each of the objects/materials/etc, Blender can read and write
         | those -- although I am not sure whether this tools supports
         | outputing those.
        
           | RoseyWasTaken wrote:
           | Exactly this! I experimented with a depth estimation model.
           | An image was added as an input, the output was a depth map in
           | grayscale, which I later used in Photoshop as a mask for
           | Lens-blur simulation.
        
       | steinvakt2 wrote:
       | I wish these things were described more clearly. Is this single
       | object tracking or multi object tracking? Just a week ago SAMURAI
       | was posted here, which is kind of the same thing, promising SOTA
       | tracking performance using SAM2. But it only allows single object
       | tracking, which makes it useless for many medical imaging tasks.
        
         | notreally123123 wrote:
         | If it uses SAM2, it is always most likely single object
         | tracking. What prevents you from running multiple single object
         | tracker in parallel? This would emulate a multi object tracker.
         | If you want to be fancy, you add some logic to handle id
         | switches etc.
        
           | _Wintermute wrote:
           | The logic to detect and attempt to fix ID switches is
           | unfortunately a huge part of multi-object tracking.
        
           | collingreen wrote:
           | https://old.reddit.com/r/restofthefuckingowl/
        
       | atoav wrote:
       | What I'd love to see is how these tools perform with low depth of
       | field shots, e.g. one actor in shot and one actor out of focus in
       | front of them standing in front of a street with moving traffic.
       | 
       | This kind of "cinematic" shots is where automatic masking tools
       | typically fall apart.
        
         | t55 wrote:
         | https://arxiv.org/abs/2411.02844 this paper is for you
        
       | wis wrote:
       | It was fun trying out the demo, with the "coffee kettle pouring"
       | video it did really well segmenting the man's hand and arm and
       | tracking it (segmenting them in every frame correctly), but with
       | the "Find the ball cup game" video it lost track of the tracked
       | cup in a strange way, it kept track of it correctly while it went
       | behind other cups, but after it wasn't occluded anymore, it
       | switched to an other cup.
       | 
       | It's still impressive to me how it twice kept track between
       | occlusions, but strange how it lost track when it wasn't
       | occluded.
       | 
       | https://i.imgur.com/hOSQBtw.mp4
        
       | datadrivenangel wrote:
       | "On mobile devices such as iPhone 15 Pro Max, our EfficientTAMs
       | can run at ~10 FPS for performing video object segmentation with
       | reasonable quality"
       | 
       | This is pretty impressive! Lowering the compute requirements will
       | allow more applications to be feasible.
        
       | ninalanyon wrote:
       | Was the abstract written by ChatGPT? It's an unreadable wall of
       | text.
        
       ___________________________________________________________________
       (page generated 2024-12-09 23:01 UTC)