[HN Gopher] Efficient Track Anything
___________________________________________________________________
Efficient Track Anything
Author : t55
Score : 146 points
Date : 2024-12-03 10:36 UTC (6 days ago)
(HTM) web link (yformer.github.io)
(TXT) w3m dump (yformer.github.io)
| brunorsini wrote:
| Does anyone know of a method for plugging the output of models
| like this one with traditional video editing software like Adobe
| Premiere?
| Infiltrator wrote:
| Final Cut Pro 11 now has a "Magnetic Mask" tool which performs
| well. I don't know if this uses the same sort of models, but is
| functionally what you're looking for.
| atoav wrote:
| A thing that "always works" is outputing the transparancy as a
| grayscale (black=transparent, white=opaque) alpha video. You
| can combine these in an after effects composition which you can
| load directly into Premiere: https://helpx.adobe.com/premiere-
| pro/using/compositing-alpha...
|
| The output of that tool also looks suspiciously like a so-
| called cryptomatte which many 3D tools use to store masks for
| each of the objects/materials/etc, Blender can read and write
| those -- although I am not sure whether this tools supports
| outputing those.
| RoseyWasTaken wrote:
| Exactly this! I experimented with a depth estimation model.
| An image was added as an input, the output was a depth map in
| grayscale, which I later used in Photoshop as a mask for
| Lens-blur simulation.
| steinvakt2 wrote:
| I wish these things were described more clearly. Is this single
| object tracking or multi object tracking? Just a week ago SAMURAI
| was posted here, which is kind of the same thing, promising SOTA
| tracking performance using SAM2. But it only allows single object
| tracking, which makes it useless for many medical imaging tasks.
| notreally123123 wrote:
| If it uses SAM2, it is always most likely single object
| tracking. What prevents you from running multiple single object
| tracker in parallel? This would emulate a multi object tracker.
| If you want to be fancy, you add some logic to handle id
| switches etc.
| _Wintermute wrote:
| The logic to detect and attempt to fix ID switches is
| unfortunately a huge part of multi-object tracking.
| collingreen wrote:
| https://old.reddit.com/r/restofthefuckingowl/
| atoav wrote:
| What I'd love to see is how these tools perform with low depth of
| field shots, e.g. one actor in shot and one actor out of focus in
| front of them standing in front of a street with moving traffic.
|
| This kind of "cinematic" shots is where automatic masking tools
| typically fall apart.
| t55 wrote:
| https://arxiv.org/abs/2411.02844 this paper is for you
| wis wrote:
| It was fun trying out the demo, with the "coffee kettle pouring"
| video it did really well segmenting the man's hand and arm and
| tracking it (segmenting them in every frame correctly), but with
| the "Find the ball cup game" video it lost track of the tracked
| cup in a strange way, it kept track of it correctly while it went
| behind other cups, but after it wasn't occluded anymore, it
| switched to an other cup.
|
| It's still impressive to me how it twice kept track between
| occlusions, but strange how it lost track when it wasn't
| occluded.
|
| https://i.imgur.com/hOSQBtw.mp4
| datadrivenangel wrote:
| "On mobile devices such as iPhone 15 Pro Max, our EfficientTAMs
| can run at ~10 FPS for performing video object segmentation with
| reasonable quality"
|
| This is pretty impressive! Lowering the compute requirements will
| allow more applications to be feasible.
| ninalanyon wrote:
| Was the abstract written by ChatGPT? It's an unreadable wall of
| text.
___________________________________________________________________
(page generated 2024-12-09 23:01 UTC)