[HN Gopher] Meta Segment Anything Model 3
___________________________________________________________________
Meta Segment Anything Model 3
Author : lukeinator42
Score : 158 points
Date : 2025-11-19 17:14 UTC (5 hours ago)
(HTM) web link (ai.meta.com)
(TXT) w3m dump (ai.meta.com)
| fzysingularity wrote:
| SAM3 is cool - you can already do this more interactively on
| chat.vlm.run [1], and do much more. It's built on our new Orion
| [2] model; we've been able to integrate with SAM and several
| other computer-vision models in a truly composable manner. Video
| segmentation and tracking is also coming soon!
|
| [1] https://chat.vlm.run
|
| [2] https://vlm.run/orion
| visioninmyblood wrote:
| Wow this is actually pretty cool, I was able to segment out the
| people and dog in the same chat.
| https://chat.vlm.run/chat/cba92d77-36cf-4f7e-b5ea-b703e612ea...
| fzysingularity wrote:
| Nice, that's pretty neat.
| yeldarb wrote:
| We (Roboflow) have had early access to this model for the past
| few weeks. It's really, really good. This feels like a seminal
| moment for computer vision. I think there's a real possibility
| this launch goes down in history as "the GPT Moment" for vision.
| The two areas I think this model is going to be transformative in
| the immediate term are for rapid prototyping and distillation.
|
| Two years ago we released autodistill[1], an open source
| framework that uses large foundation models to create training
| data for training small realtime models. I'm convinced the idea
| was right, but too early; there wasn't a big model good enough to
| be worth distilling from back then. SAM3 is finally that model
| (and will be available in Autodistill today).
|
| We are also taking a big bet on SAM3 and have built it into
| Roboflow as an integral part of the entire build and deploy
| pipeline[2], including a brand new product called Rapid[3], which
| reimagines the computer vision pipeline in a SAM3 world. It feels
| really magical to go from an unlabeled video to a fine-tuned
| realtime segmentation model with minimal human intervention in
| just a few minutes (and we rushed the release of our new SOTA
| realtime segmentation model[4] last week because it's the perfect
| lightweight complement to the large & powerful SAM3).
|
| We also have a playground[5] up where you can play with the model
| and compare it to other VLMs.
|
| [1] https://github.com/autodistill/autodistill
|
| [2] https://blog.roboflow.com/sam3/
|
| [3] https://rapid.roboflow.com
|
| [4] https://github.com/roboflow/rf-detr
|
| [5] https://playground.roboflow.com
| dangoodmanUT wrote:
| I was trying to figure out from their examples, but how are you
| breaking up the different "things" that you can detect in the
| image? Are you just running it with each prompt individually?
| rocauc wrote:
| The model supports batch inference, so all prompts are sent
| to the model, and we parse the results.
| sorenjan wrote:
| SAM3 is probably a great model to distill from when training
| smaller segmentation models, but isn't their DINOv2 a better
| example of a large foundation model to distill from for various
| computer vision tasks? I've seen it used for as starting point
| for models doing segmentation and depth estimation. Maybe
| there's a v3 coming soon?
|
| https://dinov2.metademolab.com/
| xfeeefeee wrote:
| I can't wait until it is easy to rotoscope / greenscreen / mask
| this stuff out accessibly for videos. I had tried Runway ML but
| it was... lacking, and the webui for fixing parts of it had
| similar issues.
|
| I'm curious how this works for hair and transparent/translucent
| things. Probably not the best, but does not seem to be mentioned
| anywhere? Presumably it's just a straight line or vector rather
| than alpha etc?
| nodja wrote:
| I'm pretty sure davinci resolve does this already, you can even
| track it, idk if it's available in the free version.
| rocauc wrote:
| I tried it on transparent glass mugs, and it does pretty well.
| At least better than other available models:
| https://i.imgur.com/OBfx9JY.png
|
| Curious if you find interesting results -
| https://playground.roboflow.com
| sciencesama wrote:
| Does the license allow for commercial purposes?
| visioninmyblood wrote:
| I just check and it seems to commercial permissiable.Companies
| like vlm.run and roboflow are using for commercial use as show
| by thier comments below. So i guess it can be used for
| commercial purposes.
| rocauc wrote:
| Yes. But also note that redistribution of SAM 3 requires
| using the same SAM 3 license downstream. So libraries that
| attempt to, e.g., relicense the model as AGPL are non-
| compliant.
| rocauc wrote:
| Yes. It's a custom license with an Acceptable Use Policy
| preventing military use and export restrictions. The custom
| license permits commercial use.
| colesantiago wrote:
| Yes, the license allows you to grift for your "AI startup"
| gs17 wrote:
| The 3D mesh generator is really cool too:
| https://ai.meta.com/sam3d/ It's not perfect, but it seems to
| handle occlusion very well (e.g. a person in a chair can be
| separated into a person mesh and a chair mesh) and it's very
| fast.
| Animats wrote:
| It's very impressive. Do they let you export a 3D mesh, though?
| I was only able to export a video. Do you have to buy tokens or
| something to export?
| TheAtomic wrote:
| I couldn't download it. Model appears to be comparable to
| Sparc3D, Huyunan, etc but w/o download, who can say? It is
| much faster though.
| visioninmyblood wrote:
| you can download it at
| https://github.com/facebookresearch/sam3. for 3d
| https://github.com/facebookresearch/sam-3d-objects
|
| I actually found the easiest way was to run it here
| directly to see if it works for my use case of person
| deidentification https://chat.vlm.run/chat/63953adb-a89a-4c
| 85-ae8f-2d501d30a4...
| modeless wrote:
| The model is open weights, so you can run it yourself.
| dangoodmanUT wrote:
| This model is incredibly impressive. Text is definitely the right
| modality, and now the ability to intertwine it with an LLM
| creates insane unlocks - my mind is already storming with ideas
| of projects that are now not only possible, but trivial.
| HowardStark wrote:
| Curious if anyone has done anything meaningful with SAM2 and
| streaming. SAM3 has built-in streaming support which is _very_
| exciting.
|
| I've seen versions where people use an in-memory FS to write
| frames of stream with SAM2. Maybe that is good enough?
| rocauc wrote:
| A brief history. SAM 1 - Visual prompt to create pixel-perfect
| masks in an image. No video. No class names. No open vocabulary.
| SAM 2 - Visual prompting for tracking on images and video. No
| open vocab. SAM 3 - Open vocab concept segmentation on images and
| video.
|
| Roboflow has been long on zero / few shot concept segmentation.
| We've opened up a research preview exploring a SAM 3 native
| direction for creating your own model:
| https://rapid.roboflow.com/
| hodgehog11 wrote:
| This is an incredible model. But once again, we find an
| announcement for a new AI model with highly misleading graphs.
| That SA-Co Gold graph is particularly bad. Looks like I have
| another bad graph example for my introductory stats course...
| clueless wrote:
| With a avg latency of 4 seconds, this still couldn't be used in
| real-time video, correct?
|
| [Update: should have mentioned I got the 4 second from the
| roboflow.com links in this thread]
| Etheryte wrote:
| Didn't see where you got those numbers, but surely that's just
| a problem of throwing more compute at it? From the blog post:
|
| > This excellent performance comes with fast inference -- SAM 3
| runs in 30 milliseconds for a single image with more than 100
| detected objects on an H200 GPU.
| daemonologist wrote:
| First impressions are that this model is _extremely_ good - the
| "zero-shot" text prompted detection is a huge step ahead of what
| we've seen before (both compared to older zero-shot detection
| models and to recent general purpose VLMs like Gemini and Qwen).
| With human supervision I think it's even at the point of being a
| useful teacher model.
|
| I put together a YOLO tune for climbing hold detection a while
| back (trained on 10k labels) and this is 90% as good out of the
| box - just misses some foot chips and low contrast wood holds,
| and can't handle as many instances. It would've saved me a huge
| amount of manual annotation though.
| rocauc wrote:
| As someone that works on a platform users have used for
| labeling 1B images, I'm bullish SAM 3 can automate at least 90%
| of the work. Data prep is flipped to models being human-
| assisted instead of humans being model-assisted (see
| "autolabel" https://blog.roboflow.com/sam3/). I'm optimistic
| majority of users can now start deploying a model to then
| curate data instead of the inverse.
| bangaladore wrote:
| Probably still can't get past a Google Captcha when on a VPN. Do
| I click the square with the shoe of the person who's riding the
| motorcycle?
| conception wrote:
| There are services you can get that will bypass those with a
| browser extension for you.
| exe34 wrote:
| can anyone confirm if this fits in a 3090? the files look about
| 3.5GB, but I can't work out what the memory needs will be
| overall.
| foota wrote:
| Obligatory xkcd: https://xkcd.com/1425/
| maelito wrote:
| Can it detect the speed of a vehicle on any video unsupervised ?
| Benjamin_Dobell wrote:
| For background removal (at least my niche use case of background
| removal of kids drawings -- https://breaka.club/blog/why-were-
| building-clubs-for-kids) I think birefnet v2 is still working
| slightly better.
|
| SAM3 seems to less precisely trace the images -- it'll discard
| kids drawing out the lines a bit, which is okay, but then it also
| seems to struggle around sharp corners and includes a bit of the
| white page that I'd like cut out.
|
| Of course, SAM3 is _significantly_ more powerful in that it does
| _much_ more than simply cut out images. It seems to be able to
| identify what these kids ' drawings represent. That's very
| impressive, AI models are typically trained on photos and adult
| illustrations -- they struggle with children's drawings. So I
| could perhaps still use this for identifying content, giving kids
| more freedom to draw what they like, but then unprompted attach
| appropriate behavior to their drawings in-game.
| ge96 wrote:
| Dang that seems like it would work great for game asset
| generation regarding 3D
| tonyhart7 wrote:
| This would be good for video editor
___________________________________________________________________
(page generated 2025-11-19 23:00 UTC)