[HN Gopher] Segment Anything Model (SAM) can "cut out" any objec...
___________________________________________________________________
Segment Anything Model (SAM) can "cut out" any object in an image
Author : crakenzak
Score : 322 points
Date : 2023-04-05 15:11 UTC (7 hours ago)
(HTM) web link (segment-anything.com)
(TXT) w3m dump (segment-anything.com)
| swframe2 wrote:
| The best solution I've seen is https://github.com/xuebinqin/DIS.
| You should try the DIS example images at the SAM site.
|
| The main issue I have with DIS is that creating the labels of my
| own dataset is super expensive (I think it might be easier to
| generate the training data using stable diffusion rather than
| human labelling)
| jonplackett wrote:
| This would make a great input for ControlNet
| fzliu wrote:
| Computer vision seems to be gravitating heavily towards self-
| attention. While the results here are impressive, I'm not quite
| convinced that vision encoders are the right way forward. I just
| can't wrap my head around how discretizing images, which are
| continuous in two dimensions, into patches is the most optimal
| way to do visual recognition.
|
| What's preventing us from taking something like convnext or a
| hybrid conv/attention model and hooking that up to a decoder
| stack? I fee like the results would be similar if not better.
|
| EDIT: Clarifying that encoder/decoder refers to the transformer
| stack, not an autoencoder.
| neodypsis wrote:
| > What's preventing us from taking something like convnext or a
| hybrid conv/attention model and hooking that up to a decoder? I
| fee like the results would be similar if not better.
|
| You mean like in an U-Net architecture?
| yeldarb wrote:
| Wow, this is pretty epic. I put it through its paces on a pretty
| wide variety of images that have tripped up recent zero-shot
| models[1] and am thoroughly impressed.
|
| We have a similar "Smart Polygon" tool[2] built into Roboflow but
| this is next level. Having the model running in the browser makes
| it so much more fun to use. Stoked it's open source; we're going
| to work on adding it to our annotation tool ASAP.
|
| [1] Some examples from Open Flamingo last week
| https://news.ycombinator.com/item?id=35348500
|
| [2] https://blog.roboflow.com/automated-polygon-labeling-
| compute...
| syntheweave wrote:
| Finally, I'll be able to fill line art with flat colors without
| fussing around with thresholds and painting in boundaries.
|
| (It does have difficulty finding the smallest possible area, but
| it's a significant advance over most existing options since in my
| brief test, it can usually spot the entire silhouette of figures,
| which is where painting a boundary is most tedious).
| MrGando wrote:
| This is great, runs very fast in Chrome for me.
| benatkin wrote:
| They seem to avoid using their own brand a lot. They have a
| zillion domain names and they register a new one and don't use
| the logo except in the favicon and footer. I've seen similar
| stuff including divesting OSS projects like PyTorch and GraphQL
| which Google wouldn't. To me that's tacit admission that the
| Facebook and Meta names are tarnished. And they are, by the
| content they showed users in Myanmar with the algorithmic feed,
| and by Cambridge Analytica. Maybe the whole "Meta" name is no
| different from the rebranding of Philip Morris.
| smoldesu wrote:
| On the one hand, sure. Facebook's brand is about as hip as a
| bag of Werther's Originals.
|
| On the other hand, this is one of those things (like VR) that
| is a distinctly non-Facebook project. It makes no sense to
| position or market this as "Facebook" research. The Homepod
| isn't called the iPod Home for obvious reasons, so it stands to
| reason that Facebook execs realized selling someone a "Facebook
| Quest" sounds like a metaphor for ayahuasca. It's not entirely
| stupid to rebrand, especially considering how diverse (and
| undeniably advanced) they've become in fields like AI and VR.
| oefnak wrote:
| I actively avoid everything that has anything to do with
| Facebook, and I can't be the only one.
| smoldesu wrote:
| Yeah, me too. I also avoid everything Apple and Google
| makes, but I'm not going to pretend like the Alphabet
| rebranding is their attempt at hiding who they are.
| xiphias2 wrote:
| Alphabet wasn't a rebranding: the founding billionaires
| got bored of Google, and wanted to take out a few billion
| dollars per year out of it to create new toys without
| sharing it with Google.
| throwaway290 wrote:
| Ever used React or PyTorch? Well, this is same. Developers
| make good stuff regardless of where they work, and good on
| FB for contributing
|
| But yeah if you do open source adding an element of
| corporate branding is a sure way to kill the project.
| That's why it's not called "Apple Swift" or "Microsoft
| TypeScript".
| [deleted]
| thanatropism wrote:
| I was looking into GPU nearest neighbors libraries today and
| turned Faiss down because it said "Facebook". Completely
| irrational, I know.
| aftbit wrote:
| You should use Faiss though, it's good.
| renewiltord wrote:
| I actually have a much more positive impression of Meta because
| of this work. It's hard to describe, but they feel very
| competent. My instant reaction to something being by Meta
| Research is actually to think it's probably going to be
| interesting and good.
| eminence32 wrote:
| The page says at the very top, in a fixed header that is always
| visible (even as you scroll, or browse to other pages):
| "Research by Meta AI"
|
| To me, this feels like they are not avoiding the "Meta" brand
| at all.
| benatkin wrote:
| See my other comment. Of course they needed to have it
| somewhere to score points. These probably weren't people who
| were about to quit it, probably just with a lowered
| perception of it compared to a company people are mostly
| proud to work at like Google...
| https://news.ycombinator.com/edit?id=35458445
| blululu wrote:
| What are you talking about? There is a Meta Logo Favicon, "Meta
| AI" appears in the header and "Meta AI" is purposefully
| centered in the ABF text. Registering a new domain costs $10
| compared to the massive pain of involving legal with the
| permissions to repurpose a new domain. It's a new project so
| why not make a clean start and just get a new website instead
| of going through the full FB/Meta approval process on branding.
| benatkin wrote:
| I mentioned the logo. I didn't mention the text because
| perhaps they still want to score points for Meta, so hiding
| it entirely wouldn't make sense. But they avoid the larger
| immediate hangups of the big logo and the domain name.
| fortissimohn wrote:
| They likely meant that Meta was established in part due to
| the Facebook name being tarnished in the first place.
| aftbit wrote:
| I'm out of the loop, what happened in Myanmar?
| aix1 wrote:
| From an Amnesty International report:
|
| Beginning in August 2017, the Myanmar security forces
| undertook a brutal campaign of ethnic cleansing against
| Rohingya Muslims. This report is based on an in-depth
| investigation into Meta (formerly Facebook)'s role in the
| serious human rights violations perpetrated against the
| Rohingya. Meta's algorithms proactively amplified and
| promoted content which incited violence, hatred, and
| discrimination against the Rohingya - pouring fuel on the
| fire of long-standing discrimination and substantially
| increasing the risk of an outbreak of mass violence. The
| report concludes that Meta substantially contributed to
| adverse human rights impacts suffered by the Rohingya and has
| a responsibility to provide survivors with an effective
| remedy.
|
| https://www.amnesty.org/en/documents/ASA16/5933/2022/en/
|
| See also
|
| https://en.wikipedia.org/wiki/Rohingya_genocide
|
| https://en.wikipedia.org/wiki/Rohingya_genocide#Facebook_con.
| ..
| iambateman wrote:
| Welcome to the wild world of corporate IT. Their VP has
| authority to make a new website if she wants, but has to go
| through a 3 month vetting process to put on a subdomain.
| lacker wrote:
| As someone who used to work on Facebook open source, that
| makes sense! After all, an insecure subdomain could lead to
| all sorts of problems on facebook.com. Phishing, stealing
| cookies, there's a lot of ways it could go wrong.
|
| Whereas, if one engineer spins up some random static open
| source documentation website on AWS, it really can't go wrong
| in a way that causes trouble for the rest of the company.
| benatkin wrote:
| Meta isn't a typical corporation, though. Ordinary big
| company red tape could have stopped them from indirectly
| displacing thousands based on their religion. (That isn't an
| outlandish claim but is something they actually got sued for,
| though it was dismissed without absolving them of it)
| herval wrote:
| It very much is a typical big corp, and OP is correct. It's
| easier to ship something on a new domain, using AWS and a
| bunch of contractors, than to add a subdomain to
| facebook.com or some other top-level domain
| smoldesu wrote:
| Not to mention, the "Ordinary big company red tape"
| didn't stop Coca Cola from hiring Colombian death squads,
| Nestle from draining the Great Lakes and selling it back
| to it's residents, nor Hershey's from making chocolate
| from cacao farmed with child slave labor.
|
| Relative to the rest of FAANG (or even Fortune 500),
| Facebook might have the least blood on their hands when
| everything is said and done.
| killerdhmo wrote:
| um... did you sleep through the last 8+ years of
| handwringing about election interference, Russian / state
| propaganda, live streaming massacres, addiction / mental
| health effects of social media, particular for kids? I
| can't imagine the other FAANGs come close
| smoldesu wrote:
| If platforming disinformation and enabling internet
| addiction is equivalent to criminal complacency, then
| Microsoft, Apple, Amazon and Google all have crimes to
| answer for. Facebook has shit the bed more times than
| they can count on two hands, but unfortunately that's
| kinda the table-stakes in big tech.
| reaperman wrote:
| Multiple block diagrams and the paper note that one of the inputs
| is supposed to be "text", but none of the example Jupyter
| notebooks or the live demo page show how to use those. I'm
| assuming just run the text into CLIP, take the resulting
| embedding, and throw it directly in as a prompt, which then gets
| re-encoded by the SAM prompt encoder?
|
| > "Prompt encoder. We consider two sets of prompts: sparse
| (points, boxes, text) and dense (masks). We represent points and
| boxes by positional encodings [95] summed with learned embeddings
| for each prompt type and free-form text with an off-the-shelf
| text encoder from CLIP [82]. Dense prompts (i.e., masks) are
| embedded using convolutions and summed element-wise with the
| image embedding."
|
| Edit: Found the answer myself:
| https://github.com/facebookresearch/segment-anything/issues/...
| [deleted]
| AdilZtn wrote:
| That's amazing! This model is a huge opportunity to create
| annotated data (with decent quality) for just a few dollars.
| People will iterate more quickly with this kind of foundation
| model.
| justinator wrote:
| Demo is running slow - cutting out is an impressive ability - I'm
| to assume it also fills in the background? If so: that's next
| level. Maybe that Photoshop monthly subscription will be worth it
| (providing this sort of ability is going to be baked in with
| AdobeAI's version soon)
| bobsmooth wrote:
| Seeing it run on a headset is the coolest part. Lots of
| applications for AR.
| ren_engineer wrote:
| what do you think facebook's gameplan is here? Are they trying to
| commoditize AI by releasing this and Llama as a move against
| OpenAI, Microsoft, and Google? They had to have known the Llama
| weights would be leaked and now they are releasing this
| jayd16 wrote:
| Well there's some patent offense and defense in making and
| releasing research papers. There's some recruiting aspects to
| it. Its also a way to commoditize your inverse if you assume
| this sort of stuff brings AR and the metaverse closer to reach.
| high_derivative wrote:
| n=1 (as a mid-profile AI researcher), but for me it's working
| in terms of Meta gaining my respect by open sourcing (despite
| the licensing disasters). They clearly seem to be more
| committed to open source and getting things done now in
| general.
| dragonwriter wrote:
| I think cranking out open source projects like this raises Meta
| AI's profile and helps them attract attention and people, and I
| don't think selling AI _qua_ AI is their business plan, selling
| services built on top is. And commoditized AI means that the AI
| vendors don't get to rent-seek on people doing that, whereas
| narrowly controlled monopoly /oligopoly AI would mean that the
| AI vendors extract the value produced by downstream
| applications.
| vagabund wrote:
| I've always half-believed that the relatively open approach
| to industry research in ML was a result of the inherent
| compute-based barrier to entry for productizing a lot of the
| insights. Collaborating on improving the architectural SoTA
| gets the handful of well-capitalized incumbents further ahead
| more quickly, and solidifies their ML moat before new
| entrants can compete.
|
| Probably too cynical, but you can potentially view it as a
| weak form of collusion under the guise of open research.
| dragonwriter wrote:
| This particular model has a very low barrier; the model
| size is smaller than Stable Diffusion which is running
| easily on consumer hardware for inference, though
| _training_ is more resource intensive (but not out of reach
| of consumers, whether through high-end consumer hardware or
| affordable cloud resources.)
|
| For competitive LLMs targeting text generation, especially
| for training, a compute-based barrier is more significant.
| vagabund wrote:
| Yeah that's fair. I intended my comment to be more of a
| reflection on the culture in general, but the motivations
| in this instance are probably different.
| herval wrote:
| Their main use case for these models seems to be AR. Throwing
| it out in the open might help getting external entities to
| build for them & attract talent, etc. Not sure they're that
| strategic but it's my guess
| _the_inflator wrote:
| I think Meta's gameplan is complex. Inspiration as well as
| adoption, not stepping on the toes of regulators prolly another
| intention. Have a look at PyTorch for example. Massively
| popular ML framework, with its lots of interesting projects
| running.
|
| If Meta frequently shares their "algorithms" they take the
| blame out of its usage. After all, who is to blame when
| everybody does "it" and you are very open about it.
|
| Use cases, talent visibility as well as attraction also plays a
| role. After all, Google was so fancied, due to its many open
| source projects. "Show, don't tell".
| geenew wrote:
| That's it for me
| crakenzak wrote:
| This is going along with the new Segment Anything Model paper
| Meta AI just released:
|
| Paper: https://scontent-
| sea1-1.xx.fbcdn.net/v/t39.2365-6/10000000_6...
|
| Announcement: https://ai.facebook.com/blog/segment-anything-
| foundation-mod...
|
| Code & Model Weights:
| https://github.com/facebookresearch/segment-anything
| lofaszvanitt wrote:
| Why can't they give proper filenames to these research papers.
| This drives me nuts.
| ftxbro wrote:
| if Tim Berners-Lee saw that paper link he would have never
| allowed the url to be invented
| LoganDark wrote:
| I'm so shocked by how almost every query parameter is
| required and there's even a freaking signature for validating
| the URL itself.
|
| -Emily
| jauer wrote:
| That paper link is a CDN URL that is dynamically generated to
| point to your closest POP when you load the abstract. It will
| be different for many people and will break eventually.
|
| Abstract:
| https://ai.facebook.com/research/publications/segment-anythi...
| code51 wrote:
| It's interesting that (clearly visible) text parts that cannot be
| handled properly by most OCR approaches also get left out by SAM
| in auto-predictions.
| dimatura wrote:
| The network architecture and scale don't seem to be a big
| departure from recent SOTA, but a pretty massive amount of
| labeled data went into it. And it seems to work pretty well! The
| browser demo is great. This will probably see a lot of use,
| especially considering the liberal licensing.
| bjacobt wrote:
| I apologize if this is obvious, but are both the model and
| checkpoint (as referenced in getting started section in readme)
| Apache 2.0? Can it be used for commercial applications?
| dimatura wrote:
| As far as I can tell, it can. The code itself has a `LICENSE`
| file with the Apache license, and the readme says "The model
| is licensed under the Apache 2.0 license.". Strangely, the
| FAQ in the blog post doesn't address this question, which I
| expect will be frequent.
| phkahler wrote:
| Isn't Apache 2 a free software license without some of the
| GPLv3 things some don't like?
|
| I think a more BSD would be better, or LGPL. Either would
| be more business friendly.
| MacsHeadroom wrote:
| LGPL is not business friendly at all. It's among the
| least business friendly licenses there is. Apache 2.0 is
| slightly more business friendly than BSD.
|
| With some caveats, software licenses from most to least
| business friendly roughly go:
|
| Apache > BSD > MIT > MPL > LGPL > GPL > AGPL
| kyle-rb wrote:
| LGPL is more business friendly than GPL; it's literally
| "lesser" GPL.
|
| You can use LGPL in commercial, closed-source projects as
| long as you keep the LGPL code in a separate dynamically
| linked library, e.g. a DLL, and provide a way for users
| to swap it out for their own patched DLL if they wish.
| (Plus some other license terms.)
|
| Also, you can always use LGPL code under the terms of the
| GPL, so there's no way LGPL is more restrictive than GPL.
| MacsHeadroom wrote:
| You're right, that was a mistake. It's been fixed. LGPL >
| GPL
| dang wrote:
| Related:
|
| _Meta New Segmentation Model_ -
| https://news.ycombinator.com/item?id=35453625 - April 2023 (7
| comments)
| richardw wrote:
| Surely this changes the security camera game? No more being
| fooled by clouds going overhead.
| subarctic wrote:
| The demo is pretty cool but it looks like you can just select
| things and have it highlight them in blue - is there a way to
| remove objects from the image and have the background filled in
| behind them?
| dymk wrote:
| You could probably content-aware fill the area that SAM
| identifies with another tool
| neom wrote:
| yikes. I went to film school in the early 2000s and spent hours
| and hours on levels/HDR based masking, I've used the adobe tools
| recently and they're good... this is... yikes...I wonder how
| people in their mid 20s today learning photoshop are going to
| deal with their graduating jobs.
| jacquesm wrote:
| Not. This is homing in on the SF UI that Deckard used in Blade
| Runner.
|
| https://www.youtube.com/watch?v=hHwjceFcF2Q
|
| All it takes is a couple of tools glued together and you're
| getting there.
| marstall wrote:
| "gimme a hardcopy right there."
| arduinomancer wrote:
| This exists as a feature on iOS
|
| You can long press on an image and it cuts out whatever thing it
| thinks you're pressing on
|
| They also use it in interesting ways, like making stuff in the
| photo slightly overlap the clock on the lockscreen
|
| Does anyone know if that works the same way as this?
| hbn wrote:
| It would still be nice if iOS had some kind of interface like
| this where you can nudge it in the right direction if it's
| confusing something like a jacket and the background. iOS gives
| its best attempt which is usually pretty good, but if it didn't
| get it right you're basically SOL.
| neom wrote:
| This is: understand everything in the image as elements,
| subject or whatever.
| sashank_1509 wrote:
| Extremely impressive system. Blows everything else (including
| CLIP from OpenAI) out of the water. We are inching closer to
| solving Computer Vision!
| wongarsu wrote:
| It's really impressive, and better than anything I've seen, but
| is it really leagues better than whatever Photoshop is using?
|
| Of course being on github and permissively license is huge.
| vanjajaja1 wrote:
| My question exactly, didn't photoshop already solve this like
| 5+ years ago?
| dymk wrote:
| Have you used Photoshop's magic wand tool in the last 5
| years? No, it's nowhere close to this good.
| RomanPushkin wrote:
| Stalin's dream
| https://en.wikipedia.org/wiki/Censorship_of_images_in_the_So...
| cloudking wrote:
| Pretty cool, Runway has a similar green screening feature that
| can 1-click segment a subject from the background across an
| entire video: https://runwayml.com/ai-magic-tools/
| minimaxir wrote:
| You know an AI project is serious when it has its own domain name
| instead of a subdomain.
| syrusakbary wrote:
| This is awesome. If you try the demo they provide [0], the
| inference is handled purely in the client using a ONNX model that
| only weights around ~8Mb [1] [2].
|
| Really impressive stuff! Congrats to the team that achieved it
|
| [0] https://segment-anything.com/demo
|
| [1] https://segment-
| anything.com/model/interactive_module_quanti...
|
| [2] https://segment-
| anything.com/model/interactive_module_quanti...
| nielsbot wrote:
| Sigh. Does not work in Safari on macOS (ARM). Works in Chrome
| though.
| subarctic wrote:
| It seems to work in firefox on macOS (ARM) fwiw
| georgelyon wrote:
| It seems like the output of this model is masks, but for cropping
| you really need to be able to pull partial color out of certain
| pixels (for example, pulling a translucent object out from of a
| colored background). I tried the demo, and it fails pretty
| miserably on a vase. Anyone know of a model that can do this
| well?
___________________________________________________________________
(page generated 2023-04-05 23:00 UTC)