[HN Gopher] Meta's Segment Anything written with C++ / GGML
___________________________________________________________________
Meta's Segment Anything written with C++ / GGML
Author : ariym
Score : 216 points
Date : 2023-09-05 22:49 UTC (21 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| IshKebab wrote:
| I'm so glad the AI community is finally starting to ditch Python.
| It has held progress back for far too long.
| fsloth wrote:
| In general if you don't know what you are doing, it's much
| faster to first figure out a good strategy for a solution in a
| language that does not suffer from all of the encumbrance C++
| brings in.
|
| Python is really great for fast prototyping. It can be argued
| most AI products so far are result of fast prototyping. So not
| sure if there is anything wrong with that.
|
| As practical models emerge, at that point it indeed makes sense
| to port them to C++. But I would not in my wildest dreams
| suggest prototyping a data model in C++ unless absolutely
| necessary.
| dbmikus wrote:
| How has Python held it back? Most of the heavy computation
| lifting is done by C extensions/bindings and the models are
| compiled to run on CUDA, etc. What am I missing?
| lhl wrote:
| Presumably what you're missing is that it's that IshKebab
| probably doesn't work in AI/ML at all (no links in his
| profile, but you can judge his post history yourself). Anyone
| can have voice opinion, but that doesn't mean it's
| particularly well informed.
| IshKebab wrote:
| I worked for an AI startup for 5 years until recently. Nice
| try though.
| IshKebab wrote:
| Setting up and deploying models in production or on edge
| devices is much much more complex if you have to deal with
| Python and Conda and whatnot.
| lelag wrote:
| The AI community is nowhere close to ditching Python. Most
| model development and training still use python based
| toolchains (torch, tf...). The new trends is for popular and
| useful models to be ported to more efficient stack like
| C++/GGML for easier usage and inference speed on consumer
| hardware.
|
| Another popular optimisation is to port models to WASM + GPU
| because it makes them easy to support a variety of platforms
| (desktop, mobile...) with a single API and it can still offer
| great performance (see Google's mediapipe as an exemple of
| that).
| IshKebab wrote:
| That's why I said "starting to" not "close to".
| Havoc wrote:
| I'd say discovery and innovation would be slower in a less
| relaxed language. And speed end up comparable thanks to the
| compiled parts of python
| jebarker wrote:
| This is exactly the wrong way around. We've seen the progress
| we've seen because of the adoption of Python. Even now there
| are relatively few people that can write code like this and
| have the ML and math experience to push forward the research.
| unshavedyak wrote:
| Well... damn. Is there a framework like this (or this directly?)
| which can run object detection? People, car types, makes,
| animals, etc?
| yeldarb wrote:
| Yes, GroundingDINO is an open set object detector. There are
| some others (eg DETIC and OWL-ViT) as well.
|
| We've been working on using them (often in conjunction with
| SAM) for auto-labeling datasets to train smaller faster models
| that can run in real-time at the edge:
| https://github.com/autodistill/autodistill
| [deleted]
| fiddlerwoaroof wrote:
| Would this be suitable for labeling images to search by
| keyword (think Apple Photos-like "car" searches to pull up
| photos of cars)
| lulurennt wrote:
| I think you would want to use something like CLIP
| embeddings for image search.
|
| Really enjoyed using this app for iOS:
| https://github.com/mazzzystar/Queryable HN discussion:
| https://news.ycombinator.com/item?id=34686947
|
| Or explore the dataset stable diffusion was trained on:
| https://news.ycombinator.com/item?id=32655497
| farhanhubble wrote:
| While I love the efficiency from these Python to C++ ports I
| can't stop thinking about the long tail of subtle bugs that will
| likely infest these libraries forever but then the Python
| versions also sit atop C/C++ cores
| farhanhubble wrote:
| [flagged]
| OccamsMirror wrote:
| Just wait until they're ported to C++ using AI!
| wmf wrote:
| Good news! Deep learning inherently has a long tail of subtle
| bugs (SolidGoldMagikarp anyone?) so no one will care if C++
| introduces a few more.
| hoseja wrote:
| Just because Python silently ignores the bugs doesn't mean
| they're not there.
| fzaninotto wrote:
| Bravo, the demonstration is genuinely impressive!
|
| Next Step: Incorporate this library into image editors like
| Photopea (via WebAssembly) to boost the speed of common selection
| tasks. The magic wand is a tool of the past.
|
| I'd pay for such a feature.
| artninja1988 wrote:
| Big fan of your work GGML friends
| lelag wrote:
| Another GGML model port that I'm pretty excited about is
| https://github.com/PABannier/bark.cpp.
|
| The Bark python model is very compute intensive and require a
| powerful GPU to get bearable inference speed. I really hope
| that bark.cpp with GPU/Metal support and quanticized model can
| bring useful inference speed on a laptop in the near future.
| ariym wrote:
| This is a port of Meta's Segment Anything computer vision model
| which allows easy segmentation of shapes in images. Originally
| written in Python, Yavor Ivanov has ported it to C++ using the
| GGML library created by Georgi Gerganov which is optimized for
| CPU instead of GPU, specifically Apple Silicon M1/M2. The repo is
| still in it's early stage
| dekhn wrote:
| Do you know how the time to do the image embedding takes? In
| SAM, most of the time is spent generating a very expensive
| embedding (prohibitive for real-time object detection). From
| the timing on your page it looks like yours is also similarly
| slow, but I'm curious how it compares to the pytorch Meta
| implementation.
| yavorgiv wrote:
| I am the creator of the repo.
|
| Depends on the machine, number of threads selected and the
| model checkpoint used (Vit-B or Vit-L or Vit-B). The video
| demo attached is running on Apple M2 Ultra and using the
| Vit-B model. The generation of the image embedding takes
| ~1.9s there and all the subsequent mask segmentations take
| ~45ms.
|
| However, I am now focusing on improving the inference speed
| by making better use of ggml and trying out quantization.
| Once I make some progress in this direction I will compare to
| other SAM alternatives and benchmark more thoroughly.
| billrobertson42 wrote:
| This is amazing. Thank you!
| accurrent wrote:
| Hmm wonder how this compares to stuff like FastSAM and MobileSAM.
| Is SAM quantized better or are those knock of architectures more
| performant.
___________________________________________________________________
(page generated 2023-09-06 20:03 UTC)