[HN Gopher] Meta's Segment Anything written with C++ / GGML
       ___________________________________________________________________
        
       Meta's Segment Anything written with C++ / GGML
        
       Author : ariym
       Score  : 216 points
       Date   : 2023-09-05 22:49 UTC (21 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | IshKebab wrote:
       | I'm so glad the AI community is finally starting to ditch Python.
       | It has held progress back for far too long.
        
         | fsloth wrote:
         | In general if you don't know what you are doing, it's much
         | faster to first figure out a good strategy for a solution in a
         | language that does not suffer from all of the encumbrance C++
         | brings in.
         | 
         | Python is really great for fast prototyping. It can be argued
         | most AI products so far are result of fast prototyping. So not
         | sure if there is anything wrong with that.
         | 
         | As practical models emerge, at that point it indeed makes sense
         | to port them to C++. But I would not in my wildest dreams
         | suggest prototyping a data model in C++ unless absolutely
         | necessary.
        
         | dbmikus wrote:
         | How has Python held it back? Most of the heavy computation
         | lifting is done by C extensions/bindings and the models are
         | compiled to run on CUDA, etc. What am I missing?
        
           | lhl wrote:
           | Presumably what you're missing is that it's that IshKebab
           | probably doesn't work in AI/ML at all (no links in his
           | profile, but you can judge his post history yourself). Anyone
           | can have voice opinion, but that doesn't mean it's
           | particularly well informed.
        
             | IshKebab wrote:
             | I worked for an AI startup for 5 years until recently. Nice
             | try though.
        
           | IshKebab wrote:
           | Setting up and deploying models in production or on edge
           | devices is much much more complex if you have to deal with
           | Python and Conda and whatnot.
        
         | lelag wrote:
         | The AI community is nowhere close to ditching Python. Most
         | model development and training still use python based
         | toolchains (torch, tf...). The new trends is for popular and
         | useful models to be ported to more efficient stack like
         | C++/GGML for easier usage and inference speed on consumer
         | hardware.
         | 
         | Another popular optimisation is to port models to WASM + GPU
         | because it makes them easy to support a variety of platforms
         | (desktop, mobile...) with a single API and it can still offer
         | great performance (see Google's mediapipe as an exemple of
         | that).
        
           | IshKebab wrote:
           | That's why I said "starting to" not "close to".
        
         | Havoc wrote:
         | I'd say discovery and innovation would be slower in a less
         | relaxed language. And speed end up comparable thanks to the
         | compiled parts of python
        
         | jebarker wrote:
         | This is exactly the wrong way around. We've seen the progress
         | we've seen because of the adoption of Python. Even now there
         | are relatively few people that can write code like this and
         | have the ML and math experience to push forward the research.
        
       | unshavedyak wrote:
       | Well... damn. Is there a framework like this (or this directly?)
       | which can run object detection? People, car types, makes,
       | animals, etc?
        
         | yeldarb wrote:
         | Yes, GroundingDINO is an open set object detector. There are
         | some others (eg DETIC and OWL-ViT) as well.
         | 
         | We've been working on using them (often in conjunction with
         | SAM) for auto-labeling datasets to train smaller faster models
         | that can run in real-time at the edge:
         | https://github.com/autodistill/autodistill
        
           | [deleted]
        
           | fiddlerwoaroof wrote:
           | Would this be suitable for labeling images to search by
           | keyword (think Apple Photos-like "car" searches to pull up
           | photos of cars)
        
             | lulurennt wrote:
             | I think you would want to use something like CLIP
             | embeddings for image search.
             | 
             | Really enjoyed using this app for iOS:
             | https://github.com/mazzzystar/Queryable HN discussion:
             | https://news.ycombinator.com/item?id=34686947
             | 
             | Or explore the dataset stable diffusion was trained on:
             | https://news.ycombinator.com/item?id=32655497
        
       | farhanhubble wrote:
       | While I love the efficiency from these Python to C++ ports I
       | can't stop thinking about the long tail of subtle bugs that will
       | likely infest these libraries forever but then the Python
       | versions also sit atop C/C++ cores
        
         | farhanhubble wrote:
         | [flagged]
        
         | OccamsMirror wrote:
         | Just wait until they're ported to C++ using AI!
        
         | wmf wrote:
         | Good news! Deep learning inherently has a long tail of subtle
         | bugs (SolidGoldMagikarp anyone?) so no one will care if C++
         | introduces a few more.
        
         | hoseja wrote:
         | Just because Python silently ignores the bugs doesn't mean
         | they're not there.
        
       | fzaninotto wrote:
       | Bravo, the demonstration is genuinely impressive!
       | 
       | Next Step: Incorporate this library into image editors like
       | Photopea (via WebAssembly) to boost the speed of common selection
       | tasks. The magic wand is a tool of the past.
       | 
       | I'd pay for such a feature.
        
       | artninja1988 wrote:
       | Big fan of your work GGML friends
        
         | lelag wrote:
         | Another GGML model port that I'm pretty excited about is
         | https://github.com/PABannier/bark.cpp.
         | 
         | The Bark python model is very compute intensive and require a
         | powerful GPU to get bearable inference speed. I really hope
         | that bark.cpp with GPU/Metal support and quanticized model can
         | bring useful inference speed on a laptop in the near future.
        
       | ariym wrote:
       | This is a port of Meta's Segment Anything computer vision model
       | which allows easy segmentation of shapes in images. Originally
       | written in Python, Yavor Ivanov has ported it to C++ using the
       | GGML library created by Georgi Gerganov which is optimized for
       | CPU instead of GPU, specifically Apple Silicon M1/M2. The repo is
       | still in it's early stage
        
         | dekhn wrote:
         | Do you know how the time to do the image embedding takes? In
         | SAM, most of the time is spent generating a very expensive
         | embedding (prohibitive for real-time object detection). From
         | the timing on your page it looks like yours is also similarly
         | slow, but I'm curious how it compares to the pytorch Meta
         | implementation.
        
           | yavorgiv wrote:
           | I am the creator of the repo.
           | 
           | Depends on the machine, number of threads selected and the
           | model checkpoint used (Vit-B or Vit-L or Vit-B). The video
           | demo attached is running on Apple M2 Ultra and using the
           | Vit-B model. The generation of the image embedding takes
           | ~1.9s there and all the subsequent mask segmentations take
           | ~45ms.
           | 
           | However, I am now focusing on improving the inference speed
           | by making better use of ggml and trying out quantization.
           | Once I make some progress in this direction I will compare to
           | other SAM alternatives and benchmark more thoroughly.
        
             | billrobertson42 wrote:
             | This is amazing. Thank you!
        
       | accurrent wrote:
       | Hmm wonder how this compares to stuff like FastSAM and MobileSAM.
       | Is SAM quantized better or are those knock of architectures more
       | performant.
        
       ___________________________________________________________________
       (page generated 2023-09-06 20:03 UTC)