[HN Gopher] CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Tr...
       ___________________________________________________________________
        
       CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-
       Scale Data
        
       Author : panabee
       Score  : 36 points
       Date   : 2024-04-25 17:46 UTC (5 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | ggnore7452 wrote:
       | question: any good on-device size image embedding models?
       | 
       | tried https://github.com/unum-cloud/uform which i do like,
       | especially they also support languages other than English. Any
       | recommendations on other alternatives?
        
         | philipkglass wrote:
         | I have successfully used OpenCLIP models for embedding and
         | similar-image search. The smallest model listed on that UForm
         | page is 79 million parameters, so I presume that you can use
         | other models of similar size. There are a few OpenCLIP models
         | with 80 million or fewer parameters listed here:
         | 
         | https://github.com/mlfoundations/open_clip/blob/main/docs/mo...
         | 
         | When embeddings are quantized to int8 they still work very well
         | for similarity (no differences in top 10 search on my test
         | set). I haven't tried quantizing the models themselves.
        
       | cs702 wrote:
       | TL;DR: The authors pretrain the model to classify images into
       | Wordnet synsets[a] that appear in the caption, using a standard
       | Cross Entropy loss. They keep the number of classes relatively
       | small by removing any synsets that don't show up in captions at
       | least 500 times in the dataset. It seems to work well.
       | 
       | My immediate question is: Why not classify among the entire
       | hierarchy of all Wordnet synsets?
       | 
       | ---
       | 
       | [a] https://wordnet.princeton.edu/
        
       ___________________________________________________________________
       (page generated 2024-04-25 23:01 UTC)