[HN Gopher] DINOv3
       ___________________________________________________________________
        
       DINOv3
        
       Author : reqo
       Score  : 39 points
       Date   : 2025-08-14 20:02 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | beklein wrote:
       | - Blog post: https://ai.meta.com/blog/dinov3-self-supervised-
       | vision-model... - Paper:
       | https://ai.meta.com/research/publications/dinov3/ - Hugging Face:
       | https://huggingface.co/collections/facebook/dinov3-68924841b...
        
       | ranger_danger wrote:
       | I have no idea what this even is.
        
         | n3storm wrote:
         | D3NO?
        
         | kaoD wrote:
         | > An extended family of versatile vision foundation models
         | producing high-quality dense features and achieving outstanding
         | performance on various vision tasks including outperforming the
         | specialized state of the art across a broad range of settings,
         | without fine-tuning
        
           | ranger_danger wrote:
           | English, doc
        
           | kevinventullo wrote:
           | To elaborate, this is a foundation model. This basically
           | means it can take an arbitrary image and map it to a high
           | dimensional space _H_ in which ~arbitrary characteristics
           | become much easier to solve for.
           | 
           | For example (and this might be oversimplifying a bit,
           | computer vision people please correct me if I'm wrong) if
           | you're interested in knowing whether or not the image
           | contains a cat, then maybe there is some hyperplane _P_ in
           | _H_ for which images on one side of P do not contain a cat,
           | and images on the other side do contain a cat. And so solving
           | for "Does this image contain a cat?"becomes a much easier
           | problem, all you have to do is figure out what P is. Once you
           | do that, you can pass your image into DINO, dot product with
           | the equation for P, and check whether the answer is negative
           | or positive. The point is that finding P is much easier than
           | training your own computer vision model from scratch.
        
             | reactordev wrote:
             | If computer vision were semantic search, nailed it. It's a
             | little more complicated than that but - with this new
             | model, not by much :D
        
       | barbolo wrote:
       | That's awesome. DINOv2 was the best image embedder until now.
        
       ___________________________________________________________________
       (page generated 2025-08-14 23:00 UTC)