[HN Gopher] Show HN: Semantic search over the National Gallery o...
       ___________________________________________________________________
        
       Show HN: Semantic search over the National Gallery of Art
        
       Author : breadislove
       Score  : 136 points
       Date   : 2025-10-10 20:33 UTC (1 days ago)
        
 (HTM) web link (nga.demo.mixedbread.com)
 (TXT) w3m dump (nga.demo.mixedbread.com)
        
       | philipkglass wrote:
       | How does this work? I thought it was probably powered by
       | embeddings and maybe some more traditional search code, but I
       | checked out the linked github repo and I didn't see any
       | model/inference code. The public code is a wrapper that
       | communicates with your commercial API?
       | 
       | Some searches work like magic and others seem to veer off target
       | a lot. For example, "sculpture" and "watercolor" worked just
       | about how I'd expect. "Lamb" showed lambs and sheep. But "otter"
       | showed a random selection of animals.
        
         | breadislove wrote:
         | It is powered by Mixedbread Search which is powered by our
         | model Omni. Omni is multimodal (text, video, audio, images) and
         | multi vector, which helps us to capture more information.
         | 
         | The search is in beta and we improving the model. Thank you for
         | reporting the queries which are not working well.
         | 
         | Edit: Re the otter, I just checked and I did not found otters
         | in the dataset. We should not return any results if the model
         | is not sure to reduce confusion.
        
           | justincormack wrote:
           | neither "blue pictures" nor "multiples" worked well.
        
             | breadislove wrote:
             | thank you for reporting these. we will improve on them for
             | the next iteration.
        
               | reportrappor wrote:
               | I'll pile on since these are useful. Searching for
               | "fingers and holes" did find me some nice hand drawings,
               | but the real gold at the national gallery to me is the
               | Bruce Nauman. The nga.gov search knew what I wanted.
        
           | philipkglass wrote:
           | There's at least a little bit of otter in the data. The one
           | relevant result I saw was "Plate 40: Two Otters and a Beaver"
           | by Joris Hoefnagel.
           | 
           | I also expected semantic search to return similar results for
           | "fireworks" and "pyrotechnics," since the latter is a less
           | common synonym for the former. But I got many results for
           | fireworks and just one result for pyrotechnics.
           | 
           | This is still impressive. My impulse is to poke at it with
           | harder cases to try to reason about how it could be
           | implemented. Thanks for your Show HN and for replying to me!
        
             | breadislove wrote:
             | If you find more such cases please feel free to send them
             | over to aamir at domain name of the Show HN. I would love
             | to see those cases and see how we can improve on them.
             | Thank you so much for the feedback.
        
         | treetalker wrote:
         | Yeah, "naked chicks" returns women with no clothes instead of
         | baby birds.
        
       | yawnxyz wrote:
       | hey, your service is back up again!!! Mixedbread was my favorite
       | tool for so long since your pivot, and I'm so glad y'all are back
        
         | breadislove wrote:
         | We have a lot more things coming up soon. It just took us some
         | time building Mixedbread Search.
        
       | nmitchko wrote:
       | In case anyone wants to do this themselves, check out the
       | pipeline here: https://github.com/isc-nmitchko/iris-document-
       | search
       | 
       | Colnomic and nvidia models are great for embedding images and
       | MUVERA can transform those to 1D vectors.
        
         | losteric wrote:
         | > check out the pipeline here
         | 
         | "the pipeline" - seems like this is just a personal hackathon
         | project?
         | 
         | Why these models vs other multimodals? Which "nvidia models"?
        
       | dfc wrote:
       | It would be nice if took you to the NGA page about the item. I
       | cant even copy the text easily for easy search.
       | 
       | "Images of german shepherds" never fails to provide some humor.
        
         | breadislove wrote:
         | Thank you for pointing this out. We will add this tomorrow
         | morning.
        
           | dfc wrote:
           | The results for "Mark Rothko", "Paintings by Mark Rothko",
           | "Paintings similar to mark rothko" etc does not bring up
           | anything that I was expecting. NGA has a large collection of
           | Rothko paintings but none of them come up.
           | 
           | This NGA link returns over a thousand pieces by Rothko:
           | https://www.nga.gov/artists/1839-mark-rothko/artworks
        
             | breadislove wrote:
             | We are right now not including the artist name. Which will
             | be done in the next iteration of the model (next week).
             | Right now the search is only based on what the model can
             | "see". And it seems like that the model does not understand
             | the art of Mark Rothko.
             | 
             | The next version can see the image and read the metadata.
             | 
             | A bit more context: We are include everything in the latent
             | space (embeddings) without trying to maintain multiple
             | indexes and hack around things. There is still a huge
             | mountain to climb. But this one seems really promising.
        
               | 4ndrewl wrote:
               | And this seems like a hard limitation of this approach as
               | art (v craft) is concerned with interpretation and
               | reception whereas this is more like unsplash-for-
               | galleries in that the searches have to be very literal I
               | guess? (eg search for something abstract, like 'dreams',
               | something that you will find depicted in the collection,
               | produces quite the mixed bag of results).
        
             | iDon wrote:
             | A search for : "character studies of old farmers" yielded
             | good results. The results are drawings / engravings, which
             | may reflect the balance of the collection, and perhaps this
             | subject is more used in practice than in marketable oil
             | paintings.
             | 
             | Since this is a semantic search, using a vector embedding,
             | it will handle meanings better than a text search, which
             | would handle names better.
        
       | Computer0 wrote:
       | This is neat, not sure how to report queries that are working
       | poorly as you have mentioned. But when I search "Waltz" I am
       | presented with Kitchen Utensils and only one piece of dancing
       | folks. Presumably this is due to the Artist's name being
       | 'Walton'.
        
         | breadislove wrote:
         | We will add a feedback form tomorrow morning. For now please
         | feel free to write to aamir at domain name of the page. thank
         | you so much! this helps us a lot.
        
         | khaki54 wrote:
         | Tried "Images of german shepherds" and not one on the page of
         | 16
        
       | pogilvie wrote:
       | I built a toy version of something like this a couple-ish years
       | ago for a hackathon. I wrote up a blog of how I did it back then
       | for anyone interested:
       | https://www.patrickogilvie.com/engineering/Image_Search_Engi...
       | 
       | Would be interesting to know how relevant that approach is now.
        
       | ulrikhansen54 wrote:
       | Congrats on the launch guys. I remember meeting ya'll in SF. What
       | happened to your HF model/project?
        
         | breadislove wrote:
         | there is a lot coming
        
       | kvsrh wrote:
       | Is it possible to add other data sources?
        
         | breadislove wrote:
         | yes, in which one would be interested?
        
       | samdg wrote:
       | I love old stereograms, and was happy to find a couple using this
       | tool!
        
       | adamontherun wrote:
       | love that a search for 'chill vibes sculpture' returned a very
       | chill set of results. nice step change in art search capabilities
        
       | khaki54 wrote:
       | Yale has an amazing one, worth looking at:
       | https://lux.collections.yale.edu/
        
         | ted_dunning wrote:
         | Is that a multi-modal search? Or just textual matching?
         | 
         | I couldn't find any examples that couldn't be explained by
         | simple text matches.
        
       | ted_dunning wrote:
       | Works really well for some artist names (rembrandt, whistler) and
       | exceedingly poorly for others (john singer sargent).
        
       | joki77 wrote:
       | Ketika kode dan kanvas bertemu -- sebuah pencarian tak sekadar
       | kata, tapi rasa. Di antara lukisan dan batang piksel, mesin
       | mencoba memahami jawaban yang tak terucap.
        
       | kburman wrote:
       | I recently learned that semantic search embeddings mostly
       | represent topics and concepts, but they don't handle negation or
       | emotion very well.
       | 
       | For example, if you search for "paintings of winter landscapes
       | but without sun and trees," you'll still get results with trees.
       | That's because embeddings capture the presence of concepts like
       | "tree" or "landscape," but not logical relationships like
       | "without" or "not."
       | 
       | Similarly, embeddings aren't great at capturing how something
       | feels. They can tell that "sad poem" and "happy poem" are
       | different mainly because of the words used, not because they
       | truly understand emotional tone.
       | 
       | This happens because most embedding models (like OpenAI's or
       | sentence-transformers) are trained to group things by semantic
       | similarity, not logical meaning or sentiment. Negation, polarity,
       | and affect aren't explicitly represented in the vector space.
       | 
       | Might be common knowledge to some, but it was a cool TIL moment
       | for me, realizing that embeddings are great at what something is
       | about, but not how it feels or what it excludes.
        
         | breadislove wrote:
         | Thats actually not correct. Embeddings can handle relationships
         | like "without" or "not." when trained for it. You need to scale
         | up the training massively to make it generalize it well. The
         | current version of Mixedbread Search supports negatives like
         | "tshirt without stripes". You can check it out on our launch
         | video [1]. We are working on a way more generalized model,
         | which should be able to capture relationships, emotions and
         | much more. The current models are just limited.
         | 
         | [1]: https://www.mixedbread.com/blog/mixedbread-search
        
           | kburman wrote:
           | I was referring specifically to popular embedding models like
           | OpenAI's and sentence-transformers, which (as far as I know)
           | don't reliably handle negation or emotional nuance, they
           | mostly capture topical similarity.
           | 
           | I don't know enough of the underlying math to say for sure
           | whether embeddings can be trained to consistently represent
           | negation, but when I tried the Mixedbread demo myself with a
           | query like "winter landscapes without sun and trees", it
           | still showed me paintings with both sun and trees. So at
           | least in its current form, it doesn't seem to fully handle
           | those semantic relationships yet.
        
       ___________________________________________________________________
       (page generated 2025-10-11 23:01 UTC)