[HN Gopher] Binary vector search is better than FP32 vectors
       ___________________________________________________________________
        
       Binary vector search is better than FP32 vectors
        
       Author : gaocegege
       Score  : 61 points
       Date   : 2024-03-26 01:53 UTC (1 days ago)
        
 (HTM) web link (blog.pgvecto.rs)
 (TXT) w3m dump (blog.pgvecto.rs)
        
       | crucio wrote:
       | how well would this work for 762 dimensional vectors? At 3072,
       | they're starting with such a high number of dimensions that the
       | accuracy loss may not be representative of what others would see
        
         | esafak wrote:
         | You'd have to look at the precision-recall curves for your data
         | set and make the trade-off. There are studies on this topic.
        
       | refulgentis wrote:
       | Dear reader: your intuition is right: it's not "better" to reduce
       | the information by a factor of 32.
       | 
       | The game the article plays is to do a KNN search to deal with the
       | fact this flattens similarity significantly.
       | 
       | TL;DR from the field:
       | 
       | - this is extremely helpful for doing a first pass over a _ton_
       | of documents in a resource constrained environment.
       | 
       | - it is extremely _unhelpful_ unless you 're retrieving 10x the
       | documents you want via binary, then doing re-ranking via the
       | FP-32 to rank the remaining.
       | 
       | - in general, it's unlikely you need the technique unless you're
       | A) on the edge, i.e. on consumer devices from 3 years ago or B)
       | you have tens of millions of vectors on a server. All this stuff
       | sounds really fancy, but when you implement it from scratch, you
       | quickly learn "oh its 384 numbers I gotta multiply together"
       | 
       | Source: I do embeddings, locally, to do retrieval for RAG. I
       | "discovered" this about a year ago, and it deeply pains me to see
       | anything that will misinform a lot of people.
       | 
       | Free bonus I haven't seen mentioned elsewhere yet*: you can take
       | the average of the N embeddings forming a document to figure out
       | if you should look at the N embeddings individually. This does
       | over smooth too, ex. my original test document is the GPT-4
       | sparks paper, and the variety of subjects mentioned and length
       | (100+ pages) meant it was over smooth when searching for the
       | particular example I wanted it to retrieve (the unicorn SVG)
       | 
       | * edited to clarify given reply. also my reaction to reading it,
       | "that dude rocks!!", made me wanna go off a little bit: if you're
       | not in AI/ML, don't be intimidated by it when its wrapped in
       | layers of obscure vocabulary. once you have the time to go do it,
       | things you would have thought that were "stupid" turn out to be
       | just fine. and you find like-minded souls. It's exhilirating
        
         | moralestapia wrote:
         | >I'll throw in a free bonus no one has mentioned yet, at least
         | publicly: you can take the average of the N embeddings forming
         | a document to figure out if you should look at the N embeddings
         | individually.
         | 
         | Hey, I do this! :D
         | 
         | But I've seen it mentioned in several places, it wasn't
         | completely my own idea.
        
         | sgt101 wrote:
         | >Free bonus I haven't seen mentioned elsewhere yet*
         | 
         | This is a good idea. Is it a little like a version of HNSW?
        
         | dleeftink wrote:
         | Binning and averaging is a well-known technique I'd say,
         | whether applied to PCA components or even raw distributional
         | data sorted over some dimension.
         | 
         | A relatex technique that I've seen speed up retrieval tasks
         | immensely is merging frequently co-occurring binned embeddings
         | into new bins (similar to Byte Pair Encoding) and downweighing
         | the resulting bins when the variance between its constituent
         | dimensions is high.
         | 
         | This effectively yields 'bins of interest' where documents
         | agree on some dimension(s) while filtering out documents that
         | contain information that is general to most.
        
         | nblgbg wrote:
         | I agree with the comment. Especially if you index these
         | quantized vectors with HNSW you will get even less precision.
         | You need to retrieve lot more and do a kNN with the FP32
         | vectors is the way to go !
        
       | barefeg wrote:
       | This technique had a very recent resurgence via
       | https://txt.cohere.com/int8-binary-embeddings/. Hugging face also
       | covered the technique here https://huggingface.co/blog/embedding-
       | quantization. It seems like a very good tradeoff compared to the
       | shorter embeddings which require fine tuning via the matryoshka
       | technique. On the other hand, Nils Reimers suggests that trivial
       | quantization of the full precision embeddings is not as good as
       | using "compression friendly" embeddings like Cohere's Embed V3.
       | Does anyone know what's the difference in precision between
       | trivial quantization and optimized embeddings?
        
       | rurban wrote:
       | April 1 jokes already? Storing floats as bit, great idea!
        
         | marginalia_nu wrote:
         | I think the the technique is best understood as a special case
         | of random projection.
         | 
         | It's well understood that there are a class of dimensional
         | reduction techniques that preserve distances well (via the J-L
         | lemma), owing to the countrerintuitive fact that most high
         | dimensional vectors are nearly orthogonal.
        
         | riku_iki wrote:
         | bit embeddings are different to float, they are trained
         | separately.
        
       | heipei wrote:
       | To the people dismissing the idea of binarising vectors: Fair
       | criticism, but consider the fact that you can also train a model
       | with a loss function that approaches a binary behaviour, i.e. so
       | that the magnitude per dimension plays an insignificant role and
       | only the sign of the dimension carries information. In that case
       | you can use the binary vector for search and ranking.
        
         | mjburgess wrote:
         | in practice, learning_rate * grad(L) is mostly about the
         | direction anyway.
        
       | jn2clark wrote:
       | What is accuracy in this case? is it meant to be recall or is it
       | some evaluation metric?
        
       ___________________________________________________________________
       (page generated 2024-03-27 23:01 UTC)