[HN Gopher] Binary vector search is better than FP32 vectors
___________________________________________________________________
Binary vector search is better than FP32 vectors
Author : gaocegege
Score : 61 points
Date : 2024-03-26 01:53 UTC (1 days ago)
(HTM) web link (blog.pgvecto.rs)
(TXT) w3m dump (blog.pgvecto.rs)
| crucio wrote:
| how well would this work for 762 dimensional vectors? At 3072,
| they're starting with such a high number of dimensions that the
| accuracy loss may not be representative of what others would see
| esafak wrote:
| You'd have to look at the precision-recall curves for your data
| set and make the trade-off. There are studies on this topic.
| refulgentis wrote:
| Dear reader: your intuition is right: it's not "better" to reduce
| the information by a factor of 32.
|
| The game the article plays is to do a KNN search to deal with the
| fact this flattens similarity significantly.
|
| TL;DR from the field:
|
| - this is extremely helpful for doing a first pass over a _ton_
| of documents in a resource constrained environment.
|
| - it is extremely _unhelpful_ unless you 're retrieving 10x the
| documents you want via binary, then doing re-ranking via the
| FP-32 to rank the remaining.
|
| - in general, it's unlikely you need the technique unless you're
| A) on the edge, i.e. on consumer devices from 3 years ago or B)
| you have tens of millions of vectors on a server. All this stuff
| sounds really fancy, but when you implement it from scratch, you
| quickly learn "oh its 384 numbers I gotta multiply together"
|
| Source: I do embeddings, locally, to do retrieval for RAG. I
| "discovered" this about a year ago, and it deeply pains me to see
| anything that will misinform a lot of people.
|
| Free bonus I haven't seen mentioned elsewhere yet*: you can take
| the average of the N embeddings forming a document to figure out
| if you should look at the N embeddings individually. This does
| over smooth too, ex. my original test document is the GPT-4
| sparks paper, and the variety of subjects mentioned and length
| (100+ pages) meant it was over smooth when searching for the
| particular example I wanted it to retrieve (the unicorn SVG)
|
| * edited to clarify given reply. also my reaction to reading it,
| "that dude rocks!!", made me wanna go off a little bit: if you're
| not in AI/ML, don't be intimidated by it when its wrapped in
| layers of obscure vocabulary. once you have the time to go do it,
| things you would have thought that were "stupid" turn out to be
| just fine. and you find like-minded souls. It's exhilirating
| moralestapia wrote:
| >I'll throw in a free bonus no one has mentioned yet, at least
| publicly: you can take the average of the N embeddings forming
| a document to figure out if you should look at the N embeddings
| individually.
|
| Hey, I do this! :D
|
| But I've seen it mentioned in several places, it wasn't
| completely my own idea.
| sgt101 wrote:
| >Free bonus I haven't seen mentioned elsewhere yet*
|
| This is a good idea. Is it a little like a version of HNSW?
| dleeftink wrote:
| Binning and averaging is a well-known technique I'd say,
| whether applied to PCA components or even raw distributional
| data sorted over some dimension.
|
| A relatex technique that I've seen speed up retrieval tasks
| immensely is merging frequently co-occurring binned embeddings
| into new bins (similar to Byte Pair Encoding) and downweighing
| the resulting bins when the variance between its constituent
| dimensions is high.
|
| This effectively yields 'bins of interest' where documents
| agree on some dimension(s) while filtering out documents that
| contain information that is general to most.
| nblgbg wrote:
| I agree with the comment. Especially if you index these
| quantized vectors with HNSW you will get even less precision.
| You need to retrieve lot more and do a kNN with the FP32
| vectors is the way to go !
| barefeg wrote:
| This technique had a very recent resurgence via
| https://txt.cohere.com/int8-binary-embeddings/. Hugging face also
| covered the technique here https://huggingface.co/blog/embedding-
| quantization. It seems like a very good tradeoff compared to the
| shorter embeddings which require fine tuning via the matryoshka
| technique. On the other hand, Nils Reimers suggests that trivial
| quantization of the full precision embeddings is not as good as
| using "compression friendly" embeddings like Cohere's Embed V3.
| Does anyone know what's the difference in precision between
| trivial quantization and optimized embeddings?
| rurban wrote:
| April 1 jokes already? Storing floats as bit, great idea!
| marginalia_nu wrote:
| I think the the technique is best understood as a special case
| of random projection.
|
| It's well understood that there are a class of dimensional
| reduction techniques that preserve distances well (via the J-L
| lemma), owing to the countrerintuitive fact that most high
| dimensional vectors are nearly orthogonal.
| riku_iki wrote:
| bit embeddings are different to float, they are trained
| separately.
| heipei wrote:
| To the people dismissing the idea of binarising vectors: Fair
| criticism, but consider the fact that you can also train a model
| with a loss function that approaches a binary behaviour, i.e. so
| that the magnitude per dimension plays an insignificant role and
| only the sign of the dimension carries information. In that case
| you can use the binary vector for search and ranking.
| mjburgess wrote:
| in practice, learning_rate * grad(L) is mostly about the
| direction anyway.
| jn2clark wrote:
| What is accuracy in this case? is it meant to be recall or is it
| some evaluation metric?
___________________________________________________________________
(page generated 2024-03-27 23:01 UTC)