[HN Gopher] Hierarchical Navigable Small World: a scalable neare...
___________________________________________________________________
Hierarchical Navigable Small World: a scalable nearest neighbor
search
Author : wilsonzlin
Score : 112 points
Date : 2024-09-30 07:56 UTC (3 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| 3abiton wrote:
| Interesting read, I worked on graph theory for a while some long
| time ago and did not even know some of these terminologies.
| sakras wrote:
| What really made HNSW click for me was when someone described it
| as a "skiplist graph". It really does seem to share a lot of the
| same properties and intuition as a simple skiplist, where
| descending levels corresponds to finer-grained navigation through
| the space.
|
| A couple of other thoughts with this perspective: - if a drawing
| of a skiplist is 2D, HNSW seems to be a 3D generalization. It
| makes me wonder if you could apply an HNSW-like scheme for a
| traditional relational database index (maybe a multicolumn
| index?). - if a skiplist is a probabilistic alternative to a
| B+-tree, what's HNSW a probabilistic alternative to?
|
| One thing that still mystifies me is how in the context of vector
| indexing, the HNSW seems to "bootstrap" itself by using itself to
| find the neighbors of a newly-inserted point. It's not clear to
| me why that doesn't diverge to some garbage results as the index
| grows.
| j-pb wrote:
| > if a skiplist is a probabilistic alternative to a B+-tree,
| what's HNSW a probabilistic alternative to?
|
| I don't think that analogy is accurate.
|
| A prolly tree could also be considered a probabilistic version
| of a b-tree and a skiplist could be considered a probabilistic
| version of a trie, so you certainly have probabilistic vs. non-
| probabilistic datastructures, but there isn't really a 1-to-1
| correspondence, so your inference of a missing X doesn't make
| that much sense.
|
| Here's a better dichotomy imho. Both of these probabilistic and
| non-probabilistic datastructures share a commonality. They
| operate on totally ordered data.
|
| So a skiplist is actually 1D.
|
| HNSW and other approximate k-nearesr neighbour algorithms
| however operate over metric spaces, where elements have a
| distance/similiarity but no meaningful order.[1]
|
| Now if you ask for the deterministic equivalent of a
| approximate-kNN algorithm/datastructure, then that would simply
| be anything in the kNN category.
|
| 1: https://math.stackexchange.com/questions/1429373/why-is-
| ther...
| janalsncm wrote:
| HNSW was the first approximate nearest neighbor search algorithm
| I encountered. There are others that are useful in different
| contexts (i.e. depending on your data size and available memory).
| And you can combine them sometimes. This video is a really
| awesome overview of them:
|
| https://youtu.be/SKrHs03i08Q
| WhitneyLand wrote:
| For anyone new to this, the topic has gained interest in recent
| years alongside the rise of AI. Many ML models project text,
| images, etc, into an embedding layer in high dimensional space as
| a vector (an array of numbers). It then becomes possible to
| numerically compare these vectors to see how closely related they
| are, in others words to find the semantic similarity.
|
| This nifty feature leads to the need to scale up. For example,
| given many vectors in a database which one is most similar to my
| word or image. Applications like this need ways to efficiently
| scale these kinds of searches and HNSW is one prominent technique
| to enable this.
___________________________________________________________________
(page generated 2024-10-03 23:02 UTC)