[HN Gopher] Hierarchical Navigable Small World: a scalable neare...
       ___________________________________________________________________
        
       Hierarchical Navigable Small World: a scalable nearest neighbor
       search
        
       Author : wilsonzlin
       Score  : 112 points
       Date   : 2024-09-30 07:56 UTC (3 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | 3abiton wrote:
       | Interesting read, I worked on graph theory for a while some long
       | time ago and did not even know some of these terminologies.
        
       | sakras wrote:
       | What really made HNSW click for me was when someone described it
       | as a "skiplist graph". It really does seem to share a lot of the
       | same properties and intuition as a simple skiplist, where
       | descending levels corresponds to finer-grained navigation through
       | the space.
       | 
       | A couple of other thoughts with this perspective: - if a drawing
       | of a skiplist is 2D, HNSW seems to be a 3D generalization. It
       | makes me wonder if you could apply an HNSW-like scheme for a
       | traditional relational database index (maybe a multicolumn
       | index?). - if a skiplist is a probabilistic alternative to a
       | B+-tree, what's HNSW a probabilistic alternative to?
       | 
       | One thing that still mystifies me is how in the context of vector
       | indexing, the HNSW seems to "bootstrap" itself by using itself to
       | find the neighbors of a newly-inserted point. It's not clear to
       | me why that doesn't diverge to some garbage results as the index
       | grows.
        
         | j-pb wrote:
         | > if a skiplist is a probabilistic alternative to a B+-tree,
         | what's HNSW a probabilistic alternative to?
         | 
         | I don't think that analogy is accurate.
         | 
         | A prolly tree could also be considered a probabilistic version
         | of a b-tree and a skiplist could be considered a probabilistic
         | version of a trie, so you certainly have probabilistic vs. non-
         | probabilistic datastructures, but there isn't really a 1-to-1
         | correspondence, so your inference of a missing X doesn't make
         | that much sense.
         | 
         | Here's a better dichotomy imho. Both of these probabilistic and
         | non-probabilistic datastructures share a commonality. They
         | operate on totally ordered data.
         | 
         | So a skiplist is actually 1D.
         | 
         | HNSW and other approximate k-nearesr neighbour algorithms
         | however operate over metric spaces, where elements have a
         | distance/similiarity but no meaningful order.[1]
         | 
         | Now if you ask for the deterministic equivalent of a
         | approximate-kNN algorithm/datastructure, then that would simply
         | be anything in the kNN category.
         | 
         | 1: https://math.stackexchange.com/questions/1429373/why-is-
         | ther...
        
       | janalsncm wrote:
       | HNSW was the first approximate nearest neighbor search algorithm
       | I encountered. There are others that are useful in different
       | contexts (i.e. depending on your data size and available memory).
       | And you can combine them sometimes. This video is a really
       | awesome overview of them:
       | 
       | https://youtu.be/SKrHs03i08Q
        
       | WhitneyLand wrote:
       | For anyone new to this, the topic has gained interest in recent
       | years alongside the rise of AI. Many ML models project text,
       | images, etc, into an embedding layer in high dimensional space as
       | a vector (an array of numbers). It then becomes possible to
       | numerically compare these vectors to see how closely related they
       | are, in others words to find the semantic similarity.
       | 
       | This nifty feature leads to the need to scale up. For example,
       | given many vectors in a database which one is most similar to my
       | word or image. Applications like this need ways to efficiently
       | scale these kinds of searches and HNSW is one prominent technique
       | to enable this.
        
       ___________________________________________________________________
       (page generated 2024-10-03 23:02 UTC)