[HN Gopher] Show HN: Vicinity - Fast, Lightweight Nearest Neighb...
       ___________________________________________________________________
        
       Show HN: Vicinity - Fast, Lightweight Nearest Neighbors with
       Flexible Back Ends
        
       We've just open-sourced Vicinity, a lightweight approximate nearest
       neighbors (ANN) search package that allows for fast experimentation
       and comparison of a larger number of well known algorithms.  Main
       features:  - Lightweight: the base package only uses Numpy  -
       Unified interface: use any of the supported algorithms and backends
       with a single interface: HNSW, Annoy, FAISS, and many more
       algorithms and libraries are supported  - Easy evaluation: evaluate
       the performance of your backend with a simple function to measure
       queries per second vs recall  - Serialization: save and load your
       index for persistence  After working with a large number of ANN
       libraries over the years, we found it increasingly cumbersome to
       learn the interface, features, quirks, and limitations of every
       library. After writing custom evaluation code to measure the speed
       and performance for the 100th time to compare libraries, we decided
       to build this as a way to easily use a large number of algorithms
       and libraries with a unified, simple interface that allows for
       quick comparison and evaluation.  We are curious to hear your
       feedback! Are there any algorithms that are missing that you use?
       Any extra evaluation metrics that are useful?
        
       Author : Pringled
       Score  : 35 points
       Date   : 2024-12-01 16:15 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | aravindputrevu wrote:
       | Some questions
       | 
       | 1. When you say backends, do you plan to integrate like a client
       | with some "vector" stores. 2. Also any benchmarks? 3. Lastly, why
       | python?
        
         | newfocogi wrote:
         | Not author of the library, but the documentation lists the
         | backends here:
         | https://github.com/MinishLab/vicinity?tab=readme-ov-file#sup...
         | 
         | So these are nearest neighbor search implementations, not
         | database backends.
        
         | Pringled wrote:
         | 1: that could be something for the future, but at the moment
         | this is just meant as a way to quickly try out and evaluate
         | various algorithms and libraries without having to learn the
         | syntax for them (we call those backends).
         | 
         | 2: we adopted the same methodology as ann-benchmarks for our
         | evaluation, so technically the benchmarks there are valid for
         | the backends we support. However it's a good suggestion to add
         | those explicitly to the repo, I'll add a todo for that.
         | 
         | 3: mainly because a: it's the language we are most the
         | comfortable with developing in, b: it's the most widely used
         | and adopted language for ML and c: (almost) all the algorithms
         | we support are written in C/C++/Cython already.
        
       | bravura wrote:
       | This is great.
       | 
       | I would actually perhaps think the next step would be to add some
       | sugar that allows you to run a random / fixed grid of hyper-
       | parameters and get a report of accuracy and speed for your
       | specific data set.
        
         | Pringled wrote:
         | Thanks! This is actually something that we have been
         | experimenting with a bit already (auto-tuning on a specific
         | dataset basically). It turned out to be quite complicated given
         | how many index and parameter combinations you get with a grid-
         | search (making it very costly on larger datasets), which is why
         | we first opted for this approach where you can evaluate with a
         | chosen index + parameter set, but it's definitely something we
         | are still planning to do.
        
       | antman wrote:
       | What does it mean that insertion is only supported for a few of
       | the indexes? Also will this allow hybrid search for the backends
       | that support it?
        
         | Pringled wrote:
         | Some backends/algorithms don't natively support dynamic
         | inserts, and require you to rebuild your index when you want to
         | add vectors to it (Annoy and Pynndescent are the only backends
         | that don't support it).
         | 
         | Hybrid search is a really cool idea though; it's not something
         | we support at the moment, but definitely something we could
         | investigate and add as an upcoming feature, thanks for the
         | suggestion!
        
       | davnn wrote:
       | That's actually quite similar to the nearness library [1]. The
       | main difference appears to be vicinity's focus on simplicity
       | while nearness tries to expose most of the functionality of the
       | underlying backends.
       | 
       | [1] https://github.com/davnn/nearness
        
       ___________________________________________________________________
       (page generated 2024-12-01 23:00 UTC)