[HN Gopher] Show HN: A fast HNSW implementation in Rust
       ___________________________________________________________________
        
       Show HN: A fast HNSW implementation in Rust
        
       Author : xcyto
       Score  : 75 points
       Date   : 2024-03-14 17:14 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jallmann wrote:
       | How does this compare to hsnwlib - is it faster?
       | https://github.com/nmslib/hnswlib
        
         | esafak wrote:
         | Also compare with qdrant's Rust implementation; they tout their
         | performance.
         | https://github.com/qdrant/qdrant/tree/master/lib/segment/src...
        
         | Chio wrote:
         | From a quick survey of the implementation probably not very
         | well since, for example, it is using dynamic dispatching for
         | all distance calculations and there are a lot of allocations in
         | the hot-path.
         | 
         | Maybe it would be better to post this repository as a reference
         | / teaching implementation of HNSW.
        
       | dureuill wrote:
       | Hello, I have a few questions:
       | 
       | - how much time to insert 15 millions of vectors of 768 f32?
       | 
       | - how much RAM needed for this operation?
       | 
       | - if inserting another vector, how incremental is the insertion?
       | Is it faster than reindexing the 15M + 1 vectors from scratch?
       | 
       | - does the structure need to stay in RAM or can it be efficiently
       | queried from a serialized representation?
       | 
       | - how fast is the search in the 15M vectors on average?
        
         | teaearlgraycold wrote:
         | I can answer #3. HNSW will allow for incremental index
         | rebuilding. So each additional insert is a sublinear, but
         | greater than constant time, operation.
        
       | brink wrote:
       | As a Rust dev myself, can we stop the trend of adding "in Rust"
       | to titles?
       | 
       | The loud part should be the thing that was built, not the thing
       | it was built with.
       | 
       | That said, this is super cool. I have a project that I can
       | definitely benefit from this. :)
        
         | tomrod wrote:
         | As a pythonista and wannabe Rustaceans, I personally love
         | seeing high performance tooling built in Rust!
        
         | sophacles wrote:
         | > As a Rust dev myself, can we stop the trend of adding "in
         | Rust" to titles?
         | 
         | tbf it's the headline of the Readme.
         | 
         | More generally I largely agree for software. I don't really
         | agree for library code like this though... I actually care
         | about what language a library is implemented in when I'm coding
         | up a project that might use it.
        
         | returningfory2 wrote:
         | This project is a library so it's actually relevant? The
         | language libraries are written in is one of their most
         | important features because it determines if/when you can use
         | them...
        
         | IshKebab wrote:
         | I strongly disagree. If I'm looking for one of these libraries
         | then the language is written in is extremely relevant in almost
         | all cases. If I'm writing a Rust program then this is very
         | relevant but if it said "in Java" then I would only use it if I
         | had literally no other choice.
         | 
         | FFI is generally very painful.
        
         | echelon wrote:
         | As a Rust dev myself, please don't stop. It absolutely matters
         | that this is Rust and is available for me to use. I wouldn't
         | have clicked otherwise.
        
       | random42 wrote:
       | Great job shipping. The README / repo however could definitely
       | benefit with a benchmarking report in them.
        
       | leod wrote:
       | Happy to see people working on vector search in Rust. Keep it up!
       | 
       | As far as HNSW implementations go, this one appears to be almost
       | entirely unfinished. Node insertion logic is missing
       | (https://github.com/swapneel/hnsw-rust/blob/b8ef946bd76112250...)
       | and so is the base layer beam search.
        
       | kernelsanderz wrote:
       | If people are interested in fast, scalable HNSW implementations
       | which have solid rust bindings then I'd highly recommend checking
       | out usearch https://github.com/unum-cloud/usearch
        
       | jbarrow wrote:
       | I put together a slow (but readable!) HNSW implementation in
       | python to really understand how it works:
       | https://github.com/jbarrow/tinyhnsw
       | 
       | Indexing time isn't great, but query time is surprisingly good
       | for it being written in unoptimized python and numpy.
        
       | svapnil wrote:
       | Great job! And nice name btw :) https://github.com/svapnil
        
       | doomfist wrote:
       | I think qdrant might have a rust implementation of it as well.
        
       | simonw wrote:
       | Since this doesn't have documentation yet I piped the code
       | through Claude 3 Opus and asked it to write some.
       | 
       | https://gist.github.com/simonw/9ff9a0ab8ab64e8aa8d160c4294c0...
       | 
       | You don't have a license on the code yet (weirdly Claude
       | hallucinated MIT).
       | 
       | Notes on how I generated this in the comments on that Gist.
        
       | dochtman wrote:
       | This is in pretty early stages. Might consider instant-distance
       | which I wrote a few years ago and which is in production use at
       | instantdomainsearch.com:
       | 
       | https://github.com/instant-labs/instant-distance
       | 
       | There are Python bindings, too:
       | 
       | https://pypi.org/project/instant-distance/
        
       ___________________________________________________________________
       (page generated 2024-03-14 23:00 UTC)