[HN Gopher] Show HN: A fast HNSW implementation in Rust
___________________________________________________________________
Show HN: A fast HNSW implementation in Rust
Author : xcyto
Score : 75 points
Date : 2024-03-14 17:14 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| jallmann wrote:
| How does this compare to hsnwlib - is it faster?
| https://github.com/nmslib/hnswlib
| esafak wrote:
| Also compare with qdrant's Rust implementation; they tout their
| performance.
| https://github.com/qdrant/qdrant/tree/master/lib/segment/src...
| Chio wrote:
| From a quick survey of the implementation probably not very
| well since, for example, it is using dynamic dispatching for
| all distance calculations and there are a lot of allocations in
| the hot-path.
|
| Maybe it would be better to post this repository as a reference
| / teaching implementation of HNSW.
| dureuill wrote:
| Hello, I have a few questions:
|
| - how much time to insert 15 millions of vectors of 768 f32?
|
| - how much RAM needed for this operation?
|
| - if inserting another vector, how incremental is the insertion?
| Is it faster than reindexing the 15M + 1 vectors from scratch?
|
| - does the structure need to stay in RAM or can it be efficiently
| queried from a serialized representation?
|
| - how fast is the search in the 15M vectors on average?
| teaearlgraycold wrote:
| I can answer #3. HNSW will allow for incremental index
| rebuilding. So each additional insert is a sublinear, but
| greater than constant time, operation.
| brink wrote:
| As a Rust dev myself, can we stop the trend of adding "in Rust"
| to titles?
|
| The loud part should be the thing that was built, not the thing
| it was built with.
|
| That said, this is super cool. I have a project that I can
| definitely benefit from this. :)
| tomrod wrote:
| As a pythonista and wannabe Rustaceans, I personally love
| seeing high performance tooling built in Rust!
| sophacles wrote:
| > As a Rust dev myself, can we stop the trend of adding "in
| Rust" to titles?
|
| tbf it's the headline of the Readme.
|
| More generally I largely agree for software. I don't really
| agree for library code like this though... I actually care
| about what language a library is implemented in when I'm coding
| up a project that might use it.
| returningfory2 wrote:
| This project is a library so it's actually relevant? The
| language libraries are written in is one of their most
| important features because it determines if/when you can use
| them...
| IshKebab wrote:
| I strongly disagree. If I'm looking for one of these libraries
| then the language is written in is extremely relevant in almost
| all cases. If I'm writing a Rust program then this is very
| relevant but if it said "in Java" then I would only use it if I
| had literally no other choice.
|
| FFI is generally very painful.
| echelon wrote:
| As a Rust dev myself, please don't stop. It absolutely matters
| that this is Rust and is available for me to use. I wouldn't
| have clicked otherwise.
| random42 wrote:
| Great job shipping. The README / repo however could definitely
| benefit with a benchmarking report in them.
| leod wrote:
| Happy to see people working on vector search in Rust. Keep it up!
|
| As far as HNSW implementations go, this one appears to be almost
| entirely unfinished. Node insertion logic is missing
| (https://github.com/swapneel/hnsw-rust/blob/b8ef946bd76112250...)
| and so is the base layer beam search.
| kernelsanderz wrote:
| If people are interested in fast, scalable HNSW implementations
| which have solid rust bindings then I'd highly recommend checking
| out usearch https://github.com/unum-cloud/usearch
| jbarrow wrote:
| I put together a slow (but readable!) HNSW implementation in
| python to really understand how it works:
| https://github.com/jbarrow/tinyhnsw
|
| Indexing time isn't great, but query time is surprisingly good
| for it being written in unoptimized python and numpy.
| svapnil wrote:
| Great job! And nice name btw :) https://github.com/svapnil
| doomfist wrote:
| I think qdrant might have a rust implementation of it as well.
| simonw wrote:
| Since this doesn't have documentation yet I piped the code
| through Claude 3 Opus and asked it to write some.
|
| https://gist.github.com/simonw/9ff9a0ab8ab64e8aa8d160c4294c0...
|
| You don't have a license on the code yet (weirdly Claude
| hallucinated MIT).
|
| Notes on how I generated this in the comments on that Gist.
| dochtman wrote:
| This is in pretty early stages. Might consider instant-distance
| which I wrote a few years ago and which is in production use at
| instantdomainsearch.com:
|
| https://github.com/instant-labs/instant-distance
|
| There are Python bindings, too:
|
| https://pypi.org/project/instant-distance/
___________________________________________________________________
(page generated 2024-03-14 23:00 UTC)