hngopher.com

       [HN Gopher] Show HN: Model2vec-Rs - Fast Static Text Embeddings ...
       ___________________________________________________________________
        
       Show HN: Model2vec-Rs - Fast Static Text Embeddings in Rust
        
       Hey HN! We've just open-sourced model2vec-rs, a Rust crate for
       loading and running Model2Vec static embedding models with zero
       Python dependency. This allows you to embed text at (very) high
       throughput; for example, in a Rust-based microservice or CLI tool.
       This can be used for semantic search, retrieval, RAG, or any other
       text embedding usecase.  Main Features:  - Rust-native inference:
       Load any Model2Vec model from Hugging Face or your local path with
       StaticModel::from_pretrained(...).  - Tiny footprint: The crate
       itself is only ~1.7 mb, with embedding models between 7 and 30 mb.
       Performance:  We benchmarked single-threaded on a CPU:  - Python:
       ~4650 embeddings/sec  - Rust: ~8000 embeddings/sec (~1.7x speedup)
       First open-source project in Rust for us, so would be great to get
       some feedback!
        
       Author : Tananon
       Score  : 46 points
       Date   : 2025-05-18 15:01 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | noahbp wrote:
       | What is your preferred static text embedding model?
       | 
       | For someone looking to build a large embedding search, fast
       | static embeddings seem like a good deal, but almost too good to
       | be true. What quality tradeoff are you seeing with these models
       | versus embedding models with attention mechanisms?
        
         | Tananon wrote:
         | It depends a bit on the task and language, but my go-to is
         | usually minishlab/potion-base-8M for every task except
         | retrieval (classification, clustering, etc). For retrieval
         | minishlab/potion-retrieval-32M works best. If performance is
         | critical minishlab/potion-base-32M is best, although it's a bit
         | bigger (~100mb).
         | 
         | There's definitely a quality trade-off. We have extensive
         | benchmarks here: https://github.com/MinishLab/model2vec/blob/ma
         | in/results/REA.... potion-base-32M reaches ~92% of the
         | performance of MiniLM while being much faster (about 70x faster
         | on CPU). It depends a bit on your constraints: if you have
         | limited hardware and very high throughput, these models will
         | allow you to still make decent quality embeddings, but ofcourse
         | an attention based model will be better, but more expensive.
        
           | refulgentis wrote:
           | Thanks man this is incredible work, really appreciate the
           | details you went into.
           | 
           | I've been chewing on if there was a miracle that could make
           | embeddings 10x faster for my search app that uses minilmv3,
           | sounds like there is :) I never would have dreamed. I'll
           | definitely be trying potion-base in my library for Flutter x
           | ONNX.
           | 
           | EDIT: I was thanking you for thorough benchmarking, then it
           | dawned on me you were on the team that built the model -
           | _fantastic_ work, I can 't wait to try this.
           | 
           | I am but a poor caveman mobile developer by origin, I convert
           | minilm to ONNX by using some old one-off notebook from 3
           | years ago. Dunno if that even matters, but if it does, I'd
           | love "official" / "blessed" ONNX versions from you all if
           | it's easy.
        
       | Havoc wrote:
       | Surprised it is so much faster. I would have thought the python
       | one is C under the hood
        
         | Tananon wrote:
         | Indeed, I also didn't expect it to be so much faster! I think
         | it's because most of the time is actually spent on tokenization
         | (which also happens in Rust in the Python package), but there
         | is some transfer overhead there between Rust and Python. The
         | other operations should be the same speed I think.
        
       ___________________________________________________________________
       (page generated 2025-05-18 23:00 UTC)