hngopher.com

       [HN Gopher] Find 'Abbey Road when type 'Beatles abbey rd': Fuzzy...
       ___________________________________________________________________
        
       Find 'Abbey Road when type 'Beatles abbey rd': Fuzzy/Semantic
       search in Postgres
        
       Author : nethalo
       Score  : 76 points
       Date   : 2026-01-21 18:24 UTC (5 days ago)
        
 (HTM) web link (rendiment.io)
 (TXT) w3m dump (rendiment.io)
        
       | lbrito wrote:
       | I was just starting to learn about embeddings for a very similar
       | use on my project. Newbie question: what are pros/cons of using
       | an API like gpt Ada to calculate the embeddings, compared to
       | importing some model on Python and running it locally like in
       | this article?
        
         | alright2565 wrote:
         | Do you want it to run on your CPU, or someone else's GPU?
         | 
         | Is the local model's quality sufficient for your use case, or
         | do you need something higher quality?
        
         | storystarling wrote:
         | The main trade-off I found is the RAM footprint on your backend
         | workers. If you run the model locally, every Celery worker
         | needs to load it into memory, so you end up needing much larger
         | instances just to handle the overhead.
         | 
         | With Ada your workers stay lightweight. For a bootstrapped
         | project, I found it easier to pay the small API cost than to
         | manage the infrastructure complexity of fat worker nodes.
        
       | gingerlime wrote:
       | Great post. Explains the concepts just enough that they click
       | without going too deep, shows practical implementation examples,
       | how it fits together. Simple, clear and ultimately useful. (to me
       | at least)
        
       | cess11 wrote:
       | I found fuzzy search in Manticore to be straightforward and
       | pretty good. Might be a decent alternative if one perceives the
       | ceremony in TFA as a bit much.
        
       | pinkmuffinere wrote:
       | The rewritten title is confusing imo. Can I propose:
       | 
       | "Finding 'Abbey Road' given 'beatles abbey rd' search with
       | Postgres"
        
         | pinkmuffinere wrote:
         | (The missing close-apostrophe, and the use of "type" are what
         | really confuse me in the original submission)
        
       | fsckboy wrote:
       | these days i find myself yearning to type "Beatles abbey rd" and
       | find only "Beatles abbey rd"
        
         | Manfred wrote:
         | Especially with small datasets it's more important to be exact
         | at the expense of a user having to fix a typo.
        
         | storystarling wrote:
         | I learned this the hard way on a book platform I'm working on.
         | While semantic search is useful for discovery, we found that
         | prioritizing exact matches is critical. It seems users get
         | pretty frustrated if they type a specific title and get a list
         | of conceptually similar results instead of the actual book. We
         | ended up having to tune the ranking to heavily favor literal
         | string matches over the vector distance to keep people from
         | bouncing.
        
           | fsckboy wrote:
           | everything you are saying rings perfectly true to me but
           | there's an additional problem I encounter. (i'm going to make
           | up my example because i'm lazy to check but you'll get the
           | idea) say you want to look up "Alexander the Great"...
           | 
           | ...God help you if Brad Pitt and or the Jonas Brothers ever
           | played a role with exactly that name-match. The web and
           | search (and the culture?) have become super biased toward
           | video especially commercial offerings, and the sorting ranked
           | by popularity means pages and pages of virtually identical
           | content about that which you are not interested in.
        
             | digiown wrote:
             | Related but I wish Wikipedia would provide a filter against
             | movies, music, pop culture related topics. They take up a
             | huge amount of the namespace for things for whatever reason
             | and often directs me to unintended pages.
        
               | cess11 wrote:
               | https://en.wikipedia.org/w/index.php?search=melancholia+-
               | mov...
        
           | drsalt wrote:
           | why did you have to learn this the hard way?
        
             | storystarling wrote:
             | complaining customers...
        
         | qingcharles wrote:
         | I remember eBay 30 years ago when it would showed you whatever
         | you typed in. Compared to 2026 where it only shows you
         | everything except the thing you typed in.
        
       | TeamDman wrote:
       | for 50,000 rows I'd much rather just use fzf/nucleo/tv against
       | json files instead of dealing with database schemas. When it
       | comes to dealing with embedding vectors rather than plaintext
       | then it gets slightly more annoying but still feels like such an
       | pain in the ass to go full database when really it could still be
       | a bunch of flat open files.
       | 
       | More of a perspective from just trying to index crap on my own
       | machine vs building a SaaS
        
       | danielfalbo wrote:
       | > Abbey Road
       | 
       | > The Dark Side of the Moon
       | 
       | > OK Computer
       | 
       | Those are my 3 personal records ever. I feel so average now...
        
         | tialaramex wrote:
         | The other two are popular but "Dark Side of the Moon" in
         | particular was _extremely_ popular. Like, top 10 albums ever
         | level popular.
        
       | esafak wrote:
       | tl,dr: A demo of pg_trgm (fuzzy matcher) + pgvector (vector
       | search).
        
         | TurdF3rguson wrote:
         | Sounds nice but I'm not sure that trigram brings anything to
         | the table that vector didn't already bring.
        
       | timlod wrote:
       | FWIW, the performance considerations section is a little
       | simplistic, and probably assumes that exact dataset/problem.
       | 
       | For GIN for example, perfomance depends a lot on the size of the
       | search input (the fewer characters, the more rows to compare) as
       | well as the number of rows/size of the index.
       | 
       | It also mentions GiST (another type of index which isn't
       | mentioned anywhere else in the article)..
        
       | augusteo wrote:
       | On the API vs local model question:
       | 
       | We went with API embeddings for a similar use case. The cold-
       | start latency of local models across multiple workers ate more
       | money in compute than just paying per-token. Plus you avoid the
       | operational overhead of model updates.
       | 
       | The hybrid approach in this article is smart. Fuzzy matching
       | catches 80% of cases instantly, embeddings handle the rest. No
       | need to run expensive vector search on every query.
        
         | TurdF3rguson wrote:
         | Those text embeddings are dirt cheap. You can do around 1M
         | titles on the cloudflare embedding model I used last time
         | without exceeding daily free tier.
        
           | augusteo wrote:
           | yeah exactly. even OpenAI/Gemini are really cheap too
        
       ___________________________________________________________________
       (page generated 2026-01-27 10:01 UTC)