[HN Gopher] Show HN: I wrote a full text search engine in Go
       ___________________________________________________________________
        
       Show HN: I wrote a full text search engine in Go
        
       Author : novocayn
       Score  : 62 points
       Date   : 2025-10-09 17:09 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | kdawkins wrote:
       | This is very cool! Your readme is intersting and well written - I
       | didn't know I could be so interested in the internals of a full
       | text search engine :)
       | 
       | What was the motivation to kick this project off? Learning or are
       | you using it somehow?
        
         | novocayn wrote:
         | I'm learning the internals of FTS engines while building a
         | vector database from scratch. Needed a solid FTS index, so I
         | built one myself :)
         | 
         | It ended up being a clean, reusable component, so I decided to
         | carve it out into a standalone project
         | 
         | The README is mostly notes from my Notion pages, glad you found
         | it interesting!
        
           | n_u wrote:
           | What are you building a vector database from scratch for?
        
             | novocayn wrote:
             | Mostly wanted a refresher on GPU accelerated indexes and
             | Vector DB internals. And maybe along the way, build an easy
             | on-ramp for folks who want to understand how these work
             | under the hood
        
       | add-sub-mul-div wrote:
       | Why did you create this new account if there's already 3 existing
       | accounts promoting your stuff and only your stuff?
        
         | novocayn wrote:
         | Because running a three-account bot-net farm is fun :D Okay,
         | jk, please don't mod me out.
         | 
         | One's for browsing HN at work, the other's for home, and the
         | third one has a username I'm not too fond of.
         | 
         | I'll stick to this one :) I might have some karma on the older
         | ones, but honestly, HN is just as fun from everywhere
        
       | wolfgarbe wrote:
       | Great work! Would be interesting to see how it compares to Lucene
       | performance-wise, e.g. with a benchmark like
       | https://github.com/quickwit-oss/search-benchmark-game
        
         | novocayn wrote:
         | Thanks! Honestly, given it's hacked together in a weekend not
         | sure it'd measure up to Lucene/Bleve in any serious way.
         | 
         | I intended this to be an easy on-ramp for folks who want to get
         | a feel for how FTS engines work under the hood :)
        
           | llllm wrote:
           | Not _that_ long ago Bleve was also hacked together over a few
           | weekends.
           | 
           | I appreciate the technical depth of the readme, but I'm not
           | sure it fits your easy on-ramp framing.
           | 
           | Keep going and keep sharing.
        
       | n_u wrote:
       | Cool project!
       | 
       | I see you are using a positional index rather than doing bi-word
       | matching to support positional queries.
       | 
       | Positional indexes can be a lot larger than non-positional. What
       | is the ratio of the size of all documents to the size of the
       | positional inverted index?
        
         | novocayn wrote:
         | Observation is spot on. Biword matching would definitely ease
         | this. Stealing bi-word matching for a future iteration, tysm :D
        
           | n_u wrote:
           | Well bi-word matching requires that you still have all of the
           | documents stored to verify the full phrase occurs in the
           | document rather than just the bi-words. So it isn't always
           | better.
           | 
           | For example the phrase query "United States of America"
           | doesn't occur in the document "The United States is named
           | after states of the North American continent. The capital of
           | America is Washington DC". But "United States", "states of"
           | and "of America" all appear in it.
           | 
           | There's a tradeoff because we still have to fetch the full
           | document text (or some positional structure) for the
           | filtered-down candidate documents containing all of the bi-
           | word pairs. So it requires a second stage of disk I/O. But as
           | I understand most practitioners assume you can get away with
           | less IOPS vs positional index since that info only has to
           | fetched for a much smaller filtered-down candidate set rather
           | than for the whole posting list.
           | 
           | But that's why I was curious about the storage ratio of your
           | positional index.
        
       | eudoxus wrote:
       | Would love to hear how this compares to another popular go based
       | full text search engine (with a not _too_ dissimilar name)
       | https://github.com/blevesearch/bleve?
        
         | novocayn wrote:
         | Bleve is an absolute beast! built with <3 at Couchbase Fun
         | fact: the folks who maintain it sit right across from me at
         | work
        
       | Copenjin wrote:
       | Did you vibe code this? A few things here and there are a bit of
       | a giveaway imho.
        
         | fatty_patty89 wrote:
         | What makes you think so?
        
           | niux wrote:
           | Probably the commit history.
        
             | novocayn wrote:
             | Yayiee, the "cant prove it" Doakes Dexter meme, making it
             | to HN
        
         | novocayn wrote:
         | On my way to make a Dexter meme on this
         | 
         | When you think OP vibe-coded the project but can't prove it yet
         | 
         | https://x.com/FG_Artist/status/1974267168855392371
        
         | haute_cuisine wrote:
         | I put Overview section from the Readme into an AI content
         | detector and it says 92% AI. Some comment blocks inside
         | codebase are rated as 100% AI generated.
        
           | novocayn wrote:
           | Claude: "You're absolutely right" :D
        
           | novocayn wrote:
           | > comment blocks inside codebase
           | 
           | Is vibe-commented a thing yet? :D
           | 
           | Wanted to give fellow readers a good on-ramp for
           | understanding the FTS internals. Figured leaning into
           | readability wouldn't hurt
           | 
           | For me this makes the structure super easy to grok at a
           | glance
           | 
           | https://github.com/wizenheimer/blaze/blob/27d6f9b3cd228f5865.
           | ..
           | 
           | That said, totally fair read on the comments. Curious if they
           | helped/landed the way I intended. or if a multi-part blog
           | series would've worked better :)
        
         | ge96 wrote:
         | Another possible tell (not saying this is vibe coded) is when
         | every function is documented, almost too much comments
        
           | novocayn wrote:
           | Ohh, I thought that inline comments would make it grokkable
           | and be a low-friction way in. Seems this didn't land the way
           | I intended :'
           | 
           | Should a multi-part blog would've been better?
        
       | oldgregg wrote:
       | looks great! would love to see benchmark with bleve and a
       | lightweight vector implementation.
        
         | novocayn wrote:
         | tysm, would try pairing it with HNSW and IVF, halfway through
         | :)
        
       | Xeoncross wrote:
       | I really liked the README, that was a good use of AI.
       | 
       | If you're interested in the idea of writing a database, I
       | recommend you checkout https://github.com/thomasjungblut/go-
       | sstables which includes sstables, a skiplist, a recordio format
       | and other database building blocks like a write-ahead log.
       | 
       | Also https://github.com/BurntSushi/fst which has a great Blog
       | post explaining it's compression (and been ported to Go) which is
       | really helpful for autocomplete/typeahead when recommending
       | searches to users or doing spelling correction for search inputs.
        
         | novocayn wrote:
         | tysm, i love this, FST is vv cool
        
       | mwsherman wrote:
       | Shameless plug, you may wish to do Lucene-style tokenizing using
       | the Unicode standard:
       | https://github.com/clipperhouse/uax29/tree/master/words
        
         | novocayn wrote:
         | Got to admit, initial impressions, this is pretty neat, would
         | spend sometime with this. Thanks for the link :)
        
       ___________________________________________________________________
       (page generated 2025-10-09 23:00 UTC)