[HN Gopher] Show HN: I wrote a full text search engine in Go
___________________________________________________________________
Show HN: I wrote a full text search engine in Go
Author : novocayn
Score : 62 points
Date : 2025-10-09 17:09 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| kdawkins wrote:
| This is very cool! Your readme is intersting and well written - I
| didn't know I could be so interested in the internals of a full
| text search engine :)
|
| What was the motivation to kick this project off? Learning or are
| you using it somehow?
| novocayn wrote:
| I'm learning the internals of FTS engines while building a
| vector database from scratch. Needed a solid FTS index, so I
| built one myself :)
|
| It ended up being a clean, reusable component, so I decided to
| carve it out into a standalone project
|
| The README is mostly notes from my Notion pages, glad you found
| it interesting!
| n_u wrote:
| What are you building a vector database from scratch for?
| novocayn wrote:
| Mostly wanted a refresher on GPU accelerated indexes and
| Vector DB internals. And maybe along the way, build an easy
| on-ramp for folks who want to understand how these work
| under the hood
| add-sub-mul-div wrote:
| Why did you create this new account if there's already 3 existing
| accounts promoting your stuff and only your stuff?
| novocayn wrote:
| Because running a three-account bot-net farm is fun :D Okay,
| jk, please don't mod me out.
|
| One's for browsing HN at work, the other's for home, and the
| third one has a username I'm not too fond of.
|
| I'll stick to this one :) I might have some karma on the older
| ones, but honestly, HN is just as fun from everywhere
| wolfgarbe wrote:
| Great work! Would be interesting to see how it compares to Lucene
| performance-wise, e.g. with a benchmark like
| https://github.com/quickwit-oss/search-benchmark-game
| novocayn wrote:
| Thanks! Honestly, given it's hacked together in a weekend not
| sure it'd measure up to Lucene/Bleve in any serious way.
|
| I intended this to be an easy on-ramp for folks who want to get
| a feel for how FTS engines work under the hood :)
| llllm wrote:
| Not _that_ long ago Bleve was also hacked together over a few
| weekends.
|
| I appreciate the technical depth of the readme, but I'm not
| sure it fits your easy on-ramp framing.
|
| Keep going and keep sharing.
| n_u wrote:
| Cool project!
|
| I see you are using a positional index rather than doing bi-word
| matching to support positional queries.
|
| Positional indexes can be a lot larger than non-positional. What
| is the ratio of the size of all documents to the size of the
| positional inverted index?
| novocayn wrote:
| Observation is spot on. Biword matching would definitely ease
| this. Stealing bi-word matching for a future iteration, tysm :D
| n_u wrote:
| Well bi-word matching requires that you still have all of the
| documents stored to verify the full phrase occurs in the
| document rather than just the bi-words. So it isn't always
| better.
|
| For example the phrase query "United States of America"
| doesn't occur in the document "The United States is named
| after states of the North American continent. The capital of
| America is Washington DC". But "United States", "states of"
| and "of America" all appear in it.
|
| There's a tradeoff because we still have to fetch the full
| document text (or some positional structure) for the
| filtered-down candidate documents containing all of the bi-
| word pairs. So it requires a second stage of disk I/O. But as
| I understand most practitioners assume you can get away with
| less IOPS vs positional index since that info only has to
| fetched for a much smaller filtered-down candidate set rather
| than for the whole posting list.
|
| But that's why I was curious about the storage ratio of your
| positional index.
| eudoxus wrote:
| Would love to hear how this compares to another popular go based
| full text search engine (with a not _too_ dissimilar name)
| https://github.com/blevesearch/bleve?
| novocayn wrote:
| Bleve is an absolute beast! built with <3 at Couchbase Fun
| fact: the folks who maintain it sit right across from me at
| work
| Copenjin wrote:
| Did you vibe code this? A few things here and there are a bit of
| a giveaway imho.
| fatty_patty89 wrote:
| What makes you think so?
| niux wrote:
| Probably the commit history.
| novocayn wrote:
| Yayiee, the "cant prove it" Doakes Dexter meme, making it
| to HN
| novocayn wrote:
| On my way to make a Dexter meme on this
|
| When you think OP vibe-coded the project but can't prove it yet
|
| https://x.com/FG_Artist/status/1974267168855392371
| haute_cuisine wrote:
| I put Overview section from the Readme into an AI content
| detector and it says 92% AI. Some comment blocks inside
| codebase are rated as 100% AI generated.
| novocayn wrote:
| Claude: "You're absolutely right" :D
| novocayn wrote:
| > comment blocks inside codebase
|
| Is vibe-commented a thing yet? :D
|
| Wanted to give fellow readers a good on-ramp for
| understanding the FTS internals. Figured leaning into
| readability wouldn't hurt
|
| For me this makes the structure super easy to grok at a
| glance
|
| https://github.com/wizenheimer/blaze/blob/27d6f9b3cd228f5865.
| ..
|
| That said, totally fair read on the comments. Curious if they
| helped/landed the way I intended. or if a multi-part blog
| series would've worked better :)
| ge96 wrote:
| Another possible tell (not saying this is vibe coded) is when
| every function is documented, almost too much comments
| novocayn wrote:
| Ohh, I thought that inline comments would make it grokkable
| and be a low-friction way in. Seems this didn't land the way
| I intended :'
|
| Should a multi-part blog would've been better?
| oldgregg wrote:
| looks great! would love to see benchmark with bleve and a
| lightweight vector implementation.
| novocayn wrote:
| tysm, would try pairing it with HNSW and IVF, halfway through
| :)
| Xeoncross wrote:
| I really liked the README, that was a good use of AI.
|
| If you're interested in the idea of writing a database, I
| recommend you checkout https://github.com/thomasjungblut/go-
| sstables which includes sstables, a skiplist, a recordio format
| and other database building blocks like a write-ahead log.
|
| Also https://github.com/BurntSushi/fst which has a great Blog
| post explaining it's compression (and been ported to Go) which is
| really helpful for autocomplete/typeahead when recommending
| searches to users or doing spelling correction for search inputs.
| novocayn wrote:
| tysm, i love this, FST is vv cool
| mwsherman wrote:
| Shameless plug, you may wish to do Lucene-style tokenizing using
| the Unicode standard:
| https://github.com/clipperhouse/uax29/tree/master/words
| novocayn wrote:
| Got to admit, initial impressions, this is pretty neat, would
| spend sometime with this. Thanks for the link :)
___________________________________________________________________
(page generated 2025-10-09 23:00 UTC)