[HN Gopher] Building a custom code search index in Go
___________________________________________________________________
Building a custom code search index in Go
Author : boyter
Score : 74 points
Date : 2022-11-23 10:17 UTC (12 hours ago)
(HTM) web link (boyter.org)
(TXT) w3m dump (boyter.org)
| anxiously wrote:
| Is fitting the index in ram really that important? Obviously it
| is fast, but if you can get away with storing it on a fast disk
| like an nvme gen4 then why not?
| GaryNumanVevo wrote:
| Extremely important, search indexes are cache optimized. Live
| update indexes even more so.
| alecthomas wrote:
| Great read, but a basic search yields zero results:
|
| https://searchcode.com/?q=kong.New+lang%3Ago
|
| Same search on GitHub:
|
| https://github.com/search?type=code&q=kong.New
|
| Edit: there also doesn't seem to be any ranking at all, such as
| exact word matches being boosted
| alecthomas wrote:
| > But let me know where it's not doing what you expect and I'll
| fix it.
|
| I would expect > 0 results for the above search
|
| Same search on SourceGraph:
| https://sourcegraph.com/search?q=context:global+kong.New&pat...
| boyter wrote:
| Ah I see a vanity search! The ones that always cause issues
| :)
|
| Its just down to it not being in the index, I shall ensure I
| add it just for you based on this.
|
| Done. That repository https://github.com/alecthomas/kong will
| get picked up when I kick off the indexing again (sometime
| next week once all the activity dies down)
| boyter wrote:
| Probably because I don't prioritise GitHub anymore. Their own
| search is great, but it might get picked u[ eventually.
|
| There is ranking, first by a pre rank popularity of the
| repository and secondly by tf/idf of the trigrams. It's
| weighted towards longer matches in the display as well.
|
| But let me know where it's not doing what you expect and I'll
| fix it.
| hu3 wrote:
| Wow the search is screamingly fast. And it's custom made! I
| enjoyed the writing. Thanks for taking the time to condense the
| knowledge involved in words.
| boyter wrote:
| I really did try to make it as fast as I could. Always happy to
| write it down too.
| encryptluks2 wrote:
| Congratulations to the author. It seems like they have excelled
| at a very fast pace from using third party solutions to building
| their own in a short time. I look forward to their progress and
| seeing where this goes, to maybe becoming an excellent open
| source copilot alternative.
| boyter wrote:
| I don't know about a fast pace, but I did have fun with it!
| AlchemistCamp wrote:
| Very cool to see this here, Ben! It was fun hearing the ins and
| outs of your work on this in the TZ discord, and the final result
| is _fast_.
|
| Also, off-topic but as you know, I recently tried out your scc
| tool and am eagerly awaiting its support for Elixir templates
| (.eex, .heex)! You said it was a day from done a while back and
| would go out in the next release. What's the release schedule
| like?
|
| https://github.com/boyter/scc
| boyter wrote:
| It's actually sitting on my hdd. Just need to finish it off. I
| got sidetracked.
| Darkskiez wrote:
| See also https://codesearch.debian.net/ -
| https://github.com/Debian/dcs for a similar project that may fit
| your needs better. I've not compared them both, but I use dcs
| frequently
| boyter wrote:
| The blog posts about it are great too. This one
| https://michael.stapelberg.ch/posts/2019-09-29-dcs-positiona...
| in particular.
___________________________________________________________________
(page generated 2022-11-23 23:01 UTC)