[HN Gopher] Show HN: Sourcebot, an open-source Sourcegraph alter...
___________________________________________________________________
Show HN: Sourcebot, an open-source Sourcegraph alternative
Hi HN, We're Brendan and Michael, the creators of Sourcebot
(https://github.com/sourcebot-dev/sourcebot). Sourcebot is an open-
source code search tool that allows you to quickly search across
many large codebases. Check out our demo video here:
https://youtu.be/mrIFYSB_1F4, or try it for yourself on our demo
site here: https://demo.sourcebot.dev While at prior roles, we've
both felt the pain of searching across hundreds of multi-million
line codebases. Using local tools like grep were ill-suited since
you often only had a handful of codebases checked out at a time.
Sourcegraph (https://sourcegraph.com/) solves this issue by
indexing a collection of codebases in the background and exposing a
web-based search interface. It is the de-facto search solution for
medium to large orgs, but is often cited as expensive ($49 per user
/ month) and recently went closed source
(https://news.ycombinator.com/item?id=41296481). That's why we
built Sourcebot. We designed Sourcebot to be: - Easily deployed:
we provide a single, self-contained Docker image
(https://github.com/sourcebot-dev/sourcebot/pkgs/container/so...).
- Fast & scalable: designed to minimize search times (current
average is ~73ms) across many large repositories. - Cross code-
host support: we currently support syncing public & private
repositories in GitHub and GitLab. - Quality UI: we like to think
that a good looking dev-tool is more pleasant to use. - Open
source: Sourcebot is free to use by anyone. Under the hood, we use
Zoekt (https://github.com/sourcegraph/zoekt) as our code search
engine, which was originally authored by Han-Wen Nienhuys and now
maintained by Sourcegraph
(https://sourcegraph.com/blog/sourcegraph-accepting-zoekt-mai...).
Zoekt works by building a trigram index from the source code
enabling extremely fast regular expression matching. Russ Cox has a
great article on how trigram indexes work if you're interested:
https://swtch.com/~rsc/regexp/regexp4.html In the shorter-term,
there are several improvements we want to make, like: - Improving
how we communicate indexing progress (this is currently non-
existent so it's not obvious how long things will take) - UX
improvements like search history, query syntax highlighting &
suggestions, etc. - Small QOL improvements like bookmarking code
snippets. - Support for more code hosts (e.g., BitBucket,
SourceForge, ADO, etc.) In the longer-term, we want to investigate
how we could go beyond just traditional code search by leveraging
machine learning to enable experiences like semantic code search
("where is system X located?") and code explanations ("how does
system X interact with system Y?"). You could think of this as a
copilot being embedded into Sourcebot. Our hunch is that will be
useful to devs, especially when packaged with the traditional code
search, but let us know what you think. Give it a try:
https://github.com/sourcebot-dev/sourcebot. Cheers!
Author : bshzzle
Score : 122 points
Date : 2024-10-01 16:56 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| ashobeiri wrote:
| This is really exciting. Happy to see someone building an open
| source solution in this space
| j4coh wrote:
| Cool to see someone carrying on the dream after SourceGraph lost
| their way.
| bastawhiz wrote:
| I haven't followed SG closely. Other than licensing, what have
| they done to fall out of favor?
| mattfat5 wrote:
| This is well done thanks for the share.
| jmakov wrote:
| Can somebody share the use case of this? Why not just use your
| IDE?
| bshzzle wrote:
| yea it's a fair question - an IDE is often more convenient when
| you have the code checked-out locally. This becomes a pain when
| you work in a organization with potentially hundreds of
| repositories that you need to search across (e.g., a org stores
| their 100+ microservices in separate repos, and you need to
| find all places where they make a request to your service).
| metadaemon wrote:
| Finding examples of how others implement similar logic is my
| biggest use case for code searching, but since GitHub copied
| SourceGraph, I don't have much of a need for these self-hosted
| solutions.
| eptcyka wrote:
| I cannot run Xcode on Linux, I cannot run Visual Studio on
| Linux, I might not have an IDE set up for the language that I
| want to inspect. Many reasons. Also, some languages practically
| require arbitrary code execution to make a build, which I'd
| much prefer to shove into an isolated VM.
| threecheese wrote:
| Regarding your response to "why not use an IDE?"; do you have any
| other product-like use cases interest you? The one you mention -
| search across many repositories - makes a lot of sense for
| organizations with (for example) a Github Enterprise installation
| and want to investigate or make changes across multiple
| components. This is definitely relevant to me, and so I wonder
| what other cool things can I do with it?
| bshzzle wrote:
| I think in the immediate term, we would like to talk to as many
| people as we can that have this "search across many repos"
| problem such that we can dial in the core search experience.
|
| Looking beyond the immediate, I think there is allot of fertile
| ground with respect to making engineering teams more efficient
| beyond just regular code search. Semantic code search for
| example is one of those features that I really wish I had when
| I was at my last job - would have made onboarding onto new
| codebases much easier.
|
| Would love to hear more about your use cases:
| brendan@sourcebot.dev
| planb wrote:
| Great work! Any plans to add Gitea/Forgejo (self-hosted) support?
| bshzzle wrote:
| Thanks! Yea we would definitely like to support more code-
| hosts. If you have a sec, could you open a issue so we can
| track it?
| TavsiE9s wrote:
| Any plans for non Github/Gitlab integrations? Gitea/Gogs/etc.
| maybe?
| bshzzle wrote:
| yes definitely - mind opening a issue so we can track it?
| IshKebab wrote:
| Nice! Still not _quite_ as good as grep.app from an interface
| point of view. They have instant search-as-you-type results over
| all of GitHub.
|
| It's not open source but I use it all the time. Far superior to
| Github's search.
| richardw wrote:
| Anyone know how companies like this maintain tabs on so much of
| the GitHub repos? I assume very distributed crawling/cloning.
| morgante wrote:
| Awesome to see another open source player in the space,
| especially after Sourcegraph went closed source.
|
| It looks like you're working on this full-time (and it's a lot of
| work to build great code search, as I know from working on my own
| product).
|
| What are your plans for monetizing / building a sustainable
| business without inevitably going closed source like Sourcegraph?
| bshzzle wrote:
| Currently, we don't have any plans of monetizing - the main
| focus for us right now is building something that people want
| to use :)
| asdev wrote:
| sourcegraph is dead with advent of LLMs and AI coding tools
| right? Github cross repo search is also not bad anymore
| esafak wrote:
| Wrong. Unless you want to feed the LLM your entire codebase,
| which is usually infeasible, you need to be able to retrieve
| relevant context, which relies on understanding the codebase,
| as Sourcegraph does. Sourcegraph has a product that does
| precisely this, called Cody.
| zdw wrote:
| Does this make a copy of each repo on ingest?
|
| Can it work against in-place repos, for example if hosted on the
| same server as a code forge installation?
___________________________________________________________________
(page generated 2024-10-01 23:00 UTC)