[HN Gopher] Show HN: Sourcebot - Self-hosted Perplexity for your...
       ___________________________________________________________________
        
       Show HN: Sourcebot - Self-hosted Perplexity for your codebase
        
       Hi HN,  We're Brendan and Michael, the creators of Sourcebot
       (https://www.sourcebot.dev/), a self-hosted code understanding tool
       for large codebases. We originally launched on HN 9 months ago with
       code search (https://news.ycombinator.com/item?id=41711032), and
       we're excited to share our newest feature: Ask Sourcebot.  Ask
       Sourcebot is an agentic search tool that lets you ask complex
       questions about your entire codebase in natural language, and
       returns a structured response with inline citations back to your
       code. Some types of questions you might ask:  - "How does
       authentication work in this codebase? What library is being used?
       What providers can a user log in with?"
       (https://demo.sourcebot.dev/~/chat/cmdpjkrbw000bnn7s8of2dm11)  -
       "When should I use channels vs. mutexes in go? Find real usages of
       both and include them in your answer"
       (https://demo.sourcebot.dev/~/chat/cmdpiuqhu000bpg7s9hprio4w)  -
       "How are shards laid out in memory in the Zoekt code search
       engine?"
       (https://demo.sourcebot.dev/~/chat/cmdm9nkck000bod7sqy7c1efb)  -
       "How do I call C from Rust?"
       (https://demo.sourcebot.dev/~/chat/cmdpjy06g000pnn7ssf4nk60k)  You
       can try it yourself here on our demo site
       (https://demo.sourcebot.dev/~) or checkout our demo video
       (https://youtu.be/olc2lyUeB-Q).  How is this any different from
       existing tools like Cursor or Claude code?  - Sourcebot solely
       focuses on _code understanding_. We believe that, more than ever,
       the main bottleneck development teams face is not writing code,
       it's acquiring the necessary context to make quality changes that
       are cohesive within the wider codebase. This is true regardless if
       the author is a human or an LLM.  - As opposed to being in your IDE
       or terminal, Sourcebot is a web app. This allows us to play to the
       strengths of the web: rich UX and ubiquitous access. We put a ton
       of work into taking the best parts of IDEs (code navigation, file
       explorer, syntax highlighting) and packaging them with a custom UX
       (rich Markdown rendering, inline citations, @ mentions) that is
       easily shareable between team members.  - Sourcebot can maintain an
       up-to date index of thousands of repos hosted on GitHub, GitLab,
       Bitbucket, Gerrit, and other hosts. This allows you to ask
       questions about repositories without checking them out locally.
       This is especially helpful when ramping up on unfamiliar parts of
       the codebase or working with systems that are typically spread
       across multiple repositories, e.g., micro services.  - You can BYOK
       (Bring Your Own API Key) to any supported reasoning model. We
       currently support 11 different model providers (like Amazon Bedrock
       and Google Vertex), and plan to add more.  - Sourcebot is self-
       hosted, fair source, and free to use.  Under the hood, we expose
       our existing regular expression search, code navigation, and file
       reading APIs to a LLM as tool calls. We instruct the LLM via a
       system prompt to gather the necessary context via these tools to
       sufficiently answer the users question, and then to provide a
       concise, structured response. This includes inline citations, which
       are just structured data that the LLM can embed into it's response
       and can then be identified on the client and rendered
       appropriately. We built this on some amazing libraries like the
       Vercel AI SDK v5, CodeMirror, react-markdown, and Slate.js, among
       others.  This architecture is intentionally simple. We decided not
       to introduce any additional techniques like vector embeddings,
       multi-agent graphs, etc. since we wanted to push the limits of what
       we could do with what we had on hand. We plan on revisiting our
       approach as we get user feedback on what works (and what doesn't).
       We are really excited about pushing the envelope of code
       understanding. Give it a try: https://github.com/sourcebot-
       dev/sourcebot. Cheers!
        
       Author : bshzzle
       Score  : 57 points
       Date   : 2025-07-30 14:44 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | cuzinluver wrote:
       | Love that it's free to use
        
       | Alifatisk wrote:
       | I thought this had anything to do with Perplexity
        
         | bshzzle wrote:
         | We used Perplexity as a mental mapping since there is some
         | overlap, e.g., LLMs using search and citing its sources, it's a
         | webapp, etc.
        
       | nkmnz wrote:
       | How does this compare to ingesting all your code into some RAG
       | tool and using that in a chat? I understand the citations part,
       | which is a cool feature indeed, but especially tools for graph-
       | RAG, such as graphiti https://github.com/getzep/graphiti can
       | deliver so much more information that can be stored in a graph
       | versus the code-repository alone, such as info about
       | collaborators, infrastructure, metrics, logs, etc. pp.
        
         | bshzzle wrote:
         | You certainly could create an embedding of your code and then
         | hooking it up to OpenWeb UI or equivalent as a chat interface -
         | we've actually spoked to some teams that have rolled their own
         | custom solution like that!
         | 
         | From a product POV: our main focus with Sourcebot is providing
         | a world-class DX and UX so that it is really easy to use.
         | Practically speaking, for DX: a sys-admin should be able to
         | throw Sourcebot up into their cluster in minutes with minimal
         | maintenance overhead. For UX: provide a snappy interface that
         | is minimal and gets out of your way.
         | 
         | From a technology POV: vector embeddings (and techniques like
         | graph-RAG) are definitely something we are going to investigate
         | as a means of improving the agent's ability to find relevant
         | context fast. Bringing in additional context sources (like git
         | history, logs, GitHub issues, etc.) is also something we plan
         | to investigate. It's a really fascinating problem :)
        
       | drcongo wrote:
       | This looks pretty neat. Just spotted in the docs that it has an
       | MCP server too, however, I haven't found anything in the docs
       | about using a locally hosted model. Running this on a box in the
       | corner of the office would be great, but external AI providers
       | would be a deal breaker.
        
         | bshzzle wrote:
         | Running Sourcebot with a self-hosted LLM is something we plan
         | to support and have documented in the golden path very soon, so
         | stay tuned.
         | 
         | We are using the Vercel AI SDK which supports Ollama via a
         | community provider, but doesn't V5 yet (which Sourcebot is on):
         | https://v5.ai-sdk.dev/providers/community-providers/ollama
        
       | cobbzilla wrote:
       | Love this idea, docs are good I just need to read them better :)
       | 
       | Trying it out now. Keep it fully open source and nicely pluggable
       | and I'll keep being a fan!
        
         | bshzzle wrote:
         | Ah I was just replying to your previous comment - I'm guessing
         | you found this? ;)
         | https://docs.sourcebot.dev/docs/connections/local-repos
         | 
         | Thanks for the support!
        
       | dchuk wrote:
       | In reading the docs, it doesn't look like the MCP server supports
       | the Ask Sourcebot capability. Is that correct or am I missing
       | something in the docs? Is that planned to be added?
        
         | bshzzle wrote:
         | Yea they are currently separate - the MCP server exposes out
         | the same tools that Ask Sourcebot uses, but the actual LLMs
         | call is on the MCP client. It would be interesting to merge
         | them though - maybe have a Exa style MCP tool that lets MCP
         | clients ask questions similar to how we are doing it with Ask
         | Sourcebot.
         | 
         | Would be great to hear more about your use case though.
        
       | hahaxdxd123 wrote:
       | I got this set up and working in basically 5 minutes. Going to
       | try to set it up at work. Super cool! It seems like the open
       | source version already has a bunch of features, how do you plan
       | on making sure you can sustainably support it?
        
       | prepend wrote:
       | So can I use Functional Source licensed code in internal products
       | if I'm a commercial org?
        
       ___________________________________________________________________
       (page generated 2025-07-31 23:00 UTC)