[HN Gopher] Show HN: GitMCP is an automatic MCP server for every...
       ___________________________________________________________________
        
       Show HN: GitMCP is an automatic MCP server for every GitHub repo
        
       Author : liadyo
       Score  : 171 points
       Date   : 2025-04-03 18:28 UTC (1 days ago)
        
 (HTM) web link (gitmcp.io)
 (TXT) w3m dump (gitmcp.io)
        
       | liadyo wrote:
       | We built an open source remote MCP server that can automatically
       | serve documentation from every Github project. Simply replace
       | github.com with gitmcp.io in the repo URL - and you get a remote
       | MCP server that serves and searches the documentation from this
       | repo (llms.txt, llms-full.txt, readme.md, etc). Works with
       | github.io as well. Repo here: https://github.com/idosal/git-mcp
        
         | nlawalker wrote:
         | _> searches the documentation from this repo (llms.txt, llms-
         | full.txt, readme.md, etc)_
         | 
         | What does _etc_ include? Does this operate on a _single content
         | file_ from the specified GitHub repo?
        
       | lukew3 wrote:
       | Cool project! I would probably call it an MCP server for every
       | Github repo though since project could be confused for Github
       | Projects which is their work planning/tracking tool.
        
         | liadyo wrote:
         | Thanks!
        
       | the_arun wrote:
       | But why would we need an MCP server for a github repo? Sorry, I
       | am unable to understand the use case.
        
         | liadyo wrote:
         | It's very helpful when working with a specific
         | technology/library, and you want to access the project's
         | llms.txt, readme, search the docs, etc from within the IDE
         | using the MCP client. Check it out, for exmaple, with the
         | langgraph docs: https://gitmcp.io/#github-pages-demo It really
         | improves the development experience.
        
         | qainsights wrote:
         | Same here. Can't we just give the repo URL in Cursor/Windsurf
         | to use the search tool to get the context? :thinking:
        
           | adusal wrote:
           | As an example, some repositories have huge documents (in some
           | cases a few MBs) that agents won't process today. GitMCP
           | offers semantic search out of the box.
        
           | jwblackwell wrote:
           | Yeah this is one fundamental reason I don't see MCP taking
           | off. The only real use cases there are will just be built in
           | natively to the tools.
        
             | hobofan wrote:
             | Yes, they could be, but then you 100% rely on the client
             | tools doing a good job doing that, which they aren't always
             | good at, and they also have to reinvent the wheel on what
             | are becoming essentially commodity features.
             | 
             | E.g. one of the biggest annoyances for me with cursor was
             | external documentation indexing, where you hand it the
             | website of a specific libarary and then it crawls and
             | indexes that. That feature has been completely broken for
             | me (always aborting with a crawl error). Now with a MCP
             | server, I can just use one that is specialized in this kind
             | of documentation indexing, where I also have the ability to
             | tinker with it if it breaks, and then can use that in all
             | my agentic coding tools that need it (which also allows me
             | to transfer more work to background/non-IDE workflows).
        
           | cruffle_duffle wrote:
           | MCP servers present a structured interface for accessing
           | something and (often) a structured result.
           | 
           | You tell the LLM to visit your GitHub repository via http and
           | it gets back... unstructured, unfocused content not designed
           | with an LLM's context window in mind.
           | 
           | With the MCP server the LLM can initiate a structured
           | interface request and get back structured replies... so
           | instead of HTML (or text extracted from HTML) it gets JSON or
           | something more useful.
        
             | cgio wrote:
             | Is html less structured than json? I thought with LLMs the
             | schematic of structure is less relevant than the structure
             | itself.
        
               | cruffle_duffle wrote:
               | Just trying to explain it to you made me think of a very
               | good reason why an MCP is preferable to just telling it
               | to fetch a page. When you tell ChatGPT or Sonnet or even
               | cursor/windsurf/whatever to fetch a website do you know
               | exactly what it is fetching? Does it load the raw html
               | into the context? Does it parse the page and return just
               | the text? What about the navigation elements, footer and
               | other "noise" or does it have the LLM itself waste
               | precious context window trying to figure the page out? Is
               | it loading the entire page into context or truncating it?
               | If it is truncated, how is the truncation being done?
               | 
               | With an MCP there is no question about what gets fed to
               | the model. It's exactly what you programmed to feed into
               | it.
               | 
               | I'd argue that right there is one of the key reasons
               | you'd want to use MCP over prompting it to fetch a page.
               | 
               | There are many others too though like exposing your
               | database via MCP rather than having it run random "psql"
               | commands and then parsing whatever the command returns.
               | Another thing is letting it paw through splunk logs using
               | an MCP, which provides both a structure way for the LLM
               | to write queries and handle the results... note that even
               | calling out to your shell is done via an MCP.
               | 
               | It's also a stateful protocol, though I haven't really
               | explored that aspect.
               | 
               | It's one of those things that once you play with it
               | you'll go "oh yeah, I see how this fits into the puzzle".
               | Once you see it though, it becomes pretty cool.
        
         | SkyPuncher wrote:
         | Once case I've found valuable is dropping a reference to a PR
         | that's relevant to my work.
         | 
         | I'll tell it to look at that PR to gain context about what was
         | previously changed.
        
         | ramoz wrote:
         | Right, because agents can utilize git natively.
         | 
         | If this is for navigating/searching github in a fine-grained
         | way, then totally cool and useful.
        
         | scosman wrote:
         | It's one of my favourite MCP use cases. I have cloned projects
         | and used the file browser MCP for this, but this looks great.
         | 
         | It allows you to ask questions about how an entire system
         | works. For example the other day "this GitHub action requires
         | the binary X. Is it in the repo, downloading it, or building it
         | on deploy, or something else." Or "what tools does this repo
         | used to implement full text search? Give me an overview"
        
       | qwertox wrote:
       | That is a complex webserver. https://github.com/idosal/git-
       | mcp/tree/main/api
       | 
       | What about private repos in, let's say GitLab or Bitbucket
       | instances, or something simpler?
       | 
       | A Dockerfile could be helpful to get it running locally.
        
         | liadyo wrote:
         | Yes, this is a fully remote MCP server, so the need for an SSE
         | support makes the implementation quite complex. The MCP spec
         | updated to use HTTP streaming, but clients do not support it
         | yet.
        
           | TechDebtDevin wrote:
           | Gemini does I believe. On my list of todos is to add this to
           | my fork of mcp-go.
        
             | vessenes wrote:
             | +1 for this, I'm so so tired of writing my MCP code in
             | python.
        
               | prophesi wrote:
               | If you're in Elixir land, Hermes MCP[0] is a fantastic
               | library for building out MCP clients/servers with both
               | SSE/HTTP support. And will be quite robust given the
               | scalability and fault tolerance of the BEAM.
               | 
               | [0] https://github.com/cloudwalk/hermes-mcp
        
               | vessenes wrote:
               | ooh cool. Sadly I am far from Elixir land. MCP starting
               | out as largely STDIO definitely has made things harder
               | for server-side engineers. I expect this will sort itself
               | out this year though.
        
       | pfista wrote:
       | Do you auto-generate specific MCP tools for the repo? Curious
       | what the queries you would use with an AI agent to get a response
       | back.
       | 
       | I'm building my own hosted MCP solution (https://skeet.build) and
       | have been deliberately choosing which tools to expose depending
       | on the use case- since there are tool limits due to the context
       | window for apps like Cursor.
        
       | pcwelder wrote:
       | Why not have a single mcp server that takes in the repo path or
       | url in the tool call args? Changing config in claude desktop is
       | painful everytime.
        
         | vessenes wrote:
         | I agree - i'd like that option as well.
        
         | liadyo wrote:
         | Yes! The generic form is also supported of course.
         | https://gitmcp.io/docs does exactly that:
         | https://github.com/idosal/git-mcp?tab=readme-ov-file#usage
        
       | thomasfromcdnjs wrote:
       | This is awesome, well done!
        
       | kiitos wrote:
       | > Simply change the domain from github.com or github.io to
       | gitmcp.io and get instant AI context for any GitHub repository.
       | 
       | What does this mean? How does it work? How can I understand how
       | it works? The requirements, limitations, constraints? The landing
       | page tells me nothing! Worse, it doesn't have any links or
       | suggestions as to how I could possibly learn how it works.
       | 
       | > Congratulations! The chosen GitHub project is now fully
       | accessible to your AI.
       | 
       | What does this mean??
       | 
       | > GitMCP serves as a bridge between your GitHub repository's
       | documentation and AI assistants by implementing the Model Context
       | Protocol (MCP). When an AI assistant requires information from
       | your repository, it sends a request to GitMCP. GitMCP retrieves
       | the relevant content and provides semantic search capabilities,
       | ensuring efficient and accurate information delivery.
       | 
       | MCP is a protocol that defines a number of concrete resource
       | types (tools, prompts, etc.) -- each of which have very specific
       | behaviors, semantics, etc. -- and none of which are identified by
       | this project's documentation as what it actually implements!
       | 
       | Specifically what aspects of the MCP are you proxying here?
       | Specifically how do you parse a repo's data and transform it into
       | whatever MCP resources you're supporting? I looked for this
       | information and found it nowhere?
        
         | broodbucket wrote:
         | As someone who is obviously not the target audience, I feel
         | like literally anything on this page that could lead me to
         | explain what MCP is would be nice, while we're talking about
         | what the landing page doesn't tell you. Even just one of the
         | MCP mentions being a link to modelcontextprotocol.io would be
         | fine.
         | 
         | Or maybe I'm so out of the loop it's as obvious as "git" is, I
         | dunno.
        
           | fragmede wrote:
           | It's fair to be curious, but at some point it's also
           | reasonable to expect people are capable of using Google to
           | look up unfamiliar terms. I'm not gatekeeping--just, like,
           | put in a bit of effort?
           | 
           | Threads like this work better when they can go deeper without
           | rehashing the basics every time.
        
             | johannes1234321 wrote:
             | Having a Link to the mcp website won't be "rehashing" but
             | how the web once was supposed to be.
        
             | matthewdgreen wrote:
             | I took a brief look at the MCP documentation today, and
             | left looking confused. At a high level that protocol looks
             | like a massive swiss-army knife that could potentially do
             | everything, and the use-case in TFA looks like it's
             | implementing one very specific tool within that large
             | swiss-army knife. Both need better explanation.
        
       | eagleinparadise wrote:
       | Getting "@ SSE error: undefined" in Cursor for a repo I added. Is
       | there also not a way to force a MCP server to be used? Haiku
       | doesn't pick it up in Cursor.
        
         | adusal wrote:
         | The error usually isn't an issue since the agent can use the
         | tools regardless. It's a by-product of the current
         | implementation's serverless nature and SSE's limitations. We
         | are looking into alternative solutions.
        
         | adusal wrote:
         | Update: We've upgraded our resources to accommodate the growing
         | traffic!
        
       | creddit wrote:
       | How does this differ from the reference Github MCP server?
       | 
       | https://github.com/modelcontextprotocol/servers/tree/main/sr...
       | 
       | EDIT: Oh wait, lol, I looked closer and it seems that the
       | difference is that the server runs on your server instead which
       | is like the single most insane thing I can think of someone
       | choosing to do when the reference Github MCP server exists.
        
         | creddit wrote:
         | This literally looks like spyware to me. Crazy.
        
         | adusal wrote:
         | Just to be clear, GitMCP isn't a repository management tool.
         | Its sole purpose is to make documentation accessible to AI in
         | ways the current tools do not (e.g., semantic search, not
         | necessarily limited to a repository), with minimal overhead for
         | users. GitMCP itself is a free, public, open-source repository.
         | The tool doesn't have access to PII and doesn't store agent
         | queries.
        
       | fallat wrote:
       | Ok, wow.
       | 
       | MCP is REALLY taking off FAST D:
        
       | xena wrote:
       | How do I opt-out for my repos?
        
         | scosman wrote:
         | Do you think that should be an option? I totally get opting out
         | of crawlers, search or training but this is different.
         | 
         | But should the author be able to opt out of a tool used for
         | manually initiated queries? I can't say "don't use grep" on my
         | repo.
        
           | xena wrote:
           | Grep is a tool. This is a service.
        
             | scosman wrote:
             | Yes. But is that the line?
             | 
             | Crawling makes sense (automated traffic) but this isn't
             | automated, it's user initiated. Search indexing makes sense
             | (this isn't that). Training makes sense (this isn't that).
             | 
             | It should have a honest user agent so server can filter,
             | for sure.
             | 
             | If I'm allowed 'git clone X && grep -r' against a service,
             | why can't I do the same with MCP.
        
       | fzysingularity wrote:
       | While I like the seamless integration with GitHub, I'd imagine
       | this doesn't fully take advantage of the stateful nature of MCP.
       | 
       | A really powerful git repo x MCP integration would be to
       | automatically setup the GitHub repo library / environment and be
       | able to interact with that library, making it more stateful and
       | significantly more powerful.
        
       | ianpurton wrote:
       | Some context.
       | 
       | 1. Some LLMs support function calling. That means they are given
       | a list of tools with descriptions of those tools.
       | 
       | 2. Rather than answering your question in one go, the LLM can say
       | it wants to call a function.
       | 
       | 3. Your client (developer tool etc) will call that function and
       | pass the results to the LLM.
       | 
       | 4. The LLM will continue and either complete the conversation or
       | call more tools (functions)
       | 
       | 5. MCP is gaining traction as a standard way of adding
       | tools/functions to LLMs.
       | 
       | GitMCP
       | 
       | I haven't looked too deeply but I can guess.
       | 
       | 1. Will have a bunch of API endpoints that the LLM can call to
       | look at your code. probably stuff like, get_file, get_folder etc.
       | 
       | 2. When you ask the LLM for example "Tell me how to add
       | observability to the code", the LLM can make calls to get the
       | code and start to look at it.
       | 
       | 3. The LLM can keep on making calls to GitMCP until it has enough
       | context to answer the question.
       | 
       | Hope this helps.
        
         | sandbags wrote:
         | I've been wanting to write this somewhere and this seems as
         | good a place as any to start.
         | 
         | Is it just me or is MCP a really bad idea?
         | 
         | We seem to have spent the last 10 years trying to make
         | computing more secure and now people are using node & npx -
         | tools with a less than flawless safety story - to install tools
         | and make them available to a black box LLM that they trust to
         | be non-harmful. On what basis, even about accidental harm I am
         | not sure.
         | 
         | I am not sure if horrified is the right word.
        
       | sivaragavan wrote:
       | I see the appeal of it. It is a good start. But I don't think
       | it's quite useful yet. This proves to be a great distribution
       | model for an MCP project.
       | 
       | FWIW, this project creates two tools for a GitHub repo on demand
       | fetch_cosmos_sdk_documentation
       | search_cosmos_sdk_documentation
       | 
       | These tools would be available for the MCP client to call when it
       | needs information. The search tool didn't quite work for me, but
       | the fetch did. It pulled the readme and made it available to the
       | MCP client. Like I said before, it's not so helpful at the
       | moment. But I am interested in the possibilities.
        
         | sdesol wrote:
         | Full Disclosure: I built an indexing engine for Git and GitHub
         | that can process repos at scale and my words should be taken
         | with scepticism.
         | 
         | I think using MCP is an interesting idea, but the heavy lifting
         | that can provide insights, is not with MCP. For fetch and
         | search to work effectively, the MCP will need quality context
         | to know what to consider. I'm biased, but I really looked into
         | chunking documents, but given how the LLM landscape is
         | evolving, I don't think chunking makes a lot sense any more
         | (for code at least).
         | 
         | I've committed to generating short and long overviews for
         | directories and files. Short overviews are two to three
         | sentences. And long overviews are two to three paragraphs.
         | Given how effectively newer LLMs can process 100,000 tokens or
         | less, you can feed it a short overview for all
         | files/directories to determine what files to sub query with.
         | That is, what long overviews to load into context for the sub
         | query.
         | 
         | I also believe most projects in the future will start to
         | produce READMEs for LLMs that are verbose and not easy to grok
         | for humans, but is rich in detail for LLMs. You may not want
         | the LLM to generate the code for you, but the LLM can certainly
         | help us navigate complex/unfamiliar code in a semantic manner,
         | which can be game changer for onboarding.
        
       ___________________________________________________________________
       (page generated 2025-04-04 23:02 UTC)