[HN Gopher] Show HN: GitMCP is an automatic MCP server for every...
___________________________________________________________________
Show HN: GitMCP is an automatic MCP server for every GitHub repo
Author : liadyo
Score : 171 points
Date : 2025-04-03 18:28 UTC (1 days ago)
(HTM) web link (gitmcp.io)
(TXT) w3m dump (gitmcp.io)
| liadyo wrote:
| We built an open source remote MCP server that can automatically
| serve documentation from every Github project. Simply replace
| github.com with gitmcp.io in the repo URL - and you get a remote
| MCP server that serves and searches the documentation from this
| repo (llms.txt, llms-full.txt, readme.md, etc). Works with
| github.io as well. Repo here: https://github.com/idosal/git-mcp
| nlawalker wrote:
| _> searches the documentation from this repo (llms.txt, llms-
| full.txt, readme.md, etc)_
|
| What does _etc_ include? Does this operate on a _single content
| file_ from the specified GitHub repo?
| lukew3 wrote:
| Cool project! I would probably call it an MCP server for every
| Github repo though since project could be confused for Github
| Projects which is their work planning/tracking tool.
| liadyo wrote:
| Thanks!
| the_arun wrote:
| But why would we need an MCP server for a github repo? Sorry, I
| am unable to understand the use case.
| liadyo wrote:
| It's very helpful when working with a specific
| technology/library, and you want to access the project's
| llms.txt, readme, search the docs, etc from within the IDE
| using the MCP client. Check it out, for exmaple, with the
| langgraph docs: https://gitmcp.io/#github-pages-demo It really
| improves the development experience.
| qainsights wrote:
| Same here. Can't we just give the repo URL in Cursor/Windsurf
| to use the search tool to get the context? :thinking:
| adusal wrote:
| As an example, some repositories have huge documents (in some
| cases a few MBs) that agents won't process today. GitMCP
| offers semantic search out of the box.
| jwblackwell wrote:
| Yeah this is one fundamental reason I don't see MCP taking
| off. The only real use cases there are will just be built in
| natively to the tools.
| hobofan wrote:
| Yes, they could be, but then you 100% rely on the client
| tools doing a good job doing that, which they aren't always
| good at, and they also have to reinvent the wheel on what
| are becoming essentially commodity features.
|
| E.g. one of the biggest annoyances for me with cursor was
| external documentation indexing, where you hand it the
| website of a specific libarary and then it crawls and
| indexes that. That feature has been completely broken for
| me (always aborting with a crawl error). Now with a MCP
| server, I can just use one that is specialized in this kind
| of documentation indexing, where I also have the ability to
| tinker with it if it breaks, and then can use that in all
| my agentic coding tools that need it (which also allows me
| to transfer more work to background/non-IDE workflows).
| cruffle_duffle wrote:
| MCP servers present a structured interface for accessing
| something and (often) a structured result.
|
| You tell the LLM to visit your GitHub repository via http and
| it gets back... unstructured, unfocused content not designed
| with an LLM's context window in mind.
|
| With the MCP server the LLM can initiate a structured
| interface request and get back structured replies... so
| instead of HTML (or text extracted from HTML) it gets JSON or
| something more useful.
| cgio wrote:
| Is html less structured than json? I thought with LLMs the
| schematic of structure is less relevant than the structure
| itself.
| cruffle_duffle wrote:
| Just trying to explain it to you made me think of a very
| good reason why an MCP is preferable to just telling it
| to fetch a page. When you tell ChatGPT or Sonnet or even
| cursor/windsurf/whatever to fetch a website do you know
| exactly what it is fetching? Does it load the raw html
| into the context? Does it parse the page and return just
| the text? What about the navigation elements, footer and
| other "noise" or does it have the LLM itself waste
| precious context window trying to figure the page out? Is
| it loading the entire page into context or truncating it?
| If it is truncated, how is the truncation being done?
|
| With an MCP there is no question about what gets fed to
| the model. It's exactly what you programmed to feed into
| it.
|
| I'd argue that right there is one of the key reasons
| you'd want to use MCP over prompting it to fetch a page.
|
| There are many others too though like exposing your
| database via MCP rather than having it run random "psql"
| commands and then parsing whatever the command returns.
| Another thing is letting it paw through splunk logs using
| an MCP, which provides both a structure way for the LLM
| to write queries and handle the results... note that even
| calling out to your shell is done via an MCP.
|
| It's also a stateful protocol, though I haven't really
| explored that aspect.
|
| It's one of those things that once you play with it
| you'll go "oh yeah, I see how this fits into the puzzle".
| Once you see it though, it becomes pretty cool.
| SkyPuncher wrote:
| Once case I've found valuable is dropping a reference to a PR
| that's relevant to my work.
|
| I'll tell it to look at that PR to gain context about what was
| previously changed.
| ramoz wrote:
| Right, because agents can utilize git natively.
|
| If this is for navigating/searching github in a fine-grained
| way, then totally cool and useful.
| scosman wrote:
| It's one of my favourite MCP use cases. I have cloned projects
| and used the file browser MCP for this, but this looks great.
|
| It allows you to ask questions about how an entire system
| works. For example the other day "this GitHub action requires
| the binary X. Is it in the repo, downloading it, or building it
| on deploy, or something else." Or "what tools does this repo
| used to implement full text search? Give me an overview"
| qwertox wrote:
| That is a complex webserver. https://github.com/idosal/git-
| mcp/tree/main/api
|
| What about private repos in, let's say GitLab or Bitbucket
| instances, or something simpler?
|
| A Dockerfile could be helpful to get it running locally.
| liadyo wrote:
| Yes, this is a fully remote MCP server, so the need for an SSE
| support makes the implementation quite complex. The MCP spec
| updated to use HTTP streaming, but clients do not support it
| yet.
| TechDebtDevin wrote:
| Gemini does I believe. On my list of todos is to add this to
| my fork of mcp-go.
| vessenes wrote:
| +1 for this, I'm so so tired of writing my MCP code in
| python.
| prophesi wrote:
| If you're in Elixir land, Hermes MCP[0] is a fantastic
| library for building out MCP clients/servers with both
| SSE/HTTP support. And will be quite robust given the
| scalability and fault tolerance of the BEAM.
|
| [0] https://github.com/cloudwalk/hermes-mcp
| vessenes wrote:
| ooh cool. Sadly I am far from Elixir land. MCP starting
| out as largely STDIO definitely has made things harder
| for server-side engineers. I expect this will sort itself
| out this year though.
| pfista wrote:
| Do you auto-generate specific MCP tools for the repo? Curious
| what the queries you would use with an AI agent to get a response
| back.
|
| I'm building my own hosted MCP solution (https://skeet.build) and
| have been deliberately choosing which tools to expose depending
| on the use case- since there are tool limits due to the context
| window for apps like Cursor.
| pcwelder wrote:
| Why not have a single mcp server that takes in the repo path or
| url in the tool call args? Changing config in claude desktop is
| painful everytime.
| vessenes wrote:
| I agree - i'd like that option as well.
| liadyo wrote:
| Yes! The generic form is also supported of course.
| https://gitmcp.io/docs does exactly that:
| https://github.com/idosal/git-mcp?tab=readme-ov-file#usage
| thomasfromcdnjs wrote:
| This is awesome, well done!
| kiitos wrote:
| > Simply change the domain from github.com or github.io to
| gitmcp.io and get instant AI context for any GitHub repository.
|
| What does this mean? How does it work? How can I understand how
| it works? The requirements, limitations, constraints? The landing
| page tells me nothing! Worse, it doesn't have any links or
| suggestions as to how I could possibly learn how it works.
|
| > Congratulations! The chosen GitHub project is now fully
| accessible to your AI.
|
| What does this mean??
|
| > GitMCP serves as a bridge between your GitHub repository's
| documentation and AI assistants by implementing the Model Context
| Protocol (MCP). When an AI assistant requires information from
| your repository, it sends a request to GitMCP. GitMCP retrieves
| the relevant content and provides semantic search capabilities,
| ensuring efficient and accurate information delivery.
|
| MCP is a protocol that defines a number of concrete resource
| types (tools, prompts, etc.) -- each of which have very specific
| behaviors, semantics, etc. -- and none of which are identified by
| this project's documentation as what it actually implements!
|
| Specifically what aspects of the MCP are you proxying here?
| Specifically how do you parse a repo's data and transform it into
| whatever MCP resources you're supporting? I looked for this
| information and found it nowhere?
| broodbucket wrote:
| As someone who is obviously not the target audience, I feel
| like literally anything on this page that could lead me to
| explain what MCP is would be nice, while we're talking about
| what the landing page doesn't tell you. Even just one of the
| MCP mentions being a link to modelcontextprotocol.io would be
| fine.
|
| Or maybe I'm so out of the loop it's as obvious as "git" is, I
| dunno.
| fragmede wrote:
| It's fair to be curious, but at some point it's also
| reasonable to expect people are capable of using Google to
| look up unfamiliar terms. I'm not gatekeeping--just, like,
| put in a bit of effort?
|
| Threads like this work better when they can go deeper without
| rehashing the basics every time.
| johannes1234321 wrote:
| Having a Link to the mcp website won't be "rehashing" but
| how the web once was supposed to be.
| matthewdgreen wrote:
| I took a brief look at the MCP documentation today, and
| left looking confused. At a high level that protocol looks
| like a massive swiss-army knife that could potentially do
| everything, and the use-case in TFA looks like it's
| implementing one very specific tool within that large
| swiss-army knife. Both need better explanation.
| eagleinparadise wrote:
| Getting "@ SSE error: undefined" in Cursor for a repo I added. Is
| there also not a way to force a MCP server to be used? Haiku
| doesn't pick it up in Cursor.
| adusal wrote:
| The error usually isn't an issue since the agent can use the
| tools regardless. It's a by-product of the current
| implementation's serverless nature and SSE's limitations. We
| are looking into alternative solutions.
| adusal wrote:
| Update: We've upgraded our resources to accommodate the growing
| traffic!
| creddit wrote:
| How does this differ from the reference Github MCP server?
|
| https://github.com/modelcontextprotocol/servers/tree/main/sr...
|
| EDIT: Oh wait, lol, I looked closer and it seems that the
| difference is that the server runs on your server instead which
| is like the single most insane thing I can think of someone
| choosing to do when the reference Github MCP server exists.
| creddit wrote:
| This literally looks like spyware to me. Crazy.
| adusal wrote:
| Just to be clear, GitMCP isn't a repository management tool.
| Its sole purpose is to make documentation accessible to AI in
| ways the current tools do not (e.g., semantic search, not
| necessarily limited to a repository), with minimal overhead for
| users. GitMCP itself is a free, public, open-source repository.
| The tool doesn't have access to PII and doesn't store agent
| queries.
| fallat wrote:
| Ok, wow.
|
| MCP is REALLY taking off FAST D:
| xena wrote:
| How do I opt-out for my repos?
| scosman wrote:
| Do you think that should be an option? I totally get opting out
| of crawlers, search or training but this is different.
|
| But should the author be able to opt out of a tool used for
| manually initiated queries? I can't say "don't use grep" on my
| repo.
| xena wrote:
| Grep is a tool. This is a service.
| scosman wrote:
| Yes. But is that the line?
|
| Crawling makes sense (automated traffic) but this isn't
| automated, it's user initiated. Search indexing makes sense
| (this isn't that). Training makes sense (this isn't that).
|
| It should have a honest user agent so server can filter,
| for sure.
|
| If I'm allowed 'git clone X && grep -r' against a service,
| why can't I do the same with MCP.
| fzysingularity wrote:
| While I like the seamless integration with GitHub, I'd imagine
| this doesn't fully take advantage of the stateful nature of MCP.
|
| A really powerful git repo x MCP integration would be to
| automatically setup the GitHub repo library / environment and be
| able to interact with that library, making it more stateful and
| significantly more powerful.
| ianpurton wrote:
| Some context.
|
| 1. Some LLMs support function calling. That means they are given
| a list of tools with descriptions of those tools.
|
| 2. Rather than answering your question in one go, the LLM can say
| it wants to call a function.
|
| 3. Your client (developer tool etc) will call that function and
| pass the results to the LLM.
|
| 4. The LLM will continue and either complete the conversation or
| call more tools (functions)
|
| 5. MCP is gaining traction as a standard way of adding
| tools/functions to LLMs.
|
| GitMCP
|
| I haven't looked too deeply but I can guess.
|
| 1. Will have a bunch of API endpoints that the LLM can call to
| look at your code. probably stuff like, get_file, get_folder etc.
|
| 2. When you ask the LLM for example "Tell me how to add
| observability to the code", the LLM can make calls to get the
| code and start to look at it.
|
| 3. The LLM can keep on making calls to GitMCP until it has enough
| context to answer the question.
|
| Hope this helps.
| sandbags wrote:
| I've been wanting to write this somewhere and this seems as
| good a place as any to start.
|
| Is it just me or is MCP a really bad idea?
|
| We seem to have spent the last 10 years trying to make
| computing more secure and now people are using node & npx -
| tools with a less than flawless safety story - to install tools
| and make them available to a black box LLM that they trust to
| be non-harmful. On what basis, even about accidental harm I am
| not sure.
|
| I am not sure if horrified is the right word.
| sivaragavan wrote:
| I see the appeal of it. It is a good start. But I don't think
| it's quite useful yet. This proves to be a great distribution
| model for an MCP project.
|
| FWIW, this project creates two tools for a GitHub repo on demand
| fetch_cosmos_sdk_documentation
| search_cosmos_sdk_documentation
|
| These tools would be available for the MCP client to call when it
| needs information. The search tool didn't quite work for me, but
| the fetch did. It pulled the readme and made it available to the
| MCP client. Like I said before, it's not so helpful at the
| moment. But I am interested in the possibilities.
| sdesol wrote:
| Full Disclosure: I built an indexing engine for Git and GitHub
| that can process repos at scale and my words should be taken
| with scepticism.
|
| I think using MCP is an interesting idea, but the heavy lifting
| that can provide insights, is not with MCP. For fetch and
| search to work effectively, the MCP will need quality context
| to know what to consider. I'm biased, but I really looked into
| chunking documents, but given how the LLM landscape is
| evolving, I don't think chunking makes a lot sense any more
| (for code at least).
|
| I've committed to generating short and long overviews for
| directories and files. Short overviews are two to three
| sentences. And long overviews are two to three paragraphs.
| Given how effectively newer LLMs can process 100,000 tokens or
| less, you can feed it a short overview for all
| files/directories to determine what files to sub query with.
| That is, what long overviews to load into context for the sub
| query.
|
| I also believe most projects in the future will start to
| produce READMEs for LLMs that are verbose and not easy to grok
| for humans, but is rich in detail for LLMs. You may not want
| the LLM to generate the code for you, but the LLM can certainly
| help us navigate complex/unfamiliar code in a semantic manner,
| which can be game changer for onboarding.
___________________________________________________________________
(page generated 2025-04-04 23:02 UTC)