[HN Gopher] Show HN: Replace "hub" by "ingest" in GitHub URLs fo...
___________________________________________________________________
Show HN: Replace "hub" by "ingest" in GitHub URLs for a prompt-
friendly extract
Gitingest is a open-source micro dev-tool that I made over the last
week. It turns any public Github repository into a text extract
that you can give to your favourite LLM easily. Today I added this
url trick to make it even easier to use! How I use it myself: -
Quickly generate a README.md boilerplate for a project - Ask LLMs
questions about an undocumented codebase It is still very much
work in progress and I plan to add many more options (file size
limits, exclude patterns..) and a public API I hope this tool can
help you Your feedback is very valuable to help me prioritize And
contributions are welcome!
Author : cyclotruc
Score : 143 points
Date : 2024-12-05 15:24 UTC (7 hours ago)
(HTM) web link (gitingest.com)
(TXT) w3m dump (gitingest.com)
| Exuma wrote:
| isnt there a limit on prompt size? how would you actually use
| this? Im not very up to speed on this stuff
| lolinder wrote:
| Most projects would be way too big to put into a prompt--even
| if technically you're within the official context window, those
| are often misleading--the actual window where input is actually
| useful is usually much smaller than advertised.
|
| What you can do with something like this is store it in a
| database and then query it for relevant chunks, which you then
| feed to the LLM as needed.
| jackstraw14 wrote:
| Ideally let the LLM chunk it up and figure out when to use
| those chunks.
| tom1337 wrote:
| I wonder if building a local version of this which resolves
| dependency paths of the file your currently working on to a
| certain level so the LLM gains more context of related files
| instead of just the whole repo (which could be insane if you
| use a monorepo)
| xnx wrote:
| Gemini Pro has a 2 million character context window which is
| ~1000 pages of code.
| modelorona wrote:
| Very cool! I will try this over the weekend with a new android
| app to see what kind of README I can generate.
|
| Do you have any plans to expand it?
| cyclotruc wrote:
| Yes I want to add a way to target a token count to control your
| LLM costs
| matt3210 wrote:
| The example buttons are a nice touch
| spencerchubb wrote:
| Github already has a way to get the raw text files
| barbazoo wrote:
| All of them in one operation? How?
| johnisgood wrote:
| I think he is confusing "plain" or "raw" view, so probably
| not all of them.
| cyclotruc wrote:
| Hey! OP here: gitingest is getting a lot of love right now, sorry
| if it's unstable but please tell me what goes wrong so I can fix
| it!
| nfilzi wrote:
| Looks neat! From what I understood, it's like zipping up your
| codebase in a streamlined TXT version for LLMs to ingest better?
|
| What'd you say are the differences with using sth like Cursor,
| which has access to your codebase already?
| cyclotruc wrote:
| It's in the same lane, just sometimes you need a quick and
| handy way to get that streamlined TXT from a public Repo
| without leaving your browser
| anamexis wrote:
| It seems to be broken, getting errors like "Error processing
| repository: Path ../tmp/pallets-flask does not exist"
| cyclotruc wrote:
| Thank you, I'll look into it
| ComputerGuru wrote:
| Instead of a copy icon, it would be better to just generate the
| entire content as plaintext in the result (not in an html div on
| a rich html page) so the entire url could be used as an
| attachment or its contents piped directly into an agent/tool.
|
| Ctrl-a + ctrl-c would remain fast.
| vallode wrote:
| Agreed, missing opportunity to be able to change a url from
| github.com/cyclotruc/gitingest to
| gitingest.com/cyclotruc/gitingest and simply recieve the result
| as plain text. A very useful little tool nonetheless.
| cyclotruc wrote:
| Yeah I'm going to do that very soon with the API :)
| wwoessi wrote:
| for that you can use https://uithub.com (g -> u)
|
| - for browsers it shows html - for curl is gets raw text
| prophesi wrote:
| Since the site was hugged to death by HN, this appears to be the
| repo[0] for anyone wanting to run it locally.
|
| [0] https://github.com/cyclotruc/gitingest
| bryant wrote:
| and of course, using the repo as an input for the service
| renders this[1]
|
| [1] https://gitingest.com/cyclotruc/gitingest
| mdaniel wrote:
| // Fetch stars when page loads fetchGitHubStars();
|
| I _do not_ understand why in the world so much of the code is
| related to poking the GH api to fetch the star count
| cyclotruc wrote:
| I know the code is not great, but contributions are very
| much welcome because there's a lot of low hanging fruits
| johnisgood wrote:
| Probably generated by AI, prompted by no- or junior dev.
| This is my opinion, of course, but it looks like code
| generated by an LLM.
| moralestapia wrote:
| This is really nice, congrats on shipping.
|
| I also really like this idea in general of APIs being domains,
| eventually making the web a giant supercomputer.
|
| Edit: There is literally nothing wrong with this comment but feel
| free to keep downvoting, only 5,600 clicks to go!
| Mockapapella wrote:
| https://uithub.com is also a good one for this. They also have an
| API with more options.
| fastball wrote:
| Might be good to have some filtering as well. I added a repo that
| has a heap of localized docs that don't make much sense to ingest
| into an LLM but probably use up a majority of the tokens.
| Cedricgc wrote:
| Does this use the txtar format created for developing the go
| language?
|
| I actually use txtar with a custom CLI to quickly copy multiple
| files to my clipboard and paste it into an LLM chat. I try not to
| get too far from the chat paradigm so I can stay flexible with
| which LLM provider I use
| maleldil wrote:
| If I understand correctly, this sounds like
| https://github.com/simonw/files-to-prompt/.
|
| It's quite useful, with some filtering options (hidden files,
| gitignore, extensions) and support for Claude-style tags.
| Fokamul wrote:
| Nothing against gitingest.com, but this is really peak of
| technology. Having LLMs which require feeding them info with
| copy&paste, peak of effectivity too. OMFG.
| wwoessi wrote:
| Hi, great tool!
|
| I've made https://uithub.com 2 months ago. Its speciality is the
| fact that seeing a repo's raw extract is a matter of changing 'g'
| to 'u'. It also works for subdirectories, so if you just want the
| docs of Upstash QStash, for example, just go to
| https://uithub.com/upstash/docs/tree/main/qstash
|
| Great to see this keeps being worthwhile!
| Arcuru wrote:
| That looks awesome. You didn't mention it but uithub.com also
| has an API, I can definitely see myself using this for a new
| tool.
| nonethewiser wrote:
| I implemented this same idea in bash for local use. Useful but
| only up to a certain size of codebase.
| lukejagg wrote:
| Is the unicode really the best way to display the file structure?
| The special unicode characters are encoded into 2 tokens, so I
| doubt it would function better overall for larger repos.
| shawnz wrote:
| Also, even if different characters were used, the 2D ascii art
| style representation of the directory tree in general strikes
| me as something that's not going to be easily interpreted by an
| LLM, which might not have a conception of how characters are
| laid out in 2D space
___________________________________________________________________
(page generated 2024-12-05 23:00 UTC)