[HN Gopher] Show HN: Transform your codebase into a single Markd...
       ___________________________________________________________________
        
       Show HN: Transform your codebase into a single Markdown doc for
       feeding into AI
        
       CodeWeaver is a command-line tool designed to weave your codebase
       into a single, easy-to-navigate Markdown document. It recursively
       scans a directory, generating a structured representation of your
       project's file hierarchy and embedding the content of each file
       within code blocks. This tool simplifies codebase sharing,
       documentation, and integration with AI/ML code analysis tools by
       providing a consolidated and readable Markdown output.
        
       Author : tesserato
       Score  : 189 points
       Date   : 2025-02-14 13:23 UTC (9 hours ago)
        
 (HTM) web link (tesserato.web.app)
 (TXT) w3m dump (tesserato.web.app)
        
       | ainiriand wrote:
       | My codebase sitting at 4M lines: hold my spaghetti.
        
         | nahco314 wrote:
         | This is self-promotional, but https://github.com/nahco314/feed-
         | llm has TUI to choose what to give to llm. There are many
         | similar tools out there, but I think this approach is
         | relatively effective for larger code bases.
        
         | ycombinatornews wrote:
         | You can ask Cursor to use information from specific folder (aka
         | your 4M lines) and it would summarize it and use that.
         | 
         | Not a replacement for full 4M lines but it might work for some
         | tasks/prompts
        
       | pmx wrote:
       | How does this compare to / differ from
       | https://github.com/yamadashy/repomix ?
        
         | akoculu wrote:
         | or https://github.com/azer/llmcat
        
           | imdsm wrote:
           | I simply have a bash script called printall which takes in
           | some args, and outputs markdown codeblocks with filenames and
           | a tree. One of hundreds of scripts built up over the years.
        
             | akoculu wrote:
             | if you add fzf to speed up file / folder selection, you'll
             | have your own llmcat :)
        
         | ycombinatornews wrote:
         | My question exactly. Repomix seems to be tested util for
         | something like that.
        
         | apineda wrote:
         | There is also https://github.com/regenrek/codefetch which I
         | personally like
        
         | ActVen wrote:
         | Same question here. I have found repomix to get the job done
         | really well.
        
         | tesserato wrote:
         | Some advantages of CodeWeaver are that it is compiled, so it
         | might be faster; you can grab a compatible executable from the
         | releases section instead of using `go install` so, no
         | dependencies. You can manually specify what to exclude via a
         | comma-separated list of regular expressions so it might be more
         | flexible. I never used Repomix so, those assumptions might not
         | hold. On the other hand, remix seems to be awfully more
         | complete, a full-fledged solution to convert source code to
         | monolithic representations. I wrote CodeWeaver because I only
         | needed something that worked and, occasionally, I could trust
         | to keep sensitive data away from sketchy LLMs (And wasn't aware
         | of other solutions).
        
         | SillyUsername wrote:
         | or https://github.com/bodo-run/yek
        
       | fragmede wrote:
       | Which like, kinda neat that it exists, but who's using tooling
       | that bad that they're manually copying and pasting that much code
       | into, what, a web browser text entry box?
       | 
       | Use better tools people!
        
         | rorytbyrne wrote:
         | This seems useful for _building_ new tools. It 's not strictly
         | an end-user tool.
        
           | dazzawazza wrote:
           | Exactly, the LLM-RAG boffins are all over stuff like this.
        
         | nahco314 wrote:
         | I have always used o1 pro and deep research, but these are only
         | available through the web UI. there is no doubt that cursor and
         | others have a better UI, but the demand for this type of tool
         | exists because OpenAI does not release an API
        
       | mkagenius wrote:
       | I have one for CVEs in case there are security folks here -
       | recursively finds details like code commit diff which fixed the
       | vulnerability in references links too to generate one single
       | json.
       | 
       | 1. https://github.com/BandarLabs/cveingest
        
       | rorytbyrne wrote:
       | Does anyone know of tools that go the other direction? i.e.
       | taking a technical writeup (scientific paper, architecture docs,
       | or similar) and emitting a candidate codebase.
        
         | elashri wrote:
         | Maybe I don't understand but isn't this what you use LLMs for?
        
         | codazoda wrote:
         | I don't know of a tool but I've had some success doing this
         | with a one shot short prompt. I say something like, "Here's a
         | readme. Develop this in Go." Followed by the readme.
         | 
         | I've been getting complete working code with this strategy but
         | I'm creating projects that are relatively simple.
         | 
         | I also notice that I have to give a little deeper context about
         | "how" it should work, which I normally wouldn't do.
        
         | lgas wrote:
         | Yes, I often use one LLM to generate a PRD and the include it
         | in the codebase, then ask Cursor agent to implement some part
         | of the system using the PRD as a reference. It can't emit an
         | entire codebase in one-shot (unless it's trivial project like
         | "build me a flappy bird clone") but you can use it as
         | scaffolding to manage implementing a whole project in chunks.
        
       | tecleandor wrote:
       | As a note, CodeWeaver might be a confusing name, as CodeWeavers
       | (the Wine development company) exists since 1996... (
       | https://en.wikipedia.org/wiki/CodeWeavers )
        
         | teekert wrote:
         | My first though: Is this somehow using Wine?
         | 
         | It's not mentioned on the page but is it using [0] in the
         | background? Edit -> It's a Go program so I guess not.
         | 
         | [0] https://github.com/microsoft/markitdown
        
       | retropragma wrote:
       | I really want a tool like this that can extract a function and
       | its dependency graph (to a certain depth maybe, and/or exclude
       | node_modules).
       | 
       | I wrote this library [1] and hope to add the fine-grained
       | "reference resolution" utility to it at some point, which could
       | make implementing such a tool a lot simpler.
       | 
       | [1]: https://github.com/aleclarson/ts-module-graph
        
       | reddalo wrote:
       | Unfortunate naming, given that CodeWeavers is already a company
       | making a Windows "emulator" for Linux and macOS. [1]
       | 
       | [1] https://www.codeweavers.com/
        
         | lgas wrote:
         | All names are taken. There's no need to point this out every
         | time.
        
           | Rexxar wrote:
           | Some are more confusing than others.
        
           | anamexis wrote:
           | Not all names are registered trademarks for software.
        
         | Arch-TK wrote:
         | CodeWeavers are actually making wine, not just some "emulator".
         | They then distribute this along with some QOL tools as a
         | commercial product called CrossOver.
        
       | Keyframe wrote:
       | Wouldn't it be wonderful to have a tool where you interact with
       | AI interactively through the codebase via IDE / vim / emacs tree?
       | Say, you open your codebase and start with prompts and AI+tool
       | navigates to a function or a place where it needs to and modifies
       | stuff while chatting to you about it? Or you jump to somewhere,
       | highlight where you are to scope down the focus of it (while it
       | still retains all of the code in history/memory). Sort of like
       | pair programming. It sounds so obvious that I'm almost sure I've
       | missed that already existing somewhere. I think I tried google's
       | thing (forgot the name) but it sucked / wasn't that.
        
         | hk__2 wrote:
         | I tried various solutions but I still haven't found a chat tool
         | that allows me to navigate a large monorepo. I'd like to be
         | able to say "open the file where there is the function to do
         | <xyz>", but current tools don't understand that.
        
           | lgas wrote:
           | This works fine in Cursor. As far as I know, you can't say
           | "open the file..." but you can say "where is the function to
           | do <xyz>" and it'll include a link to the file in it's
           | response and then you can click to open it.
        
         | zknowledge wrote:
         | Apologies if I'm missing something, but aren't you describing
         | Cursor/Copilot/Windsurf?
        
           | Keyframe wrote:
           | you're not. looks like that's kind of it, but would the thing
           | have the context of the whole project when I'm in a
           | file/class/function? With copilot, in my case, it was so far
           | mostly like a fancy autocomplete that has immediate vicinity
           | in its memory where it would be vastly more useful if it had
           | the context of the whole project / all files.
        
             | cjonas wrote:
             | Cursor indexes the entire code use with embeddings. It
             | works well in small single app projects
        
               | kohlerm wrote:
               | it is also the "right thing to do" IMHO.
        
           | ajoseps wrote:
           | the vscode extension cline also does this
        
         | meesles wrote:
         | This doesn't sound good to me, you end up with a large codebase
         | that no human has actually laid eyes on. When you get a bug
         | weird enough that you can't reason the LLM through it, then
         | what? What if a bug is because of interactions between two
         | systems, and you don't own one of them? What if there's an
         | issue due to convoluted business process failures, that just
         | end in a bug report like "my data is missing!"? I honestly
         | think in the latter case, the LLM will just fix a 'bug' and
         | miss the forest for the trees.
         | 
         | I prefer the idea of the other comment reply where you use AI
         | as a tool to explore a codebase and assist you, not something
         | you instruct to do the work. It can accelerate you building
         | that experience and intuition at a level we've never been able
         | to do before.
        
           | Keyframe wrote:
           | Nothing like that at all. For example I have a few codebases
           | kind of large (for certain quantity of large) where I know
           | the code since either I wrote it or participated heavily in.
           | Talking snippets at a time loses a ton of context which would
           | yield better offered solutions if you had, well.. the whole
           | context.
        
         | squeegee_scream wrote:
         | I think you're describing Aider.chat. There are 2 Emacs
         | packages for it, one official and a very recent fork. Aider is
         | a cli so it works great with vim as well.
         | 
         | In Emacs I've had good experience with gptel as well but I
         | prefer aider for the coding workflow
        
           | Keyframe wrote:
           | I'll check it out, thanks!
        
       | causal wrote:
       | This could be a lot better. The example linked in the Github
       | README is a markdown file full of binary garbage because it also
       | tried to convert gzip files to markdown.
       | 
       | Pretty big flag that this isn't ready for primetime.
        
         | tesserato wrote:
         | Thank you for pointing that out. Just fixed it.
        
       | tempoponet wrote:
       | A new tool like this comes out every week, and that's great! But
       | I think it's fair to ask how this compares to popular ones like
       | RepoMix? Anyone keeping an eye on this space will want to know
       | why this is different from what's already out there and being
       | used.
        
         | tesserato wrote:
         | I actually wrote this a couple of months ago, so perhaps
         | nothing similar existed back then (I remember doing some
         | research back then, mostly focused on VS Code plugins).
         | Nevertheless, the idea was also to test how Golang could
         | facilitate the distribution of such micro tools throughout the
         | internal team, so I probably would have still made it. It is
         | nice to know that similar tools exist. I'll take a look at
         | them.
        
       | atum47 wrote:
       | Damn, I did that the other day but manually. I just cat
       | everything from a folder in the order that I wanted and fed it to
       | ChatGPT so it could write a README for tiny.js
        
       | lars512 wrote:
       | I've been enjoying `files-to-prompt` by Simon Willison:
       | https://github.com/simonw/files-to-prompt
        
         | franze wrote:
         | here is mine
         | 
         | https://github.com/franzenzenhofer/thisismy
         | 
         | supports files, resursive directories, .gitignor and
         | .thisismyignore and online ressources / URLs + tree commands
         | 
         | also available as a chrome extension
         | https://thisismy.franzai.com/
        
       | rapind wrote:
       | Somewhat related. I built an Elm app all in one file as an
       | experiment and to see if I like it. It's a little over 7k lines
       | and I'm occasionally adding more to it.
       | 
       | It's actually pretty straightforward if you're in a language with
       | lexical scoping, and it simplifies some things, like includes /
       | cyclical, no modules, no hunting through files, etc.
       | 
       | I feel like this set up could integrate really well w/ AI models.
       | 
       | I've found that the only real limitation, at least in my
       | experiment, was a lack of decent editor support. I use vim so
       | this wasn't really much of an issue for me with many great ways
       | to navigate a file, and a combination of vertical and horizontal
       | splits on a large screen, but when I opened it up in other
       | "modern" editors the ergonomics fell apart quite a bit.
       | 
       | I think the biggest downside was re-using variable names between
       | large scopes occasionally made it hard to find the reference I
       | wanted (E.g. i, x, key, val), but again, better editor support
       | allowing you to limit your search to within the current scope
       | would help. Also easily mitigated with more verbose throwaway
       | variable naming.
        
         | squeegee_scream wrote:
         | I write Elm and use Emacs primarily, and sometimes neovim. Are
         | you using lsp in vim? You're doing it right by staying in one
         | file until it hurts, that's the recommendation for Elm, but I
         | can't recall if I've had issues using go-to-def or other lsp
         | functions like your describing
        
           | rapind wrote:
           | No LSP. It honestly doesn't speed me up any. I already have
           | the standard library memorized, plus some of the common
           | community lib methods (List.Extra) and my typing speed is
           | faster than I can think anyways.
           | 
           | I'm thinking the same approach would also work well in F#,
           | Haskell, OCaml.
        
         | Aurornis wrote:
         | > no hunting through files, etc.
         | 
         | It's easy to switch to files by name with a few keystrokes.
         | Files are names to group things I'm looking for.
         | 
         | I would much rather do that than try to search through a 7,000
         | line file for what I need.
         | 
         | > I feel like this set up could integrate really well w/ AI
         | models.
         | 
         | Massive files or too many files break AI models. Grouping
         | functionality into smaller files and including only relevant
         | files is key. The file and folder names can be hints about
         | where to find the right files to include.
        
           | rapind wrote:
           | > I would much rather do that than try to search through a
           | 7,000 line file for what I need.
           | 
           | I mean I'm not arguing for it as a best practice. I did it as
           | an experiment (as I stated), and discovered it's actually
           | really easy, and snappy for me to navigate in Vim. Mileage
           | may vary with other editors. Have you tried it?
           | 
           | > Massive files or too many files break AI models
           | 
           | It's growing faster than I code! With the latest Gemeni at
           | least it's much larger at 1-2 mil tokens. I'm sure we'll hit
           | a ceiling though, but I also think we may find some context
           | caching / rag type optimizations eventually.
        
         | cruffle_duffle wrote:
         | The big problem with that is you'll eventually blow your
         | context window feeding the model with stuff that it mostly
         | doesn't need in order to complete its task.
        
           | rapind wrote:
           | I can't think of anything I'd want to add to the context for
           | Elm at least, assuming the standard libraries are already in
           | the model (or can be added via RAG). Gemeni is 2m tokens now
           | and I expect this will grow at least until it's no longer
           | meaningful.
        
       | crisbal_ wrote:
       | I use the following for feeding into AI                  find .
       | -print -exec cat {} \; -exec echo \;
       | 
       | Which will return for each file (and subfolders) the filename and
       | then the content of the file.
       | 
       | Then `| pbcopy` to copy to clipboard and paste it into ChatGPT or
       | similar.
        
         | DrPhish wrote:
         | That's very nice and compact. I do the same with a short bash
         | script, but wrap each file in triple-backticks and attempt to
         | put the correct language label on each eg:
         | 
         | Filename: demo.py
         | 
         | ```python                  ...python code here...
         | 
         | ```
        
           | mbonnet wrote:
           | Mind sharing the script?
        
           | genewitch wrote:
           | Seconded because just having something autowrapped like that
           | and putting the clipboard would save me time: release the
           | snyder cut, er, bash script!
        
         | singpolyma3 wrote:
         | I guess this only works for very small codebase?
        
           | OsrsNeedsf2P wrote:
           | Correct, but it's the same as what OP shared.
           | 
           | You should use Aider/Cursor for proper indexing/intelligent
           | codebase referencing
        
             | boredemployee wrote:
             | not sure if it's cursor's fault, but very often it doesn't
             | give me the real or complete code of my codebase when auto
             | editing/auto completing.
             | 
             | any tips?
        
             | soco wrote:
             | I'm still puzzled how come people are convinced by Cursor,
             | while my experience was meh at best. Can it index your
             | stuff? okay it can. Can it refactor a simple function? No
             | it cannot, it can't even rename a damn Java class. How can
             | I trust it to generate then code based on my codebase? So,
             | what is your use case then? Or can anybody point me to some
             | blog/articles/videos showing some _real_ use cases for
             | Cursor? Real as in, something that it provenly can do?
        
               | risyachka wrote:
               | I think you know the correct answer:)
        
       | schaefer wrote:
       | Wait, just one question...
       | 
       | Can I call this c++ code "machine code" now?
        
       | __mharrison__ wrote:
       | Interesting. I've been converting Jupyter notebooks into markdown
       | for the same purpose. Am considering making a custom tool.
        
         | tesserato wrote:
         | I also have this use case, and would be interested in such a
         | tool. If you intend to write your tool in Golang, consider
         | instead extending CodeWeaver.
        
       | cjonas wrote:
       | I could see this being quite useful in the background for apps
       | like cursor when they need to perform a full codebase search. I
       | imagine it could be more effective in breaking up larger
       | codebases where embeddings start to fall out. If you could fit
       | the entire document into context, you'd be able to "point the
       | model" in the right direction.
       | 
       | The challenge is maintaining it... But you'd maybe ask the model
       | to do that incrementally on every commit, or just throw it away
       | and regenerate from scratch occasionally.
        
       | tribeca18 wrote:
       | https://www.repoprompt.com is better. You need more granular
       | control if you're planning to use this in real large codebases.
        
       | maurycy wrote:
       | find . -type f -name '*.py' -exec sh -c 'echo "# $1"; cat "$1";
       | echo ""' _ {} \; | pbcopy
        
       | lornajane wrote:
       | For extra points, compile your docs into one file and feed it
       | that as well.
       | 
       | (unless the reason you're giving AI the code is that you don't
       | have any docs for either humans or machines)
        
       | squeegee_scream wrote:
       | This is great, but I'm pretty sure this is trivial using Emacs
       | and org mode. You could then use pandoc to convert org to
       | markdown
        
         | lgas wrote:
         | It's trivial using a number of approaches, eg. a simple bash or
         | python script. But I think there's still a fair amount of value
         | in building a common tool for these sorts of things. Everyone
         | that builds their own one off solution will inevitably
         | encounter more and more of the edge cases (oh I need to honor
         | .gitignore... oh, I need to be able to override .gitignore and
         | include some ignored things... oh I need to deal with huge
         | files... etc) and with a common tool the tool can collect the
         | ways of dealing with all of these edge cases.
         | 
         | Now no one will need something that can handle all of the edge
         | cases, but whatever edge cases they need to be handled will
         | already be handled. The overall time and frustration saved this
         | way can be huge.
        
       | therealmarv wrote:
       | I use aider /copy-context command for that
       | 
       | https://aider.chat/docs/usage/copypaste.html
       | 
       | and with /paste you can apply the changes.
        
       | beklein wrote:
       | Tip: If you ever need to do this on a public GitHub repository
       | you can use "gitingest".
       | 
       | This will open a website that creates a copy of all the file
       | contents of the repo (code, docs, ...) It's a great tool to use
       | when using new/obscure code with LLMs in my opinion.
       | 
       | The UX is so just easy and great, change the URL from
       | <https://github.com/user_name/repo_name> to
       | <https://gitingest.com/user_name/repo_name>
       | 
       | //edit: fixed URLs
        
         | mkagenius wrote:
         | I copied the UX to my https://gitpodcast.com (creates podcast
         | on a github repo, same replace `hub` with `podcast`)
        
       | skeledrew wrote:
       | This is like a rediscovery of an org-mode capability that has
       | existed for decades, and doesn't do as much.
        
         | hatmatrix wrote:
         | Is it? I use org-babel regularly but wasn't aware of it -
         | what's the function called? As great as org-mode / org-babel
         | is, the user base is too small to not be overlooked.
        
       | ActVen wrote:
       | Any unique benefits over using this vs something like Repomix?
       | https://github.com/yamadashy/repomix
        
         | tesserato wrote:
         | CodeWeaver is compiled, so it might be faster. Also, you can
         | grab a compatible executable from the releases section, and
         | you're good to go, instead of using `go install` so, no
         | dependencies. Personally, I considered following the
         | `.gitignore` route but found that manually specifying what to
         | exclude via a comma-separated list of regular expressions
         | provided me with the flexibility I needed (initial setup might
         | be a bit tedious, though, but, then again, you can use an LLM
         | for that).
        
       | resters wrote:
       | See the script I created that does something similar with a few
       | improvements for large projects:
       | 
       | https://paste.mozilla.org/9rD95yAy
       | 
       | I would like to be able to create sets of files that I can easily
       | send to the clipboard in this kind of format. The files could
       | correspond to the ones relevant to a particular feature, etc.
       | They don't always fall under the same subtree of the source code,
       | and the entire source code is too big for the context.
        
         | roskelld wrote:
         | Link says snippet deleted.
        
           | resters wrote:
           | I made a better one that lets you add the files/paths and
           | refresh and copy to the clipboard:
           | 
           | https://paste.mozilla.org/omP4EKE8
        
       | Conasg wrote:
       | I made a similar tool in Golang,
       | https://github.com/foresturquhart/grimoire. It tries to be a bit
       | cleverer, by prioritising files that have had many commits,
       | respecting .gitignore files, and excluding useless content like
       | binaries or vector images.
        
         | tesserato wrote:
         | I can think of no use case where binaries are desired in such
         | representation, so I might bake binary exclusion into
         | CodeWeaver as well. SVGs, on the other hand, might be wanted
         | sometimes, in web design contexts. I'll take a look at your
         | implementation and see what I can learn.
        
           | franze wrote:
           | thisismy has a -g option for greedy which then also takes
           | binaries
        
         | codecraze wrote:
         | Nice! Written in go. I like that :)
        
       | Alifatisk wrote:
       | There is also repo2txt.simplebasedomain.com/local.html
        
       | OsrsNeedsf2P wrote:
       | This thread has convinced me that Aider/Cursor need to do more
       | marketing.
        
         | larusso wrote:
         | Maybe. But maybe some like the more disconnected way of coding
         | with ai.
        
           | lgas wrote:
           | Why? It's just moving more of the grunt work of shuffling
           | things around to the human?
        
             | larusso wrote:
             | For me it's still to feel under control. And the fact that
             | I don't want to inject it into every workflow. I'm open to
             | AI and use it daily. But my terms may be different then
             | others. I want to control what I share and how. People have
             | secrets and other things in a project. I sometimes rename
             | things because the AI should only deal with the big
             | picture. Paint me paranoid but that's how it is for me.
        
         | esafak wrote:
         | The future is not evenly distributed.
        
         | rane wrote:
         | Cursor is all the rage. Nobody talks about Aider, sadly.
        
       | forrest321 wrote:
       | I created something similar.
       | https://github.com/forrest321/code2text
        
       | megadragon9 wrote:
       | I would say the demand for this kind of tool definitely exists.
       | Good work! From a rough glance it looks pretty similar to another
       | tool that I've been using https://github.com/mufeedvh/code2prompt
        
       | stan_kirdey wrote:
       | Nice! Built something similar in Rust that supports local and
       | remote repos: https://crates.io/crates/r2md
        
         | tesserato wrote:
         | I thought of using Rust, but ultimately chose Go. I'll take a
         | look and see how something similar came out in Rust!
        
           | jdironman wrote:
           | Something I didn't dig to find, but is it possible for these
           | applications to also respect .gitignores? Might be a handy
           | flag!
        
       | mtrovo wrote:
       | Anybody with experience of using something like this with a big
       | codebase and Gemini 2M context window? I tried a while ago
       | (before 2.0 Flash) to solve some refactoring tasks and even after
       | spending some time on prompt wrangling I didn't manage to get
       | good results out of it.
       | 
       | I don't know what kind of agent architecture Cursor uses
       | internally but it seems much better designed at finding where
       | changes need to be made.
        
         | tesserato wrote:
         | In my experience with feeding large codebases to Gemini, simple
         | tasks work ok (enumerate where such and such happens, find
         | where a certain function is called, list TODOs throughout the
         | code, etc), but tasks that require a bit more logic are
         | trickier. Nevertheless, I had some success with moderate
         | complex refactoring tasks in Python codebases.
        
       | the_king wrote:
       | Files-to-prompt has been a surprisingly useful tool for this kind
       | of workflow.
       | 
       | https://github.com/simonw/files-to-prompt
        
       | narmiouh wrote:
       | If I'm reading this correctly, why include all code into the
       | markdown? It's almost like the AI model that would use this is
       | necessarily using all concatenated code plus explanation of the
       | code, I'm not sure which is better because the LLM then already
       | has access to the entire code as part of markdown?
        
       | emmelaich wrote:
       | Is this related to https://gitingest.com/ at all? Which seems to
       | be a service doing a similar thing.
        
         | tesserato wrote:
         | It is not. Others have commented pointing to services similar
         | to this one, though.
        
         | BoorishBears wrote:
         | There are a ridiculous number of projects doing this.
         | 
         | I'm always baffled by the response they get since doing this is
         | also the most impractical, poorly scaling, way to insert an LLM
         | into your development process.
         | 
         | On one hand if you realize that, there may be times where you
         | get lucky with the size of a codebase and the nature of your
         | questions and it works acceptably.
         | 
         | But on the other, this feels like the kind of thing someone
         | who's hearing others rave about the utility of AI will try with
         | too large of a codebase, insert the result into ChatGPT, and
         | then get an LLM underperforming because it's being flooded with
         | irrelevant context for every basic operation it's being asked
         | to do.
         | 
         | There are very few times when providing the entire codebase in
         | the context window instead of the relevant code to a single
         | operation makes sense.
        
       | thesurlydev wrote:
       | I literally just wrote something similar called techdocs[1] in
       | Rust and uses Claude to generate a README. It includes API and
       | CLI.
       | 
       | [1] https://github.com/thesurlydev/techdocs
        
         | tesserato wrote:
         | Nice! I thought of using Rust. I'll check how you implemented
         | it.
        
       | strizzo wrote:
       | There's ClipSource for VSCode that does this
        
       | strizzo wrote:
       | I made the same but for VSCode two weeks ago, called it
       | ClipSource it's in the extensions marketplace
       | https://marketplace.visualstudio.com/items?itemName=Strizzo....
       | You can right click on a directory in the workspace and copy all
       | content in markdown
        
       | mmanfrin wrote:
       | I built a simple tool to do something similar (it's meant for a
       | monorepo and will build each subfolder in to a (subfolder-
       | code.txt) text file that you can upload to AIs.
       | 
       | https://github.com/manfrin/bundle-codebases
       | 
       | I don't see much merit in things like markdown or syntax
       | highlighting as that's just extra noise for the AI. My script
       | tries to cut down on any extraneous data since the things I'm
       | working on are near the context limit of consumer AIs.
       | 
       | My script also ignores anything in .gitignore and will take a
       | .codebundlerwhitelist (i hate this name and have meant to change
       | it) to only bundle files matching patterns you specify.
        
         | antirez wrote:
         | Not just extra noise, but also extra tokens.
        
           | mmanfrin wrote:
           | Exactly.
        
       | sandGorgon wrote:
       | how does this compare to code2prompt or files2prompt ? any
       | benchmarks on which one works better for LLMs ?
        
       | adityamwagh wrote:
       | Another alternative is Gitingest [0]. What are the differences?
       | 
       | [0] https://gitingest.com/
        
       | Terretta wrote:
       | _CodeWeavers is a software company that focuses on Wine
       | development and sells a proprietary version of Wine called
       | CrossOver for running Windows applications on macOS, ChromeOS and
       | Linux._
       | 
       | https://en.wikipedia.org/wiki/CodeWeavers
       | 
       | Trademark is active. It's an r not just a (tm), registered not
       | just trademarked. To keep it, they have to demonstrate they
       | defend it.
       | 
       | https://www.trademarkia.com/codeweavers-76546826
       | 
       | While this project drops the final "s", you don't get to launch
       | an OS called "Window". The test is a fuzzy match based on
       | likelihood of confusion.
        
         | jychang wrote:
         | Yeah, I was thinking "what does the Wine guys have to do with
         | this?"
         | 
         | This project is definitely going to get C&D'd.
        
       | _puk wrote:
       | Whilst the pendulum seems well on its way to be swinging from
       | microservices back to monoliths, I'm thinking we'll end up in a
       | place that limits the volume and complexity of the code in a
       | single service so that it's just large enough to encompass a
       | point of single responsibility.
       | 
       | Then we can easily drop in and out of using LLMs in the code
       | space.
       | 
       | Service Oriented Architecture lends itself well to the limited
       | context of these models.
       | 
       | Maybe we can revive literate programming and simply build
       | everything from a single markdown document..
        
         | azthecx wrote:
         | Microservices lend themselves to architectural decisions that
         | LLMs are just not trained to understand.
         | 
         | It's one thing to have it be trained in billions of loc and be
         | useful, its another for it to have enough quality dataset to
         | have enough context and understanding of something like Kafka
         | partition ordering and its possible interactions with something
         | like a database and at-least once delivery. It will give you an
         | explanation of those things in isolation, but not in
         | combination.
        
       | nunodonato wrote:
       | This kind of context is really useful for LLMs, but in any
       | significant project, including all code in this manner will
       | easily exceed context limitations. I've been wanting to do
       | something like this for my php projects, but instead of dumping
       | the entire files, would just create a map of its methods
       | signatures, variables, etc. That should give good enough
       | information of what each file is used for and can do, while being
       | small enough to be ingested by AI.
        
         | panarky wrote:
         | _> including all code in this manner will easily exceed context
         | limitations_
         | 
         | The context window for Gemini 2.0 Flash can handle roughly
         | 50000 lines of code, and 2.0 Pro can handle twice that.
        
       | novemp wrote:
       | How do you do the opposite of this? Transform your markdown files
       | into a codebase that AI can't leech off of?
        
       | hatmatrix wrote:
       | Such a functionality would be useful for developing some scripts
       | and then converting to a Quarto document [1].
       | 
       | [1] https://quarto.org/
        
         | mbonnet wrote:
         | Second hooray for Quarto. Great tool.
        
       ___________________________________________________________________
       (page generated 2025-02-14 23:00 UTC)