https://github.com/gr-b/repogather Skip to content Navigation Menu Toggle navigation Sign in * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + GitHub Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions By size + Enterprise + Teams + Startups By industry + Healthcare + Financial services + Manufacturing By use case + CI/CD & Automation + DevOps + DevSecOps * Resources Topics + AI + DevOps + Security + Software Development + View all Explore + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up Reseting focus You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} gr-b / repogather Public * Notifications You must be signed in to change notification settings * Fork 2 * Star 54 Easily copy all relevant source files in a repository to clipboard. For use in LLM code understanding and generation workflows License MIT license 54 stars 2 forks Branches Tags Activity Star Notifications You must be signed in to change notification settings * Code * Issues 0 * Pull requests 0 * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Security * Insights gr-b/repogather This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main BranchesTags Go to file Code Folders and files Name Name Last commit Last commit message date Latest commit History 18 Commits repogather repogather .gitignore .gitignore LICENSE LICENSE MANIFEST.in MANIFEST.in README.md README.md pyproject.toml pyproject.toml requirements.txt requirements.txt setup.py setup.py View all files Repository files navigation * README * MIT license repogather repogather is a command-line tool that copies all relevant files (with their relative paths) in a repository to the clipboard. It is intended to be used in LLM code understanding or code generation workflows. It uses gpt-4o-mini (configurable) to decide file relevance, but can also be used without an LLM to return all files, with non-AI filters (such as excluding tests or config files). Features * Filters and analyzes code files in a repository * Excludes test and configuration files by default (with options to include them) * Filters out common ecosystem-specific directories and files (e.g., node_modules, venv) * Respects .gitignore rules (with option to include ignored files) * Handles repositories of any size by splitting content into multiple requests when necessary * Estimates token count and API usage cost before processing * Uses OpenAI's GPT models to evaluate file relevance * Supports various methods of providing the OpenAI API key * Copies relevant files and their contents to the clipboard * Can return all files without LLM analysis * Allows custom exclusion of files or directories Installation Install repogather using pip: pip install repogather Setup Set up your OpenAI API key using one of the following methods: * As an environment variable: export OPENAI_API_KEY= your_api_key_here * In a .env file in your current working directory: OPENAI_API_KEY=your_api_key_here * Provide it as a command-line argument when running the tool (see Usage section) Usage After installation, you can run repogather from the command line: repogather [QUERY] [OPTIONS] Options * --include-test: Include test files in the analysis * --include-config: Include configuration files in the analysis * --include-ecosystem: Include ecosystem-specific files and directories (e.g., node_modules, venv) * --include-gitignored: Include files that are gitignored * --exclude PATTERN: Exclude files containing the specified path fragment (can be used multiple times) * --relevance-threshold THRESHOLD: Set the relevance threshold (0-100, default: 50) * --model MODEL: Specify the OpenAI model to use (default: gpt-4o-mini-2024-07-18) * --openai-key KEY: Provide the OpenAI API key directly * --all: Return all files without using LLM analysis Examples 1. Analyze files with a query: repogather "Find files related to user authentication" --include-config --relevance-threshold 70 --model gpt-4o-2024-08-06 This command will: 1. Search for files related to user authentication 2. Include configuration files in the search 3. Only return files with a relevance score of 70 or higher 4. Use the GPT-4o model from August 2024 for analysis 2. Return all files without LLM analysis, including ecosystem files but excluding a specific directory: repogather --all --include-test --include-config --include-ecosystem --include-gitignored --exclude "legacy_code" This command will: 1. Gather all code files in the repository 2. Include test, config, and ecosystem-specific files in the output 3. Include files that would normally be ignored by .gitignore 4. Exclude any files or directories containing "legacy_code" in their path 5. Copy all gathered files to the clipboard without using LLM analysis How It Works repogather performs the following steps: 1. Scans the current directory and its subdirectories for code files 2. Filters out test, configuration, ecosystem-specific, and gitignored files (unless included via options) 3. Applies any custom exclusion patterns 4. If --all option is used, returns all filtered files 5. Otherwise: a. Counts the tokens in the filtered files and estimates the API usage cost b. Displays information about large files (>30,000 tokens) and directories (>100,000 tokens) c. Asks for user confirmation before proceeding d. If the total tokens exceed the model's limit, splits the content into multiple requests e. Sends the file contents and the query to the specified OpenAI model f. Processes the model's response to rank files by relevance g. Filters the files by the specified relevance threshold 6. Copies the relevant file paths and contents to the clipboard Note repogather requires an active OpenAI API key when using LLM analysis. It will prompt you to confirm the expected cost of the query (in input tokens) before proceeding. When using the --all option, no API key is required. repogather handles repositories of any size by splitting the content into multiple requests when necessary. This allows for analysis of large codebases without hitting API token limits. About Easily copy all relevant source files in a repository to clipboard. For use in LLM code understanding and generation workflows Resources Readme License MIT license Activity Stars 54 stars Watchers 2 watching Forks 2 forks Report repository Releases No releases published Packages 0 No packages published Languages * Python 100.0% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.