https://github.com/junruxiong/IncarnaMind Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Customer Stories + White papers, Ebooks, Webinars + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} junruxiong / IncarnaMind Public * Notifications * Fork 1 * Star 58 Connect and chat with your multiple documents (pdf and txt) through GPT and Claude LLMs in a minute License Apache-2.0 license 58 stars 1 fork Activity Star Notifications * Code * Issues 0 * Pull requests 0 * Discussions * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Discussions * Actions * Projects * Security * Insights junruxiong/IncarnaMind This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 2 branches 0 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/j] Use Git or checkout with SVN using the web URL. [gh repo clone junrux] Work fast with our official CLI. Learn more about the CLI. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit Git stats * 38 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .github/ISSUE_TEMPLATE data figs toolkit LICENSE README.md configparser.ini convo_qa_chain.py docs2db.py main.py requirements.txt View code [ ] IncarnaMind In a Nutshell Demo Challenges Addressed Key Features Architecture High Level Architecture Sliding Window Chunking Getting Started 1. Installation 1.0. Prerequisites 1.1. Clone the repository 1.2. Setup 2. Usage 2.1. Upload and process your files 2.2. Run 2.3. Chat and ask any questions 2.4. Others Limitations Upcoming Features License Citation README.md IncarnaMind In a Nutshell IncarnaMind enables you to chat with your personal documents (PDF, TXT) using Large Language Models (LLMs) like GPT (architecture overview). While OpenAI has recently launched a fine-tuning API for GPT models, it doesn't enable the base pretrained models to learn new data, and the responses can be prone to factual hallucinations. Utilize our Sliding Window Chunking mechanism and Emsemble Retriever enable efficient querying of both fine-grained and coarse-grained information within your ground truth documents to augment the LLMs. Please feel free to use it and welcome any feedback and new feature suggestions . Powered by Langchain and Chroma DB. Demo Demo.mp4 Challenges Addressed * Fixed Chunking: Our Sliding Window Chunking technique provides a balanced solution in terms of time, computing power, and performance. * Precision vs. Semantics: Small chunks enable fine-grained information retrieval, while large chunks focus on coarse-grained data. We leverage both embedding-based and BM25 methods for a hybrid search approach. * Single-Document Limitation: IncarnaMind supports multi-document querying, breaking the one-document-at-a-time barrier. * Stability: We use Chains instead of Agent to ensure stable parsing across different LLMs. Key Features * Adaptive Chunking: Dynamically adjust the size and position of text chunks to improve retrieval augmented generation (RAG). * Multi-Document Conversational QA: Perform simmple and multi-hop queries across multiple documents simultaneously. * File Compatibility: Supports both PDF and TXT file formats. * LLM Model Compatibility: Supports both OpenAI GPT and Anthropic Claude models. Architecture High Level Architecture image Sliding Window Chunking image Getting Started 1. Installation The installation is simple, you just need run few commands. 1.0. Prerequisites * 3.8 <= Python < 3.11 with Conda * OpenAI API Key or Anthropic Claude API Key * And of course, your own documents. 1.1. Clone the repository git clone https://github.com/junruxiong/IncarnaMind cd IncarnaMind 1.2. Setup Create Conda virtual environment conda create -n IncarnaMind python=3.10 Activate conda activate IncarnaMind Install all requirements pip install -r requirements.txt Setup your API keys in configparser.ini file [tokens] OPENAI_API_KEY = sk-(replace_me) and/or ANTHROPIC_API_KEY = sk-(replace_me) (Optional) Setup your custom parameters in configparser.ini file [parameters] PARAMETERS 1 = (replace_me) PARAMETERS 2 = (replace_me) ... PARAMETERS n = (replace_me) 2. Usage 2.1. Upload and process your files Put all your files (please name each file correctly to maximize the performance) into the /data directory and run the following command to ingest all data: (You can delete example files in the /data directory before running the command) python docs2db.py 2.2. Run In order to start the conversation, run a command like: python main.py 2.3. Chat and ask any questions Wait for the script to require your input like the below. Human: 2.4. Others When you start a chat, the system will automatically generate a IncarnaMind.log file. If you want to edit the logging, please edit in the configparser.ini file. [logging] enabled = True level = INFO filename = IncarnaMind.log format = %(asctime)s [%(levelname)s] %(name)s: %(message)s Limitations * Citation is not supported for current version, but will release soon. * Limited asynchronous capabilities. Upcoming Features * Frontend UI interface * OCR support * Asynchronous optimization * Support open source LLMs * Support more document formats License Apache 2.0 License Citation If you want to cite our work, please use the following bibtex entry: @misc{IncarnaMind2023, author = {Junru Xiong}, title = {IncarnaMind}, year = {2023}, publisher = {GitHub}, journal = {GitHub Repository}, howpublished = {\url{https://github.com/junruxiong/IncarnaMind}} } About Connect and chat with your multiple documents (pdf and txt) through GPT and Claude LLMs in a minute Topics nlp pdf ai chatbot openai gpt llm langchain Resources Readme License Apache-2.0 license Activity Stars 58 stars Watchers 2 watching Forks 1 fork Report repository Releases No releases published Packages 0 No packages published Languages * Python 100.0% Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time.