https://github.com/ai-sidekick/sidekick Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code + Explore + All features + Documentation + GitHub Skills + Blog * Solutions + For + Enterprise + Teams + Startups + Education + By Solution + CI/CD & Automation + DevOps + DevSecOps + Case Studies + Customer Stories + Resources * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles + Repositories + Topics + Trending + Collections * Pricing [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} ai-sidekick / sidekick Public * Notifications * Fork 26 * Star 333 Open source ETL framework for retrieval augmented generation (RAG). Sync data from SaaS tools to vector stores, where they can be easily queried by GPT apps www.getsidekick.ai/ License GPL-3.0 license 333 stars 26 forks Star Notifications * Code * Issues 1 * Pull requests 0 * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Security * Insights ai-sidekick/sidekick This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 2 branches 0 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/a] Use Git or checkout with SVN using the web URL. [gh repo clone ai-sid] Work fast with our official CLI. Learn more. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @Ayan-Bandyopadhyay Ayan-Bandyopadhyay Update README.md ... 8d4376e Mar 30, 2023 Update README.md 8d4376e Git stats * 139 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time client replace Content1 etc with links March 5, 2023 21:58 discord Merge branch 'main' of https://github.com/getbuff/Buff March 28, 2023 19:43 sidekick-server Update README.md March 30, 2023 14:29 .gitignore Update config for discord bot February 9, 2023 07:02 =1.3.0 Initial MVP that lets you search through your Slack messages January 11, 2023 22:36 LICENSE Initial commit January 11, 2023 15:45 README-draft.md Create README-draft.md March 27, 2023 16:56 README.md Update README.md March 30, 2023 12:25 logo.png Add new logo January 24, 2023 18:54 screenshot.png add new screenshot January 25, 2023 08:36 View code [ ] Demo Features Upcoming Getting Started - 15 min API Endpoints Contributing Acknowledgments README.md [228092627-33481415-544b-4a76-9f32] Connect your SaaS tools to a vector database and keep your data synced Slack License License Sidekick is a framework for integrating with SaaS tools like Salesforce, Github, Notion, Zendesk and syncing data between these tools and a vector store. You can also use the integrations and chunkers from built by the community to get started quickly, or quickly build new integrations and write custom chunkers for different content types based on Sidekick's DataConnector and DataChunker specs. Demo Get an API key to test out a hosted version by joining our Slack community.. Post in the #api-keys channel to request a new key. You can test it out on some pre-ingested developer docs by tagging the Sidekick bot in the #sidekick-demo channel. Demo Video Features * Scrape HTML pages and chunk them * Load Markdown files from a Github repo and chunk them * Connect to Weaviate vector store and load chunks * FastAPI endpoints to query vector store directly, or perform Q&A with OpenAI models * Slackbot interface to perform Q&A with OpenAI models Upcoming * DataConnector and DataChunker abstractions to make it easier to contribute new connectors/chunkers * Connect to Pinecone, Milvus, and Qdrant vector stores Getting Started - 15 min To run Sidekick locally: 1. Install Python 3.10, if not already installed. 2. Clone the repository: git clone https://github.com/ai-sidekick/ sidekick.git 3. Navigate to the sidekick-server directory: cd /path/to/sidekick/ sidekick-server 4. Install poetry: pip install poetry 5. Create a new virtual environment with Python 3.10: poetry env use python3.10 6. Install poetry-dotenv: poetry self add poetry-dotenv 7. Activate the virtual environment: poetry shell 8. Install app dependencies: poetry install 9. Set the required environment variables in a .env file in sidekick-server: DATASTORE=weaviate BEARER_TOKEN= // Can be any string when running locally. e.g. 22c443d6-0653-43de-9490-450cd4a9836f OPENAI_API_KEY= WEAVIATE_HOST= // Optional, defaults to http://127.0.0.1 WEAVIATE_PORT= // Optional, defaults to 8080. Should be set to 443 for Weaviate Cloud WEAVIATE_INDEX= // e.g. MarkdownChunk Note that we currently only support weaviate as the data store. You can run Weaviate locally with Docker or set up a sandbox cluster to get a Weaviate host address. 10. Create a file app_config.py in the sidekick-server directory. This should contain an object app_config which maps from each bearer token to a product_id app_config = { "22c443d6-0653-43de-9490-450cd4a9836f": { "product_id": "salesforce" } } The product_id should be a unique identifier for the source of your data. 11. Run the API locally: poetry run start 12. Access the API documentation at http://0.0.0.0:8000/docs and test the API endpoints (make sure to add your bearer token). For support and questions, join our Slack community. API Endpoints The server is based on FastAPI so you can view the interactive API documentation at /docs when you are running it locally. These are the available API endpoints: * /upsert-web-data: This endpoint takes a url as input, uses Playwright to crawl through the webpage (and any linked webpages), and loads them into the vectorstore. * /query: Endpoint to query the vector database with a string. You can filter by source type (web, markdown, etc.) and set the max number of chunks returned. * /ask-llm: Endpoint to get an answer to a question from an LLM, based on the data in the vectorstore. In the response, you get back the sources used in the answer, the user's intent, and whether or not the question is answerable based on the content in your vectorstore. Contributing Sidekick is open for contribution! To add a new data connector, follow the outlined steps: 1. Create a new folder under connectors named -connector where is the name of the source you are connecting to. 2. This folder should contain a file load.py with a function load_data that returns List[DocumentChunk] 3. Create a new endpoint in /server/main.py that calls load_data 4. Add the new source type in models/models.py Acknowledgments * The boilerplate for this project is based on the ChatGPT Retrieval Plugin * The licensing for this project is inspired by Airbyte's licensing model About Open source ETL framework for retrieval augmented generation (RAG). Sync data from SaaS tools to vector stores, where they can be easily queried by GPT apps www.getsidekick.ai/ Topics chatbot openai conversational-ai weaviate etl-pipelines chatgpt langchain Resources Readme License GPL-3.0 license Stars 333 stars Watchers 8 watching Forks 26 forks Releases No releases published Packages 0 No packages published Contributors 3 * @jasonwcfan jasonwcfan Jason Fan * @Ayan-Bandyopadhyay Ayan-Bandyopadhyay Ayan Bandyopadhyay * @Teddarific Teddarific Teddy Ni Languages * Python 86.9% * TypeScript 11.3% * HTML 1.2% * Other 0.6% Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.