https://github.com/NeumTry/NeumAI Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} NeumTry / NeumAI Public * Notifications * Fork 9 * Star 169 Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale. neum.ai License Apache-2.0 license 169 stars 9 forks Activity Star Notifications * Code * Issues 0 * Pull requests 1 * Discussions * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Discussions * Actions * Projects * Security * Insights NeumTry/NeumAI This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 4 branches 0 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/N] Use Git or checkout with SVN using the web URL. [gh repo clone NeumTr] Work fast with our official CLI. Learn more about the CLI. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @kevinco26 kevinco26 Fixed typos in readme (#20) ... bf1c6d1 Nov 21, 2023 Fixed typos in readme (#20) bf1c6d1 Git stats * 43 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time neumai-tools Refactored neumai package for latest langchain and openai dependencie... November 14, 2023 16:31 neumai 0.0.28 (#17) November 20, 2023 15:27 .gitignore package neumai (#1) November 13, 2023 22:27 CONTRIBUTING.md Contributions init November 17, 2023 12:32 LICENSE Create LICENSE November 2, 2023 10:56 README.md Fixed typos in readme (#20) November 21, 2023 10:08 View code Neum AI Features Getting Started Neum AI Cloud Local Development Self-host Roadmap README.md Neum AI Homepage | Documentation | Blog | Discord | Twitter [6874747073] PyPI Neum AI Hero Neum AI is a data platform that helps developers leverage their data to contextualize Large Language Models through Retrieval Augmented Generation (RAG) This includes extracting data from existing data sources like document storage and NoSQL, processing the contents into vector embeddings and ingesting the vector embeddings into vector databases for similarity search. It provides you a comprehensive solution for RAG that can scale with your application and reduce the time spent integrating services like data connectors, embedding models and vector databases. Features * High throughput distributed architecture to handle billions of data points. Allows high degrees of parallelization to optimize embedding generation and ingestion. * Built-in data connectors to common data sources, embedding services and vector stores. * Real-time synchronization of data sources to ensure your data is always up-to-date. * [?] Customizable data pre-processing in the form of loading, chunking and selecting. * Cohesive data management to support hybrid retrieval with metadata. Neum AI automatically augments and tracks metadata to provide rich retrieval experience. Getting Started Neum AI Cloud Sign up today at dashboard.neum.ai. See our quickstart to get started. The Neum AI Cloud supports a large-scale, distributed architecture to run millions of documents through vector embedding. For the full set of features see: Cloud vs Local Local Development Install the neumai package: pip install neumai To create your first data pipelines visit our quickstart. At a high level, a pipeline consists of one or multiple sources to pull data from, one embed connector to vectorize the content, and one sink connector to store said vectors. With this snippet of code we will craft all of these and run a pipeline: Open snippet from neumai.DataConnectors.WebsiteConnector import WebsiteConnector from neumai.Shared.Selector import Selector from neumai.Loaders.HTMLLoader import HTMLLoader from neumai.Chunkers.RecursiveChunker import RecursiveChunker from neumai.Sources.SourceConnector import SourceConnector from neumai.EmbedConnectors import OpenAIEmbed from neumai.SinkConnectors import WeaviateSink from neumai.Pipelines import Pipeline website_connector = WebsiteConnector( url = "https://www.neum.ai/post/retrieval-augmented-generation-at-scale", selector = Selector( to_metadata=['url'] ) ) source = SourceConnector( data_connector = website_connector, loader = HTMLLoader(), chunker = RecursiveChunker() ) openai_embed = OpenAIEmbed( api_key = "", ) weaviate_sink = WeaviateSink( url = "your-weaviate-url", api_key = "your-api-key", class_name = "your-class-name", ) pipeline = Pipeline( sources=[source], embed=openai_embed, sink=weaviate_sink ) pipeline.run() results = pipeline.search( query="What are the challenges with scaling RAG?", number_of_results=3 ) for result in results: print(result.metadata) Self-host If you are interested in deploying Neum AI to your own cloud contact us at founders@tryneum.com. We will publish soon an open-source self-host that leverages the framework's architecture to do high throughput data processing. Roadmap Connectors * [ ] MySQL - Source * [ ] GitHub - Source * [ ] Google Drive - Source * [ ] Hugging Face - Embedding * [ ] LanceDB - Sink * [ ] Milvus - Sink * [ ] Chroma - Sink Search * [ ] Retrieval feedback * [ ] Filter support * [ ] Unified Neum AI filters * [ ] Self-Query Retrieval (w/ Metadata attributes generation) Extensibility * [ ] Langchain / Llama Index Document to Neum Document converter * [ ] Custom chunking and loading Experimental * [ ] Async metadata augmentation * [ ] Chat history connector * [ ] Structured (SQL and GraphQL) search connector Additional tooling for Neum AI can be found here: * neumai-tools: contains pre-processing tools for loading and chunking data before generating vector embeddings. About Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale. neum.ai Topics python data database ai ops pipeline etl retrieval embeddings data-engineering vectors rag mlops vector-database llm chatgpt llmops Resources Readme License Apache-2.0 license Activity Stars 169 stars Watchers 4 watching Forks 9 forks Report repository Releases No releases published Packages 0 No packages published Contributors 2 * @ddematheu ddematheu * @kevinco26 kevinco26 Languages * Python 100.0% Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time.