https://github.com/NeumTry/NeumAI

Skip to content Toggle navigation
 
Sign up

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
NeumTry / NeumAI Public

  * Notifications
  * Fork 9
  * Star 169

Neum AI is a best-in-class framework to manage the creation and
synchronization of vector embeddings at large scale.

neum.ai

License

Apache-2.0 license
169 stars 9 forks Activity
Star
Notifications

  * Code
  * Issues 0
  * Pull requests 1
  * Discussions
  * Actions
  * Projects 0
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Projects
  * Security
  * Insights

NeumTry/NeumAI

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags

Name already in use

A tag already exists with the provided branch name. Many Git commands
accept both tag and branch names, so creating this branch may cause
unexpected behavior. Are you sure you want to create this branch?
Cancel Create
4 branches 0 tags
Code

  * Local
  * Codespaces

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/N]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone NeumTr]

    Work fast with our official CLI. Learn more about the CLI.

  * Open with GitHub Desktop
  * Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@kevinco26
kevinco26 Fixed typos in readme (#20)
...
bf1c6d1 Nov 21, 2023
Fixed typos in readme (#20)
bf1c6d1

Git stats

  * 43 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
neumai-tools
Refactored neumai package for latest langchain and openai
dependencie...
November 14, 2023 16:31
neumai
0.0.28 (#17)
November 20, 2023 15:27
.gitignore
package neumai (#1)
November 13, 2023 22:27
CONTRIBUTING.md
Contributions init
November 17, 2023 12:32
LICENSE
Create LICENSE
November 2, 2023 10:56
README.md
Fixed typos in readme (#20)
November 21, 2023 10:08
View code
Neum AI Features Getting Started Neum AI Cloud Local Development
Self-host Roadmap

README.md

                               Neum AI

         Homepage | Documentation | Blog | Discord | Twitter

                          [6874747073] PyPI

Neum AI Hero

Neum AI is a data platform that helps developers leverage their data
to contextualize Large Language Models through Retrieval Augmented
Generation (RAG) This includes extracting data from existing data
sources like document storage and NoSQL, processing the contents into
vector embeddings and ingesting the vector embeddings into vector
databases for similarity search.

It provides you a comprehensive solution for RAG that can scale with
your application and reduce the time spent integrating services like
data connectors, embedding models and vector databases.

 Features

  *  High throughput distributed architecture to handle billions of
    data points. Allows high degrees of parallelization to optimize
    embedding generation and ingestion.
  *  Built-in data connectors to common data sources, embedding
    services and vector stores.
  *  Real-time synchronization of data sources to ensure your data
    is always up-to-date.
  * [?] Customizable data pre-processing in the form of loading,
    chunking and selecting.
  *  Cohesive data management to support hybrid retrieval with
    metadata. Neum AI automatically augments and tracks metadata to
    provide rich retrieval experience.

 Getting Started

 Neum AI Cloud

Sign up today at dashboard.neum.ai. See our quickstart to get
started.

The Neum AI Cloud supports a large-scale, distributed architecture to
run millions of documents through vector embedding. For the full set
of features see: Cloud vs Local

 Local Development

Install the neumai package:

pip install neumai

To create your first data pipelines visit our quickstart.

At a high level, a pipeline consists of one or multiple sources to
pull data from, one embed connector to vectorize the content, and one
sink connector to store said vectors. With this snippet of code we
will craft all of these and run a pipeline:

Open snippet

  from neumai.DataConnectors.WebsiteConnector import WebsiteConnector
  from neumai.Shared.Selector import Selector
  from neumai.Loaders.HTMLLoader import HTMLLoader
  from neumai.Chunkers.RecursiveChunker import RecursiveChunker
  from neumai.Sources.SourceConnector import SourceConnector
  from neumai.EmbedConnectors import OpenAIEmbed
  from neumai.SinkConnectors import WeaviateSink
  from neumai.Pipelines import Pipeline

  website_connector =  WebsiteConnector(
      url = "https://www.neum.ai/post/retrieval-augmented-generation-at-scale",
      selector = Selector(
          to_metadata=['url']
      )
  )
  source = SourceConnector(
    data_connector = website_connector,
    loader = HTMLLoader(),
    chunker = RecursiveChunker()
  )

  openai_embed = OpenAIEmbed(
    api_key = "<OPEN AI KEY>",
  )

  weaviate_sink = WeaviateSink(
    url = "your-weaviate-url",
    api_key = "your-api-key",
    class_name = "your-class-name",
  )

  pipeline = Pipeline(
    sources=[source],
    embed=openai_embed,
    sink=weaviate_sink
  )
  pipeline.run()

  results = pipeline.search(
    query="What are the challenges with scaling RAG?",
    number_of_results=3
  )

  for result in results:
    print(result.metadata)

 Self-host

If you are interested in deploying Neum AI to your own cloud contact
us at founders@tryneum.com.

We will publish soon an open-source self-host that leverages the
framework's architecture to do high throughput data processing.

 Roadmap

Connectors

  * [ ] MySQL - Source
  * [ ] GitHub - Source
  * [ ] Google Drive - Source
  * [ ] Hugging Face - Embedding
  * [ ] LanceDB - Sink
  * [ ] Milvus - Sink
  * [ ] Chroma - Sink

Search

  * [ ] Retrieval feedback
  * [ ] Filter support
  * [ ] Unified Neum AI filters
  * [ ] Self-Query Retrieval (w/ Metadata attributes generation)

Extensibility

  * [ ] Langchain / Llama Index Document to Neum Document converter
  * [ ] Custom chunking and loading

Experimental

  * [ ] Async metadata augmentation
  * [ ] Chat history connector
  * [ ] Structured (SQL and GraphQL) search connector

Additional tooling for Neum AI can be found here:

  * neumai-tools: contains pre-processing tools for loading and
    chunking data before generating vector embeddings.

About

Neum AI is a best-in-class framework to manage the creation and
synchronization of vector embeddings at large scale.

neum.ai

Topics

python data database ai ops pipeline etl retrieval embeddings 
data-engineering vectors rag mlops vector-database llm chatgpt llmops

Resources

Readme

License

Apache-2.0 license
Activity

Stars

169 stars

Watchers

4 watching

Forks

9 forks
Report repository

Releases

No releases published

Packages 0

No packages published

Contributors 2

  * @ddematheu ddematheu
  * @kevinco26 kevinco26

Languages

  * Python 100.0%

Footer

 (c) 2023 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.