[HN Gopher] Korvus: Single-Query RAG with Postgres
       ___________________________________________________________________
        
       Korvus: Single-Query RAG with Postgres
        
       Author : levkk
       Score  : 112 points
       Date   : 2024-07-11 16:35 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | levkk wrote:
       | Hey fellow open-source enthusiasts,
       | 
       | We built Korvus, an open-source RAG (Retrieval-Augmented
       | Generation) pipeline that consolidates the entire RAG workflow -
       | from embedding generation to text generation - into a single SQL
       | query, significantly reducing architectural complexity and
       | latency.
       | 
       | Here's some of the highlights:
       | 
       | - Full RAG pipeline (embedding generation, vector search,
       | reranking, and text generation) in one SQL query
       | 
       | - SDKs for Python, JavaScript, and Rust (more languages planned)
       | 
       | - Built on PostgreSQL, leveraging pgvector and pgml
       | 
       | - Open-source, with support for open models
       | 
       | - Designed for high performance and scalability
       | 
       | Korvus utilizes Postgres' advanced features to perform complex
       | RAG operations natively within the database. We're also the
       | developers of PostgresML, so we're big advocates of in-database
       | machine learning. This approach eliminates the need for external
       | services and API calls, potentially reducing latency by orders of
       | magnitude compared to traditional microservice architectures.
       | It's how our founding team built and scaled the ML platform at
       | Instacart.
       | 
       | We're eager to get feedback from the community and welcome
       | contributions. Check out our GitHub repo for more details, and
       | feel free to hit us up in our Discord!
        
         | kaspermarstal wrote:
         | Very cool! A assume you use Postgres' native full-text search
         | capabilities? Any plans for BM25 or similar? This would make
         | Korvus the end-game for open source rag IMO.
        
         | mdaniel wrote:
         | I find it misleading to use an f-string containing encoded
         | `{CONTEXT}` <https://github.com/postgresml/korvus/blob/bce269a2
         | 0a1dbea933...>, and after digging into TFM
         | <https://postgresml.org/docs/open-
         | source/korvus/guides/rag#si...> it seems it is not, in fact, an
         | f-string artifact but rather the literal characters
         | "{"+"CONTEXT"+"}" and are the same in all the language
         | bindings?
         | 
         | IMHO it would be much clearer if you just used the normal %s
         | for the "outer" string and left the _implicit_ f-string syntax
         | as it is, e.g.                                   {
         | "role": "user",                             # this is not an
         | f-string, is rather replaced by TODO FIXME
         | "content": "Given the context\n:{CONTEXT}\nAnswer the question:
         | %s" % query,                         },
         | 
         | The way the example (in both the readme and the docs) is
         | written, it seems to imply I can put my own fileds as siblings
         | to the chat key and they, too, will be resolved
         | results = await collection.rag(             {
         | "EXAMPLE": {                   "uh-huh": True
         | },                 "CONTEXT": {
         | "vector_search": {                         "query": {
         | "fields": {"text": {"query": query}},
         | },                         "document": {"keys": ["id"]},
         | "limit": 1,                     },
         | "aggregate": {"join": "\n"},                 },
         | "chat": {                   "messages": [{"content": "Given
         | Context:\n{CONTEXT}\nAn Example:\n{EXAMPLE}"                 }
         | 
         | One could not fault the user for thinking such a thing since
         | the *API* docs say "see the *GUIDE*" :-(
         | https://postgresml.org/docs/open-source/korvus/api/collectio...
        
           | smarvin2 wrote:
           | This section of the docs may be confusing. What you described
           | will actually almost work. See:
           | https://postgresml.org/docs/open-
           | source/korvus/guides/rag#ra...
        
       | hahahacorn wrote:
       | Very cool. I see more languages planned in your comment. Are you
       | looking for community help developing SDKs in other languages?
       | After spending an entire Saturday running a RAG pipeline for a
       | POC for a "fun" side project, I definitely would've loved to have
       | been able to use this instead.
       | 
       | I spent too long reading Python docs because I haven't touched
       | the language since 2019. Happy to help develop a Ruby SDK!
        
         | pqdbr wrote:
         | +1 for a Ruby SDK!
        
           | smarvin2 wrote:
           | We would love help developing a Ruby SDK! We programmatically
           | generate our Python, JavaScript, and C bindings from our Rust
           | library. Check out the rust-bridge folder for more info on
           | how we do that.
        
       | naveen_k wrote:
       | This looks exciting! Will definitely be testing it out in the
       | coming days.
       | 
       | I see you offer re-ranking using local models, will there be
       | build-in support for making re-ranking calls to external services
       | such as cohere in the future?
        
         | smarvin2 wrote:
         | Great question! Making calls to external services is not
         | something we plan to support. The point of Korvus is to write
         | SQL queries that take advantage of the pgml and pgvector
         | extensions. Making calls to external services is something that
         | could be done by users after retrieval.
        
       | lecha wrote:
       | Interesting! Is there a way to deploy this on AWS RDS?
        
         | klysm wrote:
         | I'd imagine it just comes down to whether or not the extensions
         | are allowed or not
        
           | smarvin2 wrote:
           | Unfortunately the pgml extension does not work on AWS RDS so
           | there is not.
        
       | jiocrag wrote:
       | Is there any way to deploy this to an existing postgres database
       | or does it need to use the docker instance.
        
         | smarvin2 wrote:
         | You can totally use an existing postgres database. Just make
         | sure to install the pgvector and pgml postgres extensions and
         | it will work!
        
       | stavros wrote:
       | This looks great, thanks! After being disappointed by how flaky
       | gpt-4-turbo's RAG is, I want to set up my own, so this came at
       | the right time.
       | 
       | One question: Can I use an external model (ie get the raw RAG
       | snippets, or prompt text)? Or does it have to be the one
       | specified in Korvus?
        
         | smarvin2 wrote:
         | You can use Korvus for search and feed the results to an
         | external model
        
       | unixhero wrote:
       | Is RAG the new DAG? _outoftheloop_
        
         | helsinki wrote:
         | Zero mention of what a RAG is on the README. No clue, here.
        
           | jzig wrote:
           | From the README:
           | 
           | > Korvus is an all-in-one, open-source RAG (Retrieval-
           | Augmented Generation) pipeline...
        
           | brylie wrote:
           | Retrieval Augmented Generation uses text that is stored in a
           | database to augment user prompts that are sent to a
           | generative AI, like a large language model. The retrieval
           | results are based on their similarities to the user input.
           | The goal is to improve the output of the generative AI by
           | providing more information in the input (user prompt +
           | retrieval results.) For example, we can provide the LLM
           | details from an internal knowledge base so it can generate
           | responses that are specific to an organization rather than
           | based on general information. It may also reduce errors and
           | improve the relevancy of the model output, depending on the
           | context.
        
       | simonw wrote:
       | Does this work my running LLM such as Llama directly on the
       | database server? If so, does that mean that your database and the
       | LLM are competing for the same CPU and memory resources?
       | 
       | Can it run the LLM on a GPU?
        
         | smarvin2 wrote:
         | It does work by running the LLM on the database server but you
         | can configure the LLM to run on the GPU
        
       | nkmnz wrote:
       | This sounds very promising, but let me ask an honest question: to
       | me, it seems like databases are the hardest part to scale in your
       | average IT infrastructure. How much work does it add to the
       | database if you let it make all the ML related work as well? How
       | much work is saved by reducing the number of necessary queries?
        
         | CuriouslyC wrote:
         | This is a read workload that can be easily horizontally scaled.
         | The reduction in dev and infrastructure complexity is well
         | worth the slight increase in DB provisioning.
        
       | thawkins wrote:
       | What LLM system does it use to run models? Does it support
       | ollama?
        
       ___________________________________________________________________
       (page generated 2024-07-11 23:00 UTC)